05. April 2014 · Categories: Software

One of the instructions cut from the M0 with regard to the Cortex-M3 core was SMULL. This instruction is extremely helpful when you want to do fixed point arithmetic with more than 16 bits. Compilers typically emulate this instruction, so that you can write:

To implement its function, int64_t z = x * y, we need to add the individual products. Let ~ denote a sign extended halfword, and 0 a zeroed halfword, then the multiplication x × y or [a,b] * [c,d] can be calculated as 00[b*d] + ~[~a*d]0 + ~[~c*b]0 + [~a*~c]00.

To do this efficiently in assembly, we need to take into account that we only have 8 registers to work with, and that we have a carry to transport between the lower and upper word. So we will start with calculating the middle terms, and add the remaining ones at the end. The following code takes its parameters in r0 and r1, and returns the product in r1:r0.