Performing Fixed-Point Arithmetic
Fixed-Point Arithmetic
Addition and subtraction
Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.
a = fi(0.234375,0,4,6); c = a+a
c = 0.4688 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 5 FractionLength: 6
a.bin
ans = 1111
c.bin
ans = 11110
If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.
a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b
c = 3.2416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 14
Multiplication
In general, a full precision product requires a word length equal to the sum
of the word length of the operands. In the following example, note that the word
length of the product c
is equal to the word length of
a
plus the word length of b
. The
fraction length of c
is also equal to the fraction length of
a
plus the fraction length of
b
.
a = fi(pi,1,20), b = fi(exp(1),1,16)
a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 20 FractionLength: 17 b = 2.7183 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13
c = a*b
c = 8.5397 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 36 FractionLength: 30
Math with other built in data types
Note that in C, the result of an operation between an integer data type and a
double data type promotes to a double. However, in MATLAB®, the result of an operation between a built-in integer data type
and a double data type is an integer. In this respect, the fi
object behaves like the built-in integer data types in MATLAB.
When doing addition between fi
and
double
, the double is cast to a fi
with the same numerictype as the fi
input. The result of the
operation is a fi
. When doing multiplication between
fi
and double
, the double is cast to a
fi
with the same word length and signedness of the
fi
, and best precision fraction length. The result of the
operation is a fi
.
a = fi(pi);
a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13
b = 0.5 * a
b = 1.5708 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 32 FractionLength: 28
When doing arithmetic between a fi
and one of the built-in
integer data types, [u]int[8, 16, 32]
, the word length and
signedness of the integer are preserved. The result of the operation is a
fi
.
a = fi(pi); b = int8(2) * a
b = 6.2832 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 24 FractionLength: 13
When doing arithmetic between a fi
and a logical data type,
the logical is treated as an unsigned fi
object with a value
of 0 or 1, and word length 1. The result of the operation is a
fi
object.
a = fi(pi); b = logical(1); c = a*b
c = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 17 FractionLength: 13
The fimath Object
fimath
properties define the rules for performing arithmetic
operations on fi
objects, including math, rounding, and overflow
properties. A fi
object can have a local
fimath
object, or it can use the default
fimath
properties. You can attach a fimath
object to a fi
object by using setfimath
.
Alternatively, you can specify fimath
properties in the
fi
constructor at creation. When a fi
object has a local fimath
, rather than using the default
properties, the display of the fi
object shows the
fimath
properties. In this example, a
has
the ProductMode
property specified in the
constructor.
a = fi(5,1,16,4,'ProductMode','KeepMSB')
a = 5 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 4 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: KeepMSB ProductWordLength: 32 SumMode: FullPrecision
ProductMode
property of a
is set to
KeepMSB
while the remaining fimath
properties use the default values.Note
For more information on the fimath
object, its properties,
and their default values, see fimath Object Properties.
Bit Growth
The following table shows the bit growth of fi
objects,
A
and B
, when their
SumMode
and ProductMode
properties use the
default fimath
value, FullPrecision
.
A | B | Sum = A+B | Prod = A*B | |
---|---|---|---|---|
Format | fi(vA,s1,w1,f1) | fi(vB,s2,w2,f2) | — | — |
Sign | s1 | s2 | Ssum =
(s1 ||s2 ) | Sproduct =
(s1 ||s2 ) |
Integer bits | I1 =
w1-f1-s1 | I2=
w2-f2-s2 | Isum =
max(w1-f1,
w2-f2) + 1 -
Ssum | Iproduct =
(w1 + w2) -
(f1 +
f2) |
Fraction bits | f1 | f2 | Fsum =
max(f1, f2)
| Fproduct =
f1 +
f2 |
Total bits | w1 | w2 | Ssum +
Isum +
Fsum | w1 +
w2 |
This example shows how bit growth can occur in a
for
-loop.
T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end
acc = 1 s33,0 acc = 3 s34,0 acc = 6 s35,0
acc
increases with
each iteration of the loop. This increase causes two problems: One is that code
generation does not allow changing data types in a loop. The other is that, if the
loop is long enough, you run out of memory in MATLAB. See Controlling Bit Growth for some strategies to avoid this
problem.Controlling Bit Growth
Using fimath
By specifying the fimath
properties of a
fi
object, you can control the bit growth as operations
are performed on the object.
F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b
c = 11 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: FullPrecision SumMode: SpecifyPrecision SumWordLength: 8 SumFractionLength: 0 CastBeforeSum: true
The fi
object a
has a local
fimath
object F
. F
specifies the word length and fraction length of the sum. Under the default
fimath
settings, the output, c
,
normally has word length 9, and fraction length 0. However because
a
had a local fimath
object, the
resulting fi
object has word length 8 and fraction length
0.
You can also use fimath
properties to control bit growth in
a for
-loop.
F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end
acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0
Unlike when T.acc
was using the default
fimath
properties, the bit growth of
acc
is now restricted. Thus, the word length of
acc
stays at 32.
Subscripted Assignment
Another way to control bit growth is by using subscripted assignment.
a(I) = b
assigns the values of b
into
the elements of a
specified by the subscript vector,
I
, while retaining the numerictype
of
a
.
T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end
acc (:) = acc + x(n) dictates that the values at subscript vector,
(:)
, change. However, the numerictype
of output acc
remains the same. Because
acc
is a scalar, you also receive the same output if you
use (1)
as the subscript
vector.
for n = 1:numel(x) acc(1) = acc + x(n); end
acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0
The numerictype
of acc
remains the same
at each iteration of the for
-loop.
Subscripted assignment can also help you control bit growth in a function. In
the function, cumulative_sum
, the
numerictype
of y
does not change, but
the values in the elements specified by n
do.
function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements % of a vector. % % For vectors, Y = cumulative_sum(X) is a % vector containing the cumulative sum of % the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end
y = cumulative_sum(fi([1:10],1,8,0))
y = 1 3 6 10 15 21 28 36 45 55 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0
Note
For more information on subscripted assignment, see the subsasgn
function.
accumpos
and accumneg
Another way you can control bit growth is by using the accumpos
and accumneg
functions to
perform addition and subtraction operations. Similar to using subscripted
assignment, accumpos
and accumneg
preserve
the data type of one of its input fi
objects while allowing
you to specify a rounding method, and overflow action in the input
values.
For more information on how to implement accumpos
and
accumneg
, see Avoid Multiword Operations in Generated Code
Overflows and Rounding
When performing fixed-point arithmetic, consider the possibility and consequences
of overflow. The fimath
object specifies the overflow and
rounding modes used when performing arithmetic operations.
Overflows
Overflows can occur when the result of an operation exceeds the maximum or
minimum representable value. The fimath
object has an
OverflowAction
property which offers two ways of dealing
with overflows: saturation and wrap. If you set
OverflowAction
to saturate
, overflows
are saturated to the maximum or minimum value in the range. If you set
OverflowAction
to wrap
, any overflows
wrap using modulo arithmetic, if unsigned, or two’s complement wrap, if
signed.
For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.
Rounding
There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.
Rounding Method | Description | Cost | Bias | Possibility of Overflow |
---|---|---|---|---|
ceil | Rounds to the closest representable number in the direction of positive infinity. | Low | Large positive | Yes |
convergent | Rounds to the closest representable number. In the case of a
tie, convergent rounds to the nearest even
number. This approach is the least-biased rounding method
provided by the toolbox. | High | Unbiased | Yes |
floor | Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation. | Low | Large negative | No |
nearest | Rounds to the closest representable number. In the case of a
tie, nearest rounds to the closest
representable number in the direction of positive infinity. This
rounding method is the default for fi object
creation and fi arithmetic. | Moderate | Small positive | Yes |
round | Rounds to the closest representable number. In the case
of a tie, the
| High |
| Yes |
fix | Rounds to the closest representable number in the direction of zero. | Low |
| No |