HDL Filter Block Properties

AdderTreePipeline

This property applies to frame-based filters. It specifies how many pipeline registers the architecture includes between levels of the adder tree. These pipeline stages increase filter throughput while adding latency. The default value is 0. To improve the speed of this architecture, the recommended setting is 2.

Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the code generator automatically adds balancing delays on parallel data paths.

For more information on the frame-based filter architecture, see Frame-Based Architecture.

AddPipelineRegisters

This property applies to scalar input filters. When you enable this property, the default linear adder of the filter is implemented as a pipelined tree adder instead. This architecture increases filter throughput while adding latency. The default value is off.

The following limitations apply to AddPipelineRegisters:

If you use AddPipelineRegisters, the code generator forces full precision in the HDL and the generated filter model. This option implements a pipelined adder tree structure in the HDL code for which only full precision is supported. If you generate a validation model, you must use full precision in the original model to avoid validation mismatches.
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the code generator automatically adds balancing delays on parallel data paths.
Note
When you use this property with the CIC Interpolation (DSP System Toolbox) block, delays in parallel paths are not automatically balanced. Manually add delays where required by your design.

For filter architecture diagrams that indicate where the pipeline stages are added, see HDL Filter Architectures.

ChannelSharing

You can use the ChannelSharing implementation parameter with a multichannel filter to enable sharing a single filter implementation among channels for a more area-efficient design. This parameter is either 'on' or 'off'. The default is 'off', and a separate filter will be implemented for each channel.

See Multichannel FIR Filter for FPGA (DSP System Toolbox).

CoeffMultipliers

The CoeffMultipliers implementation parameter lets you specify use of canonical signed digit (CSD) or factored CSD optimizations for processing coefficient multiplier operations in code generated for certain filter blocks. Specify the CoeffMultipliers parameter using one of the following options:

'csd': Use CSD techniques to replace multiplier operations with shift-and-add operations. CSD techniques minimize the number of addition operations required for constant multiplication by representing binary numbers with a minimum count of nonzero digits. This representation decreases the area used by the filter while maintaining or increasing clock speed.
'factored-csd': Use factored CSD techniques, which replace multiplier operations with shift-and-add operations on prime factors of the coefficients. This option lets you achieve a greater filter area reduction than CSD, at the cost of decreasing clock speed.
'multipliers' (default): Retain multiplier operations.

HDL Coder™ supports CoeffMultipliers for fully-parallel filter implementations. It is not supported for fully-serial and partly-serial architectures.

DALUTPartition

The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To reduce the size, you can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed.

For example, for a 160-tap filter, the LUT size is (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking 10 inputs (taps), the total LUT size is reduced to 16*(2^10)*W bits.

Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.

You can use DALUTPartition to enables DA code generation and specify the number and size of LUT partitions.

Specify LUT partitions as a vector of integers [p1 p2...pN] where:

N is the number of partitions.
Each vector element specifies the size of a partition. The maximum size for an individual partition is 12.
The sum of all vector elements equals the filter length FL. FL is calculated differently depending on the filter type. You can find how FL is calculated for different filter types in the next section.

See Distributed Arithmetic for HDL Filters.

Specifying DALUTPartition for Single-Rate Filters

To determine the LUT partition for one of the supported single-rate filter types, calculate FL as shown in the following table. Then, specify the partition as a vector whose elements sum to FL.

Filter Type	Filter Length (FL) Calculation
Direct-form FIR	`FL = length(find(Hd.numerator ~= 0))`
Direct-form asymmetrical FIR, direct-form symmetrical FIR	`FL = ceil(length(find(Hd.numerator ~= 0))/2)`

You can also specify generation of DA code for your filter design without LUT partitioning. To do so, specify a vector of one element, whose value is equal to the filter length.

Specifying DALUTPartition for Multirate Filters

For supported multirate filters (FIR Decimation and FIR Interpolation), you can specify the LUT partition as

A vector defining a partition for LUTs for all polyphase subfilters.
A matrix of LUT partitions, where each row vector specifies a LUT partition for a corresponding polyphase subfilter. In this case, the FL is uniform for all subfilters. This approach provides fine control for partitioning each subfilter.

The following table shows the FL calculations for each type of LUT partition.

LUT Partition Filter Length (FL) Calculation

LUT Partition	Filter Length (FL) Calculation
Vector: Determine `FL` as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition as a vector of integers whose elements sum to `FL`.	FL = size(polyphase(Hm), 2)
Matrix: Determine the subfilter length `FL`i based on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition for each subfilter as a row vector whose elements sum to `FL`i.	p = polyphase(Hm); FLi = length(find(p(i,:))); where i is the index to the ith row of the polyphase matrix of the multirate filter. The ith row of the matrix `p` represents the ith subfilter.

Vector: Determine FL as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition as a vector of integers whose elements sum to FL.

FL = size(polyphase(Hm), 2)

Matrix: Determine the subfilter length FLi based on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition for each subfilter as a row vector whose elements sum to FLi.

p = polyphase(Hm);
FLi = length(find(p(i,:)));

where i is the index to the ith row of the polyphase matrix of the multirate filter. The ith row of the matrix p represents the ith subfilter.

DARadix

The inherently bit-serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two called the DA radix. For example, a DA radix of 2 (2^1) indicates that one bit sum is computed at a time. A DA radix of 4 (2^2) indicates that two bit sums are computed at a time, and so on.

To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.

Processing more than one bit at a time introduces a degree of parallelism into the operation, improving speed at the expense of area.

You can use DARadix to specify the number of bits processed simultaneously in DA. The number of bits is expressed as N, which must be:

A nonzero positive integer that is a power of two
Such that mod(W, log2(N)) = 0, where W is the input word size of the filter

The default value for N is 2, specifying processing of one bit at a time, or fully serial DA, which is slow but low in area. The maximum value for N is 2^W, where W is the input word size of the filter. This maximum specifies fully parallel DA, which is fast but high in area. Values of N between these extrema specify partly serial DA.

Note

When setting a DARadix value for symmetrical and asymmetrical filters, see Considerations for Symmetrical and Asymmetrical Filters.

See Distributed Arithmetic for HDL Filters.

FoldingFactor

FoldingFactor specifies the total number of clock cycles taken for the computation of filter output in an IIR SOS filter with serial architecture. It is complementary with NumMultipliers. You must select one property or the other; you cannot use both. If you do not specify either FoldingFactor or NumMultipliers, HDL code for the filter is generated with fully parallel architecture.

MultiplierInputPipeline

You can use this parameter to generate a specified number of pipeline stages at multiplier inputs for FIR filter structures. The default value is 0.

The following limitation applies to MultiplierInputPipeline:

Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the code generator automatically adds balancing delays on parallel data paths.

For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.

MultiplierOutputPipeline

You can use this parameter to generate a specified number of pipeline stages at multiplier outputs for FIR filter structures. The default value is 0.

The following limitation applies to MultiplierOutputPipeline:

Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the code generator automatically adds balancing delays on parallel data paths.

For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.

NumMultipliers

NumMultipliers specifies the total number of multipliers used for the filter implementation in an IIR SOS filter with serial architecture. It is complementary with FoldingFactor property. You must select one property or the other; you cannot use both. If you do not specify either FoldingFactor or NumMultipliers, HDL code for the filter is generated with fully parallel architecture.

ReuseAccum

You can use this parameter to enable or disable accumulator reuse in a serial HDL architecture. The default is a fully parallel architecture.

To Generate This Architecture...	Set ReuseAccum to...
Fully parallel	Omit this property
Fully serial	Not specified, or `'off'`
Partly serial	`'off'`
Cascade-serial with explicitly specified partitioning	`'on'`
Cascade-serial with automatically optimized partitioning	`'on'`

For more information on parallel and serial filter architectures, see HDL Filter Architectures

SerialPartition

Use this parameter to specify partitions for a serial filter architecture. The default is a fully parallel architecture.

To Generate This Architecture...	Set SerialPartition to...
Fully parallel	Omit this property
Fully serial	`N`, where `N` is the length of the filter
Partly serial	`[p1 p2 p3...pN]`: A vector of integers having `N` elements, where `N` is the number of serial partitions. Each element of the vector specifies the length of the corresponding partition. The sum of the vector elements must be equal to the length of the filter. When you define the partitioning for a partly serial architecture, consider the following: The filter length should be divided as uniformly as possible into a vector equal in length to the number of multipliers intended. For example, if your design requires a filter of length 9 with 2 multipliers, the recommended partition is `[5 4]`. If your design requires 3 multipliers, the recommended partition is`[3 3 3]` rather than some less uniform division such as `[1 4 4]` or `[3 4 2]`. If your design is constrained by the need to compute each output value (corresponding to each input value) in an exact number `N` of clock cycles, use `N` as the largest partition size and partition the other elements as uniformly as possible. For example, if the filter length is 9 and your design requires exactly 4 cycles to compute the output, define the partition as `[4 3 2]`. This partition executes in 4 clock cycles, at the cost of 3 multipliers.
Cascade-serial with explicitly specified partitioning	`[p1 p2 p3...pN]`: A vector of `N` integers, where `N` is the number of serial partitions. Each element of the vector specifies the length of the corresponding partition. The sum of the vector elements must be equal to the length of the filter. The values of the vector elements must be in descending order, except the last two elements, which can be equal. For example, for a filter length of 8, partitions `[5 3]` or `[4 2 2]` are valid, but the partitions `[2 2 2 2]` and `[3 2 3]` raise an error at code generation time.
Cascade-serial with automatically optimized partitioning	Omit this property.

For more information on parallel and serial filter architectures, see HDL Filter Architectures.

This property is also used for Min/Max blocks with cascade-serial architectures. For how to configure Min/Max cascades, see SerialPartition.