【 FPGA 】FIR 滤波器结构和优化（二）之系数填充（Coefficient Padding）_综合

赛灵思官方文档中在讲乘累加器（MAC）（也就是在这篇博文中的乘累加器：【 FPGA 】FIR 滤波器的架构）时，有一段话是对系数填充的简单叙述，当时我没有写进那篇博文中去，我觉得放到那里会让我的博文变得难以理解。

这篇博文我仍然不想放进去，因为我不是太明白，还是塞进这一块，供大家参考吧：

实现滤波器所需的乘法器数量是通过滤波计算过程所需的乘法次数（考虑对称和半带系数结构和采样率变化），通过除以可用时钟频率来处理输入采样。可用的时钟周期值总是向下取整，乘数器的数量则要向上取整。如果存在非零余数，则部分MAC引擎要针对较少的滤波器系数进行计算，并且用零填充系数以适应过量周期。

如果在系数矢量中补充一些零值，则影响输出采样，即在指定的冲激响应输出之前会有一定数目的零输出。FIR编译器IP核根据系统时钟速率，采样率，抽头和通道数以及速率变化自动生成满足用户定义的性能要求的实现。 FIR编译器IP核插入一个或多个乘法器以满足总体吞吐量要求。

进入今天的正题：

When implementing a filter with symmetric coefficients using the Multiply-Accumulate architecture, you must be aware that the core reorganizes the filter coefficients if required to exploit symmetry, and this might alter the filter response. This is only necessary if the core is configured such that all processing cycles are not utilized. For example, when the core has four cycles to process each sample for a 30-tap symmetric response filter, the core pads the coefficient storage out as shown in Figure 3-18.

当使用MAC架构去实现一个拥有对称系数的滤波器时，你必须注意到，如果要求利用对称性，则由IP核组织滤波器系数，而这可能会改变滤波器的响应。

例如，在实现一个30个抽头的对称滤波器时，如果IP核用4个时钟周期来处理每个采样，则会在滤波器系数矢量中填充部分零系数，如图3-18示：

填充零的个数由滤波器抽头数和要求实现的采样速率决定。本例中用4个 MAC 单元并行处理，为实现每4个时钟周期输出一个采样，每个单元处理的系数个数为4个，则共可以处理的系数值为4*4*2=32个，而滤波器抽头数为30，因此需要填充2个零系数，考虑到滤波器系数是对称的，因此将第一个和最后一个系数置零。

（上述计算中的2是由于对称性所得，由于对称性，4个时钟周期可以处理的系数值为4*4*2，第一个4是每个MAC单元处理的系数个数，第2个4是4个MAC单元并行处理，第3个2的意思是对称性导致的处理系数翻倍。）

The appended zeroes after the non-zero coefficients do not affect the filter response, but the prepended zero coefficients do alter the phase response of the filter implementation when compared to the ideal coefficients. There are two ways to avoid this issue: First, and simplest, you can force the Coefficient Structure to be Non-Symmetric. This avoids the issue of prepending zero coefficients to the coefficient vector, and only appended zeroes are used to pad out the filter response to the required number of cycles. Second, and more efficient, you can increase the number of taps implemented by the filter at little or no cost in resource usage. In the previous example, the filter could process 32 taps in the same time, with the same hardware resources, and with the same cycle latency as the 30-tap implementation, and the phase response of the 32-tap filter would be unaltered.

在非零系数之后的附加零不影响滤波器响应，但是与理想系数相比，前置零系数确实改变了滤波器实现的相位响应。有两种方法可以避免这个问题：

首先，最简单的是，您可以强制系数结构为非对称。这避免了将零系数预先设置到系数向量的问题，并且仅使用附加的零来将滤波器响应填充到所需的循环数。

其次，效率更高，您可以在很少或没有资源使用成本的情况下增加过滤器实现的抽头数量。在前面的示例中，滤波器可以在相同的时间内处理32个抽头，具有相同的硬件资源，并且具有与30抽头实现相同的循环延迟，并且32抽头滤波器的相位响应将不会改变。

The Vivado IDE displays the actual number of coefficients calculated on the Implementation Details tab. You can use this information to determine if you can increase the number of coefficients used by your filter definition.

Vivado IDE显示在“实施细节”选项卡上计算的实际系数数量。您可以使用此信息来确定是否可以增加过滤器定义使用的系数数量。

下篇博文：半带滤波器