当前位置: 代码迷 >> 综合 >> CSAPP: 极度舒适的Perflab
  详细解决方案

CSAPP: 极度舒适的Perflab

热度:59   发布时间:2023-12-06 12:12:12.0

最近的lab和pj太多了5555
每天写一点叭…要加油鸭!!

Perflab

共两个函数 rotate & smooth
需要优化性能 CPE越低越好
but竟然没有评分标准的嘛…

5.15:rotate优化至13.9

原始版本:
在这里插入图片描述
性能:
在这里插入图片描述

改进1:32*32块划分
注意这里调换了原本的RIDX(dim-1-j, i, dim)的i/j顺序,性能提升不少在这里插入图片描述性能:
在这里插入图片描述
改进2:循环4路展开
在这里插入图片描述
性能:
在这里插入图片描述

改进3:32*32块+指针
but为什么用指针更好呢?应该是避免重复计算地址叭~
这里是我自己的构思,分32*32块
在这里插入图片描述
在这里插入图片描述

性能:
在这里插入图片描述

改进4:32*dim+指针
这是参考大佬的,32*dim一块,直接整行拷走
在这里插入图片描述
在这里插入图片描述

今日最佳:
在这里插入图片描述

几个很奇怪的地方:
1.同样的代码,大佬们跑出来能到20+??(电脑的问题嘛…
2.指针一次取6个字节,不知道取内存时是不是4字节对齐性能更好,明天继续冲冲冲~

5.17 完结撒花~

发现大家得到的基准都不一样,所以应该是看加速比哒

首先对rotate做了最后的优化,即把所有dp++改为++dp(如果是dp++在汇编中会多用一个寄存器保存加之前的值)
性能:
在这里插入图片描述
然后是smooth函数:
原始版本:

void naive_smooth(int dim, pixel *src, pixel *dst) 
{
    int i, j;for (i = 0; i < dim; i++)for (j = 0; j < dim; j++)dst[RIDX(i, j, dim)] = avg(dim, i, j, src);
}

性能:
在这里插入图片描述

改进1:减少对函数的调用,改为define

#define min(a, b) (a < b ? a : b)
#define max(a, b) (a > b ? a : b)void smooth_inline(int dim, pixel *src, pixel *dst) 
{
    int i, j, i1, j1, max1, max2, min1, min2;pixel_sum sum;pixel tmp;for (i = 0; i < dim; i++)for (j = 0; j < dim; j++){
    	initialize_pixel_sum(&sum);max1=max(i-1, 0); max2=max(j-1, 0);min1=min(i+1, dim-1);min2=min(j+1, dim-1);for(i1=max1; i1<=min1; i1++)for(j1=max2; j1<=min2; j1++)accumulate_sum(&sum, src[RIDX(i1, j1, dim)]);assign_sum_to_pixel(&tmp, sum);dst[RIDX(i, j, dim)]=tmp;}
}

性能忘记截图了…改进不是很大

改进2:重写avg函数,不调用原始函数,按像素位置分别计算:

void smooth_avg(int dim, pixel *src, pixel *dst) 
{
    int i, j, tmp=dim+1, tmp2=2*dim-1, tmp3=dim*(dim-2), urc=dim-1, llc=dim*(dim-1), lrc=dim*dim-1, i1=0, j1=0;//upper left cornerdst[0].red=(src[0].red+src[1].red+src[dim].red+src[tmp].red)>>2;dst[0].blue=(src[0].blue+src[1].blue+src[dim].blue+src[tmp].blue)>>2;dst[0].green=(src[0].green+src[1].green+src[dim].green+src[tmp].green)>>2;//upper right cornerdst[urc].red=(src[urc].red+src[urc-1].red+src[tmp2].red+src[tmp2-1].red)>>2;dst[urc].blue=(src[urc].blue+src[urc-1].blue+src[tmp2].blue+src[tmp2-1].blue)>>2;dst[urc].green=(src[urc].green+src[urc-1].green+src[tmp2].green+src[tmp2-1].green)>>2;//lower left cornerdst[llc].red=(src[llc].red+src[llc+1].red+src[tmp3].red+src[tmp3+1].red)>>2;dst[llc].blue=(src[llc].blue+src[llc+1].blue+src[tmp3].blue+src[tmp3+1].blue)>>2;dst[llc].green=(src[llc].green+src[llc+1].green+src[tmp3].green+src[tmp3+1].green)>>2;//lower right cornerdst[lrc].red=(src[lrc].red+src[lrc-1].red+src[llc-1].red+src[llc-2].red)>>2;dst[lrc].blue=(src[lrc].blue+src[lrc-1].blue+src[llc-1].blue+src[llc-2].blue)>>2;dst[lrc].green=(src[lrc].green+src[lrc-1].green+src[llc-1].green+src[llc-2].green)>>2;//upper sidefor(j=1; j<dim-1; j++){
    dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j+dim].red+src[j+dim-1].red+src[j+dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j+dim].blue+src[j+dim-1].blue+src[j+dim+1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j+dim].green+src[j+dim-1].green+src[j+dim+1].green)/6;}//lower sidefor(j=llc+1; j<lrc; j++){
    dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j-dim].red+src[j-dim-1].red+src[j-dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j-dim].blue+src[j-dim-1].blue+src[j-dim+1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j-dim].green+src[j-dim-1].green+src[j-dim+1].green)/6;}//left sidefor(j=dim; j<llc; j+=dim){
    dst[j].red=(src[j].red+src[j-dim].red+src[j+dim].red+src[j+1].red+src[j-dim+1].red+src[j+dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-dim].blue+src[j+dim].blue+src[j+1].blue+src[j-dim+1].blue+src[j+dim+1].blue)/6;dst[j].green=(src[j].green+src[j-dim].green+src[j+dim].green+src[j+1].green+src[j-dim+1].green+src[j+dim+1].green)/6;}//right sidefor(j=2*dim-1; j<lrc; j+=dim){
    dst[j].red=(src[j].red+src[j-1].red+src[j-dim].red+src[j+dim].red+src[j-dim-1].red+src[j+dim-1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j-dim].blue+src[j+dim].blue+src[j-dim-1].blue+src[j+dim-1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j-dim].green+src[j+dim].green+src[j-dim-1].green+src[j+dim-1].green)/6;}//othersfor(i=1; i<urc; i++){
    i1=i*dim;for(j=1; j<dim-1; j++){
    j1=i1+j;dst[j1].red=(src[j1].red+src[j1-1].red+src[j1+1].red+src[j1-dim].red+src[j1-dim-1].red+src[j1-dim+1].red+src[j1+dim].red+src[j1+dim-1].red+src[j1+dim+1].red)/9;dst[j1].blue=(src[j1].blue+src[j1-1].blue+src[j1+1].blue+src[j1-dim].blue+src[j1-dim-1].blue+src[j1-dim+1].blue+src[j1+dim].blue+src[j1+dim-1].blue+src[j1+dim+1].blue)/9;dst[j1].green=(src[j1].green+src[j1-1].green+src[j1+1].green+src[j1-dim].green+src[j1-dim-1].green+src[j1-dim+1].green+src[j1+dim].green+src[j1+dim-1].green+src[j1+dim+1].green)/9;}}}

性能有大幅提升啦!最后用点小聪明
改进3:中间部分像素2*2划分块,循环展开

void smooth(int dim, pixel *src, pixel *dst) 
{
    int i, j, tmp=dim+1, tmp2=2*dim-1, tmp3=dim*(dim-2), urc=dim-1, llc=dim*(dim-1), lrc=dim*dim-1, i1=0, j1=0;//upper left cornerdst[0].red=(src[0].red+src[1].red+src[dim].red+src[tmp].red)>>2;dst[0].blue=(src[0].blue+src[1].blue+src[dim].blue+src[tmp].blue)>>2;dst[0].green=(src[0].green+src[1].green+src[dim].green+src[tmp].green)>>2;//upper right cornerdst[urc].red=(src[urc].red+src[urc-1].red+src[tmp2].red+src[tmp2-1].red)>>2;dst[urc].blue=(src[urc].blue+src[urc-1].blue+src[tmp2].blue+src[tmp2-1].blue)>>2;dst[urc].green=(src[urc].green+src[urc-1].green+src[tmp2].green+src[tmp2-1].green)>>2;//lower left cornerdst[llc].red=(src[llc].red+src[llc+1].red+src[tmp3].red+src[tmp3+1].red)>>2;dst[llc].blue=(src[llc].blue+src[llc+1].blue+src[tmp3].blue+src[tmp3+1].blue)>>2;dst[llc].green=(src[llc].green+src[llc+1].green+src[tmp3].green+src[tmp3+1].green)>>2;//lower right cornerdst[lrc].red=(src[lrc].red+src[lrc-1].red+src[llc-1].red+src[llc-2].red)>>2;dst[lrc].blue=(src[lrc].blue+src[lrc-1].blue+src[llc-1].blue+src[llc-2].blue)>>2;dst[lrc].green=(src[lrc].green+src[lrc-1].green+src[llc-1].green+src[llc-2].green)>>2;//upper sidefor(j=1; j<dim-1; ++j){
    dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j+dim].red+src[j+dim-1].red+src[j+dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j+dim].blue+src[j+dim-1].blue+src[j+dim+1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j+dim].green+src[j+dim-1].green+src[j+dim+1].green)/6;}//lower sidefor(j=llc+1; j<lrc; ++j){
    dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j-dim].red+src[j-dim-1].red+src[j-dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j-dim].blue+src[j-dim-1].blue+src[j-dim+1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j-dim].green+src[j-dim-1].green+src[j-dim+1].green)/6;}//left sidefor(j=dim; j<llc; j+=dim){
    dst[j].red=(src[j].red+src[j-dim].red+src[j+dim].red+src[j+1].red+src[j-dim+1].red+src[j+dim+1].red)/6;dst[j].blue=(src[j].blue+src[j-dim].blue+src[j+dim].blue+src[j+1].blue+src[j-dim+1].blue+src[j+dim+1].blue)/6;dst[j].green=(src[j].green+src[j-dim].green+src[j+dim].green+src[j+1].green+src[j-dim+1].green+src[j+dim+1].green)/6;}//right sidefor(j=2*dim-1; j<lrc; j+=dim){
    dst[j].red=(src[j].red+src[j-1].red+src[j-dim].red+src[j+dim].red+src[j-dim-1].red+src[j+dim-1].red)/6;dst[j].blue=(src[j].blue+src[j-1].blue+src[j-dim].blue+src[j+dim].blue+src[j-dim-1].blue+src[j+dim-1].blue)/6;dst[j].green=(src[j].green+src[j-1].green+src[j-dim].green+src[j+dim].green+src[j-dim-1].green+src[j+dim-1].green)/6;}//othersfor(i=1; i<urc-1; i+=2){
    i1=i*dim;for(j=1; j<dim-2; j+=2){
    j1=i1+j;dst[j1].red=(src[j1].red+src[j1-1].red+src[j1+1].red+src[j1-dim].red+src[j1-dim-1].red+src[j1-dim+1].red+src[j1+dim].red+src[j1+dim-1].red+src[j1+dim+1].red)/9;dst[j1].blue=(src[j1].blue+src[j1-1].blue+src[j1+1].blue+src[j1-dim].blue+src[j1-dim-1].blue+src[j1-dim+1].blue+src[j1+dim].blue+src[j1+dim-1].blue+src[j1+dim+1].blue)/9;dst[j1].green=(src[j1].green+src[j1-1].green+src[j1+1].green+src[j1-dim].green+src[j1-dim-1].green+src[j1-dim+1].green+src[j1+dim].green+src[j1+dim-1].green+src[j1+dim+1].green)/9;j1=j1+1;dst[j1].red=(src[j1].red+src[j1-1].red+src[j1+1].red+src[j1-dim].red+src[j1-dim-1].red+src[j1-dim+1].red+src[j1+dim].red+src[j1+dim-1].red+src[j1+dim+1].red)/9;dst[j1].blue=(src[j1].blue+src[j1-1].blue+src[j1+1].blue+src[j1-dim].blue+src[j1-dim-1].blue+src[j1-dim+1].blue+src[j1+dim].blue+src[j1+dim-1].blue+src[j1+dim+1].blue)/9;dst[j1].green=(src[j1].green+src[j1-1].green+src[j1+1].green+src[j1-dim].green+src[j1-dim-1].green+src[j1-dim+1].green+src[j1+dim].green+src[j1+dim-1].green+src[j1+dim+1].green)/9;}i1=(i+1)*dim;for(j=1; j<dim-2; j+=2){
    j1=i1+j;dst[j1].red=(src[j1].red+src[j1-1].red+src[j1+1].red+src[j1-dim].red+src[j1-dim-1].red+src[j1-dim+1].red+src[j1+dim].red+src[j1+dim-1].red+src[j1+dim+1].red)/9;dst[j1].blue=(src[j1].blue+src[j1-1].blue+src[j1+1].blue+src[j1-dim].blue+src[j1-dim-1].blue+src[j1-dim+1].blue+src[j1+dim].blue+src[j1+dim-1].blue+src[j1+dim+1].blue)/9;dst[j1].green=(src[j1].green+src[j1-1].green+src[j1+1].green+src[j1-dim].green+src[j1-dim-1].green+src[j1-dim+1].green+src[j1+dim].green+src[j1+dim-1].green+src[j1+dim+1].green)/9;j1=j1+1;dst[j1].red=(src[j1].red+src[j1-1].red+src[j1+1].red+src[j1-dim].red+src[j1-dim-1].red+src[j1-dim+1].red+src[j1+dim].red+src[j1+dim-1].red+src[j1+dim+1].red)/9;dst[j1].blue=(src[j1].blue+src[j1-1].blue+src[j1+1].blue+src[j1-dim].blue+src[j1-dim-1].blue+src[j1-dim+1].blue+src[j1+dim].blue+src[j1+dim-1].blue+src[j1+dim+1].blue)/9;dst[j1].green=(src[j1].green+src[j1-1].green+src[j1+1].green+src[j1-dim].green+src[j1-dim-1].green+src[j1-dim+1].green+src[j1+dim].green+src[j1+dim-1].green+src[j1+dim+1].green)/9;}}}

性能:
在这里插入图片描述

最终结果
在这里插入图片描述

这种没有明确评分的lab…就很虚( ??? ? ??? )
助教手下留情~
相比起前两个lab,感觉这个回到了上学期还在学基础理论的时候…
谁让机智的老师跳着上呢~
加油冲冲冲!