CS143-project4基于滑窗的人脸检测 Face detection with a sliding window_综合

利用一个窗口（模板）对图像进行滑动计算匹配。是目标检测中的传统方法，尤其是人脸检测。

1.hog和方法简介
2.获取积极(人脸)特征
3.获取随机消极(非人脸)特征
4.训练模型
5.检测验证

一.hog和方法简介
方向梯度直方图(Histogram of Oriented Gradient)，简称HOG 是用于目标检测的描述算子
将每个图像划分为若干个cells，计算每个cell中像素的梯度，然后统计每个cell中不同梯度的直方图，形成一个完整的hog特征算子
这里直接调用开源api:vl_feat的vl_hog算法，得到我们想要的hog算子
我们先对训练数据进行标记，分为两类：人脸(积极positive)和非人脸(消极negative),是人脸则标记为1，非人脸标记为-1
然后分别获取两类图像的hog特征算子
使用svm或其他算法对训练数据进行训练
使用训练好的模型对测试集进行验证

二.获取积极(人脸)特征 get_positive_features
project已经提供了一些头像的图片，只需要读入这些图像，并生成对应的hog特征
代码：
function features_pos = get_positive_features(train_path_pos, feature_params)

% 'train_path_pos' is a string. This directory contains 36x36 images of
%   faces
% 'feature_params' is a struct, with fields
%   feature_params.template_size (probably 36), the number of pixels %块的大小
%   36*36个像素，一个块=6个cell*6个像素 cell包含像素越小越好，但是性能变差
%      spanned by each train / test template and
%   feature_params.hog_cell_size (default 6), the number of pixels in each
%      HoG cell. template size should be evenly divisible by hog_cell_size.
%      Smaller HoG cell sizes tend to work better, but they make things
%      slower because the feature dimensionality increases and more
%      importantly the step size of the classifier decreases at test time.

% 'features_pos' is N by D matrix where N is the number of faces and D
% is the template dimensionality, which would be
% (feature_params.template_size / feature_params.hog_cell_size)^2 * 31
% if you're using the default vl_hog parameters

% Useful functions:
% vl_hog, HOG = VL_HOG(IM, CELLSIZE)
% http://www.vlfeat.org/matlab/vl_hog.html (API)
% http://www.vlfeat.org/overview/hog.html (Tutorial)
% rgb2gray

image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
features_pos = [];
for i = 1:num_images %循环每幅头像图像
    im = image_files(i);
    image_path = fullfile(train_path_pos,im.name);
    image = imread(image_path); %读入头像
    if(size(image,3) > 1)
        image = rgb2gray(image);
    end
    hog = vl_hog(single(image), feature_params.hog_cell_size); %获取hog算子
%     imhog = vl_hog('render', hog, 'verbose') ;
%     clf ;
%     imagesc(imhog) ;
%     colormap gray ;
    features_pos = cat(1,features_pos,reshape(hog,1,1116)); %拼接成N(头像数量)*D(6*6*31) N*1116的矩阵
end

三.获取随机消极(非人脸)特征 get_positive_features
function features_pos = get_positive_features(train_path_pos, feature_params)
% 'train_path_pos' is a string. This directory contains 36x36 images of
%   faces
% 'feature_params' is a struct, with fields
%   feature_params.template_size (probably 36), the number of pixels %块的大小
%   36*36个像素，一个块=6个cell*6个像素 cell包含像素越小越好，但是性能变差
%      spanned by each train / test template and
%   feature_params.hog_cell_size (default 6), the number of pixels in each
%      HoG cell. template size should be evenly divisible by hog_cell_size.
%      Smaller HoG cell sizes tend to work better, but they make things
%      slower because the feature dimensionality increases and more
%      importantly the step size of the classifier decreases at test time.

% 'features_pos' is N by D matrix where N is the number of faces and D
% is the template dimensionality, which would be
%   (feature_params.template_size / feature_params.hog_cell_size)^2 * 31
% if you're using the default vl_hog parameters

% Useful functions:
% vl_hog, HOG = VL_HOG(IM, CELLSIZE)
% http://www.vlfeat.org/matlab/vl_hog.html (API)
% http://www.vlfeat.org/overview/hog.html   (Tutorial)
% rgb2gray

image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
features_pos = [];
for i = 1:num_images %循环每幅头像图像
    im = image_files(i);
    image_path = fullfile(train_path_pos,im.name);
    image = imread(image_path); %读入头像
    if(size(image,3) > 1)
        image = rgb2gray(image);
    end
    hog = vl_hog(single(image), feature_params.hog_cell_size); %获取hog算子

features_pos = cat(1,features_pos,reshape(hog,1,1116)); %拼接成N(头像数量)*D(6*6*31) N*1116的矩阵
end

四.对模型进行训练

代码
%% step 2. Train Classifier
% Use vl_svmtrain on your training features to get a linear classifier
% specified by 'w' and 'b'
% [w b] = vl_svmtrain(X, Y, lambda)
% http://www.vlfeat.org/sandbox/matlab/vl_svmtrain.html
% 'lambda' is an important parameter, try many values. Small values seem to
% work best e.g. 0.0001, but you can try other values

%YOU CODE classifier training. Make sure the outputs are 'w' and 'b'.
% category = categories{i}; %当前分类
% labels = double(strcmp(category, train_labels)); %生成label变量属于此类别值为1，否则是-1
% labels(find(labels == 0)) = -1;
% [W B] = vl_svmtrain(train_image_feats', labels, 0.00006); %调用vl_feat库函数训练svm
if ~exist('w.mat', 'file')
    pos_size = size(features_pos,1);
    neg_size = size(features_neg,1);
    labels = zeros(pos_size+neg_size,1);
    labels(1:pos_size,1)=1;
    labels(pos_size+1:pos_size+neg_size,1)=-1;
    train_feats = cat(1,features_pos,features_neg);
    %[w b] = vl_svmtrain(train_feats, labels, 0.00006); %调用vl_feat库函数训练svm
    [w b] = vl_svmtrain(train_feats', labels', 0.00006); %每列一个样品
    save('w.mat', 'w');
    save('b.mat', 'b');
else
    load('w.mat');
    load('b.mat');
end

五.将训练好的模型用于测试

代码
function [bboxes, confidences, image_ids] = ....
    run_detector(test_scn_path, w, b, feature_params)
% 'test_scn_path' is a string. This directory contains images which may or
%    may not have faces in them. This function should work for the MIT+CMU
%    test set but also for any other images (e.g. class photos) 测试数据
% 'w' and 'b' are the linear classifier parameters
% 'feature_params' is a struct, with fields
%   feature_params.template_size (probably 36), the number of pixels
%      spanned by each train / test template and
%   feature_params.hog_cell_size (default 6), the number of pixels in each
%      HoG cell. template size should be evenly divisible by hog_cell_size.
%      Smaller HoG cell sizes tend to work better, but they make things
%      slower because the feature dimensionality increases and more
%      importantly the step size of the classifier decreases at test time.

% 'bboxes' is Nx4. N is the number of detections. bboxes(i,:) is 探测器
%   [x_min, y_min, x_max, y_max] for detection i.
%   Remember 'y' is dimension 1 in Matlab!
% 'confidences' is Nx1. confidences(i) is the real valued confidence of
%   detection i.
% 'image_ids' is an Nx1 cell array. image_ids{i} is the image file name
%   for detection i. (not the full path, just 'albert.jpg')

% The placeholder version of this code will return random bounding boxes in
% each test image. It will even do non-maximum suppression on the random
% bounding boxes to give you an example of how to call the function.

% Your actual code should convert each test image to HoG feature space with
% a _single_ call to vl_hog for each scale. Then step over the HoG cells,
% taking groups of cells that are the same size as your learned template,
% and classifying them. If the classification is above some confidence,
% keep the detection and then pass all the detections for an image to
% non-maximum suppression. For your initial debugging, you can operate only
% at a single scale and you can skip calling non-maximum suppression.

test_scenes = dir( fullfile( test_scn_path, '*.jpg' ));

%initialize these as empty and incrementally expand them.
bboxes = zeros(0,4);
confidences = zeros(0,1);
image_ids = cell(0,1);

for i = 1:length(test_scenes)

    fprintf('Detecting faces in %s\n', test_scenes(i).name)
    img = imread( fullfile( test_scn_path, test_scenes(i).name ));
    img = single(img)/255;
    if(size(img,3) > 1)
        img = rgb2gray(img);
    end

    %%%%%%%%%%%%%%%%%%%%%%%%%%
    %增加多尺度地判断
    %多尺度参考
    %deciding downsample parameters.
    edgesize = min(size(img,2),size(img,1));%最短的边
    maxdownsize = log(feature_params.template_size/edgesize)/log(0.9);%可以降多少次采样？
    cur_bboxes = [];
    cur_confidences = [];
    cur_image_ids = [];
    for downsize = 0:1:maxdownsize %min(maxdownsize,0) %0开始 step为1
        downsample_scale = realpow(0.9,downsize); %0.7 的downsize次方
        timg = imresize(img,downsample_scale);%降采样
        %使用hog来表示image，如何滑动窗口（模板）检测confidences
        %hogs = vl_hog(img,feature_params.hog_cell_size);
        hogs = vl_hog(timg,feature_params.hog_cell_size);
        max_j = size(hogs,1) - 5;
        max_k = size(hogs,2) - 5;
        %blocks=min(floor(size(hogs,1)/6),floor(size(hogs,2)/6));

        for j = 1:max_j
            for k = 1:max_k
                cur_hog=hogs(j:j+5,k:k+5,:);
                cur_hog= reshape(cur_hog,1,1116);
                confidence = cur_hog*w + b;
                if confidence > 0.5 %计算出来如果是大于阀值（是脸）
                    x_min = k*6/downsample_scale; %记录窗口的位置
                    y_min = j*6/downsample_scale;
                    x_max = (k+5)*6/downsample_scale;
                    y_max = (j+5)*6/downsample_scale;
                    cur_bboxes = cat(1,cur_bboxes,[x_min, y_min, x_max, y_max]);
                    cur_confidences = cat(1,cur_confidences,confidence);
                    cur_image_ids = cat(1,cur_image_ids,{test_scenes(i).name});
                end
            end
        end
    end

    if size(cur_bboxes,1)==0
        continue;
    end

    %%%%%%%%%%%%%%%%%%%%%%%%%%
    %You can delete all of this below.
    % Let's create 15 random detections per image

%     cur_x_min = rand(15,1) * size(img,2);
%     cur_y_min = rand(15,1) * size(img,1);
%     cur_bboxes = [cur_x_min, cur_y_min, cur_x_min + rand(15,1) * 50, cur_y_min + rand(15,1) * 50];
%     cur_confidences = rand(15,1) * 4 - 2; %confidences in the range [-2 2]
%     cur_image_ids(1:15,1) = {test_scenes(i).name};

    %non_max_supr_bbox can actually get somewhat slow with thousands of
    %initial detections. You could pre-filter the detections by confidence,
    %e.g. a detection with confidence -1.1 will probably never be
    %meaningful. You probably _don't_ want to threshold at 0.0, though. You
    %can get higher recall with a lower threshold. You don't need to modify
    %anything in non_max_supr_bbox, but you can.
    [is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(img));

    cur_confidences = cur_confidences(is_maximum,:);
    cur_bboxes      = cur_bboxes(     is_maximum,:);
    cur_image_ids   = cur_image_ids( is_maximum,:);

    bboxes      = [bboxes;      cur_bboxes];
    confidences = [confidences; cur_confidences];
    image_ids   = [image_ids;   cur_image_ids];

end

效果：