利用一个窗口(模板)对图像进行滑动计算匹配。是目标检测中的传统方法,尤其是人脸检测。
1.hog和方法简介
2.获取积极(人脸)特征
3.获取随机消极(非人脸)特征
4.训练模型
5.检测验证
一.hog和方法简介
方向梯度直方图(Histogram of Oriented Gradient),简称HOG 是用于目标检测的描述算子
将每个图像划分为若干个cells,计算每个cell中像素的梯度,然后统计每个cell中不同梯度的直方图,形成一个完整的hog特征算子
这里直接调用开源api:vl_feat的vl_hog算法,得到我们想要的hog算子
我们先对训练数据进行标记,分为两类:人脸(积极positive)和非人脸(消极negative),是人脸则标记为1,非人脸标记为-1
然后分别获取两类图像的hog特征算子
使用svm或其他算法对训练数据进行训练
使用训练好的模型对测试集进行验证
二.获取积极(人脸)特征 get_positive_features
project已经提供了一些头像的图片,只需要读入这些图像,并生成对应的hog特征
代码:
function features_pos = get_positive_features(train_path_pos, feature_params)
% 'train_path_pos' is a string. This directory contains 36x36 images of
% faces
% 'feature_params' is a struct, with fields
% feature_params.template_size (probably 36), the number of pixels %块的大小
% 36*36个像素,一个块=6个cell*6个像素 cell包含像素越小越好,但是性能变差
% spanned by each train / test template and
% feature_params.hog_cell_size (default 6), the number of pixels in each
% HoG cell. template size should be evenly divisible by hog_cell_size.
% Smaller HoG cell sizes tend to work better, but they make things
% slower because the feature dimensionality increases and more
% importantly the step size of the classifier decreases at test time.
% 'features_pos' is N by D matrix where N is the number of faces and D
% is the template dimensionality, which would be
% (feature_params.template_size / feature_params.hog_cell_size)^2 * 31
% if you're using the default vl_hog parameters
% Useful functions:
% vl_hog, HOG = VL_HOG(IM, CELLSIZE)
% http://www.vlfeat.org/matlab/vl_hog.html (API)
% http://www.vlfeat.org/overview/hog.html (Tutorial)
% rgb2gray
image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
features_pos = [];
for i = 1:num_images %循环每幅头像图像
im = image_files(i);
image_path = fullfile(train_path_pos,im.name);
image = imread(image_path); %读入头像
if(size(image,3) > 1)
image = rgb2gray(image);
end
hog = vl_hog(single(image), feature_params.hog_cell_size); %获取hog算子
% imhog = vl_hog('render', hog, 'verbose') ;
% clf ;
% imagesc(imhog) ;
% colormap gray ;
features_pos = cat(1,features_pos,reshape(hog,1,1116)); %拼接成N(头像数量)*D(6*6*31) N*1116的矩阵
end
三.获取随机消极(非人脸)特征 get_positive_features
function features_pos = get_positive_features(train_path_pos, feature_params)
% 'train_path_pos' is a string. This directory contains 36x36 images of
% faces
% 'feature_params' is a struct, with fields
% feature_params.template_size (probably 36), the number of pixels %块的大小
% 36*36个像素,一个块=6个cell*6个像素 cell包含像素越小越好,但是性能变差
% spanned by each train / test template and
% feature_params.hog_cell_size (default 6), the number of pixels in each
% HoG cell. template size should be evenly divisible by hog_cell_size.
% Smaller HoG cell sizes tend to work better, but they make things
% slower because the feature dimensionality increases and more
% importantly the step size of the classifier decreases at test time.
% 'features_pos' is N by D matrix where N is the number of faces and D
% is the template dimensionality, which would be
% (feature_params.template_size / feature_params.hog_cell_size)^2 * 31
% if you're using the default vl_hog parameters
% Useful functions:
% vl_hog, HOG = VL_HOG(IM, CELLSIZE)
% http://www.vlfeat.org/matlab/vl_hog.html (API)
% http://www.vlfeat.org/overview/hog.html (Tutorial)
% rgb2gray
image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
features_pos = [];
for i = 1:num_images %循环每幅头像图像
im = image_files(i);
image_path = fullfile(train_path_pos,im.name);
image = imread(image_path); %读入头像
if(size(image,3) > 1)
image = rgb2gray(image);
end
hog = vl_hog(single(image), feature_params.hog_cell_size); %获取hog算子
features_pos = cat(1,features_pos,reshape(hog,1,1116)); %拼接成N(头像数量)*D(6*6*31) N*1116的矩阵
end
四.对模型进行训练
代码
%% step 2. Train Classifier
% Use vl_svmtrain on your training features to get a linear classifier
% specified by 'w' and 'b'
% [w b] = vl_svmtrain(X, Y, lambda)
% http://www.vlfeat.org/sandbox/matlab/vl_svmtrain.html
% 'lambda' is an important parameter, try many values. Small values seem to
% work best e.g. 0.0001, but you can try other values
%YOU CODE classifier training. Make sure the outputs are 'w' and 'b'.
% category = categories{i}; %当前分类
% labels = double(strcmp(category, train_labels)); %生成label变量 属于此类别值为1,否则是-1
% labels(find(labels == 0)) = -1;
% [W B] = vl_svmtrain(train_image_feats', labels, 0.00006); %调用vl_feat库函数训练svm
if ~exist('w.mat', 'file')
pos_size = size(features_pos,1);
neg_size = size(features_neg,1);
labels = zeros(pos_size+neg_size,1);
labels(1:pos_size,1)=1;
labels(pos_size+1:pos_size+neg_size,1)=-1;
train_feats = cat(1,features_pos,features_neg);
%[w b] = vl_svmtrain(train_feats, labels, 0.00006); %调用vl_feat库函数训练svm
[w b] = vl_svmtrain(train_feats', labels', 0.00006); %每列一个样品
save('w.mat', 'w');
save('b.mat', 'b');
else
load('w.mat');
load('b.mat');
end
五.将训练好的模型用于测试
代码
function [bboxes, confidences, image_ids] = ....
run_detector(test_scn_path, w, b, feature_params)
% 'test_scn_path' is a string. This directory contains images which may or
% may not have faces in them. This function should work for the MIT+CMU
% test set but also for any other images (e.g. class photos) 测试数据
% 'w' and 'b' are the linear classifier parameters
% 'feature_params' is a struct, with fields
% feature_params.template_size (probably 36), the number of pixels
% spanned by each train / test template and
% feature_params.hog_cell_size (default 6), the number of pixels in each
% HoG cell. template size should be evenly divisible by hog_cell_size.
% Smaller HoG cell sizes tend to work better, but they make things
% slower because the feature dimensionality increases and more
% importantly the step size of the classifier decreases at test time.
% 'bboxes' is Nx4. N is the number of detections. bboxes(i,:) is 探测器
% [x_min, y_min, x_max, y_max] for detection i.
% Remember 'y' is dimension 1 in Matlab!
% 'confidences' is Nx1. confidences(i) is the real valued confidence of
% detection i.
% 'image_ids' is an Nx1 cell array. image_ids{i} is the image file name
% for detection i. (not the full path, just 'albert.jpg')
% The placeholder version of this code will return random bounding boxes in
% each test image. It will even do non-maximum suppression on the random
% bounding boxes to give you an example of how to call the function.
% Your actual code should convert each test image to HoG feature space with
% a _single_ call to vl_hog for each scale. Then step over the HoG cells,
% taking groups of cells that are the same size as your learned template,
% and classifying them. If the classification is above some confidence,
% keep the detection and then pass all the detections for an image to
% non-maximum suppression. For your initial debugging, you can operate only
% at a single scale and you can skip calling non-maximum suppression.
test_scenes = dir( fullfile( test_scn_path, '*.jpg' ));
%initialize these as empty and incrementally expand them.
bboxes = zeros(0,4);
confidences = zeros(0,1);
image_ids = cell(0,1);
for i = 1:length(test_scenes)
fprintf('Detecting faces in %s\n', test_scenes(i).name)
img = imread( fullfile( test_scn_path, test_scenes(i).name ));
img = single(img)/255;
if(size(img,3) > 1)
img = rgb2gray(img);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%
%增加多尺度地判断
%多尺度参考
%deciding downsample parameters.
edgesize = min(size(img,2),size(img,1));%最短的边
maxdownsize = log(feature_params.template_size/edgesize)/log(0.9);%可以降多少次采样?
cur_bboxes = [];
cur_confidences = [];
cur_image_ids = [];
for downsize = 0:1:maxdownsize %min(maxdownsize,0) %0开始 step为1
downsample_scale = realpow(0.9,downsize); %0.7 的downsize次方
timg = imresize(img,downsample_scale);%降采样
%使用hog来表示image,如何滑动窗口(模板)检测confidences
%hogs = vl_hog(img,feature_params.hog_cell_size);
hogs = vl_hog(timg,feature_params.hog_cell_size);
max_j = size(hogs,1) - 5;
max_k = size(hogs,2) - 5;
%blocks=min(floor(size(hogs,1)/6),floor(size(hogs,2)/6));
for j = 1:max_j
for k = 1:max_k
cur_hog=hogs(j:j+5,k:k+5,:);
cur_hog= reshape(cur_hog,1,1116);
confidence = cur_hog*w + b;
if confidence > 0.5 %计算出来如果是大于阀值(是脸)
x_min = k*6/downsample_scale; %记录窗口的位置
y_min = j*6/downsample_scale;
x_max = (k+5)*6/downsample_scale;
y_max = (j+5)*6/downsample_scale;
cur_bboxes = cat(1,cur_bboxes,[x_min, y_min, x_max, y_max]);
cur_confidences = cat(1,cur_confidences,confidence);
cur_image_ids = cat(1,cur_image_ids,{test_scenes(i).name});
end
end
end
end
if size(cur_bboxes,1)==0
continue;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%
%You can delete all of this below.
% Let's create 15 random detections per image
% cur_x_min = rand(15,1) * size(img,2);
% cur_y_min = rand(15,1) * size(img,1);
% cur_bboxes = [cur_x_min, cur_y_min, cur_x_min + rand(15,1) * 50, cur_y_min + rand(15,1) * 50];
% cur_confidences = rand(15,1) * 4 - 2; %confidences in the range [-2 2]
% cur_image_ids(1:15,1) = {test_scenes(i).name};
%non_max_supr_bbox can actually get somewhat slow with thousands of
%initial detections. You could pre-filter the detections by confidence,
%e.g. a detection with confidence -1.1 will probably never be
%meaningful. You probably _don't_ want to threshold at 0.0, though. You
%can get higher recall with a lower threshold. You don't need to modify
%anything in non_max_supr_bbox, but you can.
[is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(img));
cur_confidences = cur_confidences(is_maximum,:);
cur_bboxes = cur_bboxes( is_maximum,:);
cur_image_ids = cur_image_ids( is_maximum,:);
bboxes = [bboxes; cur_bboxes];
confidences = [confidences; cur_confidences];
image_ids = [image_ids; cur_image_ids];
end
效果: