SLAM、3D环境理解与重建，物体追踪_综合

一、DeepSort

匈牙利算法可以告诉我们当前帧的某个目标，是否与前一帧的某个目标相同。
在DeepSORT中，匈牙利算法用来将前一帧中的跟踪框tracks与当前帧中的检测框detections进行关联，通过外观信息（appearance information）和马氏距离（Mahalanobis distance），或者IOU来计算代价矩阵。
卡尔曼滤波可以基于目标前一时刻的位置，来预测当前时刻的位置，并且可以比传感器（在目标跟踪中即目标检测器，比如Yolo等）更准确的估计目标的位置。
在目标跟踪中，需要估计track的以下两个状态：
均值(Mean)：表示目标的位置信息，由bbox的中心坐标 (cx, cy)，宽高比r，高h，以及各自的速度变化值组成，由8维向量表示为 x = [cx, cy, r, h, vx, vy, vr, vh]，各个速度值初始化为0。
协方差(Covariance )：表示目标位置信息的不确定性，由8x8的对角矩阵表示，矩阵中数字越大则表明不确定性越大，可以以任意值初始化。
卡尔曼滤波分为两个阶段：(1) 预测track在下一时刻的位置，(2) 基于detection来更新预测的位置。
DeepSort工作流程
DeepSORT对每一帧的处理流程如下：

检测器得到bbox → 生成detections → 卡尔曼滤波预测→ 使用匈牙利算法将预测后的tracks和当前帧中的detecions进行匹配（级联匹配和IOU匹配） → 卡尔曼滤波更新

Frame 0：检测器检测到了3个detections，当前没有任何tracks，将这3个detections初始化为tracks
Frame 1：检测器又检测到了3个detections，对于Frame 0中的tracks，先进行预测得到新的tracks，然后使用匈牙利算法将新的tracks与detections进行匹配，得到(track, detection)匹配对，最后用每对中的detection更新对应的track

首先对基于外观信息的马氏距离计算tracks和detections的代价矩阵，然后相继进行级联匹配和IOU匹配，最后得到当前帧的所有匹配对、未匹配的tracks以及未匹配的detections

外观特征提取网络——小型的残差网络。该网络接受reshape的检测框（大小为128x64，针对行人的）内物体作为输入，返回128维度的向量表示。

# tracker.py
def _match(self, detections):def gated_metric(racks, dets, track_indices, detection_indices):"""基于外观信息和马氏距离，计算卡尔曼滤波预测的tracks和当前时刻检测到的detections的代价矩阵"""features = np.array([dets[i].feature for i in detection_indices])targets = np.array([tracks[i].track_id for i in track_indices]# 基于外观信息，计算tracks和detections的余弦距离代价矩阵cost_matrix = self.metric.distance(features, targets)# 基于马氏距离，过滤掉代价矩阵中一些不合适的项 (将其设置为一个较大的值)cost_matrix = linear_assignment.gate_cost_matrix(self.kf, cost_matrix, tracks, dets, track_indices, detection_indices)return cost_matrix# 区分开confirmed tracks和unconfirmed tracksconfirmed_tracks = [i for i, t in enumerate(self.tracks) if t.is_confirmed()]unconfirmed_tracks = [i for i, t in enumerate(self.tracks) if not t.is_confirmed()]# 对confirmd tracks进行级联匹配matches_a, unmatched_tracks_a, unmatched_detections = \linear_assignment.matching_cascade(gated_metric, self.metric.matching_threshold, self.max_age,self.tracks, detections, confirmed_tracks)# 对级联匹配中未匹配的tracks和unconfirmed tracks中time_since_update为1的tracks进行IOU匹配iou_track_candidates = unconfirmed_tracks + [k for k in unmatched_tracks_a ifself.tracks[k].time_since_update == 1]unmatched_tracks_a = [k for k in unmatched_tracks_a ifself.tracks[k].time_since_update != 1]matches_b, unmatched_tracks_b, unmatched_detections = \linear_assignment.min_cost_matching(iou_matching.iou_cost, self.max_iou_distance, self.tracks,detections, iou_track_candidates, unmatched_detections)# 整合所有的匹配对和未匹配的tracksmatches = matches_a + matches_bunmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))return matches, unmatched_tracks, unmatched_detections# 级联匹配源码 linear_assignment.py
def matching_cascade(distance_metric, max_distance, cascade_depth, tracks, detections, track_indices=None, detection_indices=None):...unmatched_detections = detection_indicematches = []# 由小到大依次对每个level的tracks做匹配for level in range(cascade_depth):# 如果没有detections，退出循环if len(unmatched_detections) == 0:  break# 当前level的所有tracks索引track_indices_l = [k for k in track_indices if tracks[k].time_since_update == 1 + level]# 如果当前level没有track，继续if len(track_indices_l) == 0: continue# 匈牙利匹配matches_l, _, unmatched_detections = min_cost_matching(distance_metric, max_distance, tracks, detections, track_indices_l, unmatched_detections)matches += matches_lunmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))return matches, unmatched_tracks, unmatched_detections

卡尔曼滤波更新阶段
对于每个匹配成功的track，用其对应的detection进行更新，并处理未匹配tracks和detections：

# tracker.py
def update(self, detections):"""Perform measurement update and track management.Parameters----------detections: List[deep_sort.detection.Detection]A list of detections at the current time step."""# 得到匹配对、未匹配的tracks、未匹配的dectectionsmatches, unmatched_tracks, unmatched_detections = self._match(detections)# 对于每个匹配成功的track，用其对应的detection进行更新for track_idx, detection_idx in matches:self.tracks[track_idx].update(self.kf, detections[detection_idx])# 对于未匹配的成功的track，将其标记为丢失for track_idx in unmatched_tracks:self.tracks[track_idx].mark_missed()# 对于未匹配成功的detection，初始化为新的trackfor detection_idx in unmatched_detections:self._initiate_track(detections[detection_idx])...

参考：https://zhuanlan.zhihu.com/p/202993073

二、SLAM

**SLAM是指当某种移动设备（如机器人、无人机、手机等）从一个未知环境里的未知地点出发，在运动过程中通过传感器（如激光雷达、摄像头等）观测定位自身位置、姿态、运动轨迹，再根据自身位置进行增量式的地图构建，从而达到同时定位和地图构建的目的。**定位和建图是两个相辅相成的过程，地图可以提供更好的定位，而定位也可以进一步扩建地图。需要说明的是，上述扫地机器人例子中，定位和建图是SLAM的基本要求，而路径规划是在此基础上的高级功能，不属于SLAM的讨论范畴。

SLAM的应用
SLAM所使用的传感器主要分为激光雷达和视觉两大类。在SLAM研究史上，早期SLAM研究几乎全使用激光雷达作为传感器，其优点是精度高，解决方案相对成熟。但是缺点也非常明显，比如价格贵、体积大，信息少不够直观等。

视觉SLAM就是用摄像头作为主传感器，用拍摄的视频流作为输入来实现同时定位与建图。视觉SLAM广泛应用于AR、自动驾驶、智能机器人、无人机等前沿领域。我们知道SLAM的两大核心：定位和建图。

三、VR & AR & MR

虚拟现实 (VR)
VR就是把完全虚拟的世界通过各种各样的头戴显示器（如下图所示）呈现给用户，一般是全封闭的，给人一种沉浸感。

增强现实 (AR)
1、基于标记的增强现实
最早的图案一般都选择二维码来触发AR，因为二维码识别技术非常成熟，简单方便、识别速度快、成功率很高。此外，二维码图案还可以方便的计算镜头位置和方向。
2、基于地理位置服务（LBS）的增强现实
基于LBS的增强现实一般使用嵌入在手机等智能设备中的GPS、电子罗盘、加速度计等传感器来提供位置数据。它最常用于地图类应用
3、基于投影的增强现实
基于投影的增强现实直接将信息投影到真实物体的表面来呈现信息
4、基于场景理解的增强现实
物体识别和场景理解起着至关重要的作用，直接关系到最终呈现效果的真实感。

混合现实 (MR)
AR是把虚拟的东西叠加到真实世界，而MR则是把真实的东西叠加到虚拟世界里。听起来好像是差不多，反正都是把现实和虚拟互相叠加，但其实差别大了，因为把虚拟叠加到现实里比较容易，只需要用计算机生成好虚拟的物体，然后在真实的画面上显示就好了。但要把现实叠加到虚拟里，可就比较难了。因为首先得把现实的东西虚拟化。虚拟化一般使用摄像头来扫描物体进行三维重建，我们都知道摄像头拍摄的画面其实是二维的，也就是画面是扁平的，丢失了深度信息，所以没有立体感，因此需要通过算法把摄像头拍摄的二维的视频进行三维重建，生成虚拟的三维物体，我们称之为真实物体的虚拟化。MR和AR最大的不同就是可以把虚拟化的效果呈现给多人，实现多人交互。