Paper - DeepSort 논문 리뷰
Paper Review를 작성해 보았다.
DeepSort Paper Review
- 논문 리뷰를 작성해보았는데, 중점적인 내용을 요약 형식으로 작성을 하게 되었다.
ABSTRACT
- Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.
- In this paper, we integrate appearance information to improve the performance of SORT.
- Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches
- During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space.
- Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates
1. INTRODUCTION
-
While achieving overall good performance in terms of tracking precision and accuracy, SORT returns a relatively high number of identity switches. This is, because the employed association metric is only accurate when state estimation uncertainty is low.
-
Overcome this issue by replacing the association metric with a more informed metric that combines motion and appearance information.
2. SORT WITH DEEP ASSOCIATION METRIC
- Adopt a conventional single hypothesis tracking methodology with recursive Kalman filtering and frame-by-frame data association.
2.1. Track Handling and State Estimation
- The track handling and Kalman filtering framework is mostly identical to the original formulation in SORT.
2.2 Assignment Problem
- Into this problem formulation we integrate motion and appearance information through combination of two appropriate metrics.
Mahalanobis distance
- To incorporate motion information we use the (squared) Mahalanobis distance between predicted Kalman states and newly arrived measurements
-
where Denote the projection of the i-th track distribution into measurement space by $(y_i ,S_i)$ and the $j$-th bounding box detection by $d_j$ .
-
The Mahalanobis distance takes state estimation uncertainty into account by measuring how many standard deviations the detection is away from the mean track location.
-
Using this metric it is possible to exclude unlikely associations by thresholding the Mahalanobis distance at a 95% confidence interval computed from the inverse $χ^{2}$ distribution.
- Denote this decision with an indicator that evaluates to 1 if the association between the $i$-th track and $j$-th detection is admissible.
Smallest cosine distance
- second metric measures the smallest cosine distance between the $i$-th track and $j$-th detection in appearance space.
- Apply a pre-trained CNN to compute bounding box appearance descriptors.
- where we call an association admissible if it is within the gating region of both metrics.
-
To build the association problem we combine both metrics using a weighted sum.
-
During our experiments we found that setting $ \lambda $ = 0 is a reasonable choice when there is substantial camera motion.
2.3 Matching Cascade
-
When an object is occluded for a longer period of time, subsequent Kalman filter predictions increase the uncertainty associated with the object location.
-
Therefore, we introduce a matching cascade that gives priority to more frequently seen objects to encode our notion of probability spread in the association likelihood.
3. EXPERIMENTS
-
This is a decrease of approximately 45%.
-
Overall, due to integration of appearance information we successfully maintain identities through longer occlusions.