Too Far to See? Not Really! — Pedestrian Detection with Scale-aware Localization Policy

Proposed System: an active pedestrian detector that explicitly operates over multi-layer representation and localization policy.

Multi-layer representations of ResNet are respectively utilized to compile pedestrian proposals of different sizes, which are then passed to our localization policy module to produce the final outputs.

Existing System: Meanwhile, initial pedestrian proposals are attained by faster R-CNNs techniques, i.e. region proposal network and follow-up region of interesting pooling layer employed right after the specific ResNet convolutional layer of interest, to produce joint predictions on the bounding-box proposals’ locations and categories (i.e. pedestrian or not). 

Methodology:


 A high-level overview of our approach is displayed in Figure, which consists of two main stages. An input image will first pass through an initialization stage to make ready the multi-layer feature representations and the initial pedestrian proposals (of ResNet are respectively utilized to compile pedestrian proposals of different sizes), they are then fed into an active detection stage where a dedicated localization policy (final outputs) is engaged to produce the final bbox predictions by executing sequences of coordinate transformation actions.

ResNet: An input image is passed through the ResNet layers to form multi-layer feature representations, where bbox proposals of different scales are also generated. The region of interest (RoI) pooling layer of is then utilized to pool the feature maps of each pedestrian proposal into a fixed-length feature vector, which is fed into a fully connected layer. At last, each pedestrian proposal ends up with two outputs, one accounts for classification score, and the other one predicts the bbox position 

Localization Policy: Localization Policy (delivering the final results) that is learned from data by exploiting both local contextual information and multi-layer representations using deep reinforcement learning and recurrent neural network (RNN) techniques. Localization Policy that identifies the bounding-box of an object of interest, as well as by for digit recognition with a recurrent attention mechanism: where at each time step t, both spatial and temporal contextual information are incorporated to decide current transformation action.

Drawback: further address the challenged of detecting heavily occluded pedestrian instances, as well as investigating the ability of our approach in tackling more generic object detection scenarios.

Requirement: ResNet, RNN inbuild/plugin supporting IDE MATLAB/ Java/DotNet.

For additional details comment below with requirements.

Comments

Popular posts from this blog

Secure Data Group Sharing and Dissemination with Attribute and Time Conditions in Public Cloud

Lightweight Fine-Grained Search over Encrypted Data in Fog Computing

A Comprehensive Study on Social Network Mental Disorders Detection via Online Social Media Mining