Skip to content

N-Raghav/Mono-Sense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mono-Sense

End-to-end autonomous vehicle scene understanding: monocular depth, multi-class detection, 3D tracking, velocity estimation, and collision prediction.

Mono-sense side-by-side comparison

MonoSense Demo
▶ Click to watch the full Product Pitch on YouTube

---

Directory Structure

Code/
├── config.py                                  # Camera calibration constants (front/left/right/back fisheye)
├── load_camera_params.py                      # Calibration loader utility
├── process_image.py                           # Single-image detection (YOLO + Depth Anything)
├── process_video.py                           # Single-video detection (YOLO + Depth Anything)
├── make_comparison.py                         # Side-by-side output comparison
├── run_pipeline.sh                            # Main pipeline orchestrator (Phases 3b–3h)
├── run_slurm.sh                               # SLURM job submission wrapper
│
├── Depth-Anything-V2/                         # Depth estimation module
│   ├── run.py                                 # Single-image depth inference
│   ├── run_video.py                           # Video-to-depth inference
│   ├── test_batch_depth.py                    # Batch depth testing
│   ├── app.py                                 # Interactive Gradio app
│   ├── requirements.txt
│   └── depth_anything_v2/                     # Model code (DPT + DINOv2 backbone)
│
├── Phase2and3/src/                            # Phase 2 master detection pipeline
│   ├── detect_all.py                          # Master script: depth + pose + objects + lanes
│   ├── depth_estimator.py                     # Depth Anything V2 wrapper
│   ├── object_detector.py                     # YOLOWorld open-vocabulary detector
│   ├── pose_estimator.py                      # YOLOv8-pose pedestrian keypoint estimator
│   ├── road_marking_detector.py               # HSV-based road arrow detector
│   ├── pick_best_frames.py                    # Frame quality selector
│   └── lane_detection/                        # UFLD v2 (CULane ResNet-34) lane detector
│
├── fcos3d/                                    # FCOS3D monocular 3D object detection
│   └── fcos3d_test.py                         # FCOS3D inference (3D bbox, yaw, score)
│
├── fcos_smoothing/                            # FCOS3D trajectory post-processing
│   ├── smooth_trajectories.py                 # EMA smoothing on 3D box sequences
│   └── script.sh
│
├── vehicle_tracking_fcos/                     # Phase 3: velocity estimation & collision
│   ├── ego_motion_estimator.py                # Ego-motion via KLT optical flow + depth
│   ├── absolute_velocity_estimator.py         # Vehicle velocity (BoT-SORT + EMA)
│   ├── smooth_velocities.py                   # Temporal velocity smoothing (moving avg)
│   ├── pedestrian_velocity_estimator.py       # Pedestrian absolute velocity (ego-relative)
│   ├── collision_detector.py                  # Time-to-collision prediction (5 s horizon)
│   └── visualize_velocity.py                  # Velocity vector + collision overlay on video
│
├── traffic_light_detection.py                 # Emissive LED traffic light classifier
└── traffic_sign_detection/                    # LISA-dataset YOLO traffic sign detector
    ├── process_all_scenes.py                  # Batch scene processor
    └── traffic_sign_video_inference.py        # Per-video inference

Pipeline Overview

Phase 2 — Per-Frame Scene Understanding

Run via Phase2and3/src/detect_all.py. Produces one JSON file per frame.

Step Module Method Output fields
Depth estimation depth_estimator.py Depth Anything V2 (vitl, 518 px input) depth_map (H×W float32, metres)
Vehicle/pedestrian detection Pre-computed Phase 1 JSON objects[] (class, bbox, position_3d)
Misc object detection object_detector.py YOLOWorld (open-vocab) objects[] + traffic cones, poles, signs
Pedestrian pose pose_estimator.py YOLOv8-pose, 17 COCO keypoints keypoints_3d[] per pedestrian
Lane detection lane_detection/ UFLD v2, CULane ResNet-34, Bézier fit lanes[] (points_3d, confidence)
Road markings road_marking_detector.py White/yellow HSV + morphology road_markings[] (type, bbox_2d, position_3d)
Traffic lights traffic_light_detection.py HSV + CLAHE, 6× upscale traffic_lights[] (state, confidence, position_3d)
Speed bumps detect_all.py YOLOv5n TF SavedModel (640×640) speed_bumps[] (bbox_2d, position_3d)
Vehicle sub-type detect_all.py YOLOv8n-cls, ImageNet mapping subtype per vehicle (sedan/suv/pickup)
Brake lights detect_all.py Red HSV on lower half of vehicle crop brake_light bool per vehicle

Usage:

python Phase2and3/src/detect_all.py \
  --video path/to/scene.mp4 \
  --detections_dir Phase1/JSONFiles/scene1/ \
  --out_dir output/scene1/ \
  --step 1

Phase 3 — 3D Tracking, Velocity & Collision

Orchestrated by run_pipeline.sh. Steps run sequentially per scene.

Video
  └─► [3a] FCOS3D 3D Detection          fcos3d/fcos3d_test.py
        └─► [3b] Trajectory Smoothing    fcos_smoothing/smooth_trajectories.py
              └─► [3c] Ego-Motion        vehicle_tracking_fcos/ego_motion_estimator.py
                    └─► [3d] Vehicle Velocity   absolute_velocity_estimator.py
                          └─► [3e] Velocity Smoothing  smooth_velocities.py
                                └─► [3f] Pedestrian Velocity  pedestrian_velocity_estimator.py
                                      └─► [3g] Collision Detection  collision_detector.py
                                            └─► [3h] Visualization  visualize_velocity.py

Step Details

3a — FCOS3D Inference (fcos3d/fcos3d_test.py)

python fcos3d/fcos3d_test.py \
  --video scene.mp4 \
  --config fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py \
  --checkpoint fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth \
  --calib-mat <path/to/front.mat/derived/from/calibration.mat> \
  --run-name scene1 --skip-frames 0

Output: frame_XXXXXX.json per frame with objects[]{label, score, bbox_3d[x,y,z,w,h,l,yaw]}

3b — Trajectory Smoothing (fcos_smoothing/smooth_trajectories.py)

python fcos_smoothing/smooth_trajectories.py \
  --input-dir fcos_output/scene1/ \
  --output-dir fcos_output/scene1_smoothed/ \
  --ema-alpha-pos 0.35 --ema-alpha-size 0.2 \
  --distance-thresh 4.0 --max-missed 2 --interp-gap 2 --fps 10.0

Tracking via 2D bbox IoU; EMA on position, size, and yaw; linear interpolation across gaps ≤ 2 frames.

3d — Vehicle Velocity (vehicle_tracking_fcos/absolute_velocity_estimator.py)

python vehicle_tracking_fcos/absolute_velocity_estimator.py \
  --input-dir fcos_output/scene1_smoothed/ \
  --output-dir velocity_output/scene1/

Uses YOLOv8n + BoT-SORT for track IDs, ego-corrected Δposition/Δt per track. Output fields per object: track_id, speed_m_s, velocity_vector [vx,vy,vz], position_3d.

3e — Velocity Smoothing (vehicle_tracking_fcos/smooth_velocities.py)

python vehicle_tracking_fcos/smooth_velocities.py \
  --input-dir velocity_output/scene1/ \
  --output-dir smoothed_velocities/scene1/ \
  --window-size 30 --distance-thresh 5.0

3f — Pedestrian Velocity (vehicle_tracking_fcos/pedestrian_velocity_estimator.py)

python vehicle_tracking_fcos/pedestrian_velocity_estimator.py \
  --ego-vel-dir smoothed_velocities/scene1/ \
  --out-dir pedestrian_velocities/scene1/ \
  --fps 36.0 --pos-window 10 --vel-window 10

Detects pedestrians by label == 7. Absolute velocity = relative velocity − ego velocity.

3g — Collision Detection (vehicle_tracking_fcos/collision_detector.py)

python vehicle_tracking_fcos/collision_detector.py \
  --input-dir pedestrian_velocities/scene1/ \
  --output-dir collision_results/scene1/ \
  --time-horizon 5.0 --radius 2.5 --parked-thresh 1.0

Predicts closest approach via linear extrapolation over a 5-second horizon. Objects with speed < 1.0 m/s are classified as parked and excluded. Collision flag per object smoothed by majority vote across frames.

3h — Visualization (vehicle_tracking_fcos/visualize_velocity.py)

python vehicle_tracking_fcos/visualize_velocity.py \
  --video scene.mp4 \
  --velocity-dir collision_results/scene1/ \
  --calib-mat <path/to/front.mat/derived/from/calibration.mat> \
  --out-video annotated_scene1.mp4

Coordinate Conventions

Space X Y Z
Camera (raw) Right Down Forward
Phase 2 JSON output Right Up −Forward
Depth range (valid) 0.5 – 120 m

Key Dependencies

Package Used by
torch, torchvision Depth Anything V2, UFLD, FCOS3D
ultralytics YOLOWorld, YOLOv8-pose, YOLOv8n-cls, YOLOv8n (BoT-SORT)
mmdet3d FCOS3D inference
opencv-python All video I/O, HSV detectors, KLT flow
tensorflow Speed bump YOLOv5n SavedModel
scipy, scikit-image Morphology, Bézier fitting
transformers Depth Anything V2 metric model

Install Phase 2/3 dependencies:

pip install torch torchvision ultralytics opencv-python scipy scikit-image transformers
# For FCOS3D: follow MMDetection3D install guide (use a separate virtual environment)

Camera Calibration

Intrinsic parameters for all four cameras are defined in config.py as dicts with keys: fx, fy, cx, cy, k1, k2, p1, p2 (distortion).

Used by load_camera_params.py to construct the 3×3 camera matrix K passed to all 3D projection steps.

About

Tesla-inspired autonomous driving visualization using monocular camera perception — lane detection, 3D object placement, pedestrian pose estimation, and depth-based scene reconstruction rendered in Blender.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors