Monocular 3D Indoor Detection

Overview

The mono3d_indoor_detection package is an indoor object 3D detection algorithm example developed based on the hobot_dnn package. It uses a 3D detection model and indoor data on RDK to run model inference on the BPU and obtain inference results.

Compared with 2D object detection, which can only identify object categories and bounding boxes, 3D object detection can identify precise object positions and orientations. For example, in navigation and obstacle avoidance applications, the rich information provided by 3D object detection algorithms can help planning and control modules achieve better obstacle avoidance.

Supported indoor object detection categories include: charging dock, trash can, and slippers.

Detection results for each category include:

Length, width, height: The length, width, and height of the 3D object (i.e., a hexahedron), in meters.
Rotation: The orientation of the object relative to the camera, in radians, with a range of -π to π, representing the angle between the object's forward direction and the camera coordinate system x-axis in the camera coordinate system.
Depth information: The distance from the camera to the object, in meters.

Code repository: (https://github.com/D-Robotics/mono3d_indoor_detection)

Application scenarios: Monocular 3D indoor detection can directly identify the exact position and orientation of objects in an image, enabling object pose recognition. It is mainly used in autonomous driving, smart home, and related fields.

Monocular 3D vehicle detection example: (https://github.com/RayXie29/Kaggle-Peking-University-Baidu-Autonomous-Driving-32-place-solution)

Supported Platforms

Platform	Runtime	Example Features
RDK X3, RDK X3 Module	Ubuntu 20.04 (Foxy), Ubuntu 22.04 (Humble)	Start MIPI/USB camera and display inference rendering results via Web
RDK X5, RDK X5 Module	Ubuntu 22.04 (Humble)	Start MIPI/USB camera and display inference rendering results via Web
X86	Ubuntu 20.04 (Foxy)	· Start local image playback; save inference rendering results locally

Algorithm Information

Model	Platform	Input Size	Inference FPS
centernet	X3	1x3x512x960	85.93
centernet	X5	1x3x512x960	196.33

Preparation

RDK Platform

RDK has been flashed with the Ubuntu system image.
TogetheROS.Bot has been successfully installed on RDK.

X86 Platform

Ubuntu 20.04 system image has been configured on the X86 environment.
tros.b has been successfully installed on the X86 environment.

Usage

Since 3D detection models are related to camera parameters, different cameras require parameter adjustments.

The monocular 3D indoor detection example package reads local images for detection inference. After algorithm inference, it detects object categories and 3D localization information and publishes algorithm messages containing 3D detection information. Users can subscribe to 3D detection result messages for application development.

RDK Platform

Foxy
Humble

# 配置tros.b环境
source /opt/tros/setup.bash

# 配置tros.b环境
source /opt/tros/humble/setup.bash

# 从tros.b的安装路径中拷贝出运行示例需要的配置文件。
cp -r /opt/tros/${TROS_DISTRO}/lib/mono3d_indoor_detection/config/ .

# 启动launch文件
ros2 launch mono3d_indoor_detection mono3d_indoor_detection.launch.py 

X86 Platform

# 配置tros.b环境
source /opt/tros/setup.bash

# 从tros.b的安装路径中拷贝出运行示例需要的配置文件。
cp -r /opt/tros/${TROS_DISTRO}/lib/mono3d_indoor_detection/config/ .

# 启动launch文件
ros2 launch mono3d_indoor_detection mono3d_indoor_detection.launch.py 

Result Analysis

After the mono3d_indoor_detection package processes one frame of image data, the runtime terminal outputs the following information:

[mono3d_indoor_detection-1] [INFO] [1662612553.868256257] [mono3d_detection]: target type: trash_can
[mono3d_indoor_detection-1] [INFO] [1662612553.868303755] [mono3d_detection]: target type: width, value: 0.236816
[mono3d_indoor_detection-1] [INFO] [1662612553.868358420] [mono3d_detection]: target type: height, value: 0.305664
[mono3d_indoor_detection-1] [INFO] [1662612553.868404002] [mono3d_detection]: target type: length, value: 0.224182
[mono3d_indoor_detection-1] [INFO] [1662612553.868448000] [mono3d_detection]: target type: rotation, value: -1.571989
[mono3d_indoor_detection-1] [INFO] [1662612553.868487790] [mono3d_detection]: target type: x, value: -0.191978
[mono3d_indoor_detection-1] [INFO] [1662612553.868530705] [mono3d_detection]: target type: y, value: -0.143963
[mono3d_indoor_detection-1] [INFO] [1662612553.868570870] [mono3d_detection]: target type: z, value: 0.714024
[mono3d_indoor_detection-1] [INFO] [1662612553.868611119] [mono3d_detection]: target type: depth, value: 0.714024
[mono3d_indoor_detection-1] [INFO] [1662612553.868651409] [mono3d_detection]: target type: score, value: 0.973215
[mono3d_indoor_detection-1] [INFO] [1662612553.868760238] [mono3d_detection]: target type: trash_can
[mono3d_indoor_detection-1] [INFO] [1662612553.868799486] [mono3d_detection]: target type: width, value: 0.253052
[mono3d_indoor_detection-1] [INFO] [1662612553.868842610] [mono3d_detection]: target type: height, value: 0.282349
[mono3d_indoor_detection-1] [INFO] [1662612553.868885191] [mono3d_detection]: target type: length, value: 0.257935
[mono3d_indoor_detection-1] [INFO] [1662612553.868929273] [mono3d_detection]: target type: rotation, value: -1.542728
[mono3d_indoor_detection-1] [INFO] [1662612553.868968855] [mono3d_detection]: target type: x, value: 0.552460
[mono3d_indoor_detection-1] [INFO] [1662612553.869010645] [mono3d_detection]: target type: y, value: -0.164073
[mono3d_indoor_detection-1] [INFO] [1662612553.869050018] [mono3d_detection]: target type: z, value: 1.088358
[mono3d_indoor_detection-1] [INFO] [1662612553.869088767] [mono3d_detection]: target type: depth, value: 1.088358
[mono3d_indoor_detection-1] [INFO] [1662612553.869126765] [mono3d_detection]: target type: score, value: 0.875521

The log excerpt shows the processing result of one frame. The results show that the target type in the subscribed algorithm message is trash_can, along with 3D dimensions, distance, and rotation angle information for trash_can.

The rendering result of reading a local image (you can replace the image by modifying the feed_image field in mono3d_indoor_detection.launch.py) is saved as an image in the result directory where the program runs. The corresponding image inference result and rendering information are shown below:

Overview​

Supported Platforms​

Algorithm Information​

Preparation​

RDK Platform​

X86 Platform​

Usage​

RDK Platform​

X86 Platform​

Result Analysis​

Overview

Supported Platforms

Algorithm Information

Preparation

RDK Platform

X86 Platform

Usage

RDK Platform

X86 Platform

Result Analysis