【虹科方案】 SimData 高保真虛擬數據集 – 基於 aiSim 的自動駕駛多傳感器感知數據方案

Hongke's latest articles

HongKe

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

HONGKE Solution】Automatic Driving Sensory Data Costs Too Much? It's time to use high-fidelity virtual data.

01 Preamble

During the development of an automated driving perception system, the model performance is highly dependent on theLarge-scale, high-quality perceptual datasetsThe following are some of the public datasets widely used in the industry. Publicly available datasets that are currently widely used by the industry include KITTI, nuScenes, Waymo Open Dataset These datasets lay an important foundation for the research and implementation of automatic driving algorithms.

However, building real-world perceptual datasets is not an easy task - not only does it require a significant investment in manpower, resources, and time, but it also faces multiple challenges such as limited data acquisition, privacy and compliance requirements, time-consuming labeling, and difficulty in accessing extreme scenarios.

In this context.High fidelity virtual datasetIt is gradually becoming an important development direction for the research of automatic driving perception algorithms. The virtual data generated by the simulation platform can not only rapidly expand the data scale, but also flexibly construct complex traffic scenes, adverse weather conditions and rare events, providing more comprehensive and diversified training samples for the model.

With this in mind, HONGKE officially launches a new high fidelity virtual data set -- the SimDataThe

SimData relies on Highly accurate physical modeling and realistic visual rendering capabilities of the aiSim simulation platform.It can generate multi-sensor synchronized data (including Camera, LiDAR, Radar, IMU, etc.) to achieve highly consistent multi-modal characteristics with real-world data.

SimData's data structure strictly follows nuScenes Data Set Format SpecificationYou can directly use the official nuscenes-devkit Tools are parsed and visualized, significantly reducing the threshold and integration costs for developers.

In this article, we will introduce the core features and construction process of SimData, and demonstrate its application performance in a typical sensing task; the official version of SimData and the related comparison test report will be released in the near future, so please stay tuned to the latest news of Hongke.

02 SimData Organization Process

Sensor Layout

In the aiSim simulation platform, we strictly reproduce the sensor layout design of the nuScenes dataset to ensure a high degree of consistency in the data structure and multimode synchronization characteristics.

The simulation vehicle is configured as follows:

6-way Surround View Camera (Camera)
5 Radar
1 x Laser Radar (LiDAR)
1 Inertial Measurement Unit (IMU)
1 Positioning System (GPS)

The sampling frequency of the camera and the radar are 40 HzThe sampling frequency of the laser radar is 80 HzIt is designed to meet the needs of synchronized multi-sensor acquisition with high time accuracy.

The spatial layout and orientation of each sensor is shown below:

Coordinate system description

Unlike nuScenes, all sensors in SimData use the FLU (Forward-Left-Up) Coordinate Systemand in the nuScenes dataset, the camera sensor uses the RDF (Right-Down-Forward) Coordinate SystemThe

During the data construction process, all annotation files were subjected to rigorous coordinate conversion and alignment to ensure that the logical definitions were fully consistent with nuScenes.

As a result, users do not need to deal with additional coordinate differences when applying SimData, and their data parsing process and development experience is consistent with nuScenes.

Data Structure Design

The SimData dataset is fully aligned with nuScenes in terms of overall architecture. For developers who are already familiar with nuScenes, they can get started quickly without additional adaptation or learning costs.

The overall directory structure is as follows (consistent with the nuScenes organization):

maps folder

Holds all the high-resolution map image files used in the dataset, which are used to provide geolocation information and scene context references.

samples folder

Stores critical frame data for all types of sensors, including:

6-way camera image (.jpg)
5 Lutra Dot Cloud (.pcd)
1-way laser radar point cloud (.bin)

每 0.5 seconds Saves a frame as a keyframe.

sweeps folder

Saves continuous sensor data except for key frames, which is used to construct timing information and multi-frame fusion tasks.

v1.0-* Folder

Storage of sensor annotations and metadata information, all utilizing .json Format, Coverage:

Time Stamp
Attitude Parameters
Labeling
Scene Description

The JSON association structure is identical to that of nuScenes.

In SimData, each block of data and information is identified by a globally unique identifier. UUID (Universally Unique Identifier) Marked as a token.

These tokens form a bridge between different data. Users can access most of the tokenized and structured information through the following three core files:

sample.json
sample_data.json
sample_annotation.json

sample.json

Records the basic information of the Keyframe:

Each keyframe corresponds to a sample_token.
Query the scene by scene_token.
Provide prev / next token for constructing continuous frame relationship.

sample_data.json

Includes details of the multi-sensor data for the corresponding frame:

ego_pose_token: corresponds to the vehicle pose in ego_pose.json.
calibrated_sensor_token: corresponds to the sensor calibration parameter (internal and external)
filename: path of the original data file.
height / width (if image)
timestamp (microseconds)
is_key_frame (boolean)
next / prev (temporal association)

sample_annotation.json

Records information about the target object in each keyframe, including:

instance_token (unique token for the target)
category_token (category information)
visibility_token (visibility level)
Geometry and posture information:
- Translation (center point)
- size
- rotation (Quaternion)
Dot cloud statistics:
- num_lidar_pts
- num_radar_pts
- Front and back frame token association

03 Examples of SimData and Perceptual Modeling Applications

SimData is available directly through nuscenes-devkit parsing, which is used in exactly the same way as nuScenes:

from nuscenes.nuscenes import NuScenes
nusc = NuScenes(version=‘v1.0-custom’, dataroot=data_path, verbose=True)

Analysis and modeling can be done using official tools. Combine with cv2 or matplotlib for data visualization, including:

6-Camera GT Frame Output
Synchronized LiDAR Dot Cloud
BEV Viewing Angle Marker Display

BEVFormer test results

Direct inference using BEVFormer-Tiny weights trained on nuScenes (not retrained on SimData) to verify data availability.

Official Formula Library
https://github.com/fundamentalvision/BEVFormer/tree/master
Links
https://arxiv.org/pdf/2203.17270

04 Conclusion

This paper illustrates the importance of virtual datasets in the study of autonomous driving perception and presents a high-fidelity virtual perception dataset generated based on the aiSim simulation platform -- the SimDataThe

The article describes in detail the data structure and usage of SimData, and proves its usability and effectiveness by testing and validating it with an open-source perceptual model.

In the future, the HONGKEI team will release more detailed test and comparison reports to further validate the high consistency between SimData and real datasets.

Through this series of work, we have not only verified the high fidelity of the aiSim simulation environment, but also provided a set of high-quality, easy-to-use, and scalable virtual perceptual data resources for researchers and developers, which will continue to help the research and model training of automatic driving perception algorithms.

Other Articles

Hongke Case

[Hongke Case Study] Servers Australia Implements KnowBe4 to Achieve a 90% Training Completion Rate

Servers Australia, an Australian cloud hosting provider, had previously been able to only reactively handle cybersecurity incidents, with high risks of internal threats and phishing attacks. After implementing the KnowBe4 cybersecurity awareness training platform, the company established a company-wide cybersecurity culture through simulated phishing tests, comprehensive training resources, and employee risk analysis. Training completion rates exceeded 90%, effectively reducing human-caused security vulnerabilities and shifting the team from reactive remediation to proactive defense against ransomware and social engineering threats.

HongKeTechnology July 20, 2026

Hongke Dry Goods

[Hongke Insights] From “Cables” to “Starry Sky”: Hongke Skydel Anechoic Launches a New Paradigm for GNSS Spatial Radiation Testing

To address the need for high-precision navigation in the low-altitude economy (UAV/eVTOL) in Hong Kong and Southeast Asia, Hongke has launched the all-new Skydel Anechoic spatial physical field and phase angle simulation system. Breaking free from the limitations of traditional RF cables, this system perfectly replicates a three-dimensional sky environment within a microwave anechoic chamber (OTA), enabling precise verification of CRPA antenna interference resistance, antenna radome phase distortion, and spatial angle of arrival (AoA)!

HongKeTechnology July 15, 2026

Hongke Case

[Hongke Solutions] Hongke PCAN-M.2 Interface Card – Case Study: L4 Autonomous Vehicle On-Board Communication Solution

This article explores how Hongke’s PCAN-M.2 four-channel CAN FD interface card can be deeply integrated into ADLINK’s autonomous driving ECU to create a high-bandwidth, low-latency, ISO automotive-grade, highly reliable in-vehicle communication solution for Level 4 autonomous shuttle buses. Click to read the full B2B technical case study and architecture analysis!

HongKeTechnology July 13, 2026

Data Security and Compliance

High-Performance Data and Automation

ADAS Simulation and Testing Framework

In-Vehicle Network Communications

Signal Analysis and Sensing

Industrial Internet of Things and Digital Factories

AI Machine Vision

Automation Control

Pharmaceutical Cold Chain and Environmental Monitoring

Laboratory Automation and Microfluidics

Environmental Monitoring and Facility Management

Critical Infrastructure Communications and Remote Collaboration

Professional Electronic Testing and Measurement

Enterprise Cloud IT Solutions

Test Measurement

Automotive Electronics

Optical Inspection

VUZIX Industrial AR

Biomedicine

Industrial Internet of Things

Visual Inspection

Industrial Measurement

Autopilot

Hongke's latest articles

HongKe

HONGKE Solution】Automatic Driving Sensory Data Costs Too Much? It's time to use high-fidelity virtual data.

01 Preamble

02 SimData Organization Process

Sensor Layout

Coordinate system description

Data Structure Design

03 Examples of SimData and Perceptual Modeling Applications

BEVFormer test results

04 Conclusion

Other Articles

[Hongke Case Study] Servers Australia Implements KnowBe4 to Achieve a 90% Training Completion Rate

[Hongke Insights] From “Cables” to “Starry Sky”: Hongke Skydel Anechoic Launches a New Paradigm for GNSS Spatial Radiation Testing

[Hongke Solutions] Hongke PCAN-M.2 Interface Card – Case Study: L4 Autonomous Vehicle On-Board Communication Solution

Hot Products

About Us

Solutions

Other Information

Contact Us

Contact Hongke to help you solve your problems.

Let's have a chat