VIRAT Video Dataset
Purpose and Characteristics
The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets.
Compared to existing datasets, the distinguishing characteristics of the dataset are the following:
- Realism and natural scenes: Data was collected in natural scenes showing people performing normal actions in standard contexts, with uncontrolled, cluttered backgrounds. There are frequent incidental movers and background activities. Actions performed by directed actors were minimized; most were actions performed by the general population.
- Diversity: Data was collected at multiple sites distributed throughout the USA. A variety of camera viewpoints and resolutions were included, and actions are performed by many different people.
- Quantity: Diverse types of human actions and human-vehicle interactions are included, with a large number of examples (>30) per action class.
- Wide range of resolution and frame rates: Many applications such as video surveillance operate across a wide range of spatial and temporal resolutions. The dataset is designed to capture these ranges, with 2–30Hz frame rates and 10–200 pixels in person-height. The dataset provides both the original videos with HD quality and downsampled versions both spatially and temporally.
- Ground and Aerial Videos: Both ground camera videos and aerial videos are collected released as part of VIRAT Video Dataset.
VIRAT Video Dataset will contain two broad categories of activities (single-object and two-objects) which involve both human and vehicles. Details of included activities, and annotation formats may differ per release. Relevant information can be found from each release information.
2012 Jan 11th: We are glad to announce that Version 2.0 of VIRAT Public Dataset is updated with Aerial video subsets.
- Currently, only videos are available. Annotations will be available soon.
2011 Oct 4th: We are glad to announce that Version 2.0 of VIRAT Public Dataset is released with Ground video subsets.
The main characteristics of this new version are as follows:
- All videos are Stationary Ground Videos.
- Large amount of data: total ~8.5 hours of HD videos
- Total 12 event types annotated, from videos from 11 different outdoor scenes.
- Includes suggested evaluation metrics and methodologies (data folds for cross-validation etc)
Release 2.0 is described in a PDF available here.
To browse and download Release 2.0, please click here.
To browse and download all releases of the data, including prior releases, please click here
If you make use of the VIRAT Video Dataset, please use the following citation (with release version information):
"A Large-scale Benchmark Dataset for Event Recognition in
Surveillance Video" by Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J.K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xiaoyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai, in Proceedings of IEEE Comptuer Vision and Pattern Recognition (CVPR), 2011.
Support & Contact
A dedicated e-mail list to share information and report issues about the dataset can be found here. Please subscribe the list for announcements and Q&A.
The VIRAT Video Dataset collection work is supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-08-C-0135. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.
The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.