VIRAT Video Dataset

Purpose and Characteristics
The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets.

Compared to existing datasets, the distinguishing characteristics of the dataset are the following:

  • Realism and natural scenes: Data was collected in natural scenes showing people performing normal actions in standard contexts, with uncontrolled, cluttered backgrounds. There are frequent incidental movers and background activities. Actions performed by directed actors were minimized; most were actions performed by the general population.
  • Diversity: Data was collected at multiple sites distributed throughout the USA. A variety of camera viewpoints and resolutions were included, and actions are performed by many different people.
  • Quantity: Diverse types of human actions and human-vehicle interactions are included, with a large number of examples (>30) per action class.
  • Wide range of resolution and frame rates: Many applications such as video surveillance operate across a wide range of spatial and temporal resolutions. The dataset is designed to capture these ranges, with 230Hz frame rates and 10200 pixels in person-height. The dataset provides both the original videos with HD quality and downsampled versions both spatially and temporally.
  • Ground and Aerial Videos: Both ground camera videos and aerial videos are collected released as part of VIRAT Video Dataset.

VIRAT Video Dataset will contain two broad categories of activities (single-object and two-objects) which involve both human and vehicles. Details of included activities, and annotation formats may differ per release. Relevant information can be found from each release information.

Release News
2012 Jan 11th: We are glad to announce that Version 2.0 of VIRAT Public Dataset is updated with Aerial video subsets.
- Currently, only videos are available. Annotations will be available soon.

2011 Oct 4th: We are glad to announce that Version 2.0 of VIRAT Public Dataset is released with Ground video subsets.

The main characteristics of this new version are as follows:
- All videos are Stationary Ground Videos.
- Large amount of data: total ~8.5 hours of HD videos
- Total 12 event types annotated, from videos from 11 different outdoor scenes.
- Includes suggested evaluation metrics and methodologies (data folds for cross-validation etc)

For more detailed information, download from the links provided below.

  • Data Release 2.0 (2011 Oct)
Data description (Updated 2011-03-14)
Instruction for Data Download, MIDAS FAQ

Citation Information
If you make use of the VIRAT Video Dataset, please use the following citation (with release version information):

"A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video" by Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J.K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xiaoyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai, in Proceedings of IEEE Comptuer Vision and Pattern Recognition (CVPR), 2011.

Previous Releases
Release 1.0

Support & Contact
A dedicated e-mail list to share information and report issues about the dataset can be found here. Please subscribe the list for announcements and Q&A.

Acknowledgements
The VIRAT Video Dataset collection work is supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-08-C-0135. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.

Disclaimer:
The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.