Special Issue on Visual Datasets Submission Date: 2024-09-30 Guest Editors
Xin Zhao, University of Science and Technology Beijing, China
Liang Zheng, Australian National University, Australia
Qiang Qiu, Purdue University, USA
Yin Li, University of Wisconsin-Madison, USA
Limin Wang, Nanjing University, China
José Lezama, Google Research, USA
Qiuhong Ke, Monash University, Australia
Yongchan Kwon, Columbia University, USA
Ruoxi Jia, Virginia Tech, USA
Jungong Han, University of Sheffield, UK
Data is the fuel of computer vision, on which state-of-the-art systems are built. A robust computer vision system not only needs a strong model architecture and learning algorithms but also relies on a comprehensive large-scale training set. Despite the pivotal significance of datasets, existing research in computer vision is usually algorithm centric. That is, given fixed training and test data, it is the algorithms or models that are primarily considered for improving. As such, while significant progress has been made in understanding and improving algorithms, there is much less effort in the community made on dataset-level analysis. For example, comparing the number of algorithm-centric works in domain adaptation, the quantitative understanding of the domain gap is much more limited. As a result, there are currently few investigations into the representations of datasets, while in contrast an abundance of literature concerns ways to represent images or videos, essential elements in datasets.
Much benefit can be brought by research centered on datasets. For example, if we can quantify the distribution difference in a more principled way such as end-to-end training, we will have better ideas of how datasets differ from each other and thus be able to design better domain adaptation algorithms. If we can learn to predict the level of labeling noise of a training set, we will be better positioned to design specific noise-resistant learning schemes. Moreover, by quantifying the difficulty of test datasets, it is eventually possible to predict model performance under evolving environments so as to ensure safe applications of AI. This will be made possible through the launching of the label-free model evaluation challenge.
Topics of Interest
This special issue invites original research articles focusing on Visual Datasets. Appropriate submissions include, but are not limited to the following forms or a combination there of:
Properties and attributes of vision datasets: The first and foremost problem is the definition of dataset-level properties. While computer vision typically studies image-level properties, such as image category, human identities, object bounding boxes and region semantics, the dataset-level counterparts are not extensively studied. Examples of such dataset properties include but are not limited to the level of database noise, the extent to which a dataset looks realistic, dataset diversity, bias and fairness, its quality as a training set, and its difficulty level as a test or validation set. Analogous to image-level properties, dataset properties will require dedicated methods to evaluate and will induce the problem of dataset representation learning and end-to-end training.
Application of dataset-level analysis: We find numerous application opportunities for dataset-level analysis. For example, understanding the quality of datasets allows us to better design dataset composition schemes, thus obtaining higher accuracy for given models. This creates interesting directions for the computer vision community, especially considering that datasets for existing tasks are usually fixed, and that dynamic dataset composition would foster new opportunities. Moreover, mining and understanding the content and label bias of datasets will give us a clearer picture of the generalization ability of models trained on such datasets, and subsequently allow us to make corresponding improvements to the datasets. For example, having automated dataset-level metrics could be very beneficial for active learning.
Representations of and similarities between vision datasets: While image representations, being either hand-crafted or deeply learned, have been widely studied, those of datasets are much less investigated. The latter has been largely focused at first- and second- order statistics, and this is somehow analogous to the hand-crafted features in computer vision. In the context of computer vision, it would be very interesting to explore how to extract relevant image characteristics and aggregate them into global set representations. For example, when analyzing label noise levels, it would be beneficial to separate image foreground and backgrounds when computing dataset features. A more exciting topic is to perform (semi-)end-to-end learning, where the entire dataset or its selected parts are fed into neural networks, and thus task-oriented feature representations can be learned.
Improving vision dataset quality through generation and simulation: The community has seen interesting research using synthetic data generated by simulation engines or generative adversarial nets (GANs) or existing real data to compose new training sets. These methods give flexible and inexpensive solutions to various testing scenarios where training data are expensive to collect or corner cases happen.
In summary, the questions related to the proposed Special Issue include but are not limited to:
Can vision datasets be analyzed on a large scale?
How to holistically understand the visual semantics contained in a dataset?
How to define vision-related properties and problems on the dataset level?
How can we improve algorithm design by better understanding vision datasets?
Can we predict the performance of an existing model in a new dataset?
What are good dataset representations? Can they be hand-crafted, learned through neural nets or a combination of both?
How do we measure similarities between datasets and their bias and fairness?
Can we improve training data quality through data engineering or simulation?
How to efficiently create labelled datasets under new environments?
How to create realistic datasets that serve our real-world application purpose?
How can we alleviate the need for large-scale labelled datasets in deep learning?
How to best analyze model performance under various environments without requiring accessing the groundtruth labels?
How to evaluate diffusion models and large language models?
Important Dates
Paper submission deadline: 30 September 2024
First Review decision: 30 December 2024
Revision deadline: 15 March 2025
Final decision: 15 June 2025