Hamed Barzamini, Mona Rahimi
The inductive nature of artificial neural models makes dataset quality a key factor of their proper functionality. For this reason, multiple research studies proposed metrics to assess the quality of the models’ datasets, such as dataset correctness, completeness, and consistency. However, these studies commonly lack a point of reference against which the proposed quality metrics could be assessed. To this end, this paper proposes a generic process that extracts the necessary knowledge to build a reliable reference point for the purpose of explanation, assessment, and augmentation of the AI-software dataset. This process automatically builds a benchmark specific to the software operational domain, interprets the training and validation datasets of AI-enabled perception software systems, and evaluates the dataset semantic quality and completeness relative to the benchmark. We implemented this process within a framework called Concept Augmentation and Dataset Evaluation (CADE), which leverages a series of novel natural language and image processing techniques to construct a semantic benchmark with respect to the domain specifications. The application of CADE to three commonly-used autonomous driving datasets showed several common weaknesses present in the arbitrarily-collected datasets against the encoded domain specifications, demonstrating dataset divergence from the domain concepts and under-represented variances of the concepts in the data. The qualitative evaluation results showed an average of about 75% relevancy of CADE generated topics.