The UCI Machine Learning Repository is one of the most popular public source of data sets for machine learning. A large amount of machine learning papers use UCI data sets as experimental data. However, it is not well-known how much diversity is available across the UCI data sets. Yet, such information is critical to 1) understand the behavior of learning algorithms, and 2) understand whether meaningful meta-level information may be extracted to support meta-learning research. We propose to fill this gap by examining over 60 UCI data sets, and characterizing them using statistics, information theoretic measures and landmarkers. We then discuss how meta-information extracted from these data sets correlates with predictive accuracy.

