Brigham Young University Homepage
Route Y Secure Sign In
College of Physical and Mathematical Sciences College of Physical and Mathematical Sciences

Characterizing UCI Data Sets - Jun won Lee

Personal Information
Primary Presenter First Name: 
Jun won
Primary Presenter Last Name: 
Lee
Abstract Information
Department: 
Computer Science
Faculty Advisor: 
Christophe Giraud-Carrier
Title of Abstract: 
Characterizing UCI Data Sets

The UCI Machine Learning Repository is one of the most popular public source of data sets for machine learning. A large amount of machine learning papers use UCI data sets as experimental data. However, it is not well-known how much diversity is available across the UCI data sets. Yet, such information is critical to 1) understand the behavior of learning algorithms, and 2) understand whether meaningful meta-level information may be extracted to support meta-learning research. We propose to fill this gap by examining over 60 UCI data sets, and characterizing them using statistics, information theoretic measures and landmarkers. We then discuss how meta-information extracted from these data sets correlates with predictive accuracy.

Maintained by the College of Physical and Mathematical Sciences Webmaster
Copyright © 2009. Brigham Young University. All Rights Reserved.