Friday, June 08, 2007

The treatment of missing values and its effect in the classifier accuracy

Good paper on the effects on missing values in the accuracy of your model. The organization of this paper could improve if the authors would have included their recommendation as part of the Summary.

Nevertheless, this is the crucial recommnedation (p.8): "We recommend that we can deal with datasets having up to 20 % of missing values. For the CD (Complete Deletion) method we have up to 60 % of instances containing missing
values and still have a reasonable performance."

For healthcare, pharma, and biotech data this paper is important because of the complexity and diversity of this data.

Business Analytics

