In this thesis Instance Neighbor Entropy INE with weighting was
proposed to estimate the Class Structure Ambiguity (CSA) of class
structures. The main idea of the INE(x)k for one instance x was
to compute the weighted entropy of class probability distribution of
the top k nearest neighbors of that x. The weighting associated with
that entropy was determined according to the inverse of the distance
between the x and the other instances. One instance was seemed as
ambiguous one if most of its neighbors came from the other classes.
Therefore, one class structure might be ambiguous if it contained a
lot of ambiguous instances. To evaluate the effectiveness of the CSA
via INE, the Pearson’s correlation coefficient ρ between the values
of accuracy achieved by SVM classifiers and the values of CSA was
computed and expected to be close -1 (complete negative correlation)
as possible. For experiments, there were two types of datasets. One
was according to some seed points for each class and, for each seed
point, there were a fixed number instances generated randomly under
normal distribution while with class ambiguity under control. The
other was selected from the LIBSVM as read world datasets. Experimental results showed that the evaluation of the CSA via
INE(x)k did reveal the degree of class ambiguous with datasets generated randomly because the values of the ρ almost as -1, and the INE(x)k with weighted entropy evaluated more precisely than that without weighted entropy when with both types of datasets.