Un reciente trabajo publicado a finales del 2007 realiza un estudio de cuáles han sido los 10 algoritmos de Data Mining más exitosos. Según cuentan sus autores, entre los que se encuentran Ross Quinlan -creador de los míticos árboles de decisión ID3 y C4.5- la decisión se llevó a cabo según la opinión de los autores de los artículos premiados y los comités científicos de las conferencias más prestigiosas en Data Mining (IEEE ICDM y KDD), junto con el número de referencias al algoritmo en Google Scholar (mínimo 50).
Y los premiados son…
1. C4.5 (1993)
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.
Google Scholar Count in October 2006: 6907
2. K-Means (1967)
MacQueen, J. B., Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, 1967, pp. 281-297.
Google Scholar Count in October 2006: 1579
3. SVM (1995)
Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.
Google Scholar Count in October 2006: 6441
4. Apriori (1994)
Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int’l Conference on Very Large Databases (VLDB ’94), Santiago, Chile, September 1994.
Google Scholar Count in October 2006: 3639
5. EM (2000)
McLachlan, G. and Peel, D. (2000). Finite Mixture Models. J. Wiley, New York.
Google Scholar Count in October 2006: 848
6. PageRank (1998)
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the Seventh international Conference on World Wide Web (WWW-7) (Brisbane, Australia). P. H. Enslow and A. Ellis, Eds. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, 107-117.
Google Shcolar Count: 2558
7. AdaBoost (1997)
Freund, Y. and Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119-139.
Google Scholar Count in October 2006: 1576
8. K Nearest Neighbours (1996)
Hastie, T. and Tibshirani, R. 1996. Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI). 18, 6 (Jun. 1996), 607-616.
Google SCholar Count: 183
9. Naive Bayes (??)
Hand, D.J., Yu, K., 2001. Idiot’s Bayes: Not So Stupid After All? Internat. Statist. Rev. 69, 385-398.
Google Scholar Count in October 2006: 51
10. CART (1984)
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.
Google Scholar Count in October 2006: 6078