这是indexloc提供的服务,不要输入任何密码
Skip to main content
Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

A k-Median Algorithm with Running Time Independent of Data Size

  • Published: July 2004
  • Volume 56, pages 61–87, (2004)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
A k-Median Algorithm with Running Time Independent of Data Size
Download PDF
  • Adam Meyerson,
  • Liadan O'Callaghan &
  • Serge Plotkin 
  • 1118 Accesses

  • 34 Citations

  • 7 Altmetric

  • 1 Mention

  • Explore all metrics

Abstract

We give a sampling-based algorithm for the k-Median problem, with running time O(k \((\frac{{k^2 }}{ \in } \log k)^2 \) log\((\frac{k}{ \in } \log k)\)), where k is the desired number of clusters and ∈ is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1)-approximation, if each cluster in some optimal solution has Ω\((\frac{{n \in }}{k})\) points. We also give weakly-polynomial-time algorithms for this problem and a relaxed version of k-Median in which a small fraction of outliers can be excluded. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.

Article PDF

Download to read the full article text

Similar content being viewed by others

Faster Algorithms for the Constrained k-means Problem

Article 06 November 2017

To Close Is Easier Than To Open: Dual Parameterization To k-Median

Chapter © 2021

K-Medoids Clustering Is Solvable in Polynomial Time for a 2d Pareto Front

Chapter © 2020

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Algorithmic Complexity
  • Algorithms
  • Computational Geometry
  • Computational Complexity
  • Genome assembly algorithms
  • Optimization
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Alon, N., Dar, S., Parnas, M., & Ron, D. (2000). Testing of clustering. In Proc. FOCS.

  • Arora, S., Raghavan, P., & Rao, S. (1998). Approximation schemes for euclidean k-medians and related problems. In Proc. STOC.

  • Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., & Pandit, V. (2001). Local search heuristics for k-median and facility location problems. In Proc. STOC.

  • Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. In Proc. FOCS.

  • Bradley, P. S., Fayyad, U. M., & Reina, C. (1998). Scaling clustering algorithms to large databases. In Proc. KDD.

  • Charikar, M., Chekuri, C., Feder, T., & Motwani, R. (1997). Incremental clustering and dynamic information retrieval. In Proc. STOC.

  • Charikar, M., & Guha, S. (1999). Improved combinatorial algorithms for the facility location and k-median problems. In Proc. FOCS.

  • Charikar, M., Guha, S., Tardos, É., & Shmoys, D. (2002). A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65:1, 129–149.

    Google Scholar 

  • Charikar, M., Khuller, S., Mount, D., & Narasimhan, G. (2001). Algorithms for facility location problems with outliers. In Proc. SODA.

  • Charikar, M., & Panigrahy, R. (2001). Clustering to minimize the sum of cluster diameters. In Proc. 33rd STOC.

  • Dasgupta, S. (1999). Learning mixtures of gaussians. In Proc. FOCS.

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1984). Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Ser. B, 39.

  • Duda, R. O., & Hart, P. E. (1972). Pattern classification and scene analysis. Wiley.

  • Feder, T., & Greene, D. (1988). Optimal algorithms for approximate clustering. In Proc. STOC.

  • Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38, 2–3.

    Google Scholar 

  • Guha, S. (2000). Approximation algorithms for facility location problems. Ph.D. Thesis, Stanford University.

  • Guha, S., Mishra, N., Motwani, R., & O'Callaghan, L. (2000). Clustering data streams. In Proc. FOCS.

  • Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. In Proc. SIGMOD.

  • Hartigan, J. A. (1975). Clustering algorithms. John Wiley and Sons.

  • Hochbaum, D. S., & Shmoys, D. B. (1985). A best possible approximation algorithm for the k-center problem. Mathematics of Operations Research, 10, 180–184.

    Google Scholar 

  • Indyk, P. (1999). Sublinear time algorithms for metric space problems. In Proc. STOC.

  • Jain, K., Mahdian, M., Markakis, E., Saberi, A., & Vazirani, V. (to appear). Greedy facility location algorithms analyzed using dual-fitting with factor-revealing LP. Journal of the ACM.

  • Jain, K., & Vazirani, V. (1999). Primal-dual approximation algorithms for metric facility location and k-median problems. In Proc. FOCS.

  • Joze-Hkajavi, N., & Salem, K. (1998). Two-phase clustering of large datasets. Tech. rpt. CS-98-27, Univ. of Waterloo.

  • Lin, J.-H., & Vitter, J. S. (1992). ε-Approximations with minimum packing constraint violation. In Proc. STOC.

  • Marroquin, J., & Girosi, F. (1993). Some extensions of the k-means algorithm for image segmentation and pattern recognition. AI Memo 1390, MIT AI Lab..

  • Mettu, R., & Plaxton, C. G. (2002). Optimal time bounds for approximate clustering. In Proc. 18th UAI.

  • Meyerson, A. (2001). Online facility location. In Proc. FOCS.

  • Mishra, N., Oblinger, D., & Pitt, L. (2001). Sublinear time approximate clustering. In Proc. SODA.

  • Palmer, C. R., & Faloutsos, C. (2000). Density biased sampling: An improved method for data mining and clustering. In Proc. SIGMOD.

  • Pelleg, D., & Moore, A. W. (1999). Accelerating exact k-means algorithms with geometric reasoning. In Proc. KDD.

  • Pitt, L., & Reinke, R. E. (1988). Criteria for polynomial-time (conceptual) clustering. Machine Learning, 2, 4.

    Google Scholar 

  • Redner, R. A., & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 2.

    Google Scholar 

  • Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. Tech rept. No. 00-034, Univ. of Minn.

  • Thorup, M. (2001). Quick k-median, k-center, and facility location for sparse graphs. In Proc. ICALP.

  • Young, N. E. (2000). k-medians, facility location, and the Chernoff-Wald bound. In Proc. SODA (pp. 86–95).

  • Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. In Proc. SIGMOD (pp. 103–114).

Download references

Authors
  1. Adam Meyerson
    View author publications

    Search author on:PubMed Google Scholar

  2. Liadan O'Callaghan
    View author publications

    Search author on:PubMed Google Scholar

  3. Serge Plotkin
    View author publications

    Search author on:PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meyerson, A., O'Callaghan, L. & Plotkin, S. A k-Median Algorithm with Running Time Independent of Data Size. Machine Learning 56, 61–87 (2004). https://doi.org/10.1023/B:MACH.0000033115.78247.f0

Download citation

  • Issue date: July 2004

  • DOI: https://doi.org/10.1023/B:MACH.0000033115.78247.f0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • clustering
  • sampling
  • sublinear
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

23.94.208.52

Not affiliated

Springer Nature

© 2025 Springer Nature