这是indexloc提供的服务,不要输入任何密码
Skip to content

Clarification on "Cluster sizes in a t-SNE plot mean nothing" #13

@luanamarinho

Description

@luanamarinho

Dear Team,

First, I would like to express my appreciation for your contributions to the understanding of t-SNE. I'm beyond grateful for the insights shared thus far.

That said, I would like to seek clarification on the statement: "The t-SNE algorithm adapts its notion of 'distance' to regional density variations in the data set. As a result, it naturally expands dense clusters, and contracts sparse ones, evening out cluster sizes."

I’m struggling to understand the connection between this statement and the actual functioning of the algorithm. I’d appreciate your thoughts and would be happy to hear if there are any misunderstandings on my part.

To explain my perspective: t-SNE, like SNE, converts distances into similarities. By setting the (single) perplexity parameter, the local densities are somewhat equalized. Specifically, in Barnes-Hut-SNE, the local densities are effectively uniform, since 3×perplexity=nbr_NN for each data point. However, I don't see how this translates to equalizing cluster sizes. Not when considering the number of nodes, nor when analyzing the pairwise distances in the resulting low-dimensional map.

In fact, the use of a t-Student distribution in the low-dimensional space typically results in stretched distances. To match qij in the low-dimensional space with pij​ in the high-dimensional space, the distances d(yi,yj) between points in the map are larger than their high-dimensional counterparts. This observation leads me to question the idea that t-SNE “equalizes” cluster sizes: or "expands dense clusters, and contracts sparse ones", as stated in the documentation.

I look forward to hearing your thoughts on this, and I’m open to any points that might help resolve my issue.

Kind regards,
Luana

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions