6. Similarity Functions

In statistics and related fields, a similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. In short, a similarity function quantifies how much alike two data objects are [1].

6.1. Common similarity functions

\begin{align} similarity(X, Y) = d(X, Y) = \sqrt{\sum_{i=1}^n (X_i - Y_i)^2} \tag{1} \end{align}
\begin{align} similarity(X, Y) = d(X, Y) = \sum_{i=1}^n |X_i - Y_i| \tag{2} \end{align}
\begin{align} similarity(X, Y) = d(X, Y) = (\sum_{i=1}^n |X_i - Y_i|^p)^\frac{1}{p} \tag{3} \end{align}
\begin{align} similarity(X, Y) = cos(\theta) = \frac{\vec{X}.\vec{Y}}{\|\vec{X}\|.\|\vec{Y}\|} = \frac{\sum_{i=1}^n X_i.Y_i}{\sqrt{\sum_{i=1}^n X_i^2}.\sqrt{\sum_{i=1}^n Y_i^2}} \tag{4} \end{align}
\begin{align} similarity(X, Y) = \frac{cov(X, Y)}{\sigma_X . \sigma_Y} = \frac{\sum_{i=1}^n (X_i - \bar{X}).(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^n (X_i - \bar{X})^2 . (Y_i - \bar{Y})^2}} \tag{5} \end{align}
\begin{align} similarity(X, Y) = J(X, Y) = \frac{|X \cap Y|}{|X \cup Y|} = \frac{|X \cap Y|}{|X| + |Y| - |X \cap Y|} \tag{6} \end{align}

6.2. Manual examples

Euclidean distance

Manhattan distance

Minkowski distance

Cosine similarity

Pearson similarity

Jaccard similarity

6.3. Sklearn examples

Reference

[1] Wikipedia - Similarity measure.


« Home