# How To / Python: Calculate Cosine Distance I/II

Suppose we have some multi-dimensional data at the country level and we want to see the extent to which two countries are similar. One way to do this is by calculating the Cosine distance between the countries. Here you can find a Python code to do just that.

In this code, I use the SciPy library to take advantage of the built-in function cosine. This function provides the result of 1 – Cosine Proximity. This means that the results of this function range from 0 to  2, while Cosine Proximity ranges from -1 to 1.

```import pandas as pd
from scipy.spatial.distance import cosine

'country': ['Argentina', 'Bolivia', 'Brazil', 'Chile', 'Ecuador', 'Colombia', 'Paraguay', 'Peru', 'Venezuela'],
'd1': [0.34, -0.19, 0.37, 1.17, -0.31, -0.3, -0.48, -0.15, -0.61],
'd2': [-0.57, -0.69, -0.28, 0.68, -2.19, -0.83, -0.53, -1, -1.39],
'd3': [-0.02, -0.55, 0.07, 1.2, -0.14, -0.85, -0.9, -0.47, -1.02],
'd4': [-0.69, -0.18, 0.05, 1.43, -0.02, -0.7, -0.72, 0.23, -1.08],
'd5': [-0.83, -0.69, -0.39, 1.31, -0.7, -0.75, -1.04, -0.52, -1.22],
'd6': [-0.45, -0.77, 0.05, 1.37, -0.1, -0.67, -1.4, -0.35, -0.89]}

pairsdict = {
'country2': ['Bolivia', 'Chile', 'Colombia', 'Peru']}

pairs = pd.DataFrame(pairsdict)

#Add data to the country pairs
pairs = pairs.merge(df, how='left', left_on=['country1'], right_on=['country'])
pairs = pairs.merge(df, how='left', left_on=['country2'], right_on=['country'])

#Convert data columns to list in a single cell
pairs['vector1'] = pairs[['d1_x','d2_x','d3_x','d4_x','d5_x','d6_x']].values.tolist()
pairs['vector2'] = pairs[['d1_y','d2_y','d3_y','d4_y','d5_y','d6_y']].values.tolist()

cosinedf = pairs[['country1', 'country2', 'vector1', 'vector2']]

#Calculate Cosine distance
cosinedf['cosine_dist'] = cosinedf.apply(lambda x: round(cosine(x['vector1'], x['vector2']),2), axis=1)

cosinedf = cosinedf[['country1', 'country2', 'cosine_dist']]
print(cosinedf)
```

The df dataframe contains 6 variables for each country. The pairs dataframe contains pairs of countries that we want to compare.

In lines 21-22, we add the the 6 variables (d1d6) to each country of the dyad. In lines 25-26 we convert the 6 columns to one column containing a list with the 6 values of variables d1d6. Finally, in line 31 we apply the cosine function from SciPy to each pair of countries and we store the result in the new column called cosine_dist.

As a result, we get the following table:

```country1, country2, cosine_dist
Argentina, Bolivia, 0.26
Chile, Venezuela, 1.93