CLEU: Cross-Lingual Embeddings Utility

CLEU is a a high-level easy to use python library that can be used to analyze cross-lingual word embeddings on multiple languages, In addition CLEU provide some interactive visualization, where users can explore the cross-lingual word embeddings on two or more languages.

Requirements

altair==4.1.0
faiss==1.7.1.post2
matplotlib==3.4.3
numpy==1.20.3
pandas==1.3.2
scikit-learn==0.24.2
scipy==1.7.1
seaborn==0.11.2
umap-learn==0.5.1

Installation

CLEU is on PyPI, so you can use pip to install it, Currently only python 3 is supported.

pip install cleu

Example

The code demonstrate how to use CLEU from loading embeddings, querying neighbours on different languages and visualizing the embeddings.

Example: Loading the Embeddings

Load cross-lingual word embeddings for English (en), Indonesian (id) and Spanish

from cleu.embeddings import Embeddings

clwe_en = Embeddings(dim=300,lang='en',cuda=False)
clwe_en.load_embeddings('data/vectors-en.txt',lang='en',max_vocab=10000)

clwe_id = Embeddings(dim=300,lang='id',cuda=False)
clwe_id.load_embeddings('data/vectors-id.txt',lang='id',max_vocab=10000)


clwe_es = Embeddings(dim=300,lang='es',cuda=False)
clwe_es.load_embeddings('data/vectors-es.txt',lang='es',max_vocab=10000)

Example: Accessing the Embeddings property

Accessing the Embeddings, more property can be seen in Embedding Class

# There are two ways to get the Embeddings, by word or id
# emb_you = clwe_en.get_embedding_by_id(0)
emb_you = clwe_en.get_embedding_by_word("you")
print(emb_you.vector, emb_you.id)

Example: Querying Nearest Neighbours Embeddings

Find nearest neighbour of Embedding you (en) in English word embeddings

neighbours_you_en = clwe_en.get_nearest_neighbours(emb_you,k=5,distance_function='cosine')
neighbours_you_en = clwe_en.get_nearest_neighbours(emb_you,k=5,distance_function='csls',csls_k=10)

Find nearest neighbour of Embedding you (en) in Indonesian word embeddings

neighbours_you_id = clwe_id.get_nearest_neighbours(emb_you,k=5,distance_function='cosine')
neighbours_you_id = clwe_id.get_nearest_neighbours(emb_you,k=5,distance_function='csls',csls_k=10)

Example: Plotting Module

Import the plotting module

from cleu.plot import plot_embeddings

Plot Similarity Matrix between List of Embedding in English and Indonesian word embeddings

words_id = ["makan","memakan","minum","jalan","hipotesis","kucing","belajar","uang","obligasi"]
words_en = ["eat","eating","drink","walk","hypothesis","cat","study","money","bonds"]
# Map each list of word to list of Embedding
emb_id = list(map(lambda word : clwe_id.get_embedding_by_word(word) ,words_id))
emb_en = list(map(lambda word : clwe_en.get_embedding_by_word(word) ,words_en))
# plot_embeddings.plot_similarity(emb_id,emb_en,distance_function='csls',csls_k=3)
plot_embeddings.plot_similarity(emb_id,emb_en,distance_function='cosine')

Reduce Dimensionality and plot English, Indonesian and Spanish cross-lingual word embeddings on 2d plane

import altair as alt
# If you have more than 5000 embedding, uncomment the line below 
# alt.data_transformers.disable_max_rows()
plot_embeddings.plot_embeddings_2d([clwe_en,clwe_id,clwe_es],width=800,height=500,dimensionality_reduction='umap')

Find English words nearest neighbours in Indonesian and Spanish cross-lingual word embeddings and plot it on 2d plance

words_en = ["hypothesis","cat","study","comic"]
emb_en = list(map(lambda word : clwe_en.get_embedding_by_word(word) ,words_en))
plot_embeddings.plot_embeddings_neighbours(emb_en,[clwe_id,clwe_es],width=800,height=500,dimensionality_reduction='umap',k=3,distance_function='csls',csls_k=10)

Support

Open bug reports and feature requests on GitHub issues.

License

CLEU is licensed under the MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.vscode		.vscode
cleu		cleu
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLEU: Cross-Lingual Embeddings Utility

Requirements

Installation

Example

Example: Loading the Embeddings

Example: Accessing the Embeddings property

Example: Querying Nearest Neighbours Embeddings

Example: Plotting Module

Support

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Languages

License

williammulianto/cleu

Folders and files

Latest commit

History

Repository files navigation

CLEU: Cross-Lingual Embeddings Utility

Requirements

Installation

Example

Example: Loading the Embeddings

Example: Accessing the Embeddings property

Example: Querying Nearest Neighbours Embeddings

Example: Plotting Module

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Languages

Packages