rdt

RDT: Russian Distributional Thesaurus (Русский Дистрибутивный Тезаурус)

This package let you efficiently use word graph of the Russian Distributional Thesaurus.

Quickstart

Download the pre-packed resource:

wget http://panchenko.me/data/russe/rdt.pkl

Install dependencies, e.g.:

pip install -r requirements.txt

Load the distributional thesaurus (specify path to the downloaded 'rdt.pkl' file):

from dt import RDT, DistributionalThesaurus
rdt = RDT(dt_pkl_fpath="rdt.pkl")

Loading takes about 5 minutes and the resulting structure occupy around 1.3 Gb of RAM. This is however more efficient than parsing the CSV file into a dict in terms of both time and memory consumption. This implementation relies on marisa trie for storing keys and on numpy array for storing similarity scores.

Search for nearest neighbours:

for w,s in rdt.most_similar(u"граф"):
    print w,s

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_python3.py		convert_python3.py
dt.py		dt.py
requirements.txt		requirements.txt
sample_using_rdt.py		sample_using_rdt.py
sample_using_rdt_dict.py		sample_using_rdt_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

convert_python3.py

convert_python3.py

dt.py

dt.py

requirements.txt

requirements.txt

sample_using_rdt.py

sample_using_rdt.py

sample_using_rdt_dict.py

sample_using_rdt_dict.py

Repository files navigation

rdt

Quickstart

About

Releases

Packages

Contributors 2

Languages

License

nlpub/rdt

Folders and files

Latest commit

History

Repository files navigation

rdt

Quickstart

About

Topics

Resources

License

Stars

Watchers

Forks

Languages