estnltk.database.database module

class estnltk.database.database.Database(index, doc_type='document', **kwargs)[source]

Database class represents a single index in Elastic and helps with inserting and querying Estnltk documents.

Parameters:

index: str

The name of the Elastic index.

doc_type:

The document type to use (default: ‘document’)

keyword_argument:

All keyword arguments will be passed to Python Elasticsearch constructor.

Attributes

doc_type The doc_type property
es The ElasticSearch instance.
index The name of the index.

Methods

bulk_insert(list_of_texts[, id, refresh]) Generator to use for bulk inserts
close_connection()
count()
delete(index, id)
delete_index() Delete the index.
get(id) Retrieve a document with given id.
insert(text[, id]) Insert a document to index.
query_documents(query[, layer, es_result, ...]) Find all Text documents that match keywords in the query.
query_matches(uqwy[, layer])
refresh() Commit all changes to the index.
update()
bulk_insert(list_of_texts, id=None, refresh=True)[source]

Generator to use for bulk inserts

delete_index()[source]

Delete the index.

doc_type

The doc_type property

es

The ElasticSearch instance.

get(id)[source]

Retrieve a document with given id.

index

The name of the index.

insert(text, id=None)[source]

Insert a document to index.

Parameters:

text: estnltk.text.Text

The text instance to be inserted.

id: str

Optional id for the document, if not omitted, a default value is generated.

Returns:

str

The id of the created document.

query_documents(query, layer=None, es_result=False, start=0, size=10)[source]

Find all Text documents that match keywords in the query.

Check elasticsearch documentation for more information: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

Example queries:

krokodill Gena
Find documents containing “krokodill” or “gena” or both
+venemaa -eesti
Find documents containing word/lemma about venemaa, but not eesti
“suur pauk”
Find documents containing exact phrase “suur pauk”
Parameters:

query: str

The keywords to use for search.

layer: str

The layer to search the text from (for example words, sentences, clauses, verb_phrases etc). If layer is None (default), then use the full document text for search.

start: int (default: 0),

The start index of the results. Same as “from” in Elastic, but we cannot use this name as it is reseved in Python.

size: int (default: 10)

Return size matches.

es_result: boolean (default: False)

if True, return the elasticsearch results, otherwise return a list of Text instances.

Returns

——-

list of Text instances if es_result is False

dict if es_result is True

refresh()[source]

Commit all changes to the index.

estnltk.database.database.prepare_text(text)[source]

Function that converts Text instance to format that can be easily indexed with ES database.