estnltk.core module

Core module of the Estnltk library, that sets up some common paths and has functions to convert between binary and unicode data.

Python 2.x and Python 3.x versions are different in the way the handle unicode data.

  • Python 2 uses str for binary data and unicode for textual data.
  • Python 3 uses str for unicode data and bytes for binary data.

As it is impossible to write code that is compatible with both Python versions due to using different types, we use as_unicode() and as_binary() to abstact the conversion away.

estnltk.core.as_unicode(s, encoding='utf-8')[source]

Force conversion of given string to unicode type. Unicode is str type for Python 3.x and unicode for Python 2.x .

If the string is already in unicode, then no conversion is done and the same string is returned.

Parameters:

s: str or bytes (Python3), str or unicode (Python2)

The string to convert to unicode.

encoding: str

The encoding of the input string (default: utf-8)

Returns:

str for Python3 or unicode for Python 2.

Raises:

ValueError

In case an input of invalid type was passed to the function.

estnltk.core.as_binary(s, encoding='utf-8')[source]

Force conversion of given string to binary type. Binary is bytes type for Python 3.x and str for Python 2.x .

If the string is already in binary, then no conversion is done and the same string is returned and encoding argument is ignored.

Parameters:

s: str or bytes (Python3), str or unicode (Python2)

The string to convert to binary.

encoding: str

The encoding of the resulting binary string (default: utf-8)

Returns:

bytes for Python3 or str for Python 2.

Raises:

ValueError

In case an input of invalid type was passed to the function.