prepars package#

Submodules#

prepars.normalizer module#

class prepars.normalizer.Normalizer[source]#

Bases: object

characterRefine(text: str) str[source]#

This method does some common text preprocessing rules.

This method is used to:
  • Remove Extra spaces

  • Remove extra newlines

  • Remove Extra ZWNJs

  • Remove keshide, carriage

  • Translate Latin numbers to Persian numbers

  • Replace quotation with gyoome

  • Relace dot with momayez

  • Replace 3 dots

  • Remove FATHATAN, DAMMATAN, KASRATAN, FATHA, DAMMA, KASRA, SHADDA, SUKUN

Parameters:

text (str) – a pure text to refine

Returns:

Refined text as string

Return type:

str

makeTrans(A, B)[source]#

This method is responsible to map chars to each other(zip). example: 1->۱

Parameters:
  • A (str) – source string

  • B (str) – destination string

Returns:

a dictionary of mapped words

Return type:

str

normalize(text)[source]#

This method used to manage normalization operation

Parameters:

text (str) – unnormalized text

Returns:

normalized text

Return type:

str

punctuationRefine(text)[source]#
This method is responsible to:
  • Remove space before and after quotation

  • Remove space before and after symbols

  • Put space after . and :

Parameters:

text (str) – a pure text to refine

Returns:

refined text as string

Return type:

text (str)

prepars.regexer module#

class prepars.regexer.Regexer[source]#

Bases: object

compilePatterns(patterns)[source]#

This method take an array of tuples (pattern, replacement) and compile them

Parameters:

patterns – array of tuples (pattern, replacement)

Returns:

an array of compiled regex patterns

prefixPatternGenerator()[source]#

This method fetchs all affix pattern from rule file and generate regex patterns

Parameters:

self – python class

Returns:

an array of regex patterns[(pattern, replacement)]

sffixPatternGenerator()[source]#

This method fetchs all suffix pattern from rule file and generate regex patterns

Parameters:

self – python class

Returns:

an array of regex patterns[(pattern, replacement)]

prepars.spacing module#

class prepars.spacing.Spacing[source]#

Bases: object

fix(text)[source]#

This method used to fix text(call all spacing methods)

Parameters:

text – a pure text

Returns:

processed text

prefixFixer(text)[source]#

This method applies prefix rules on text

Parameters:

text – a pure text

Returns:

processed text

suffixFixer(text)[source]#

This method applies suffix rules on text

Parameters:

text – a pure text

Returns:

processed text

unregularWords(text)[source]#

This method applies unregular words rules on text

Parameters:

text – a pure text

Returns:

processed text

prepars.verb module#

class prepars.verb.verbProcessing[source]#

Bases: object

fixVerbs(text)[source]#

This method fixes all verb half-space and space problems.

Parameters:

text – input text

Returns:

corrected text

Module contents#