prepars package#
Submodules#
prepars.normalizer module#
- class prepars.normalizer.Normalizer[source]#
Bases:
object- characterRefine(text: str) str[source]#
This method does some common text preprocessing rules.
- This method is used to:
Remove Extra spaces
Remove extra newlines
Remove Extra ZWNJs
Remove keshide, carriage
Translate Latin numbers to Persian numbers
Replace quotation with gyoome
Relace dot with momayez
Replace 3 dots
Remove FATHATAN, DAMMATAN, KASRATAN, FATHA, DAMMA, KASRA, SHADDA, SUKUN
- Parameters:
text (str) – a pure text to refine
- Returns:
Refined text as string
- Return type:
str
- makeTrans(A, B)[source]#
This method is responsible to map chars to each other(zip). example: 1->۱
- Parameters:
A (str) – source string
B (str) – destination string
- Returns:
a dictionary of mapped words
- Return type:
str
prepars.regexer module#
- class prepars.regexer.Regexer[source]#
Bases:
object- compilePatterns(patterns)[source]#
This method take an array of tuples (pattern, replacement) and compile them
- Parameters:
patterns – array of tuples (pattern, replacement)
- Returns:
an array of compiled regex patterns
prepars.spacing module#
- class prepars.spacing.Spacing[source]#
Bases:
object- fix(text)[source]#
This method used to fix text(call all spacing methods)
- Parameters:
text – a pure text
- Returns:
processed text
- prefixFixer(text)[source]#
This method applies prefix rules on text
- Parameters:
text – a pure text
- Returns:
processed text