"Low Quality" filter

feature

(Robert Weissgraeber) #1

For fully automatic text production, some cases of low quality text want to be filtered out. For this, a new state “Low Quality” for texts is introduced:

This state can be triggered by multiple criteria, currently “minimum text length” is implemented. If the minimum text length is configured in the collection, all texts below that are sorted into the “low quality” state:

image

In the future, more of these criteria will be made available, current ideas are to use spellchecking/grammar check information, axite-based predictions, etc.


(Florence) #2

Hi Robert,

Would it be possible to have a detection of stuck words, multiple blank characters, sentences starting with a blank character, sentences starting with a word whose 1st letter is not in capital and, depending on the language, a blank character must be inserted before the exclamation and question marks.

Best regards,
Flo