General Aims of OTMI
From OpenTextMining
The general aim of OTMI is to enable text mining analysis without issuing human-readable text. This allows publishers to support text-mining-based research within their existing business models. In practice this is achieved in two principal ways: using word vectors (i.e. word occurrences with frequency counts) and 'snippets' (sentences and phrases from the text presented out of order).
Other considerations:
- Remove markup
- Remove entities/macros
- Self-contained (with metadata and links back to original content), so OTMI documents make sense outside of their original context.
- Retain section-level structure of document
- Separate figure legends from main text
