Decomposing documents into claims
Part 1 of 11 in Episteme
Turning a large corpus of internet discourse, academic papers, and the like into a clean database of claims is the foundational subproblem. A document must be read, the propositions it asserts must be extracted, and each one organized into the graph. Several distinct difficulties fall out of this:
- Splitting vs. lumping similar claims
- Canonical forms of claims
- Structuring the claim graph (Wiki vs. DAG vs. DCG)
- Defining edges between claims
- Determining claim importance
Once a claim has been extracted, it still has to be reconciled against what is already there — see matching instances of claims to claims in the graph.
Part 1 of 11 in Episteme