Decomposing documents into claims

notein-progressJun 22, 2026

Part 1 of 11 in Episteme

Turning a large corpus of internet discourse, academic papers, and the like into a clean database of claims is the foundational subproblem. A document must be read, the propositions it asserts must be extracted, and each one organized into the graph. Several distinct difficulties fall out of this:

Splitting vs. lumping similar claims
Canonical forms of claims
Structuring the claim graph (Wiki vs. DAG vs. DCG)
Defining edges between claims
Determining claim importance

Once a claim has been extracted, it still has to be reconciled against what is already there — see matching instances of claims to claims in the graph.

Part 1 of 11 in Episteme

Splitting vs. lumping similar claims →