Language Modeling with DEFLATE

Language Modeling with DEFLATE

Can you generate language using compression - selecting the next token by what compresses the most? Can a high-quality starting corpus influence the quality of the generation?

<corpus, input text, [ generation sequence ]>?

Fetch from Project Gutenberg

I used the plaintext Fetch Darwin’s text from https://www.gutenberg.org/cache/epub/1228/pg1228.txt


How well does it compress?