Language Modeling with DEFLATE
Language Modeling with DEFLATE
Can you generate language using compression - selecting the next token by what compresses the most? Can a high-quality starting corpus influence the quality of the generation?
<corpus, input text, [ generation sequence ]>
?
Fetch from Project Gutenberg
I used the plaintext Fetch Darwin’s text from https://www.gutenberg.org/cache/epub/1228/pg1228.txt