This is not a sutta: AI generated buddhist scripture

March 17, 2019

Well, someone had to do it.

This is a very rough hack put together in the early morning when I couldn’t sleep due to some leg pain.

I based it on on the gpt-2 collab notebook example, training it on all the middle length suttas translated by Thanissaro that I copied from Access To Insight into a single text file. Other than swapping out A Tale of Two Cities for my text file the code is identical, and I’m not doing anything further with this, so there’s no point in forking the repository.

I didn’t expect this to work all that well in generating “interesting” text that is both intelligible and novel; it’s a relatively small data set and with the high content of pericopes it’s quite redundant, so low information.

When run, the model converged quickly, which makes sense given the nature of the training set:

[2257 | 2349.62] loss=0.18 avg=0.31]

As is the custom with such things, I ran it a bunch of times to get sense of the output and cherry picked some good/interesting results.

Common patterns I saw in the output:

Verbatim repeating of verses from the training text with no changes or slight additions at the end
The text system getting stuck, repeating the same word or phrase several times
The generally aphasic style of output which is seen in undertrained gpt-2 sets

Some example output:

Ignorance & craving for becoming: these five skillful qualities that through having not-of-the-flesh might be attained while one is attained right here & now, timeless, timeless, eternal, not worth remaining in the supreme rest from the yoke.

Not terrible, but a bit incoherent…

“With his mind thus concentrated, purified, and bright, unblemished, free from defects, pliant, malleable, steady, and fixed on one point, pliant,neither enlarged nor too thin, nor too thin, nor too too thin, nor too thin, nor too thin, nor too thin; the hair — done [d] with blood, named with blood, smeared with blood.

This one is interesting; it starts with a Jhanic pericope, stutters a bit then gets kinda dark. I don’t find “named with blood” in the training text, though “smeared with blood” is repeated several times.

“With his mind thus concentrated, purified, and bright, unblemished, free from defects, pliant, malleable, steady, and fixed, he said to one of them: ‘This five hundred years… the five hundred, the six hundred thousand, the five thousand, the four hundred Good, the seven hundred existents, the seven hundred devalations, the seven fathoms. And the very first to which one was a a leper pod, a master of the Vinaya, possessed of the Abodes, awakened, blessed, blessed, blessed.’

Jhanic pericope at the start again, then a list of numbers (a common theme, those these particular numbers are not found together in that list), plus some stuttering.

“Monks, is Master Ananda here fit for you to admonish & fine conduct, to proclaim the holy life both in its particulars and in the presence of the Blessed One who knows & sees no drawbacks in telling a deliberate lie?” “In that case, please let contact with fists come to this monk, for this is how the Buddha’s bidding is done.”

And sometimes it kinda goes off the rails…

I don’t see much point in pursuing this further. If I were to bundle up all the Tipitakka translated it would likely not provide sufficient information to generate interesting or novel output much better than what I’ve got already.

Overall an intersting experiment. I’ve never used the Google collab system and was impressed, particularly in that I could set it to run on GPU, which sped things up considerably. I’ll likely use this as a training system at work to help students do bionformatics analysis.