Description of the problem
One of the brilliant ideas that Kurt Vonnegut came up with was that one can track the plot of a literary work using graphical methods. He did that intuitively. Today’s question is “Can we track the plot changes in a text using computational or algorithmic methods?”
Overlaps between units of texts
The basic idea is to split a text into fundamental units (whether this is a sentence, or a paragraph depends on the document) and then convert each unit into a hash table where the keys are stemmed words within the unit, and the values are the number of times these (stemmed) words appear in each unit.
My hypothesis is (and I will test that in this experiment below) that the amount of overlap (the number of common words) between two consecutive units tells us how the plot is advancing.
I will take the fundamental unit as a sentence below.