Distant Reading

During the process of learning about Distant Reading last week, I found it quite interesting to capture the beauty of Distant Reading. Distant Reading, as Whitley mentioned in her writing, is not to take the place of traditional close reading, but to give readers a broader view of the whole documents in a quantitative other than qualitative way.
This is quite apparent when related to personal reading experiences. People intend to pay more attention to the feeling changes of the characters in the whole story since words are qualitative. In this case, you need to sniff what is hidden underneath the word, not just the word itself. However, various digital tools give us a more clear way, through giving numerical data, tree maps and other methods, to view the whole document in distance calmly and dispassionately. We transcribed the travel journal by Jasper Payne in 1747. At first, we read it as usual to get a general view of what they did every day during the journey, how they crossed the river, met the Justice and talked to Negroes. As Payne himself heavily involved in the event, his perspective is quite related to his religion and personal characteristic. In this case, readers will feel themselves also heavily got involved when reading. However, as we put the Travel Diary into Cirrus, one Voyant tool, the screen shows as follow.

http://voyeurtools.org/tool/Cirrus/
http://voyeurtools.org/tool/Cirrus/

When click at a specific word, a axis graph on the top right shows the frequency of this word that appears in the text.

http://voyeurtools.org/?corpus=1423537324433.2082&type=came&stopList=stop.en.taporware.txt&skin=simple&event=documentTypeSelected
http://voyeurtools.org/?corpus=1423537324433.2082&type=came&stopList=stop.en.taporware.txt&skin=simple&event=documentTypeSelected

By studying this kind of graphs, we will be confused about what the logical relationship within the text and get a more clear view of what is going on quantitatively.

5
http://voyeurtools.org/tool/Links/

Link is another representative tool to re-process this travel journal. As it not only display distant reading, but also spatial reading, the key words in the document are showed how they relate to each other. Another very interesting point in Whitley reading is related to the Poetess Archives, a visualization project that relate the flowery poetry in Britain and America between 1750 to 1900. This is quite attractive to me since human eyes are not able to directly gather these two together in a short period of time, but DH can. As what we did during today’s class, we put Travel Diary and Powell Diary, two documents which similar but not not, together in the DocuBurst. The image below is what came out in one second. This is really cool to use digital tools to connect 2 old documents.

3
http://vialab.science.uoit.ca/docuburst/search.php?doc=1501_travel_diary&doc2=963_powell_diary&root=Justice&sense=00

 

On Distant Reading

Humanists always spend too much time on exploring useful information from tremendous humanity database. The theme of the book, the intentions of authors and the subtle evidence of the contemporary lives, culture and thoughts are all hidden in hundreds of pages, or even more, of humanity materials. However, nowadays, humanity exploration no longer consumes as much time as we had before. The distant reading method can help readers to compact hundreds of pages into simple visual images. Such images can not only help readers find what the author focus on, but also can help readers find the relationship between different terms. To better understand the distant reading method, I will use DH method to explore the question “what are the major groups of people Payne and Froehlich met” from the full compiled transcription of the Payne/Froehlich Travel Journal.

The picture on the right is cirrus which creates a word cloud in which the more1 frequently the word appear, the bigger the word.  By finding the large words, we can easily find out what the author mentioned most. Therefore,  to find who did Payne and Froehlich meet, we can simply find the large words of people’s name or people’s group. For example, the words “negroes” and “ brother” are large which suggests that negroes and brothers would be people Payne and Froehlich always care or meet.

To find the relationship among different people, Links would be a good choice. 2When I find the names of people, I clicked the names, and the links increased. It shows that “we” and “negroes” are linked together, which indicates an interaction between Payne and Froehlich and negroes. Also, the the Links shows no interaction between brother, children and negroes but an interaction between brother, children and Payne and Froehlich.

Bubblelines shows the relative locations of different words in the text. I put some3 of the names that I found form the cirrus into bubblelines. I find out that most “brother” and “children” spread at the front of the article and negroes spread from the middle to the end, which provides evidence that during their journey, Payne and Froehlich first pay a lot more attention on brother and children. Later, they pay more attention on negroes.

On Distant Reading

An distant reading tool called "Knots" from http://voyeurtools.org/tool/Knots/
An analysis of Payne’s Journal using a distant reading tool called “Knots” from http://voyeurtools.org/tool/Knots/

The picture to the left may look like a children’s drawing to most people, but in reality it is an analysis of the frequency of five different words in a document. Every time the word shows up in the document, the assigned, colored line takes a turn at a specified angle. This is a clever visualization, which Digital Humanists use for something called distant reading.

Distant reading is a useful tool for analyzing text from far away. Instead of closely reading every word of a specific document word for word, we can use tools like online applications or programs to analyze the types of words used inside of the document.

A distant reading tool called "Links" from http://voyeurtools.org/tool/Links/
An analysis of Payne’s Journal using a distant reading tool called “Links” from http://voyeurtools.org/tool/Links/

These tools provide overall summaries of data like the frequency of words, popularity of words, or relationships (as seen in the image to the right) between words in a certain document. The image to the right is also an example of spatial reading, which means the spaces between the words are there for a reason to do with their relationships with each other.

Recently, we transcribed a Moravian travel journal written by Jasper Payne in 1747. By closely reading this document we learned about the everyday life Payne and his “brethren”  lived on their journey, including the places and people they saw and met. A distant reading of this journal may reveal similar aspects of the writings, but it also unveils broader ideas. For example, the following image shows us many interesting things about the writing that we may not have realized in the close reading (click on image to read more clearly):

An analysis of Payne's Journal using a program named "Jigsaw" by John Stasko.
An analysis of Payne’s Journal using a program named “Jigsaw” by John Stasko.

This “tree map” shows the words that follow the word “our” in the travel journal listed from most common to least common. As you can see, Jasper Payne displayed a lot of emotional religious views in his writing. From this distant reading we can really see exactly how loving and faithful Payne and the Moravian group of people actually were.

Distant Reading: Plural Pronouns Reflect Collectivist Society

Distant reading of texts can help readers understand overall themes, concepts, and cultural context. For this reason, distant reading can help to answer the following question: Were the Moravians in the 18th century a collectivist or individualist society? Some distant reading strategies will better display this orientation in society than others, however the theme remains sound. In order to test if this distant reading can adequately answer this question, I will use the the full compiled transcription of the Payne/Froehlich Travel Journal.

Screen Shot 2015-02-09 at 5.17.13 PM
The size of the words indicates the number of times the word was used in the travel journal. Some of the largest words are “we,” “our,” and “us” — all plural pronouns.

Though the writer was directly and heavily involved in the travel events described, he chose to use plural pronouns, as can be seen in the Cirius cloud to the right. This is something that Whitley, in Visualizing the Archive, would identify as a form of “spatial reading.” He continues to explain the ways in which this type of spatial readings, world clouds, blur the lines between close and distant reading. As the graphics display the most frequently used words in the text in a visual way (the size of the words), however are still very much displaying the worlds. This allows distant readers to gather information about the text that may otherwise go unnoticed in a close reading. Whitley also points out that the reader must find a balance between looking at the word cloud as a big picture, or focusing on specific words. In order to help answer the research question posed about Moravian society it is important to notice that some of the largest, and most used, words are plural pronouns.

The below image is another visualization tool that helps to show word use and frequency in the the Payne/Froehlich Travel Journal. The pink line of bubbles represents the use of “we,” the purple represents the use of “he,” the neon blue represents “our,” the yellow represents “us,” and the blue represents “I.” All of the plural pronouns are used more frequently in the text overall. Again confirming that the Moravians were likely a more collectivist society. Though in order to answer the posed research question, it is not necessary to show what segments of the text use the provided pronouns most frequently.

Screen Shot 2015-02-09 at 6.28.03 PM
This tool, Bubblelines, shows the frequency of words in certain segments of the text.

However, it is interesting to note that “I” is used virtually not at all in the beginning of the text, but becomes more frequently used towards the end. There are many possibilities as to why this is the pattern. But, it is something that would have gone entirely unnoticed without the distant reading tools.

The next representation of the text is through Links, which shows both frequency of word use as well as connections between the words.

Screen Shot 2015-02-09 at 6.41.12 PM
This is the first Links image. It shows the connections between words, especially those that are frequently used. There is a “the” cluster and a “we” cluster that are not connected.
Screen Shot 2015-02-09 at 6.41.23 PM
This is the second Links image, that includes “I.” It shows that adding the word “I” was essential to connect two different sets of clusters.

The first links image shows networks of words in relation to the two most frequently used words: we and the. However, the two networks do not connect. Once introducing the word “I,” the clusters become connected. This would prove that though less frequently used, singular pronouns (such as “I”) are more important to the coherence of the text than would have been deduced based on the other two spacial readings. For this reason, this approach is not the most helpful in answering the research question.

Despite the journalist’s personal experiences being documented it is clear that for this society at this point in history a more collective and plural writing style was preferred. The visualization methods used in the distant reading of the Payne/Froehlich Travel Journal allow the reader to see the big picture, instead of getting bogged down in the details of the text. It would be interesting to further research the question of “were the Moravians in the 18th century a collectivist or individualist society?” by comparing this journal with another Moravian travel journal from that time.

On Distant Reading

In the past, people tended to value close reading over the broad brushstrokes of information visualization. However, the subtlety of word choice and the nuance of phrasing reveal that apparently straightforward texts are more complicated. Then, a number of scholars have cited Franco Moretti’s concept of “distant reading”. Distant reading means understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data. Opposite to close reading, distant reading can uncover the true scope and nature of literature. Also, it provides us the opportunity not only to enhance our vision but also to rethink some of our basic assumptions about how to read by using visualization tools. Text visualization tools propose distant reading as complementary practices that promise wide-angle perspective on the large corpora of texts housed in digital archives and the serendipitous discovery of the knowledge these archives contain. Now, Let’s explore these tools with our transcription of the travel journal.

Links is one of my favorite tools. It represents the collocation of terms in the text by depicting them in a network through the Screen Shot 2015-02-08 at 2.14.38 AMuse of a force directed graph. The frequency of the word is indicated by the relative size of the term, which helps me find the main term easier. The most attracting part is that I can click on any term I’m interested in, after I click on the term, other terms related to that term would appear. By doing so, I can discover more and more relationships between the terms in the text.

Bubblelines is another tool that visualizes the frequency and repetition of a word’s use in the text. Each document is Screen Shot 2015-02-08 at 2.34.18 AMrepresented as a horizontal line and the selected word is represented as bubbles. The size of the bubble indicates the word’s frequency. By checking the box of separate lines for terms, the bubble line will be split into separate lines with different terms and colors. When you click on a bubble, you can see the entire text that includes the word in the bubble.Screen Shot 2015-02-08 at 2.34.41 AM