Visualization of Text’s Polysingularity Using Network Analysis

Figure 10: Most prominent communities in both texts.

By Dmitry Paranyushkin, Nodus Labs. Published January 2012, Berlin

Abstract:

In this research we propose a method for visualizing text’s polysingularity: the multiple clusters of meaning circulation contained within a text. These clusters can be described as “strange attractors” (to use the term from dynamical systems theory), which are actualized during the process of reading. We use network analysis in order to plot the text’s structure onto a two-dimensional plane and represent these strange attractors as the communities of co-occurring nodes, positioned within the graph depending on their influence for the production of meaning. Such visual representation can then be used to identify the most influential topics within the text, provide quick summaries, enhance text comprehension (aiding the retainment of previous information according to the Landscape model of reading), create interfaces for inter-textual navigation, produce individual or group “mental maps”, and perform comparative analysis of texts. Our method can also improve the existing graphical methods for identifying topical structure (pLSA and DLA) in that it takes into account the overall structure of text for analysis and also brings in the tools and approaches used in network and graph analysis to further enhance qualitative and quantitive evaluation of textual data.

Keywords: polysingularity, meaning, text network analysis, graph, visualization, data, diagram, mental map, politics

Download PDF

 

1. INTRODUCTION

Normally we think of a text as a sequence of words that produce a meaning. However, what really happens when we speak? Cognitive science has two points of view on this process. The first one is based on Noam Chomsky’s transformational grammar: there is an internal representation called deep structure, a kind of semantic mental map, which after a series of transformations is expressed through the language (Friedenberg & Silverman 2011). The second one belongs to the camp of radical embodied cognitive science, which explains language in terms of agent-environment dynamics, denying the existence of representation (Chemero 2009; Johnson 2007). Whether there is representation or not, it is clear that there’s a constant process of interplay between the concepts that come into relation with one another and produce the meaning. This perceptual dynamics is very complex: most often we are aware of several things at the same time and exist in the multiple parallel worlds. There is here and now, but then there’s also that plan you have for the weekend and that nice flashback from some months ago and your Facebook profile and today’s news…

The point is that when we have to put all this into what we call a “text” we are faced with an unsurmountable challenge of cramming this dynamic complexity into a linear narrative, therefore reducing the multiplicity of meanings that every concept can carry.

This is a very important process, because it allows expression to be specific (to the particular time and space) and at the same time maintain co-isolated multiplicities (the underlying experience of the text). We call this process polysingularity because it has several possible “solutions” that co-exist simultaneously and yet only one solution is available at each point of time and space for actualization (Gabdulkhaev, 2005; Simonenko, 1965; Boikov, 2000). Polysingularity emerges when our experience meets the commonly accepted notion of linear time. Therefore it’s an expression of a certain purpose from the multitude of simultaneously existing possibilities. The question of what is real gets a totally different aspect when we think of it in terms of polysingularity.

Polysingularity may be present within a text as the actualization of a certain purpose through the relationships between the concepts contained within. A text in itself can also be part of a larger polysingularity: one of the many possible actualization of concepts, among many others.

In other words, certain concepts within a text may work together to produce different meanings, parallel narratives, contexts – each representing its own interconnected singularity within the text’s polysingularity (multiplicity of possibilities). In the same way the text itself, in relation to other texts, proposes a certain set of rules or functions, which produce its specific expression from the concepts it employs. Two texts could have absolutely the same vocabulary, but produce a totally different meaning.

Henri Bergson (2002) said that “time prevents everything from being given at once”. So any deviation from the conventionally accepted way of speaking or writing, through longer pauses, gestures and movement, stuttering, repetition is automatically making much more information available to us. Italo Calvino, a prominent Italian journalist and philosopher, once said that “writing is essentially a combinatorial exercise” (Calvino 1985). We have all the possibilities in front of us and when we write or speak we align them together for a certain purpose. The point here is that this combinatorial exercise does not have to made in one way only, there’s a lot more possibilities to be explored.

So text is an interface between our perception and a certain purpose that was compelling enough for us to express it. It’s polysingular in the way that it has multiple expressions but potentiates only one at a time. The way we use text is very limited, it’s still in its early alpha. We only read that one potentiation disregarding the underlying dynamics that produced it. Our proposition is to look closer at this dynamics to be able to see the text in its polysingularity or multiplicity of possible expressions, offering a more holistic approach to writing and reading. To go beyond a mere sequence of words, beyond the 18-second gap of short-term memory (which in turn defines how much text we can “see” at each moment of time), and to instead focus on the relationships between these words in order to enhance our communication and show that there’s much more interesting nuances than the standard representation of “text” offers.

One of the many possible ways to represent text in its complexity is to use a network (Popping 2000; Ryan 2007; Carley 1997; Doyle et al 2007). While there are many different ways of mapping a text into a graph, we will use the most direct one. That is, the words are the nodes and their proximity to each other in text is the relationship between them. Choosing this method to encode a text is reflecting a well-documented capacity of a human mind for priming, or associative recall of related concepts (Friedenberg & Silverman 2011). Here we assume that the closer the words appear within text, the more related they are.

Visual representation of text as a graph can also play an important role in improving text comprehension. It has been shown in so-called Landscape model (Myers & O’Brien 1998) that comprehension during reading requires that the concepts and propositions presented earlier in the text are reactivated (resonance process). Thus, having a diagram (graph) where the most “influential” concepts are represented and are related to one another may increase the rates of recall from the working memory and lead to better overall text comprehension. The text graph can also serve as a quick reference for retrieving background information that may be necessary for better comprehension of other related texts (Rawson & Kintsch 2004).  In his seminal book “Symbolic Species” Terrence Deacon speaks about the importance of interrelations between the words for interpretation of meaning. He adds that “symbolic reference derives from combinatorial possibilities and impossibilities, and we therefore depend on combinations both to discover it (during learning) and to make use of it (during communication). “ (Deacon, 1997) Emphasizing these relationships through graphs that can represent clusters of related words – or the contexts present within the text – visually can also lead to a better understanding of symbolic value for each text.

Therefore representing text as a network graph does not only reflect the core processes of human cognition, such as priming and associative thinking, but also helps towards better comprehension. Treating text as a network opens up more possibilities for interpretations and allows it to be manifested in its polysingularity. The many meanings and agendas that can arise from that interconnectedness are very diverse, however, each expression of the text plays a certain purpose related to a certain moment of time, so this diversity is reduced to the specific singularity that was selected among many possibilities on the moment. However, having the source code of the text as a network allows for a much more holistic views of the text and for many other expressions of the same agenda that could be more related to a specific context.

Representing a text as a network has several applications.

First, is finding a new way of writing and reading a text. We can see how the main concepts relate to one another. Such representation can expose what Foucault (1977) called a “heterogenous ensemble consisting of discourses” or dispositif (apparatus). We showed above that every text has a purpose, so it is a priori ideological. Stripping it off the timeline allows us to see it in its entirety and reveal the inner workings of ideology production within the text structure. Derrida (1979) once said that “the relief and design of structures appears more clearly when content, which is the living energy of meaning, is neutralized”. Text graph is such content neutralization that allows to expose the pathways for meaning circulation and see text as a Deleuzian rhizome (Deleuze & Guittari 1979), representing the underlying dynamics of multiple simultaneous discourses in their complexity. Ryan (2007) proposed to think of a diagram as an heuristic device, which can represent a narrative in a spatial, temporal and mental plane, thus allowing for multiple readings and novel interpretation of textual data. Calvino (1985) proposed to think of reading as “a way of exercising the potentialities contained in the system of signs” and of writing as a “combinatorial exercise”. Therefore, network representation of textual data brings in the new possibilities excluded by linear nature of written text.

Second, representing a text through the framework of network gives us a range of tools from graph analysis that can be used to detect communities of interrelated concepts within the text, clusters of meaning circulation, find the most influential concepts, and, more importantly, offer quantitive evaluation of textual data, which can be very useful for comparative analysis of text. In other words, we can easily find the topical structure in text and use the graph’s structural property to evaluate relative importance of the topics for the production of meaning.

Third, text network analysis can be used to analyze group sentiments. A series of interviews can be represented as a graph that shows the concepts and topics that are most relevant to the group as a whole and show the possible points of contact and the specific areas of expertise for each individual.

Fourth, it can be used for creating novel interfaces for inter-textual navigation.

Finally, text network analysis may be used to provide more choices to patients in psychotherapeutic treatment. It has been shown that many psychological deficiencies arise due to the structural gaps in the patients’ cognition (Havens 2005; Friedenberg & Silverman 2011; Bateson 1973; Watzlawick 1968). Text network analysis could be used to represent the “mental model” of such patients (Carley & Palmquist 1992) and therefore identify the structural gaps that lead to psychosomatic conditions.

It’s important to note the difference of our approach to other similar methods. Most of the text graph visualizations (Lima 2011; Paley 2002; Carley 1997) designed to get additional insight into texts or their underlying mental maps do not make the full use of graph visualization and analysis techniques. While they successfully represent text as a network and sometimes emphasize the most frequently mentioned concepts on the graph, there is rarely visual representation of the most often co-occurring words as clusters and there’s little emphasis on the node’s influence, or betweenness centrality, which is one of the most important measures for nodes in the graph. In other words, they don’t make use of the all the advantages that graph and network theory can offer for better understanding of textual data.

There are several other methods of finding structure within text that employ graphical models and topic modeling: latent semantic analysis or LSA (Landauer et al 1998), pLSA (Hofmann 1999), Pachinco allocation (Li & McCallum 2006), latent dirichlet allocation (Blei et al 2003), and relational topic models (Chang & Blei 2010). All these methods rely on detecting concept co-occurrence in order to identify topical structure of texts. Sometimes they also use hyperlink structure in order to identify relevant topics. Our approach works in a similar way, but in addition takes into account the structural properties of a text. For example, some texts, especially literary works and poetry, but also everyday speech may have very diverse range of topics covered. There will be words that are used more frequently, but they will serve for cohesion rather than for the production of meaning. Therefore LSA will show as as the most prominent concepts within a text when in fact the structure of the text is much more diverse and several words that co-occur less often but nevertheless form a densely-knit community will be detected in our method and mostly omitted by LSA (Paranyushkin 2011).

 

2. TEXT NETWORK VISUALIZATION AND ANALYSIS

We will now proceed to discuss the method for text network analysis in more detail. We will use as an example Martin Luther King’s speech “I Have a Dream” (see Appendix A). First, we should prepare the text for network representation and choose which words will be shown on the graph. In order to do that we remove so-called stopwords (e.g. “the”, “is”, “are”) and modify the remaining words in order to bring them to their appropriate morphemes (e.g. “dreaming”, “dream”, “dreamer” become “dream”) using K-Stemmer algorithm (Krovets 1993). We then remove the punctuation, numbers and extra spaces and the resulting text can be seen in Appendix A.

The modified text is then converted into XML-like GraphML graph format (Brandes et al 2002), where co-occurrences of words are recorded. This can be done using Automap software (Carley 2008), which can produce a semantic map of a text as XML file. Different settings can be used to transform co-occurrences of words into a network. For this research we use the method outlined in Paranyushkin (2011) using InfraNodus open-source text to network visualization tool, where the text is scanned twice using 5- and 2-word “windows” that record co-occurrences between the words depending on their proximity to each other in these windows. The scanning starts anew from each new paragraph to retain the initial structure of the text. The resulting GraphML file is opened in Gephi software (Bastian et al 2009), which visualizes it as a graph. The nodes are aligned randomly in a two-dimensional space (Figure 1).

 

Random layout of nodes, text graph for Martin Luther King – I Have a Dream

Figure 1: Random layout of nodes, text graph for Martin Luther King – I Have a Dream

 

The next step is to produce a more readable version of this graph. In order to do that we will apply force-atlas layout (Noack 2007; Jacomy 2009) in order to align the most connected hubs (words) away from each other and aggregate the nodes connected to these hubs around them. This produces a text network diagram (Figure 2) where the communities formed by the words used in text are much more visible, those words that co-occur together are located near each other.

 

Figure 2: Force-atlas layout, Martin Luther King, I have a dream

Figure 2: Force-atlas layout, Martin Luther King, I have a dream

 

We then calculate the key metrics for resulting graphs (average degree, diameter, average path, clustering, graph density), in order to have quantitive data for further analysis. As the final step we range the size of the nodes by the betweenness centrality (Freeman 1977; Brandes 2001) in order to emphasize the more influential words in the resulting text graph. Betweenness centrality measure for each node shows how often it appears on the shortest path between any two random nodes in the network. It’s an indicator of how important the node is to the overall connectivity of the network and those nodes that connect distinct separated communities together will have a higher measure of betweenness centrality. As a next step we apply different colors to distinct communities identified by modularity algorithm (Blondel 2008). This algorithm scans through all the relations between the nodes, grouping them into communities on the basis of how densely they are connected together. If the nodes are more tightly-knit together than to the rest of the network, they are considered to be a part of a distinct community.

These communities represent polysingularity within the text. If one was to start reading the text using the network from any random node (word), the clusters of meaning circulation would occur more frequently during the process of reading. To use the terminology from non-equilibrium dynamics these clusters are similar strange attactors (Strogatz 1994) within the text, determined by the structure of the original discourse. They provide temporary solutions towards which the text is drawn, in other words, polysingularity of the text’s discourse.

Polysingularity in text is also represented by the nodes with the highest betweenness centrality as they appear more often bridging separate communities together. These concepts form the backbone of meaning circulation within the text and this backbone is another strange attractor, where the system (reader – text) tends to shift as a whole during the course of reading. Therefore an important question in interpreting this data is the relationship between the communities and the concepts, which we will discuss below.

Figure 3 shows the resulting graph and Figure 4 shows the community structure of that graph more precisely.

Figure 3: community structure coded by color, nodes with the highest betweenness centrality are shown bigger on the graph.

Figure 3: community structure coded by color, nodes with the highest betweenness centrality are shown bigger on the graph.

 

 

Figure 4: community structure of the network

Figure 4: community structure of the network


Using Gephi we also make a chart of degree distribution, where the X axis shows the number of edges a node has (its degree) and the Y axis shows the number of nodes that have the degree shown on X axis (Fig 5). It can be seen that the distribution is close to normal distribution (where most nodes have an average number of edges) with a fat tail.

 

Figure 5: Degree distribution chart

Figure 5: Degree distribution chart

 

Finally, we calculate key metrics for the resulting text graph (Table 1). Average degree shows the number of edges each node (morpheme) has in the graph. The higher it is, the more densely connected is the text (which means that morphemes are spread more or less equally across the whole text). Using the results of previous research that used the same algorithm to create graph representations of different texts we can say that this particular text has a medium average degree, demonstrating diversity of distinct topics that are nevertheless well-connected to each other in the text. Average path shows how many nodes on average one has to pass to reach from one randomly chosen node to another. Diameter is the longest path that exists in the network. The higher these two parameters are, the more long-winding the text and the more is the diversity of presented topics. This particular text has a relatively low diameter and average path indicating that the agenda is pretty much centralized. Finally, modularity measure of more than 0.4 shows the presence of prominent communities within text (Freeman 2010, Blondel et al 2008). This text has modularity measure of 0.496, which indicates the presence of communities that are significantly more connected within than to the rest of the network.

All this data combined together with the normal degree distribution that we observe indicate that this text is located in the middle of “centralized” and “dispersed” spectrum proposed in comparative text network topology analysis (Paranyushkin 2011). In other words, it has a combination of a central agenda expressed by a few key concepts, and there are several topics that participate in the production of meaning while at the same time serving to support the key concepts within the text.  We will then take both the relations between the influential concepts and the communities into account when analysing the text.

Av Path Diameter Av Degree Density Modularity Community 42 (% of nodes) Community 3 (% of nodes) Community 13 (% of nodes)
3 6 9.8 0.025 0.496 5.5 4.5 4.3

Table 1: Text graph metrics
We can now proceed to qualitative evaluation of the text graph (Figures 3 and 4). First of all, as it has been shown that there’s a significant community structure and the data from Table 1 indicate that they also have a relatively significant size, we will analyze the relations between them, following the procedure for identifying the clusters for meaning circulation (Paranyushkin 2011). These are the largest communities (42, 3, and 13) of words and co-occur more often within the text:

42: day – make – time – dream
3: freedom – ring
13: satisfy – long 

These three communities (42, 3 and 13) represent the possible polysingular “solutions” for this text, the direction it tends to shift towards temporarily when it is read by the audience.

The communities 42, 3 and 13 are mainly related through the concepts that have the highest betweenness centrality. These are:

freedom – negro – nation – justice

The word “freedom” is the central concept of the text as it is both the most influential concept and also represents one of the more prominent communities.

Therefore we can interpret the graph in the following way. The main pathway for meaning circulation (or the ideological agenda of the text) is comprised of the 4 main concepts: “freedom”, “negro”, “nation” and “justice”. Further qualitative content analysis (Appendix A) shows that this also is the main message of the text: that “negros” have as much right for freedom in the nation and that taking that freedom would do justice. This agenda is supported by the two main clusters for meaning circulation: “day-make-time-dream”, which is a declaration of vision in the moment of time, and “satisfy – long”, which is long-due call for “justice”. While it does give a new insight into the text itself, it definitely offers a more holistic overview of the speech and points out the major semantic units that support the prevalent agenda within the text.

It has been shown through the framework of Landscape Model (Linderholm et al 2004; van den Broek et al 1996) that the fluctuation of information availability is related to structural properties of a text. Specifically, so-called “cohort activation” type of information recall functions on the basis of concepts’ proximity to one another. Therefore, as the resulting text graph can depict the community structure of words based on their semantic proximity, this will, in turn, enhance the process of comprehension. The reader can use the graph (Figure 3) as a reference while reading to improve background information recall and to have a constant overview of the whole text’s “landscape”.

Referring back to the first part of this paper where we talked about the possible interpretations of the text, graph analysis allows us to transform a narrative into a sort of “field of tension” that shows the discursive field from where the actual text originated. Our brief analysis shows that while there’s a range of various concepts that were at the basis of the speech, the particular choice Martin Luther King did for delivering them emphasized the importance of freedom for black people to the overall concept of justice for all upon which America was built. It has also shown that the text has a combination of both random and small-world network structurally (random distribution of degree, short average paths). One of the branches of studies in network science is how the structure of a network affects its capacity to spread epidemics. It has been repeatedly shown that a combination of random and small-world networks structurally allows for epidemics to be global with high amplitude infection rate (Kuperman & Abramson, 2001; Barahona & Pecora 2002; Newman 2002), while at the same time maintaing localized endemic infection states. If we transpose these results onto ideology and text perception and interpretation, they offer a pretty good explanation of why this text was so moving and important for a lot of people. It provided a certain connectivity, which was specific (small-world) and general (interconnected) at the same time. We can say that when one hears a text like this, word after word, a network of concepts that is formed within one’s memory has a high capacity for explosive outbreaks (short-term actions) and at the same time for maintaining low-endemic levels within communities (long-term memory).

In practical terms that means having a clear agenda defined by 4-5 main concepts that appear repeatedly throughout the text, while at the same time dispersing distinct stories within the text that relate together through the central concepts.

Knowing the structural properties of a text can also be used in order to know how to appeal to the “mental set” it represents. If we assume that after listening to the text the individuals form a certain mental model, which could be represented by the graph shown in Figure 3 (see Carley 1997 for more on the subject), then it would be possible to access the emotional state and action associated to that mental model once we know how to initiate it (see Friedenberg & Silverman 2011 on the link between the network of activated concepts and emotions). One way of initiating it would be to follow a well-documented path for viral information contagion (Ball 1997; Leskovec et al 2007), focusing on targeting interconnected communities first, which then activate the rest of the network through their ties to the other nodes and communities. Therefore, a focus on topics #13 (long-due satisfaction), #42 (the necessity to realize a dream here and now) and #3 (freedom) combined with several key concepts (negro-nation-justice) could be used to reactivate the “mental map” that would resonate with the audience’s previous experience. Once this resonance is reached further content could be added in order to modify the meaning or shift the ideological agenda of the text.

 

3. COMPARATIVE ANALYSIS OF TEXT STRUCTURES

We will now proceed to perform comparative analysis of the text presented above to George Bush’s second inauguration address. Using the same methodology described above, we produce the following graph representation of this text:

Figure 6: George W Bush second inauguration speech text network graph

Figure 6: George W Bush second inauguration speech text network graph

Figure 7: Clusters of meaning circulation, Bush text network graph

Figure 7: Clusters of meaning circulation, Bush text network graph

 

Figure 8: Node degree distribution, text graph for G.W. Bush inauguration speech

Figure 8: Node degree distribution, text graph for G.W. Bush inauguration speech

 

Av Path Diameter Av Degree Density Modularity Community 4 (% of nodes) Community 38 (% of nodes) Community 16 (% of nodes)
2.8 6 11.8 0.023 0.417 10.7 7.5 3.8

Table 2: Text graph metrics
It can be seen from the charts and the table above that this text is quite similar structurally to Martin Luther King’s text above. Both texts have similar average path and diameter, as well as modularity > 0.4. The difference of Bush’s text is that that it has a slightly higher average degree and skewed degree distribution that follows power law (so the higher average degree is due to the presence of a few, but significant number of concepts that re-appear throughout the whole text quite frequently). Also, Bush text’s two most prominent communities accumulate a higher proportion of nodes than the two most prominent communities in Luther King’s text. This indicates that Bush’s text is closer topologically to the “centralized” part of the spectrum (Paranyushkin 2011), so the communities are less important in generating the agenda than in Luther King’s text and the meaning is produced by the most influential concepts, which are, in turn, supported by the communities connected to them.

The most influential concepts within the text are:

nation – america – freedom – liberty

The most prominent communities are:

4: liberty – american – time – history

38: freedom – hope – long – word

The meaning is produced through the concepts “america” and “nation” related to “freedom” and “liberty”, both of which are in turn supported and elaborated by the two major communities they belong to: “hope, freedom” and “american, time, history”.

As we can see, the notions of  “freedom”, “liberty” and “nation” are very prominent in both texts: Martin Luther King’s address and George Bush’s inauguration speech. Moreover, both appeal to “time” as an important notion to emphasize the immediacy of their message. Also, both appeal to “hope” and “long” – a certain expectation and necessity of a change. But while in Martin Luther King’s speech this expectation is related to a “negro”, in  Bush’s speech it’s related to “American people”.

Therefore, both texts have an almost identical field of relations that’s represented through the graph structure. However, a slight change of a few concepts makes the agenda and the pathos of a text very different. This is a good example of resonance we mentioned previously when structurally similar texts can evoke similar emotions. It cannot be proved, of course, that Bush’s speech-writers purposefully used Martin Luther King’s speech to provide some inspiration, but what is shown through our analysis is that both texts at least operate with very similar concepts and have similar intentions. Although their target group is very different (black americans in Luther King’s speech vs. all american people in Bush’s speech), similar structural use of key concepts in Bush’s text could as well be a purposeful attempt to appeal to the black electorate.

We will finally put the two networks together (Figure 9) and calculate the metrics for the composite graph:

Av Path Diameter Av Degree Density Modularity Community 7 (% of nodes) Community 9 (% of nodes) Community 8 (% of nodes)
2.8 6 12 0.015 0.472 11.6 10.8 9.2

Table 3: Text graph metrics

The most influential concepts within the text are very similar to the texts above (except that in Luther King’s text “america” is replaced with “negro” and in the Bush’s speech “liberty” is replaced with “justice”):

nation – america – freedom – justice

The largest community in this network (Figure 1)  is comprised of the concepts dispersed more or less equally along the graph that do not have much influence on the whole discourse. They act as the supporting material. Majority of interaction happens between the two smaller (but still prominent communities):

nation – time – justice

america

Which are also part of the conceptual “backbone” of the text. Therefore, the community structure supports the agenda circulated by the main concepts, which is – for both texts – related to the notions of “nation” and “freedom”. But whereas Luther King’s texts applies this agenda to the “justice” for “negroes”, Bush applies exactly the same agenda for “american” and “liberty”. And they both use the rhetorics of freedom: in Bush’s case it’s a promise for the future (belonging to the same community as “hope”, “long”), in Luther King’s case it’s a call to “satisfy” the “dream” right “now”. While the emotional appeal of the both texts can be quite similar, the action that is called through that appeal is very different. Bush’s text is more looking forwards to the future, while Luther King text is directed directly at the “now”. This could perhaps explain why some speeches have the capacity to move, while others are made more to procure the current position in exchange for the promise of the positive future.

Figure 9: Luther King’s and Bush’s text visualizations

Figure 9: Luther King’s and Bush’s text visualizations

 

Figure 10: Most prominent communities in both texts.

Figure 10: Most prominent communities in both texts.

 

4. CONCLUSION

In this paper we presented a type of text network analysis that is drawing on the approaches and methodology used in graph analysis in order to interpret textual data. Our method for text network analysis allows to find the clusters of meaning circulation within the text, which function as a kind of strange attractors within the dynamic text structure during the process of reading. Plotting the dynamics of text onto a two-dimensional plane allows for a visual representation of polysingularity (multiplicity of possible specific expressions) contained within a text and also proposes to think of the text itself as a special case of a larger polysingularity of expressions. Comparative analysis of different networked text structures produced using our method allows to find similarities and differences that define the texts’ specificity and yet make them part of a larger networked body of interrelated concepts.

Our method has several practical applications. It produces a readable visual representation (a diagram) of a text, which can then be used to provide a summary, expose the text’s topical structure, find the most influential concepts, and better understand the structural properties of the writing. Our method can also be used for comparative text analysis, to create visual interfaces for inter-textual navigation, and to open the original text up for more different ways of reading while retaining the influence of the original structure (which can be a useful tool for writers, media, and artistic production).

The already existing methods for finding topical structure within text (such as pSLA or LDA mentioned in part 1) allow to detect topical structure within texts based on probabilistic co-occurrences of concepts within a textual corpus. Our method draws on a similar methodology, but takes into account the structural properties of the resulting text’s graph, which may further inform the methods above and provide possible strategies for ranging the importance of various topics present within text depending on their structural positioning within the text.

Using the example of two very different political speeches – “I Have a Dream” by Martin Luther King and the President George Bush’s second inauguration address – we’ve shown how our method can be used to derive and visually represent topical structure within text and also to compare different texts to each other.

Our method is comprised of the following steps.

First, a text is represented as a network, where the words are represented as nodes and their co-occurrences are represented as connections between them. The closer and the more often words appear next to each other in a text, the stronger will be the connection between them on the resulting text graph.

The next step is representing the network as a graph, applying Force-Layout method to produce a more readable community structure. The nodes (words) that have most connections (co-occurrences with other words) are pushed apart, while the other less connected nodes are grouped around them. This community structure is further enhanced by applying the modularity algorithm, which detects the nodes that are more densely connected to each other than to the rest of the network. This produces what we call the clusters of meaning circulation, which represent topical structure within the text. This is made more precise by detecting the nodes (words) with the highest betweenness centrality, which occur more often on the shortest paths between any two random nodes in the network, in order to detect the backbone for meaning circulation within a text.

As a final step, we provide quantitative data for a graph in order to estimate the extent to which the communities produced by the modularity method participate in the production of meaning within a text. If the modularity measure is low and thus community structure is not very prominent, then the ideological agenda of the text is generated through the most central influential concepts (we call such texts “centralized”). If modularity measure is high, then the community structure is modular, meaning there are several distinct contexts present within the text that use different vocabulary. If the distribution curve for nodes is close to normal (most of the nodes on average have the same measure of co-occurrence, which indicates there is only a few, but not a significant number of singular nodes that exert their influence on the text’s agenda), then we call such texts “dispersed” and consider that the meaning is produced within the communities or the clusters of meaning circulation and the conceptual backbone of the text serves to provide connectivity, but does not actively participate in generating the content of the meaning. The texts used in our examples (parts 2 and 3) have “mixed” properties meaning that the meaning is produced both within the clusters of meaning circulation and through the conceptual backbone of the text, so content of both is taken into account when detecting the topical structure of the text.

 

APPENDIX A:

Martin Luther King, “I Have a Dream”

I Have a Dream

I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.

Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.

But one hundred years later, the Negro still is not free. One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. One hundred years later, the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the Negro is still languished in the corners of American society and finds himself an exile in his own land. And so we’ve come here today to dramatize a shameful condition.

In a sense we’ve come to our nation’s capital to cash a check. When the architects of our republic wrote the magnificent words of the Constitution and the Declaration of Independence, they were signing a promissory note to which every American was to fall heir. This note was a promise that all men, yes, black men as well as white men, would be guaranteed the “unalienable Rights” of “Life, Liberty and the pursuit of Happiness.” It is obvious today that America has defaulted on this promissory note, insofar as her citizens of color are concerned. Instead of honoring this sacred obligation, America has given the Negro people a bad check, a check which has come back marked “insufficient funds.”

But we refuse to believe that the bank of justice is bankrupt. We refuse to believe that there are insufficient funds in the great vaults of opportunity of this nation. And so, we’ve come to cash this check, a check that will give us upon demand the riches of freedom and the security of justice.

We have also come to this hallowed spot to remind America of the fierce urgency of Now. This is no time to engage in the luxury of cooling off or to take the tranquilizing drug of gradualism. Now is the time to make real the promises of democracy. Now is the time to rise from the dark and desolate valley of segregation to the sunlit path of racial justice. Now is the time to lift our nation from the quicksands of racial injustice to the solid rock of brotherhood. Now is the time to make justice a reality for all of God’s children.

It would be fatal for the nation to overlook the urgency of the moment. This sweltering summer of the Negro’s legitimate discontent will not pass until there is an invigorating autumn of freedom and equality. Nineteen sixty-three is not an end, but a beginning. And those who hope that the Negro needed to blow off steam and will now be content will have a rude awakening if the nation returns to business as usual. And there will be neither rest nor tranquility in America until the Negro is granted his citizenship rights. The whirlwinds of revolt will continue to shake the foundations of our nation until the bright day of justice emerges.

But there is something that I must say to my people, who stand on the warm threshold which leads into the palace of justice: In the process of gaining our rightful place, we must not be guilty of wrongful deeds. Let us not seek to satisfy our thirst for freedom by drinking from the cup of bitterness and hatred. We must forever conduct our struggle on the high plane of dignity and discipline. We must not allow our creative protest to degenerate into physical violence. Again and again, we must rise to the majestic heights of meeting physical force with soul force.

The marvelous new militancy which has engulfed the Negro community must not lead us to a distrust of all white people, for many of our white brothers, as evidenced by their presence here today, have come to realize that their destiny is tied up with our destiny. And they have come to realize that their freedom is inextricably bound to our freedom.

We cannot walk alone.

And as we walk, we must make the pledge that we shall always march ahead.

We cannot turn back.

There are those who are asking the devotees of civil rights, “When will you be satisfied?” We can never be satisfied as long as the Negro is the victim of the unspeakable horrors of police brutality. We can never be satisfied as long as our bodies, heavy with the fatigue of travel, cannot gain lodging in the motels of the highways and the hotels of the cities. We cannot be satisfied as long as the negro’s basic mobility is from a smaller ghetto to a larger one. We can never be satisfied as long as our children are stripped of their self-hood and robbed of their dignity by signs stating: “For Whites Only.” We cannot be satisfied as long as a Negro in Mississippi cannot vote and a Negro in New York believes he has nothing for which to vote. No, no, we are not satisfied, and we will not be satisfied until “justice rolls down like waters, and righteousness like a mighty stream.”

I am not unmindful that some of you have come here out of great trials and tribulations. Some of you have come fresh from narrow jail cells. And some of you have come from areas where your quest — quest for freedom left you battered by the storms of persecution and staggered by the winds of police brutality. You have been the veterans of creative suffering. Continue to work with the faith that unearned suffering is redemptive. Go back to Mississippi, go back to Alabama, go back to South Carolina, go back to Georgia, go back to Louisiana, go back to the slums and ghettos of our northern cities, knowing that somehow this situation can and will be changed.

Let us not wallow in the valley of despair, I say to you today, my friends.

And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.

I have a dream that one day this nation will rise up and live out the true meaning of its creed: “We hold these truths to be self-evident, that all men are created equal.”

I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.

I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.

I have a dream today!

I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of “interposition” and “nullification” — one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.

I have a dream today!

Try InfraNodus Text Network Visualization Tool developed by Nodus Labs. You can use it to make sense of disjointed bits and pieces of information, get visual summaries for text documents, and generate insight for your research process: www.infranodus.com

I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; “and the glory of the Lord shall be revealed and all flesh shall see it together.”

This is our hope, and this is the faith that I go back to the South with.

With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.

And this will be the day — this will be the day when all of God’s children will be able to sing with new meaning:

My country ’tis of thee, sweet land of liberty, of thee I sing.

Land where my fathers died, land of the Pilgrim’s pride,

From every mountainside, let freedom ring!

And if America is to be a great nation, this must become true.

And so let freedom ring from the prodigious hilltops of New Hampshire.

Let freedom ring from the mighty mountains of New York.

Let freedom ring from the heightening Alleghenies of Pennsylvania.

Let freedom ring from the snow-capped Rockies of Colorado.

Let freedom ring from the curvaceous slopes of California.

But not only that:

Let freedom ring from Stone Mountain of Georgia.

Let freedom ring from Lookout Mountain of Tennessee.

Let freedom ring from every hill and molehill of Mississippi.

From every mountainside, let freedom ring.

And when this happens, when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God’s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual:

Free at last! Free at last!

Thank God Almighty, we are free at last!

 

Martin Luther King, “I Have a Dream”, normalized version, removed stopwords, punctuation, extra spaces, after K-Stemming (words brought to their morphemes)

dream

happy join today history greatest demonstration freedom history nation

score year ago great american symbolic shadow stand today sign emancipation proclamation momentous decree great beacon light hope million negro slave sear flame wither injustice joyous daybreak end long night captivity

hundred year negro free hundred year life negro sadly cripple manacle segregation chain discrimination hundred year negro live lonely island poverty midst vast ocean material prosperity hundred year negro languish corner american society find exile land ve today dramatize shameful condition

sense ve nation capital cash check architect republic write magnificent word constitution declaration independence sign promissory note american fall heir note promise men black men white men guarantee unalienable right life liberty pursuit happiness obvious today america default promissory note citizen color concern honor sacred obligation america negro people bad check check back mark insufficient fund

refuse bank justice bankrupt refuse insufficient fund great vault opportunity nation ve cash check check give demand rich freedom security justice

hallow spot remind america fierce urgency time engage luxury cool tranquilizing drug gradualism time make real promise democracy time rise dark desolate valley segregation sunlit path racial justice time lift nation quicksand racial injustice solid rock brotherhood time make justice reality go children

fatal nation overlook urgency moment swelter summer negro legitimate discontent pass invigorate autumn freedom equality nineteen sixtythree end begin hope negro need blow steam content rude awaken nation return business usual rest tranquility america negro grant citizenship right whirlwind revolt continue shake foundation nation bright day justice emerge

people stand warm threshold lead palace justice process gain rightful place guilty wrongful deed seek satisfy thirst freedom drink cup bitterness hatred forever conduct struggle high plane dignity discipline creative protest degenerate physical violence rise majestic height meet physical force soul force

marvelous militancy engulf negro community lead distrust white people white brother evidence presence today realize destiny tie destiny realize freedom inextricably bind freedom

walk

walk make pledge march ahead

turn back

devotee civil right satisfy satisfy long negro victim unspeakable horror police brutality satisfy long body heavy fatigue travel gain lodge motel highway hotel city satisfy long negro basic mobility smaller ghetto larger satisfy long children strip selfhood rob dignity sign stating white satisfy long negro mississippi vote negro york believe vote satisfy satisfy justice roll water righteousness mighty stream ¬π

unmindful great trial tribulation fresh narrow jail cell area quest quest freedom leave batter storm persecution stagger wind police brutality veteran creative suffer continue work faith unearned suffer redemptive back mississippi back alabama back south carolina back georgia back louisiana back slum ghetto northern city know situation change

wallow valley despair today friend

face difficulty today tomorrow dream dream deeply root american dream

dream day nation rise live true mean creed hold truth selfevident men create equal

dream day red hill georgia son slave son slave owner sit table brotherhood

dream day state mississippi state swelter heat injustice swelter heat oppression transform oasis freedom justice

dream children day live nation judge color skin content character

dream today

dream day alabama vicious racist governor lip drip word interposition nullification day alabama black boy black girl join hand white boy white girl sister brother

dream today

dream day valley exalt hill mountain make low rough place make plain crook place make straight glory lord reveal flesh 2

hope faith back south

faith hew mountain despair stone hope faith transform jangling discord nation beautiful symphony brotherhood faith work pray struggle jail stand freedom know free day

day day go children sing meaning

country ti thee sweet land liberty thee sing

land father die land pilgrim pride

mountainside freedom ring

america great nation true

freedom ring prodigious hilltop hampshire

freedom ring mighty mountain york

freedom ring heighten allegheny pennsylvania

freedom ring snowcap rocky colorado

freedom ring curvaceous slope california

freedom ring stone mountain georgia

freedom ring lookout mountain tennessee

freedom ring hill molehill mississippi

mountainside freedom ring

freedom ring ring village hamlet state city speed day go children black men white men jew gentile protestant catholic join hand sing word negro spiritual

free free

go almighty free

 

REFERENCES:

Ball, F. (1997). Epidemics with two levels of mixing. The Annals of Applied Probability, 7(1), 46–89. Institute of Mathematical Statistics. Retrieved from http://projecteuclid.org/euclid.aoap/1034625252

Barahona, M., & Pecora, L. M. (2002). Synchronization in Small-world systems. Physical Review Letters, 89(5), 54101. APS. Retrieved from http://link.aps.org/doi/10.1103/PhysRevLett.89.054101

Bastian, M.; Heymann, S.; Jacomy, M.; (2009). Gephi: An Open Source Software for Exploring and Manipulating Networks. Association for the Advancement of Artificial Intelligence

Bateson, G. (1973). Steps to an Ecology of Mind. (C. P. Co, Ed.) The Western Political Quarterly (Vol. 26, p. 345). Aronson. doi:10.2307/446833

Bergson, H (2002). The Possible and the Real. Continuum

Blei, D. Ng, A. Jordan, M. (2003) Latent Dirichlet Allocation. In Journal of Machine Learning Research, 1/03, 993-1022

Blondel, V.; Guillaume, J-L; Lambiotte, R; Lefebvre, E; (2008). Fast Unfolding of Communities in Large Networks. In Journal of Statistical Mechanics: Theory and Experiment, Volume 2008

Boikov, I. V. (2001). Numerical Methods of Computation of Singular and Hypersingular Integrals. 3, 127-179.

Brandes, A. (2001). Faster Algorithm for Betweenness Centrality. In Journal of Mathematical Sociology 25(2):163-177

Brandes, U.; Eiglsperger, M.; Herman. I., Himsolt, M.; and Marshall M.S. (2002). GraphML Progress Report: Structural Layer Proposal. Proc. 9th Intl. Symp. Graph Drawing (GD ’01), LNCS 2265, pp. 501-512. Springer-Verlag

Calvino, I (1986). The Uses of Literature. Orlando: A Harvest Book

Carley, K., & Palmquist, M. (1992). Extracting, Representing, and Analyzing Mental Models. Science.

Carley, K. (1997). Extracting Team Mental Models through Textual Analysis. Journal of Organizational Behavior

Carley, K. (2008). Automap software. URL: http://www.casos.cs.cmu.edu/projects/automap/software.html

Chang, J. & Blei, D. (2010). Hierarchical Relational Models for Document Networks. Annals of Applied Statistics, Vol. 4, No. 1, 124–150

Cheremo, A. (2009). Radical Embodied Cognitive Science. The MIT Press.

Deacon, T. (1997). The Symbolic Species. Penguin

Deleuze, G. & Guittari, F. (1987). A Thousand Plateaus. Minneapolis: University of Minnesota Press

Doyle, J.; Radzicki, M; Trees, S; (2008). Measuring Change in Mental Models of Complex Dynamic Systems. Springer

 

Foucault, M. (1977). The Confession of the Flesh interview. In Power/Knowledge Selected Interviews and Other Writings (ed Colin Gordon), 1980: pp. 194-228.

Freeman, L. (1977). A Set of Measures of Centrality Based on Betweenness. Sociometry Vol. 40, No. 1 (Mar., 1977): 35-41

Friedenberg, J.; Silverman, G. (2011). Cognitive Science: An Introduction to the Study of Mind, 2nd Edition. SAGE Publications.

Gabdulkhaev, B. G. (2005). Polysingular integral equations with positive operators, 49(11), 2005.

Havens, R. (2005). The Wisdom of Milton H. Erickson: The Complete Volume. Wales: Crown House Publishing

Hofmann, G. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

Jacomy, M. (2009). Force-Atlas Graph Layout Algorithm. URL: http://gephi.org/2011/forceatlas2-the-new-version-of-our-home-brew-layout/

Johnson, M. (2007). The Meaning of the Body. The University of Chicago Press

Krovetz, R. (1993). Viewing Morphology as an Inference Process. SIGIR ’93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

Kuperman, M., & Abramson, G. (2001). Small World Effect in an Epidemiological Model. Physical Review Letters, 86(13), 2909-2912. doi:10.1103/PhysRevLett.86.2909

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259-284.

Leskovec, J., Adamic, L. A., & Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 5. ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1232727

Li, W. & McCallum, A. (2006). Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations. In Proceedings of the 23rd International Conference on Machine Learning

Lima, M. (2011). Visual Complexity: Mapping Patterns of Information. Princeton Architectural Press.

Linderholm, T; Virtue, S; Tzeng, Y; van den Broek, P; (2004). Fluctuations in the Availability of Information During Reading: Capturing Cognitive Processes Using Graph Model. In Discourse Processes, 37(2), 165-186

Myers, J & O’Brien, E (1998). Accessing the Discourse Representation During Reading. In Discourse Processes, 26: 131-157

Newman, M. E. J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 5. American Physical Society. Retrieved from http://arxiv.org/abs/cond-mat/0205405

Noack, A. (2007). Energy Models for Graph Clustering. In Journal of Graph Algorithms and Applications, vol 11, no 2: 453-480

Paley, B.W. (2002). TextArc text visualization software. URL: www.textarc.org

Paranyushkin, D; (2011). Identifying the pathways for meaning circulation using text network analysis. Nodus Labs, text to network visualization tool used: www.infranodus.com

Popping, R. (2000). Computer-assisted Text Analysis. SAGE.

Rawson, K & Kintsch, W (2004). Exploring Encoding and Retrieval Effects of Background Information on Text Memory. In Discourse Processes, 38(3): 323-344

Ryan, M-L. (2007). Diagramming Narrative. In Semiotica 165–1/4: 11–40.

Simonenko, I. (1965). A new general method for researching linear operational integral equation. 29, 567-586.

Strogatz, S. H. (1994). Nonlinear Dynamics And Chaos.

van den Broek, P; Risden, K; Fletcher, C; Thurlow, R. (1996). A “landscape” view of reading: Fluctuating patterns of activation and construction of a stable memory representation. In Models of Understanding Text

Watzlawick, P., Beavin, J. H., & Jackson, D. D. (1967). Pragmatics of Human Communication. (B. S, Ed.)Dissertation Abstracts International, 44(1-B), 356. W W Norton & Company. doi:10.1177/003803856900300341

 

 

 

 

 

On the internet people come and go, but we would like to stay in touch. If you like what you're reading, please, consider connecting to Nodus Labs on Facebook, Twitter and Patreon, so we can inform you about the latest updates and engage in a dialogue.