Electronic Text Analysis and the Wary Humanist

March 30th, 2009 § 2

For a long list of complicated reasons, most practitioners of my discipline—political theory—tend to be suspicious of, if not altogether opposed to, the integration of computer technology into their research and teaching. While some scholars cite the superfluity of computer technology to the discipline (excepting, of course, Microsoft Word), others argue that the introduction of certain technologies might somehow actually endanger both thinking and learning (and who wouldn’t find the reduction of Plato to a series of PowerPoint slides, well, a tad reductive?).

Nevertheless, working at the Scholars’ Lab has afforded me the opportunity to sample a range of digital scholarship tools/resources, some of which might appeal to that most skeptical of techno-skeptics, the political theorist. One such resource is TAPoR, the Text Analysis Portal for Research, a website that provides access to tools used in the analysis of electronic texts.

Many classic texts are now available in electronic form, and “computer-assisted text analysis” (TAPoR website) enables the researcher to explore a text in ways that are difficult, if not impossible, using only conventional tools for text analysis such as the index or a concordance (though much electronic text analysis is modeled conceptually on both of these). This is generally accomplished by allowing the researcher to search a text for specific words or word patterns or to generate a listing of the most frequently used words. In the case of TAPoR’s word/word pattern search, the results are displayed in the context of the surrounding text—the words sought are in bold, while several additional words on either side give the researcher some sense of how the words are being used. One may employ this kind of analysis with many ends in view, though some common goals include: a) testing to see whether—and if so, how often—an author employs either specific language or a certain kind of language and b) exploring certain words or phrases in context in order to gauge the narrowness or expansiveness of the author’s meaning when such language is used.

In order to explore the features and capacities of TAPoR, I brought to the portal an aspect of a particular research question that I had been thinking about for some time. To write my dissertation, I must provide at least a provisional answer to the following question: Does John Locke articulate a consistent view of the human person throughout his corpus? Although I was very familiar with the way he spoke about the “person” in one text, I was less familiar with his usage in other texts. After sketching a brief definition of the “person” based the first text, I proceeded to investigate whether Locke spoke of the “person” in similar terms in a second text. I pasted the URL of a webpage that contained the second text into TAPoR, which I then asked to search the document for the word “person.” I performed several additional searches, using other key words/phrases from my original definition as well as others that came to mind as I was searching. The results were illuminating. I discovered that although Locke tended to use the word “person” in a more ordinary, less philosophical sense in the second text, all of the basic features of the first text’s conception were nevertheless present. While this confirmed the intuition I had about the consistency of Locke’s view of the “person” across his texts (or at least two of them), the specific instances of personhood language that I isolated with the help of TAPoR will allow me to present a much more convincing defense of my position in the dissertation. Additionally, the fact that TAPoR allows the researcher to view all results of a word search simultaneously helped me to formulate more precisely what was going on in the text and to relate it to my more general argument–i.e. a looser, more familiar usage of “person” in certain contexts can co-exist with a unified, consistent account of personhood.

TAPoR enabled me to “see” more than I otherwise would have in a text and could be a valuable resource for scholars in any field concerned with the close and careful reading of texts.

Mining and Mapping Apocalyptic Texts, Part 2

March 30th, 2009 § 0

As I explained in my last blog post, my dissertation will compare several statements about the final fate of humankind in Paul to similar statements in apocalyptic texts. In that post, I described how text-mining could help with the interpretation of the texts which stand at the center of my dissertation. In this post, I will discuss how geographic information systems (GIS) can help to visualize geographic relationships among texts. My ideas here, as in my first blog post, are the result of conversations with other staff members here at the Scholars’ Lab. The question that I pose and answer in this blog post is, What does geography have to do with the analysis of biblical texts? The short answer is, “Much, in every way.” But I can’t just assert that, I need to show it.

Historical-critical study of the Bible has understood for the last two hundred years that the historical circumstances of any person or group profoundly affect the literature that that person or group produces. And scholars understand geographical location to be an integral part of any author’s historical circumstances. I was always dubious about the ability of GIS to help me with my research into the Apostle Paul. After all, scholars have no more evidence for Paul’s geographic location than he gives in his letters, and scholars have already thoroughly discussed this evidence. Another basic tenet of historical-criticism, however, is that we understand an author’s history better when we put it in relationship to the histories of other authors. This goes for geography as well. That means that I should put Paul in geographical relationship with the apocalyptic texts I will study.

But this process will be more than simply plotting each work’s points of origin on a map. Since GIS is driven by databases, one can query the databases and display the results geographically. For instance, I may find that certain texts assert that the Messiah will descend with angels before the final judgment. If I have geographical data for these texts, I can tell GIS to show the place of origin of all texts that meet these criteria. I could then discover that all of these texts come from a certain area or that they all fall along a certain trade route. I might also discover that they have no apparent geographical similarity. And that is the beauty of GIS. I can follow leads quickly enough that pursuing a red herring no longer requires wasted hours or days. I can check out multiple leads in the time it would take to follow one lead manually.

The ultimate question, however, is how this technology could help my research. One scenario will make its usefulness apparent. I will consider dozens of apocalyptic texts. If I find that a Paul shares some textual characteristic with only 2 of these texts, I would be hard pressed to show that these three sources by themselves demonstrate an historical pattern. But, if I could show that all three of these texts originated in approximately the same area at approximately the same time, I would show that the texts share more than just textual characteristics. This demonstration would relate the texts more closely to one another and thus strengthen my argument that the textual similarity represents a geographically specific historical pattern. Once such a pattern is recognized, I could interpret these three texts together to reach a fuller understanding of the textual characteristic that is partially represented in each text. And with GIS, one is not limited to analyzing one relationship at a time. One can also assign different symbols to texts depending on which characteristics they have. In this way, one can produce a graphical representation of textual features that may suggest relationships that otherwise would not have been clear. In the end, GIS technologies make it easier to analyze and visualize geographical relationships among texts. As a result, my interpretation of Paul would be based more firmly in Paul’s own historical circumstances.

A Kindle for Every Student?

March 30th, 2009 § 2

The blogosphere has been abuzz with diverse opinions on the release of Amazon’s new Kindle 2. So far, most of the news has surrounded the controversial text-to-speech function and whether or not it violates copyright law (more on this here and here). Regardless of its legality, the speech sounds mechanical, and I don’t see this posing a threat to genuine audio books read with intonation by real people. But my interest is not in this primarily, but in reading via ebook itself. I’ll admit, when it comes to ebooks, I’m still in the undecided camp. On the one hand, I love technology, and can’t resist the latest gadget. On the other hand, I consider myself a “book person.” And the book as physical object matters to me. I want to be able to pick it up, smell it, leaf through the pages. I’m guessing there’s not much to be said for ebook smell.

Where Kindle does seem to have gotten it right is in the screen. I can’t read books or journal articles on my computer screen, because it’s just not like reading a book. There’s too much glare, it puts too much strain on the eyes, and it’s too distracting. Kindle has solved the glare and eyestrain problems with “electronic ink,” a new technology designed to make letters look more like they do on the written page. From what I can tell, this is a vast improvement over the first generation of ebooks. But what of the distraction factor? Christine Rosen argues that the Kindle is too distracting to generate productive reading. She tried to read Nicholas Nickleby on a Kindle, but got lured away into Wikipedia searches on Dickens. Alan Jacobs, however, argues almost the opposite. He writes that it is too hard to navigate pages to get to the internet—and that this is a good thing. It keeps you reading, because it’s too hard to leave the book you’re on. If I’m going to shell out the money for an ebook reader, though, I’d want it to do as many things as possible. Ultimately, the discipline required to stave off distraction is not inherent to the print book, but to the act of reading. It is something that is learned by readers, to varying degrees, when they learn to read (on this point, see a recent post, “In Defense of Readers”). I can read a novel in a crowded coffee shop, on a busy beach, or just about anywhere without getting distracted. I can’t say the same of my computer. But an ebook needs to be able to do neat techie things that a computer can do in order to be worthwhile. After a point, it’s up to the user to learn how to read well on it.

What would I want an ebook reader to do? Here’s my wish list: I’d like to be able to read a book without interruption. But I’d also want to be able to read journal articles. Could I go to the UVA library website, get an article as .pdf from JSTOR, and read it on a Kindle? This is a copyright nightmare. But it sure would be nice (I did say that this was a wish list). For all of the ancient language work I do, I’d also love to have a Greek dictionary, or maybe even a bible program with ability to switch languages. I have this on my PDA, but a PDA screen is too small and too difficult to read from for any length of time. The same goes for an ipod/iphone. This sort of basic reference work requires huge volumes of books that can’t be lugged around everywhere, but would always be at hand with an ebook reader. This would be of tremendous value for students. Even though many have praised the fact that the Kindle and its biggest competitor from Sony so far have limited the number of tasks that they can perform, this will ultimately restrict their usage to a niche market: avid readers with money to spend on gadgets. Unfortunately, this is a rather small market. Right now, ebook readers are not made to read all books. They are best suited for reading novels, especially quick page-turners. This leaves out all sorts of books, that are read in all sorts of different ways. Dictionaries aren’t read in the same way as Grisham thrillers, but they’re still books and they’re still “read” in their own way. The difference, I imagine, is that Amazon doesn’t have as much interest in books that are re-read or referenced, because they aren’t “consumed” in the same way and don’t create the need to go buy another book when one is finished.

For me, the Kindle is still too proprietary, in terms of what can be read on it, and too limited, in terms of its non-reading functionality. Ebooks won’t replace books, unless they can do “e” things. I guess I’m OK with that. I like books.

Day of Digital Humanities 2009

March 26th, 2009 § 0

Ever wonder how folks in the Scholars’ Lab spend their day?  Bethany Nowviskie, Director of Digital Research & Scholarship at the UVA Library and Joseph Gilbert, Head of the Scholars’ Lab, recently participated in the “Day in the Life of the Digital Humanities” project initiated by our friends at the University of Alberta.  The “Day of DH” project encouraged scholars, administrators, students, and others who self-identify as “digital humanists” to blog about their day on March 18, 2009.  You can read about Bethany’s day and Joseph’s day, as well as the experiences of a host of other participants.

Ada Lovelace Day

March 24th, 2009 § 1

Today has been declared — quite spontaneously, and to the cheers of a great many people — Ada Lovelace Day, a day on which to honor women working in technology by writing blog posts about their often-unsung achievements, and about ways in which they inspire and challenge us.

» Read the rest of this entry «

Rome Reborn

March 11th, 2009 § 1

My wife and I frequently engage in a strange kind of “culture war.” She thinks ancient Rome is the more interesting civilization, and I’m partial to ancient Greece. In these debates, I always tell her that I prefer philosophers to politicians. Still, I was excited when I first encountered Rome Reborn, a joint project between UVA’s Institute for Advanced Technology in the Humanities, a few other schools, and Google (who allows access to the project through Google Earth). The goal of Rome Reborn is to create a 3D digital model of ancient Rome in the year 320. There are plans to extend the project over time, so that you will be able to track the development and growth of the city over time. The buildings have all been reconstructed by computer modeling, and mapped onto Rome’s actual terrain. What a cool project.

I should say, before continuing, that if you want to check out Rome Reborn for yourself, you might have some trouble getting to it. First, you need to download Google Earth. Then, you need to turn on the “Ancient Rome 3D” layer, which listed under the “Gallery” layers. Next, get to Rome, zoom into the ancient city and click on a yellow building, which brings a popup window to add the ancient terrain, landmarks, and buildings. Then, you are finally ready to enjoy the model. (But be warned, if you don’t have a good computer with a fast processor and a hefty bit of RAM, you’ll only send yourself into conniptions rather than enjoy the grandeur of this ancient civilization.)

My first impression, in wandering through the reconstructed forum on Google Earth, was of how chock-a-block the buildings are. You realize how many of the buildings are right on top of one another. You do get this feeling in person, walking around the ruins, but the 3D model captures the hustle and bustle of a true big city that is not conveyed adequately by pictures alone. This project will help scholars puzzle over details of the architecture itself, but having it available to such a wide audience on Google will also help those just learning about Rome. It has the potential to spark students’ interest in learning—for me, this is well worth the effort.

Mining and Mapping Apocalyptic Texts, Part 1

March 11th, 2009 § 0

I have used computer technology to help my work in biblical interpretation for a while. I learned to do complex digital word searches with the Bibleworks software package early in my graduate career. When I started working at the Scholars’ Lab in the summer of 2006, I was introduced to digital humanities. I found these technologies fascinating. But how, I asked, could they help me interpret ancient religious texts in their original languages? I recently posed this question to some of my colleagues in the Scholars’ Lab and was pleasantly surprised by the answers. In this two-part blog, I will consider these answers in relation to my dissertation, which focuses on several passages in the Apostle Paul that speak of the final fate of humankind. Some of these passages suggest that all people will, in the end, be made right with God. Other passages suggest that some people will be permanently alienated from God. I wish to discover the central kernel of Paul’s thinking about the fate of humankind (called soteriology) that would make both of these statements true. In this first entry, I will focus on how I plan to use text-mining to enhance my ability to compare dozens of Greek, Hebrew, and Latin texts with each other more quickly and more thoroughly than I could manually. The second part will focus on how geographic information systems (GIS) will help me to place Paul’s writings in spatial relationships with other writings of the same time.

Text-mining involves parsing a digital text, inserting the words along with their linguistic features into a database, searching for patterns within the database, and, finally, evaluating the results. In my case, I will use text-mining methodologies to extract linguistic data from the Pauline texts as well as other early Jewish texts that speak of the fate of humankind. This process will be fairly straightforward for the Pauline texts. There are many versions of the Greek text of Paul that have linguistic data attached to the words. One need simply extract this data from the text and insert it into a database. I will use the text of the Bibleworks 6 package as my source for Paul. For other texts, this process will not be as easy. For instance, the Thesaurus Lingua Graecae has a huge collection of Greek texts for the period that interests me. But they have no linguistic data attached to the words. To attach linguistic data to these words, I will need to write a script, probably in PERL, to query the open source parsing engine from the Perseus Project at Tufts University (www.perseus.tufts.edu). I will then insert the results from these queries into the database for that text.

The next step will be to design queries that will find appropriate relationships among the texts. Good methodology requires that I test my queries against a set of texts for which I know the results. I will test my queries on several Greek apocalyptic texts which I have already read carefully, noting the sections that relate closely to Paul. Once I have designed a set of useful queries, I will apply these queries to the databases I created earlier.

The application of these queries should point to numerous texts that I will then manually analyze to determine their meaning and how they relate to the Pauline passages under investigation. If my investigation were purely manual, I would begin by reading the texts in English in order to find promising passages that I would then examine more closely in their original language, whether Greek, Hebrew, or Latin. This digital method, though, will do this first analysis using the original languages. That means that this computerized comparison of the texts in their original languages will find verbal and grammatical similarities that may be obscured or destroyed in translation. In the end, I expect text-mining to return data that would be only partially accessible by manual means.

In my next post, I will consider how another area of digital humanities, geographic information systems, can help me to explore how the Pauline and apocalyptic texts are related spatially, instead of linguistically, to each other.

Library Innovation Grant Yields Dividends for Numismatists

February 22nd, 2009 § 0

Ethan in the SLab

A recent post by Ethan Gruber, a UVA Library staff member who has lately joined the Scholars’ Lab team, detailed his experiments with 3-dimensional modeling to re-contextualize Roman mosaics — right down to the interplay of light and shadow in ancient villas. Now Ethan’s work on creating a scholarly interface for the study of Greek and Roman coins has been profiled in UVA Today. This project came about through an internal UVA Library Innovation Grant and was undertaken in consultation with Art History professor John Dobbins, a 1994 IATH Fellow, whose Pompeii Forum project provided an early example for the utility of digital tools for archaeological inquiry. The rare coins were scanned by Andrew Curley of the Library’s Scholarly Resources Digitization Services.

Photo credit: Dan Addison. Read the full UVA Today press release here, or jump straight to the coins collection.

Research Applications for 3D Models in Art History

February 11th, 2009 § 4

These days, it is difficult to find a television documentary detailing an archaeological site that does not feature a representation in the form of a 3D model. Computer models make good teaching tools. A class of students may not have the opportunity to travel to Rome to view the Colosseum first-hand, and even if they did, they would have great difficulty visualizing what the mostly-ruined structure looked like 1,900 years ago. A model based on the most recent archaeological research, however, can help fill in the gaps left by time and the elements.

One of the more important aspects of a computer model is that it is dynamic. Using software, a model can be adjusted to reflect newer theories of the site’s architectural reconstruction. This is certainly a stark contrast to artists’ sketches and paintings, which, over time, tend to become outdated. Importantly, like other visualization methods used in the humanities (such as GIS), 3D models can help scholars get a fuller picture of a site and formulate research questions that never would have been considered otherwise.  This is the case in my most recent research.

Having never truly given up on the video game design aspirations of my high school days (I specifically remember my father turning the breaker off to the upstairs when I was up until 4 AM designing a Quake map), I have found a niche within my field of academic interest—Roman archaeology and architectural history. While many of my Pompeianist classmates take a more traditional approach to graduate research projects, I chose to develop a 3D model of the House of the Faun, one of the largest and most famous houses in the city. The model was constructed as accurately as possible based on the archaeological plan, a number of artists’ reconstructions, and photographs of the house (many gathered from Flickr).

The intent of the model was to test art historians’ philosophical assertions about Roman atrium houses.  With accurate lighting simulation (i. e., calibrating a simulated sunlight to the latitude and longitude of the house and to any point in time back to antiquity), high resolution images of the model rendered by Mentalray software gave me a glimpse of what the House of the Faun looked like at noon on January 1st, 100 B.C., which is something no artist can replicate.

Coincidentally, lighting simulation may have an impact on how we consider the artwork within the house. For example, when many art historians point to the colors of a mosaic as being proof of its Greek influence, can that assertion bear the burden of the fact that the mosaic was rarely in sunlight?

House of the Faun

Many of us have seen Roman floor mosaics hanging on the walls of American and European museums, but they have been removed from their original context. Even in Pompeii, one of the best-preserved sites of the ancient world, the roofs collapsed long ago, making it difficult to visualize the natural lighting scenario within the House of the Faun and other structures within the city. 3D models allow us to put artworks back in their original context and consider how the ancients viewed them, which is quite different from how we view them now. In this case, the computer model is more than just a teaching tool; it is a scholarly research tool.

On “Asian American” Digital Identity Politics

February 4th, 2009 § 2

Everyday, I receive Google Alerts about any websites, blogs, or news feeds containing the keywords “Asian / American / music” in whatever order and combination that Google search engine finds. Most of the Alerts, unsurprisingly, point to stories related to U.S. politics. Interestingly, around the time of the 2008 Presidential Election, my InBox experienced a minor Google Alert “explosion” with news stories and criticisms listing all the color-based social groups, connecting Obama’s racial politics to the now dominant American ideology of multiculturalism. To my disappointment, none of these news stories included anything substantial information with regards to the Asian American (if there is such a thing) perspective on the Obama and Biden duo.

Is “Asian American” coming to stand in for a keyword, tag (in the speak of blogosphere), or a hip buzzword in our current media environment as digitally informed and constructed? Is there “real content” beyond the textual reference of “Asian” and “American”? If so, how do we assess this content considering the methods of information retrieval, i.e. Google Alerts, and the context of presentation, i.e. hypertextual state of Internet media?

Today, my Google Alerts linked me to a couple of exciting pages of content-worthy materials related to Asian American arts and culture. One of these is a New Yorker article titled “By the Skin of Our Teeth” about “The Shipment”, the new play by Young Jean Lee. The reviewer Hilton Als comments on the Lee’s “irreverent take on racial politics.” Commenting on her 2005 play “Songs of the Dragons Flying to Heaven”, featuring the self-violence of an Asian American female character, Lee declares her attitude toward the state of identity politics in the U.S: “For this project, I decided the worst thing I could possibly do was to make an Asian-American identity-politics show, because it can be a very formulaic, very clichéd genre, and very assimilated into white American culture. It’s almost become part of the dominant white power structure to have identity-politics plays about how screwed-over minorities are. It’s such a familiar, soothing pattern. . . . It’s become the status quo.”

When I read the passage, I thought to myself, “now, here’s a kernel of wisdom” worth pursuing. What does she mean by “identity-politics show”? What consists of this ‘cliché genre’ of formulaic and assimilationist plays? A good content analyst would seek information about the playwright and this play. Before I jumped into my usual mode of performing a search on Google or Wikipedia search on Young Jean Lee, I slowed down and pondered about the path of information that allowed me to arrive at this intellectually compressed bit of information.

The New Yorker tags this article with the following keywords: “The Shipment”; Young Jean Lee; Korean-Americans; Douglas Scott Streater; Race Relations; Asian-Americans; “Pullman, WA.” Google search engines must have picked up this article because of the tag “Asian-Americans.” But search engines are not able to make a qualitative distinction between this article [or other substantive articles] from the sources that simply use “Asian American” as a stand-in for cultural multiplicity and diversity. Unfortunately, Asian America still exists, in the digital environment, mostly under a pile of diversity-bound laundry lists at best, or pornography and ads for mail-order brides or other forms of race-related sex industry, at worst.

The risk of being pigeonholed, tokenized, or even sexualized is no news to individuals of Asian descent in the United States. Playwright Young Jean Lee asserts provocative and vehement critiques for the discursive objectification of Asianness in her 2005 play which opens with a monologue by a woman with the name of “Korean-American”:

“Have you ever noticed how most Asian-Americans are slightly brain-damaged from having grown up with Asian parents? It’s like being raised by monkeys—these retarded monkeys who can barely speak English and are too evil to understand anything besides conformity and status. . . . Asian people from Asia are even more brain-damaged, but in a different way, because they are the original monkey. . . . I am so mad about all of the racist things against me in this country, which is America. Like the fact that the reason why so many white men date Asian women is that they can get better-looking Asian women than they can get white women because we . . . have lower self-esteem. It’s like going with an inferior brand so that you can afford more luxury features.”

This is intellectually dense, emotionally heavy stuff. But the fact that it’s available in a point-and-click fashion is astounding. Google Alerts prevent information from fossilization. Without Google Alerts, I would find this article somewhere down the line when I do archival search, plowing through databases for historical artifacts. The newness and immediacy of this information would be lost. Also, it would take many more steps to link this article to other articles related to the subject of “Asian / American / music” published today.

The other noteworthy piece Google Alerts linked me to is an interview of jazz pianist Vijay Iyer by RVAjazz blog entitled “Intellect Meets Creativity.” Iyer speaks reflexively about his role as an Indian American musician in the Afro-centric tradition of jazz music: “I’m just fortunate to be able to interact with the music from my perspective, and to reconsider what resonances there might be with my own experience, or with anyone’s. The point is to honor that legacy and not commodify it, but also to learn from it. I think that America was invited to reconsider a lot of this in light of the ascent and success of Obama. Those are symptoms of a larger development in our culture – it’s about who we are and where we are and what time it is!”

The juxtaposition between the New Yorker article on Young Jean Lee’s play and Vijay Iyer’s interview is intellectually curious. Iyer’s perspective on race in America is less dystopic than Lee’s. In fact, his alliance with African American culture and struggle speaks to a larger discourse about race in terms of minoritarian politics, quite contrary to the uncritical multiculturalist orientation. Iyer’s interview could tap into the historical and contemporary moments of Afro-Asian connections formed in anti-racist solidarity.

My research aims to track these moments deliberately and shamelessly, making links and disconnects among them as they occur in real time. Information as such, categorized and recategorized based on similar or dissimilar terms, is generated and circulated at high volume daily on the Internet. Digital technologies allow discourse to flow in disparate, rhizomatic directions. The hypertextual state of Internet media is overwhelming to sort through, but this quality allows information to seep into unexpected cracks and generate surprising juxtapositions. Similar to keywords and tags, identity categories, also reproduce themselves in a semi-irrational, hypertextual fashion in our time. These contradictory patterns as discovered in the digital environment may best represent the schizophrenic style of identity proliferation that would mark our post-identity-politics (or post-Race) age.

Where Am I?

You are currently browsing the Digital Humanities category at the Scholars' Lab.