June 30th, 2010 § Adam Soroka
In our work on Neatline, we have made a deliberate choice to start by constraining ourselves to map-sources that are quickly and easily provided through WMS. This leaves out (for now) two popular sources of map imagery; Google Maps and Open Street Map. I’m going to explain why we made that choice, and why, when we do come to make these sources usable with Neatline, we will do so with great care and with an eye to scholarly method. » Read the rest of this entry «
May 7th, 2010 § Jean Bauer
Jean Bauer, former Scholars’ Lab Graduate Fellow in Digital Humanities announces: “I have just released my first open source project. HUZZAH!”
DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and a pipe separated customization file and uses them to produce an interactive, color-coded, annotated diagram similar in format to UML. There are many applications that will create technical diagrams based on database schema, but as a digital humanist I require more than they can provide. » Read the rest of this entry «
March 25th, 2010 § Kelly Johnston
In the Scholars’ Lab we are working with remarkably detailed datasets showing changes to US political boundaries over time. We’ve all been fascinated with visualizations where the familiar outlines of the US states emerge from thousands of boundary changes to their underlying counties over the last few hundred years. Did you know Virginia once spanned from the Atlantic Ocean to the Mississippi River?

We’re developing a new web-based tool for visualizing these historic boundary changes and it’s nearly ready for prime time. We’ll announce the beta release here soon.
So with the knowledge that US state boundaries have already been subject to drastic change over time, let’s have some fun with geographic information systems to visualize drastic mathematically-induced changes to those familiar US state boundaries.
For our experiment, let’s keep all our current state capital cities right where they are since they are laden with the necessary infrastructure of government. But we’ll move the state boundary lines Voronoi-style so anywhere you travel in each of our new states you’ll be closer to the state capital than any other state capital. In other words, when you’re standing anywhere inside our newly outlined Virginia, you will always be closer to the Virginia state capital, Richmond, than any other state capital. That seems very efficient. Let’s have a look.

Here’s that familiar grade-school wall map of the lower 48 US states and their capital cities. Now let’s tweak the map with GIS software to reconfigure the states, Voronoi-style.

Wow, what a difference Voronoi makes.
Let’s measure just how much the states have changed in our new layout. In absolute terms, Utah and New Mexico make the biggest land grabs while Texas and California lose the most real estate. But as a percentage of their current area, Rhode Island is the big winner ballooning in size by over 240% while Massachusetts shrinks 60%.
To visualize the state-by-state changes, Todd Burks from neighboring Clemons Library overlayed the two maps.

Intrigued? Read more about Voronoi and Thiessen polygon GIS techniques.
February 15th, 2010 § Jason Kirby
Hello. In my last blog, I began my discussion of Pandora.com, the streaming audio website which offers a new kind of web radio to listeners. Enter a “seed” song into Pandora’s search engine, and the site will create a streaming “station” composed of songs that resemble your seed song. This process is powered by the Music Genome Project, a massive research endeavor which began in the early 2000s and is based out of the company’s Oakland, California headquarters.
How is Pandora’s song-recommendation engine different than web radio platforms that came before it? Well, the majority of other online radio stations, such as last.fm, operate off a system called collaborative filtering. What is collaborative filtering? In layperson’s terms, collaborative filtering involves matching one user’s taste to another’s (or a series of other people). On a site like last.fm, over time a user amasses a playlist of songs they’ve expressed a preference for—a sort of musical taste profile. Last.fm’s search tools automatically identify other users with whom your tastes seem to overlap, and uses this information to power “radio” stations you can stream on the site. The process is pretty simple, and based on personal intuition and the data existing users have already entered into the system. Collaborative filtering powers aspects of many media websites, such as Amazon.com’s personal recommendation feature for shoppers.
» Read the rest of this entry «
May 4th, 2009 § Wendy Hsu
At the onset of my field research in summer 2007, I launched a blog – YellowBuzz.org – with the intention to: 1) archive and organize my field notes in textual and audio-visual form; 2) convey my research purpose and progress to informant musicians and the public; 3) self-position as a “participant” in the scene. Since then, I have made over 160 posts, some directly linked and others tangentially related to my research findings about the activities and media of Asian American indie rock musicians. Over the past one and a half years, my field research blog has received attention from both print and online media. Evidently, this blog has constructed a community consisting of musician- and music-enthusiast-visitors with an interest in Asian American and transpacific music-culture. » Read the rest of this entry «
April 8th, 2009 § Jason Kirby
Hello, it’s been a while since I blogged. You may remember me as the music Ph.D. student who was last heard from pondering the uses of Google Scholar. I’m on a new mission this semester, studying for my comprehensive exams. One of the topics I am researching and preparing an essay on is about genre in popular music. The concept may seem initially so self-evident, you may wonder what there is to write about it, per se. Oh, but there’s lots. This is because the issue of genre always involves the issue of classification, which inherently provokes debate. Take, for instance, a star performer like Beck. His music often includes acoustic guitar, and he’s covered Mississippi John Hurt. So he must be a folkie. Oh wait, but he also apes Prince on some funky jams. So maybe he’s a pop star. But he also headlines a bunch of big rock festivals, and we find his music in the “Rock” section at the record store (wait, what’s a record store?). So I guess we’ll call him a rocker.
My point being, popular music can be difficult to pin down using genre tags. You’ll find this evidenced in any number of press interviews with musicians who, when pressed by a journalist, pull out that time-worn chesnut that their sound is “unclassifiable”. Genre tags, be it pop, country, rock, hip-hop, salsa, what have you are almost like identifying pornography: I’ll know it when I see it. It’s often somewhat easier to identify what a genre isn’t than what it actually is. » Read the rest of this entry «
April 3rd, 2009 § Ethan Gruber
Following up on my introduction to using 3D models to recreate archaeological sites and perform meaningful academic analysis on simulated virtual environments, I will discuss in further detail my current project concerning the recreation of the House of the Drinking Contest in Seleucia Pieria, the port city of Roman Antioch.
» Read the rest of this entry «
March 30th, 2009 § Sara Henary
For a long list of complicated reasons, most practitioners of my discipline—political theory—tend to be suspicious of, if not altogether opposed to, the integration of computer technology into their research and teaching. While some scholars cite the superfluity of computer technology to the discipline (excepting, of course, Microsoft Word), others argue that the introduction of certain technologies might somehow actually endanger both thinking and learning (and who wouldn’t find the reduction of Plato to a series of PowerPoint slides, well, a tad reductive?).
Nevertheless, working at the Scholars’ Lab has afforded me the opportunity to sample a range of digital scholarship tools/resources, some of which might appeal to that most skeptical of techno-skeptics, the political theorist. One such resource is TAPoR, the Text Analysis Portal for Research, a website that provides access to tools used in the analysis of electronic texts.
Many classic texts are now available in electronic form, and “computer-assisted text analysis” (TAPoR website) enables the researcher to explore a text in ways that are difficult, if not impossible, using only conventional tools for text analysis such as the index or a concordance (though much electronic text analysis is modeled conceptually on both of these). This is generally accomplished by allowing the researcher to search a text for specific words or word patterns or to generate a listing of the most frequently used words. In the case of TAPoR’s word/word pattern search, the results are displayed in the context of the surrounding text—the words sought are in bold, while several additional words on either side give the researcher some sense of how the words are being used. One may employ this kind of analysis with many ends in view, though some common goals include: a) testing to see whether—and if so, how often—an author employs either specific language or a certain kind of language and b) exploring certain words or phrases in context in order to gauge the narrowness or expansiveness of the author’s meaning when such language is used.
In order to explore the features and capacities of TAPoR, I brought to the portal an aspect of a particular research question that I had been thinking about for some time. To write my dissertation, I must provide at least a provisional answer to the following question: Does John Locke articulate a consistent view of the human person throughout his corpus? Although I was very familiar with the way he spoke about the “person” in one text, I was less familiar with his usage in other texts. After sketching a brief definition of the “person” based the first text, I proceeded to investigate whether Locke spoke of the “person” in similar terms in a second text. I pasted the URL of a webpage that contained the second text into TAPoR, which I then asked to search the document for the word “person.” I performed several additional searches, using other key words/phrases from my original definition as well as others that came to mind as I was searching. The results were illuminating. I discovered that although Locke tended to use the word “person” in a more ordinary, less philosophical sense in the second text, all of the basic features of the first text’s conception were nevertheless present. While this confirmed the intuition I had about the consistency of Locke’s view of the “person” across his texts (or at least two of them), the specific instances of personhood language that I isolated with the help of TAPoR will allow me to present a much more convincing defense of my position in the dissertation. Additionally, the fact that TAPoR allows the researcher to view all results of a word search simultaneously helped me to formulate more precisely what was going on in the text and to relate it to my more general argument–i.e. a looser, more familiar usage of “person” in certain contexts can co-exist with a unified, consistent account of personhood.
TAPoR enabled me to “see” more than I otherwise would have in a text and could be a valuable resource for scholars in any field concerned with the close and careful reading of texts.
March 11th, 2009 § Wendy Robertson
So for the thousandth (or so it seems) time I’ve gotten into this discussion with my friends from the East Coast and Midwest (I’m from Texas) about the correct way to refer to a sweet carbonated beverage, and I have finally got to thinking about ways to map locally spoken slang and jargon using GIS. Starting a database of ‘events’ where a person uses unique language in reference to a common-place item or occurrence (I have a friend from Wisconsin who calls the drinking fountain a “bubbler”) would be an insightful way to examine how jargon or slang starts and spreads geographically.
So I decided to indulge my curiosity and create a small database consisting of the answers to two quick survey questions; What do you call sweet carbonated beverages?, and what state do you identify yourself as being “from”?. I solicited friends and colleagues for the answers to these questions and ended up with about 150 useable responses (if you were one of the people who responded with “beer”, I thank you for the interest in the survey, but your answer was not included). I chose to ask this question (please bear in mind that linguistics is not a focus of my studies) because regardless what you refer to it as, most people have had experience with a coke/soda/soda-pop/pop, which isn’t true for all objects of regional jargon (example: before moving to the East Coast I had never seen nor heard of scrapple) and I wanted to document the geographical extent and overlap of a single object rather than attempt to compare multiple similar objects with this first foray.
Approximately 94% of respondents identified that they referred to sweet carbonated beverages as either “coke”, “soda”, “pop”, or “soda-pop”, so I chose to focus the mapping of this data on those four responses. I took the responses I received and calculated a ‘count’ by state of each type of response; for example, I received a total of 4 responses from people who identified as being from the state of Missouri. Three of the respondents refer to sweet carbonated beverages as “coke”, and one refers to it as “soda”. I took these counts and normalized them to the total number of responses received from the state and used that percentage to map the responses by state broken into ~25%, ~50%, ~75%, and 100%. For each response (coke, soda, pop, soda-pop) I chose a single color to represent responses on the map, and varied the transparency of the color to represent the percentage of the response (25% response = 75% transparency, 50% response = 50% transparency, 75% response = 25% transparency, and 100% response = 0% transparency). I mapped all four responses separately first (figure 1).

I chose to vary transparency as opposed to saturation of color (eg: monochromatic choropleth) because I wanted to be able to overlap the response maps to visualize the confluence of the regional terms yet keep the original colors of each response (figure 2).

The map above shows the overlap of “coke” responses with “soda” responses, which are displayed by the variation in colors from bright red where 100% responses were “coke” to bright blue where 100% responses were “soda” and various shades of purple and pink in between where there was a mix of responses in that state. This kind of map can be created using a map with a double ended scale, but that type of visualization is limited to displaying the spectrum between two absolute responses, which would mean that I could only display the confluence of two responses rather than all four (figure 3).

One interesting thing I noticed when looking at the results of this survey is that I need to meet more people from the Pacific Northwest section of the country. The other interesting result I noticed which is more pertinent to the questions asked in this study is the confluence of the regional jargon that occurs in the region that includes Kentucky, Indiana, Ohio, and Illinois. This area represents the confluence of the “soda” and “pop” responses and is also the region with responses of “soda-pop”, a hybridization of “soda” and “pop”.
This exercise seems to make the argument that assembling databases of ideas such as regional jargon and using tools like GIS to display that information is a thought provoking and possibly worthwhile endeavor. (I’d like to thank all of my friends and colleagues who participated in this survey that allowed me to assemble and produce this study for digestion by the blogosphere. Thanks, you guys!)
March 11th, 2009 § Matt Munson
I have used computer technology to help my work in biblical interpretation for a while. I learned to do complex digital word searches with the Bibleworks software package early in my graduate career. When I started working at the Scholars’ Lab in the summer of 2006, I was introduced to digital humanities. I found these technologies fascinating. But how, I asked, could they help me interpret ancient religious texts in their original languages? I recently posed this question to some of my colleagues in the Scholars’ Lab and was pleasantly surprised by the answers. In this two-part blog, I will consider these answers in relation to my dissertation, which focuses on several passages in the Apostle Paul that speak of the final fate of humankind. Some of these passages suggest that all people will, in the end, be made right with God. Other passages suggest that some people will be permanently alienated from God. I wish to discover the central kernel of Paul’s thinking about the fate of humankind (called soteriology) that would make both of these statements true. In this first entry, I will focus on how I plan to use text-mining to enhance my ability to compare dozens of Greek, Hebrew, and Latin texts with each other more quickly and more thoroughly than I could manually. The second part will focus on how geographic information systems (GIS) will help me to place Paul’s writings in spatial relationships with other writings of the same time.
Text-mining involves parsing a digital text, inserting the words along with their linguistic features into a database, searching for patterns within the database, and, finally, evaluating the results. In my case, I will use text-mining methodologies to extract linguistic data from the Pauline texts as well as other early Jewish texts that speak of the fate of humankind. This process will be fairly straightforward for the Pauline texts. There are many versions of the Greek text of Paul that have linguistic data attached to the words. One need simply extract this data from the text and insert it into a database. I will use the text of the Bibleworks 6 package as my source for Paul. For other texts, this process will not be as easy. For instance, the Thesaurus Lingua Graecae has a huge collection of Greek texts for the period that interests me. But they have no linguistic data attached to the words. To attach linguistic data to these words, I will need to write a script, probably in PERL, to query the open source parsing engine from the Perseus Project at Tufts University (www.perseus.tufts.edu). I will then insert the results from these queries into the database for that text.
The next step will be to design queries that will find appropriate relationships among the texts. Good methodology requires that I test my queries against a set of texts for which I know the results. I will test my queries on several Greek apocalyptic texts which I have already read carefully, noting the sections that relate closely to Paul. Once I have designed a set of useful queries, I will apply these queries to the databases I created earlier.
The application of these queries should point to numerous texts that I will then manually analyze to determine their meaning and how they relate to the Pauline passages under investigation. If my investigation were purely manual, I would begin by reading the texts in English in order to find promising passages that I would then examine more closely in their original language, whether Greek, Hebrew, or Latin. This digital method, though, will do this first analysis using the original languages. That means that this computerized comparison of the texts in their original languages will find verbal and grammatical similarities that may be obscured or destroyed in translation. In the end, I expect text-mining to return data that would be only partially accessible by manual means.
In my next post, I will consider how another area of digital humanities, geographic information systems, can help me to explore how the Pauline and apocalyptic texts are related spatially, instead of linguistically, to each other.