<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Scholars&#039; Lab &#187; Visualization and Data Mining</title>
	<atom:link href="http://www.scholarslab.org/category/visualization-and-data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.scholarslab.org</link>
	<description>Works in Progress</description>
	<lastBuildDate>Thu, 09 Sep 2010 18:42:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>WMS vs. tilecaching</title>
		<link>http://www.scholarslab.org/geospatial-and-temporal/wms-vs-tilecaching/</link>
		<comments>http://www.scholarslab.org/geospatial-and-temporal/wms-vs-tilecaching/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 11:59:35 +0000</pubDate>
		<dc:creator>Adam Soroka</dc:creator>
				<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[gis]]></category>
		<category><![CDATA[historic]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[plugins]]></category>

		<guid isPermaLink="false">http://www.scholarslab.org/?p=853</guid>
		<description><![CDATA[In our work on Neatline, we have made a deliberate choice to start by restraining our work to map-sources that are quickly and easily provided through WMS. This leaves out (for now) two popular sources of map imagery; Google Maps and Open Street Map. I’m going to explain why we made that choice, and why, when we do come to make these sources usable with Neatline, we will do so with great care and with an eye to scholarly method.]]></description>
			<content:encoded><![CDATA[<p>In our work on <a href="http://neatline.org/">Neatline</a>, we have made a deliberate choice to start by constraining ourselves to map-sources that are quickly and easily provided through <a href="http://www.opengeospatial.org/standards/wms">WMS</a>. This leaves out (for now) two popular sources of map imagery; <a href="http://maps.google.com/">Google Maps</a> and <a href="http://www.openstreetmap.org/">Open Street Map</a>. I&#8217;m going to explain why we made that choice, and why, when we do come to make these sources usable with Neatline, we will do so with great care and with an eye to scholarly method.  <span id="more-853"></span></p>
<p>All two-dimensional maps (as opposed to globes) are <a href="http://en.wikipedia.org/wiki/Map_projection">projected</a>. That is, the curved three-dimensional surface of the Earth is transformed onto a flat two-dimensional surface. This can be done in an infinite variety of ways, many of which have been mathematically characterized and named by cartographers, for whom they are necessary tools. We must note, however, that no such transform can obtain a perfect representation of a section of the Earth. The mapmaker must choose which qualities to preserve and in what measures. Is it more important to provide an accurate depiction of relative areas or of relative lengths? Is the area around Greenland to be kept in the focus of accuracy, or that around New Zealand?</p>
<p>Each map therefore carries with it from its creation certain choices like these, part of the arguments the map makes about the world by its very construction. We chose WMS on which to start building our tools because, amongst other reasons, it allows for the transmission of projection information as part of its operation. This fact allows us to produce imagery from historical maps (themselves in any number of projections) and maintain the original choices the mapmaker made. Google Maps and Open Street Map are not WMS sources. They can be described as tile caches, huge reservoirs of rendered imagery. As such, they offer their own choices about how the world is to be projected. (Google&#8217;s choice has become so closely associated with Google that it is known widely as &#8220;the Google projection&#8221;.)</p>
<p>Now we come to an important technical distinction; WMS services are able (depending on the capabilities of the specific software in use) to reproject their contents. That is, in response to a specific request for imagery, they can produce the imagery in a projection different from the one in which it was stored. <a href="http://geoserver.org/display/GEOS/Welcome">GeoServer</a>, the software we are using for Neatline, has a library of thousands of projections to which users can add more as desired. This allows us to take imagery from a WMS source and lay it under a historical map layer while maintaining the original projection for that of the map as a whole. Tile caches, by and large, do not allow for this. (Google Maps offers its one projection, and Open Street Map offers two.) This means that in order to lay historical map imagery over a layer from one of these sources, we would have to reproject the foreground (historical imagery) overriding the choices of the mapmaker and introducing additional choices of our own about what facets of the geographies at stake are to be preserved and which abandoned.</p>
<p>(Neogeographers will remark that georectifying a digital image introduces similar issues. This is true, but unavoidable for our purposes. We would like to avoid compounding the matter in a way that is subtle and hard to detect.)</p>
<p>We are working out means by which we can provide the undeniable utility of popular tilecaching services in a way that is respectful of the historical context and story of map artifacts. Until we do, we will continue to concentrate on the more flexible and sophisticated apparatus provided by WMS.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/geospatial-and-temporal/wms-vs-tilecaching/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing DAVILA</title>
		<link>http://www.scholarslab.org/digital-humanities/introducing-davila/</link>
		<comments>http://www.scholarslab.org/digital-humanities/introducing-davila/#comments</comments>
		<pubDate>Fri, 07 May 2010 19:10:46 +0000</pubDate>
		<dc:creator>Jean Bauer</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Grad Fellows]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://www.scholarslab.org/?p=732</guid>
		<description><![CDATA[Jean Bauer, former Scholars&#8217; Lab Graduate Fellow in Digital Humanities announces: &#8220;I have just released my first open source project. HUZZAH!&#8221; DAVILA is a database schema visualization/annotation tool that creates “humanist readable” technical diagrams. It is written in Processing with the toxiclibs physics library and released under GPLv3. DAVILA takes in the database’s schema and [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>Jean Bauer, former Scholars&#8217; Lab Graduate Fellow in Digital Humanities announces: &#8220;I have just released my first open source project.  HUZZAH!&#8221;</p>
<p>DAVILA is a database schema visualization/annotation tool that  creates “humanist readable” technical diagrams.  It is written in <a href="http://processing.org/" target="_blank">Processing</a> with the <a href="http://toxiclibs.org/" target="_blank">toxiclibs</a> physics  library and released under GPLv3.  DAVILA takes in the database’s schema  and a pipe separated customization file and uses them to produce an  interactive, color-coded, annotated diagram similar in format to UML.   There are many applications that will create technical diagrams based on  database schema, but as a digital humanist I require more than they can  provide.  <span id="more-732"></span></p>
<p>Technical diagrams are wonderfully compact ways of conveying  information about extremely complex systems.  But they only work for  people who have been trained to read them.  If you design a database for  a historian, and then hand him or her a basic E-R or UML diagram, you  will end up explaining the diagram’s nomenclature before you can talk  about the database (and oftentimes you run out of time before getting  back to the research question underlying the database).  This removes  the major advantage of technical diagrams and can also create an  unnecessary divide between the technical and non-technical members of a  digital humanities development team.</p>
<p>I have become fascinated by how documenting a project (either in  development or after release) can build community.  I’m not just talking  about user generated documentation (ala wikis), but rather the feeling  created by a diagram or README file that really takes the time to  explain how the software works and why it works the way it does.  There  is a generosity and even warmth that comes from thoughtful, helpful  documentation, just as inadequate documentation can make someone feel  stupid, slighted, or unwanted as a user/developer.  I will be writing on  this topic more in the months to come (perhaps leading up to an  article).  In the meantime, check out DAVILA and let me know what you  think.</p>
<p>Project homepage: <a href="http://www.jeanbauer.com/davila.html" target="_blank">http://www.jeanbauer.com/davila.html</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/introducing-davila/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mr. Voronoi, meet the US state boundaries</title>
		<link>http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/</link>
		<comments>http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 18:57:34 +0000</pubDate>
		<dc:creator>kgj3t</dc:creator>
				<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://www.scholarslab.org/?p=609</guid>
		<description><![CDATA[In the Scholars&#8217; Lab we are working with remarkably detailed datasets showing changes to US political boundaries over time.  We&#8217;ve all been fascinated with visualizations where the familiar outlines of the US states emerge from thousands of boundary changes to their underlying counties over the last few hundred years.  Did you know Virginia once spanned [...]]]></description>
			<content:encoded><![CDATA[<p>In the Scholars&#8217; Lab we are working with remarkably detailed datasets showing changes to US political boundaries over time.  We&#8217;ve all been fascinated with visualizations where the familiar outlines of the US states emerge from thousands of boundary changes to their underlying counties over the last few hundred years.  Did you know Virginia once spanned from the Atlantic Ocean to the Mississippi River?</p>
<p><a rel="attachment wp-att-634" href="http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/attachment/virginiatomiss/"><img class="alignnone size-full wp-image-634" src="http://www.scholarslab.org/wp-content/uploads/2010/03/VirginiaToMiss.png" alt="Virginia" width="476" height="343" /></a></p>
<p>We&#8217;re developing a new web-based tool for visualizing these historic boundary changes and it&#8217;s nearly ready for prime time.  We&#8217;ll  announce the beta release here soon.</p>
<p>So with the knowledge that US state boundaries have already been subject to drastic change over time, let&#8217;s have some fun with geographic information systems to visualize drastic mathematically-induced changes to those familiar US state boundaries.</p>
<p>For our experiment, let&#8217;s keep all our current state capital cities right where they are since they are laden with the necessary infrastructure of government.  But we&#8217;ll move the state boundary lines <a href="http://mathworld.wolfram.com/VoronoiDiagram.html" target="_blank">Voronoi-style</a> so anywhere you travel in each of our new states you&#8217;ll be closer to the state capital than any other state capital.  In other words, when you&#8217;re standing anywhere inside our newly outlined Virginia, you will always be closer to the Virginia state capital, Richmond, than any other state capital.  That seems very efficient.  Let&#8217;s have a look.</p>
<p><a rel="attachment wp-att-626" href="http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/attachment/usanow2/"><img class="alignnone size-large wp-image-626" src="http://www.scholarslab.org/wp-content/uploads/2010/03/USAnow2-1024x651.png" alt="US States with capitals" width="470" height="298" /></a></p>
<p>Here&#8217;s that familiar grade-school wall map of the lower 48 US states and their capital cities.   Now let&#8217;s tweak the map with GIS software to reconfigure the states, Voronoi-style.</p>
<p><a rel="attachment wp-att-629" href="http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/attachment/usathen2/"><img class="alignnone size-large wp-image-629" src="http://www.scholarslab.org/wp-content/uploads/2010/03/USAthen2-1024x655.png" alt="US Voronoi states with capitals" width="470" height="300" /></a></p>
<p>Wow, what a difference Voronoi makes.</p>
<p>Let&#8217;s measure just how much the states have changed in our new layout.   In absolute terms, Utah and New Mexico make the biggest land grabs while Texas and California lose the most real estate.  But as a percentage of their current area, Rhode Island is the big winner ballooning in size by over 240% while Massachusetts shrinks 60%.</p>
<p>To visualize the state-by-state changes, Todd Burks from neighboring Clemons Library overlayed the two maps.</p>
<p><a rel="attachment wp-att-729" href="http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/attachment/toddmashup/"><img class="alignnone size-large wp-image-729" src="http://www.scholarslab.org/wp-content/uploads/2010/03/ToddMashup-1024x655.jpg" alt="" width="470" height="300" /></a></p>
<p>Intrigued?  <a href="http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?id=1349&amp;pid=1347&amp;topicname=Create_Thiessen_Polygons_%28Analysis%29" target="_blank">Read more</a> about Voronoi and Thiessen polygon GIS techniques.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/geospatial-and-temporal/mr-voronoi-meet-the-us-state-boundaries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More on Pandora:  genres, genomes, and musical taste&#8230;</title>
		<link>http://www.scholarslab.org/digital-humanities/more-on-pandora-genres/</link>
		<comments>http://www.scholarslab.org/digital-humanities/more-on-pandora-genres/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 14:39:52 +0000</pubDate>
		<dc:creator>jbk3y</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>
		<category><![CDATA[Add new tag]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=76</guid>
		<description><![CDATA[Hello. In my last blog, I began my discussion of Pandora.com, the streaming audio website which offers a new kind of web radio to listeners. Enter a “seed” song into Pandora’s search engine, and the site will create a streaming “station” composed of songs that resemble your seed song. This process is powered by the [...]]]></description>
			<content:encoded><![CDATA[<p><!--StartFragment--></p>
<p class="MsoNormal">Hello.<span> </span>In my last blog, I began my discussion of <a href="http://www.pandora.com" target="_blank">Pandora.com</a>, the streaming audio website which offers a new kind of web radio to listeners.<span> </span>Enter a “seed” song into Pandora’s search engine, and the site will create a streaming “station” composed of songs that resemble your seed song.<span> </span>This process is powered by the Music Genome Project, a massive research endeavor which began in the early 2000s and is based out of the company’s Oakland, California headquarters.</p>
<p class="MsoNormal">How is Pandora’s song-recommendation engine different than web radio platforms that came before it?<span> </span>Well, the majority of other online radio stations, such as last.fm, operate off a system called collaborative filtering.<span> </span>What is collaborative filtering?<span> </span>In layperson’s terms, collaborative filtering involves matching one user’s taste to another’s (or a series of other people).<span> </span>On a site like <a href="last.fm" class="broken_link">last.fm</a>, over time a user amasses a playlist of songs they’ve expressed a preference for—a sort of musical taste profile.<span> </span>Last.fm’s search tools automatically identify other users with whom your tastes seem to overlap, and uses this information to power “radio” stations you can stream on the site.<span> </span>The process is pretty simple, and <a href="http://www.wired.com/culture/lifestyle/news/2003/07/59522">based on personal intuition and the data existing users have already entered into the system</a>.<span> </span>Collaborative filtering powers aspects of many media websites, such as Amazon.com’s personal recommendation feature for shoppers.</p>
<p class="MsoNormal"><span id="more-76"></span>It does have some limitations for online radio listeners, however.<span> </span>As Pandora’s founder Tim Westergren pointed out in a 2006 interview with Leo Laporte and Amber MacArthur, collaborative filtering-powered online radio stations have a tendency to only recommend what is broadly popular in contemporary pop music.<span> </span>While independent-label music certainly has a strong presence on last.fm, a quick scan of various users’ profiles on the site may suggest that Westergren has a point.<span> </span>Even among “indie” users on last.fm, there’s a whole lot of Death Cab for Cutie and Modest Mouse ruling the playlists (nothing against either of these bands).<span> </span>Collaborative filtering doesn’t necessarily ensure that the site’s users will discover truly obscure stuff they hadn’t heard of before.<span> </span>And in keeping with my interest in genre boundaries vis-à-vis Internet radio, in interviews Westergren has attributed the problem to the mainstream music business’ interest in keeping consumers bracketed into genre-specific niches.<span> </span>In the aforementioned chat with Laporte and MacArthur, Westergren cited the <a href="http://twit.tv/itn6" target="_blank">“age-old problem in the music industry”</a> wherein a tiny percentage of music released by a given label typically accounts for nearly all its sales—a problem codified by genre boundaries.</p>
<p class="MsoNormal">Pandora, through its Music Genome Project, aims to circumvent this problem, by offering its users a new kind of recommendation engine.<span> </span>As I mentioned in my earlier post, the Music Genome is a systematic endeavor to deconstruct and analyze individual pop songs using over 400 “musical attributes” that the company has identified.<span> </span>These attributes include everything from tempo, to vocal timbre, to harmonic movement—even sound production aspects like echo and reverb.<span> </span>In other words, it is essentially a musicological approach in the strictest sense of the word.<span> </span>The focus is on sound itself, rather than a band’s cultural associations with other bands (as is the case in collaborative filtering).<span> </span>Indeed, Westergren bragged in the aforementioned interview that “when we recommend to you a piece of music, we don’t even <em>know</em><span> how popular it is.”<span> </span></span></p>
<p class="MsoNormal">Instead, what the Music Genome Project entails is the company’s roughly fifty analysts sitting down in the Oakland, CA headquarters and methodically tagging a given song using these 400+ attributes.<span> </span>Westergren has described the process in ways akin to the scientific method, noting that a percentage of songs the analysts deconstruct are reviewed twice for quality.<span> </span>The songs, categorized by attributes, are added to the Project’s over 500,000 songs (and counting) accumulating in the company’s database.<span> </span>Songs sharing a similar musical “DNA” are then automatically matched and linked by Pandora’s search engine when you enter in a “seed” song.<span> </span>Westergren has called the Genome “kind of like a musical taxonomy,” and I don’t think this language is accidental.<span> </span>As Fabian Holt has pointed out about musical genres, “Discourse on the temporal dimensions of categories is saturated with organicist metaphors, as in discussions of how genres are <em>born</em><span>, how they </span><em>grow</em><span>, </span><em>mature</em><span>, </span><em>branch off</em><span>, </span><em>explode</em><span>, and </span><em>die</em><span>.”1<span> </span>Even though Pandora in fact aims to get </span><em>around</em><span> genre, it seems to me that this biologic language informs the company’s mission and direction.<br />
</span></p>
<p class="MsoNormal">In any case, as a Pandora user, I have often benefited from the happy accidents occasioned by the way the Music Genome Project works.<span> </span>For instance, I entered in Pandora as a &#8220;seed&#8221; Bob Dylan&#8217;s song &#8220;Tonight I&#8217;ll Be Staying Here With You&#8221;, a lilting, mid-tempo country-rock stroll.<span> </span>The Genome built a streaming station for me that included folk-rocky chestnuts by relatively obscure ‘60s and ‘70s groups like UFO and Earth Opera.<span> </span>It&#8217;s likely that I would not have heard about these groups without Pandora, or at least that I would&#8217;ve heard about them years from now in another context.<span> </span>In this regard, it seems that Westergren does have something to boast about regarding his claim that Pandora&#8217;s search engine connects listeners with &#8220;invisible&#8221; music in a way that mainstream, genre-bound, multinational music corporations just can&#8217;t.</p>
<p class="MsoNormal">On the other hand, there are several notable gaps in the logic and execution of Pandora and the Music Genome Project model.<span> </span>The first gap I feel compelled to point out is a very practical one.<span> </span>Returning to my example of the station based around &#8220;Tonight I&#8217;ll Be Staying Here With You,&#8221; Pandora is skilled in giving a user a lot of what they like.<span> </span>Enter in a twangy rock song like Dylan’s and you’ll get a station with loads of twangy rock songs.<span> </span>But there can be too much of a good thing; namely, I find homogeneity of songs’ <em>tempo</em><span> an issue on Pandora stations.<span> </span>“Tonight I’ll Be Staying Here With You” is a bit plodding, and I’ve found that over a few hours of playing this station, I mostly get one plodding song after the next.<span> </span>This can be useful in terms of finding hidden gems, but makes for monotonous, even frustrating listening over a span of a few hours. </span></p>
<p class="MsoNormal">I do know that Pandora makes much of its “Thumbs Up/Thumbs Down” feature, which allows the user to indicate her or his preference for a given song.<span> </span>Pandora’s algorithms will adjust the playlist’s direction (ever so slightly) upon a “Thumbs Down” for a song you don’t care for.<span> </span>In a 2006 interview with the <em>New York Times</em><span>, Westergren describes this feature as <a href="http://www.nytimes.com/2006/09/03/arts/music/03leed.html?_r=1&amp;pagewanted=all">a concession to human subjectivity</a> (within an otherwise “objective” platform), and I agree.<span> </span>The “Thumbs Up/Thumbs Down” feature requires active listening and participation on the user’s part—generally a good thing, I’ll admit.<span> </span>But what if I want to just sit back with a cold beverage and let the music play?<span> </span>The Genome’s platform, as it currently works, seems unable to deliver the ebbs and flows in tempo and musical texture which I enjoy in a good mixtape or college radio show.</span></p>
<p class="MsoNormal">These kind of practical gaps in Pandora’s service point me toward a larger theoretical problem worth discussing. <span> </span>In its insistence upon musical sound as <em>the</em><span> key ingredient for making song recommendations, the Music Genome Project willingly suspends belief in some basic social facts about the way music works.<span> </span>Music is undeniably social, cultural, and political.<span> </span>It’s the soundtrack to our lives as we dance, eat dinner, exercise, commute to work, fall in love, and so on.<span> </span>Music blasts out of loudspeakers at political rallies.<span> </span>We argue with friends over drinks about the relative merit of this or that musical group.<span> </span>And music is always part of a commercial marketplace, even in this age of file-sharing.<span> </span>Given all this, I find Westergren’s claim that a band’s marketplace popularity is <a href="http://twit.tv/itn6">“completely irrelevant to what we do”</a> a wee bit disingenuous, or at the very least requiring a willing suspension of disbelief regarding music’s social and marketplace role.</span></p>
<p class="MsoNormal">The Music Genome Project’s near-exclusive focus on sound itself, coupled with its organicist rhetoric regarding “musical DNA”, seems to suggest the company believes it can map out music in its totality—that it can “crack the code” of music, so to speak.<span> </span>As an aspiring musicologist, this reminds me a bit of another massive scholarly endeavor which worked toward a similar goal of cataloging music:<span> </span>Alan Lomax’s Cantometrics project.<span> </span>Developed by Lomax in the late 1950s and into the ‘60s, Cantometrics was a project wherein Lomax and several co-researchers analyzed the performance styles of (mostly traditional “folk”) songs from hundreds of different cultures around the world, tagging them with a variety of traits.<span> </span>These performance traits, such as vocal timbre, were organized into a computerized system wherein elements of the different musics could be compared.<span> </span>Lomax made the bold claim that one could draw conclusions about the social structure of a given society based on some of these performance traits (societies noted for a certain style of singing were sexually repressive, for instance).<span> </span>Of course, this claim was quite controversial, and has been challenged by other scholars since as overly reductive and essentialist.<span> </span>Fortunately, the Music Genome Project doesn’t attempt to make the connection between music and social structures the way Cantometrics did; indeed, as I said, the Genome Project’s rhetoric seems to <em>deny</em><span> aspects of the social world, if anything.</span></p>
<p class="MsoNormal">However, in the desire to systematically categorize and compare different aspects of music, one could say that the Genome Project and Cantometrics spring from a shared wellspring of human curiosity.<span> </span>One issue with this categorizing mission, though, is the problem of sample size.<span> </span>Lomax’s research was criticized for not casting a broad enough net in collecting these comparable performance traits.<span> </span>One could ask similar questions about the Music Genome Project’s scope.<span> </span>As I mentioned, the company’s website points out that its database currently features over 500,000 songs, and counting.<span> </span>This seems like a lot, but how useful is that number, when one considers the thousands and thousands of songs which are released commercially every year?<span> </span>And how can one ensure that multiple varieties, styles, and (yes, even) genres of music are adequately represented within those 500,000 songs?</p>
<p class="MsoNormal">Additionally, Pandora shares potentially problematic assumptions with Cantometrics regarding humans’ ability to fully categorize and catalog the world, to reduce music to its essence.<span> </span>This descends from the Enlightenment idea that our natural world is fully knowable through empirical and objective observation.<span> </span>That bedrock assumption has been the basis for the natural sciences, and one can see its influence on a project like Alan Lomax’s.<span> </span>The problem is:<span> </span>while an empirical observation approach might work well for classifying different varieties of tree frogs, when one wades into the murky waters of human behavior, it’s a lot more difficult to claim objectivity.<span> </span>Indeed, for myself and for a growing number of musicologists and humanities scholars more broadly, it is basically impossible to claim objectivity in one’s understanding of the world.</p>
<p class="MsoNormal">This is not to say that Pandora explicitly makes a claim of total objectivity, on their website or elsewhere.<span> </span>But as with Cantometrics, the fact that the company breaks songs down into discrete components and then makes comparisons and connections based on those components suggests that they believe music is knowable in some objective way.<span> </span>Pandora hasn’t made public a list of its over 400 “musical attributes”, but <a href="http://blog.pandora.com/faq/#92">shares a handful of them</a> on their website’s Frequently Asked Questions page.<span> </span>Some of the attributes they share make a lot of sense, and could even be called “objective”:<span> </span>major or minor key tonality, for instance.<span> </span>But consider an attribute like “headnodic beats”:<span> </span>in its FAQ entry, Pandora’s analysts admit they created the term themselves (it describes hip-hop beats which are strong, but not forceful enough to dance to).<span> </span>Given that probably almost no one outside of the Pandora offices uses this term, it can’t reasonably be called objective.<span> </span>This is not to say that an identification of some subjectivity within Pandora’s research model makes the whole enterprise come crashing down.<span> </span>Rather, I just wish to point out that while company prides itself on the cold objectivity of a computer algorithm choosing your music for you, human beings with subjective viewpoints created the components which power that algorithm.</p>
<p class="MsoNormal">Related to this, both Pandora and Cantometrics raise questions regarding musical gatekeepers, tastemakers, and their authority.<span> </span>Indeed, as ethnomusicologist Steven Feld has mused in response to Lomax’s work, “What are the sources of authority, wisdom, and legitimacy about sounds and music?<span> </span>Who can know about sound?<span> </span>Is musical knowledge public, private, ritual, esoteric?”.2<span> </span>Many researchers of pop music, pop culture, and genre agree that this issue of <em>who</em><span> is doing the classifying, categorizing, and ranking is a really important question.<span> </span>For instance, snobby clerks at your local independent record store may decide that Gillian Welch’s music belongs in the “folk” rather than “rock” section of the store.<span> </span>But where does the authority behind their judgment come from?  <span> </span>Their judgment is informed by their life experiences and backgrounds as (mostly) well-educated middle-class white males.</span></p>
<p class="MsoNormal">These gatekeepers’ judgments are also informed by a deep knowledge of various musical genres:<span> </span>the ability to distinguish glam from punk from grunge, and so on.<span> </span>Since Pandora’s fifty music analysts perform a similar function, I find this aspect of their job paradoxical:<span> </span>though the company seems to pride itself on getting beyond musical genre, these analysts must be extremely well-versed in genre in order to do their jobs well.<span> </span>In a recent video post on Pandora’s blog, Westergren states that the purpose of the Music Genome Project is ultimately to connect musicians with audiences, in ways the traditional music business can’t.3<span> </span>This is notably egalitarian rhetoric; it works off the assumption that consumers and musicians are empowered enough to seek each other out, and that they don’t need tastemakers dictating what music they should like.</p>
<p class="MsoNormal">As I noted above, however, the computerized system that listeners use to connect with musicians is designed and maintained by a group of (relatively) elite tastemakers.<span> </span>And in Westergren’s public statements about these analysts’ qualifications, I read a certain degree of anxiety over what kind of authority is vested in that role of analyst.<span> </span>In his 2006 interview with Leo Laporte and Amber MacArthur, Westergren pointed out that while all their analysts are regularly-gigging musicians, in order to carry out the depth of analysis required for the Music Genome Project, one really “need[s] an academic background”.<span> </span>Thus, in addition to being a working musician, an analyst employed by Pandora also needs at least a four-year undergraduate degree in music theory.<span> </span></p>
<p class="MsoNormal">On a practical level, this makes a lot of sense to me.<span> </span>If you’re going to employ folks to analyze songs for you, wouldn’t you want them to have an understanding of musical principles on several different levels?<span> </span>On the other hand, on a theoretical level, Pandora’s insistence on both “street” and “book” smarts from its analysts demonstrates an unresolved subliminal conflict over whether &#8220;brains and corporate no-how&#8221; or &#8220;gut, ‘Id’ feelings&#8221; are what shape the music we listen to.<span> </span>Thus, in this way, Pandora and the Music Genome Project struggle with these issues of taste and knowledge hierarchies just like other public pop prognosticators, even as their seemingly objective research platform denies this social fact.</p>
<p class="MsoNormal">This may read as though I am beating up on Pandora, but I hope the position I’m staking out is subtler than that.<span> </span>Rather, I have simply been attempting to point out some slight contradictions of logic within the Music Genome Project’s overall research platform.<span> </span>On a practical, user’s level, I enjoy the site.<span> </span>And to be fair to Pandora’s employees, on a certain level they seem to recognize the issues I am brining up here.<span> </span>For instance, in a recent post on Pandora’s official blog by one of its music analysts, Michael Zapruder likens evaluating songs to judging a baby beauty contest, and then points out,</p>
<p class="MsoNormal">&#8220;<em>The idea that all music is equal and deserves equal rights is somehow fundamentally a democratic idea; as is the corresponding idea that the public, and not some small cadre of experts, is the best judge of musical quality.<span> </span>But the fact that some music not only attracts more listeners, but also seems to mean more to more people over a longer period of time, indicates that there is actually something fundamentally unequal about music as well</em>.&#8221;4</p>
<p class="MsoNormal">In other words, perhaps this issue of taste isn’t an “either/or” problem, but rather a “both/and” one.<span> </span>And by its nature, it’s most likely a problem with no definitive answer.</p>
<p class="MsoNormal">It seems that in his blog entry, Pandora employees like Zapruder are trying to find a <em>practical</em><span>, everyday way of working around and through this problem—and I can’t fault them for that.<span> </span>Certainly, the academic in me bristles when I see Pandora present something like “headnodic beats” as some kind of objective criteria for judging music.<span> </span>But on a practical level, it seems that these classifications, even if they’re vague (such as<span> </span>“vinyl ambience,” or what have you) are perhaps vague at least partly in the service of the </span><em>listener’s</em><span> experience—of trying to match users to interesting new music.<span> </span>It doesn’t seem that the point of the Genome is to categorize musical attributes simply for the sake of categorization.<span> </span>Rather, the point seems to be to put that information to use, making musical connections for the listener.<span> </span>So perhaps it’s a utilitarian reason why the Music Genome cuts certain logical corners on the “objective vs. subjective” question.</span></p>
<p class="MsoNormal">Ultimately, Pandora’s service rests upon the assumption that sound itself is the only aspect which really matters when analyzing different forms of music.<span> </span>It assumes that sound automatically trumps the sociocultural boundaries of genre, taste, and marketplace.<span> </span>This isn&#8217;t true, of course:<span> </span>in the real world we live in, rhetoric surrounding genre and taste guide the musical choices we all make, from Walmart AC/DC lovers to bebop nerds.<span> </span>But the Music Genome Project’s fiction regarding the supremacy of sound is an important, if very one-sided, position to have out there in the world.<span> </span>In fact, it’s almost counter-cultural in a way, because journalists and advertisers often focus so much on <em>image</em><span> when considering contemporary pop music.<span> </span>Pandora’s vision is a kind of imagined musical utopia, making a particularly 21st-century-specific stand for the importance of musical sound—a stand made possible by the shared </span><em>cultural</em><span> resource of the Internet.</span></p>
<p class="MsoNormal">Finally, closing with an idea my professor Fred Maus pointed out to me, when you&#8217;re confronted with enjoying a song you didn&#8217;t think you would like on Pandora (you enter in a Turbonegro song as your “seed” and are rewarded with a Poison song, for example), that tells us something important about genre boundaries.<span> </span>Your bemusement proves that musical genres exist.<span> </span>They&#8217;re cultural; they don&#8217;t hold up to objective scrutiny.<span> </span>And they&#8217;re based on something more than just musical sound; they&#8217;re built around assumptions that have to do with hierarchies of taste and class.<span> </span>Thus, paradoxically, we can learn quite a bit about the rules of genre from a website devoted to transcending those rules.</p>
<div>
<hr size="1" />
<div>
<p class="MsoFootnoteText">
</div>
<div>
<p class="MsoNormal">1  Holt, Fabian.<span> </span><em>Genre in Popular Music</em><span>.<span> </span>Chicago:<span> </span>University of Chicago Press, 2007.<span> </span></span>Pg. 14.</p>
<p class="MsoFootnoteText">2  Feld, Steven.<span> </span>“Sound Structure as Social Structure.”<span> </span><em>Ethnomusicology</em><span>, Vol. 28. No. 3 </span>(Sept. 1984), pp. 383-409.</p>
<p class="MsoFootnoteText">3  http://blog.pandora.com/pandora/archives/2009/03/index.html<span> </span>March 15, 2009 entry.</p>
<p class="MsoFootnoteText">4  http://blog.pandora.com/pandora/archives/2009/02/index.html      February 25, 2009 entry.</p>
</div>
<div>
<p class="MsoFootnoteText">
</div>
</div>
<p><!--EndFragment--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/more-on-pandora-genres/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mapping the Digital Diaspora of a Dissertation Research Blog</title>
		<link>http://www.scholarslab.org/digital-humanities/mapping-out-the-geography-of-an-%e2%80%9casian-american-music%e2%80%9d-blog/</link>
		<comments>http://www.scholarslab.org/digital-humanities/mapping-out-the-geography-of-an-%e2%80%9casian-american-music%e2%80%9d-blog/#comments</comments>
		<pubDate>Mon, 04 May 2009 14:32:00 +0000</pubDate>
		<dc:creator>wendyhsu</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=63</guid>
		<description><![CDATA[At the onset of my field research in summer 2007, I launched a blog – YellowBuzz.org – with the intention to: 1) archive and organize my field notes in textual and audio-visual form; 2) convey my research purpose and progress to informant musicians and the public; 3) self-position as a “participant” in the scene. Since [...]]]></description>
			<content:encoded><![CDATA[<p>At the onset of my field research in summer 2007, I launched a blog – <a href="http://yellowbuzz.org" target="_blank">YellowBuzz.org</a> – with the intention to: 1) archive and organize my field notes in textual and audio-visual form; 2) convey my research purpose and progress to informant musicians and the public; 3) self-position as a “participant” in the scene. Since then, I have made over 160 posts, some directly linked and others tangentially related to my research findings about the activities and media of Asian American indie rock musicians. Over the past one and a half years, my field research blog has received attention from both print and online media.  Evidently, this blog has constructed a community consisting of musician- and music-enthusiast-visitors with an interest in Asian American and transpacific music-culture.<span id="more-63"></span></p>
<p>This past January, I began tracking the blog traffic by using <a href="http://www.google.com/analytics/">Google Analytics</a>. This service monitors the physical location of site visitors and their interactions with the pages on the site. The geographical data are analyzed in terms of the number of visits per unit of geographical organization such as city, country/territory, sub continent region, and continent.  This information is also visualized in the form of an interactive map on which users can zoom in and out of specific locales and find site visit patterns specific to cities, countries, regions, or continents in the world.</p>
<p>Over the last four months, I have been playing with the May Overlay function projecting geospatial patterns of the site traffic on my blog. These interactive moments have helped me imagine interesting questions such as: What is the geography of an electronic community based on the topic of “Asian American music,” the tagline of my blog? What does the geo-spatial terrain of this “digital diaspora” look like? Are there any striking patterns at each of the organizational level namely, the city, country, sub-continental region, and continent? What spatial boundaries are transcended and created in these visualizations? Or, fancifully, how does the digital geography of my blog reconfigure the more general social geography of “Asian America” online or offline?</p>
<p>Today marks a 4-month anniversary of this thought experiment. I decided to take some screen shots of a few of the visualizations that I’ve found more meaningful in Google Analytics. This analysis uses data from a sample of 3,061 site visits collected from January 1 to April 30, 2009. I will highlight a few interesting findings below:</p>
<p><a href="http://lh4.ggpht.com/_POx44XG38pY/SfsEEpcmLGI/AAAAAAAABME/jPp5MEsm9t0/s800/Analytics_Map_UScities.jpg"><img style="float:right" src="http://lh4.ggpht.com/_POx44XG38pY/SfsEEpcmLGI/AAAAAAAABME/jPp5MEsm9t0/s800/Analytics_Map_UScities.jpg" alt="blog visits in U.S. cities" width="240" /></a>1) Here’s a map of blog visits in various U.S. cities. It appears that the visitors are concentrated in central Virginia (the home of yours truly), New York City, Boulder, Los Angeles, and San Francisco. Other than central Virginia and Boulder, these are areas of high concentration of Asian Americans and indie rock activities. I&#8217;m not quite sure how to explain the traffic flow from the Denver area (Boulder and Aurora, ranked sixth in this map) other than to link it to the thriving indie rock scene in Boulder and the physical location of an Asian/Japanese music blogger Shay of <a href="http://sparkplugged.net/" target="_blank">Sparkplugged</a>.</p>
<p><a href="http://lh4.ggpht.com/_POx44XG38pY/SfsEE0fNXFI/AAAAAAAABMM/LccJoFTelNY/s800/Analytics_May_countries_PieChart.jpg"><img style="float:right" src="http://lh4.ggpht.com/_POx44XG38pY/SfsEE0fNXFI/AAAAAAAABMM/LccJoFTelNY/s800/Analytics_May_countries_PieChart.jpg" alt="site visits per country" width="240" /></a>2) According to this chart, 76% of the site visits have occurred within the boundaries of the United States. Next on the list are Canada, United Kingdom, and Australia, all English-speaking countries with close historical ties to American music. In the continent of Asia, countries such as Taiwan, South Korea, the Philippines, and Singapore have among the highest number of visitors to my site. I attribute this pattern to my blog posts about U.S.-based artists who have a large following in these particular countries. Specifically, Hsu-nami (of New Jersey) and Johnny Hi-Fi (SF-based) has strong ties to Taiwan; Kite Operations (New-York) to South Korea; Plus/Minus (New York) to the Philippines and Taiwan.</p>
<p><a href="http://lh3.ggpht.com/_POx44XG38pY/SfsEE4M02WI/AAAAAAAABMU/3_qKDX7hSoY/s800/Analytics_piechart_subcont.jpg"><img style="float:right" src="http://lh3.ggpht.com/_POx44XG38pY/SfsEE4M02WI/AAAAAAAABMU/3_qKDX7hSoY/s800/Analytics_piechart_subcont.jpg" alt="site visits per sub-continent region" width="240" /></a>3) This last chart represents the sub-continental spread of the site visits. North America takes the lead (taking 80% of all visits). Northern Europe and Eastern Asia tie as second, followed by South-Eastern Asian and Western Europe. I’m not quite sure how to explain the high number of visits from Northern Europe other than to link it to the popularity of a Taiwanese metal band Chthonic in North Europe. Chthonic has a strong international presence, having worked with producers in Denmark and the U.S. including Rob Caggiano, the guitarist of Anthrax. In 2007, Chthonic toured with the OzzFest and established close ties with Taiwanese-American-led erhu rock group Hsu-nami.</p>
<p>So what does this all mean? YellowBuzz, a blog on “Asian American music”, has constructed a global, transnational readership. Asian America in the online digital environment exists beyond the boundaries of the United States and the Asian continent. These observations of transnational crossings work against the geography of Orientalism: a now-classical theory within postcolonial studies that refers the representational control of the non-west by western-produced discourse.  The digital diaspora of YellowBuzz has tampered with the so-called east-west binary.</p>
<p>Now if I were serious about pursuing the research on the transnationality of Internet music journalism, I would look for a correlation between blog content and traffic patterns. This would require systematic, post-to-post observations. I would also consider mapping information regarding Internet access and user demographic with the intention to find links between the blog statistics and general Internet sociality. I would also look for statistical and mapping methods more powerful than Google Analytics.</p>
<p>But – to get back to my dissertation that asks: What paths do musicians and their music take as they establish routes crossing territories constructed by nation-states, corporations, international laws, etc? Unfortunately, these visualizations lack the analytical strength to provide an insight on the musicians’ perspective on the scene. They have offered a perspective on media, in particular in understanding the role of a music blog in constructing “Asian America.”</p>
<p>In the coming months, I will be working on a digital humanities project with Joe Gilbert at UVa’s <a href="http://www2.lib.virginia.edu/scholarslab/">Scholars’ Lab</a> pursuing questions related to the musicians’ side of the story. I hope to unravel the terrain of musicians’ sociality within the transnational scene of indie rock music by mapping out their tours, social networks on (SNS), and record distribution. Meanwhile, I’m experiencing a bout of euphoria loving the fact that I have reclaimed a free market analytical tool offered by Google for my academic(-y) ethnomusicological thought experiment.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/mapping-out-the-geography-of-an-%e2%80%9casian-american-music%e2%80%9d-blog/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Pandora and the &#8220;genes&#8221; of music genres</title>
		<link>http://www.scholarslab.org/digital-humanities/pandora-and-the-genes-of-music-genres/</link>
		<comments>http://www.scholarslab.org/digital-humanities/pandora-and-the-genes-of-music-genres/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 20:23:39 +0000</pubDate>
		<dc:creator>jbk3y</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=59</guid>
		<description><![CDATA[Hello, it’s been a while since I blogged. You may remember me as the music Ph.D. student who was last heard from pondering the uses of Google Scholar. I’m on a new mission this semester, studying for my comprehensive exams. One of the topics I am researching and preparing an essay on is about genre [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal"><span> </span>Hello, it’s been a while since I blogged.<span> </span>You may remember me as the music Ph.D. student who was last heard from pondering the uses of Google Scholar.<span> </span>I’m on a new mission this semester, studying for my comprehensive exams. <span> </span>One of the topics I am researching and preparing an essay on is about genre in popular music.<span> </span>The concept may seem initially so self-evident, you may wonder what there is to write about it, per se.<span> </span>Oh, but there’s lots.<span> </span>This is because the issue of genre always involves the issue of classification, which inherently provokes debate.<span> </span>Take, for instance, a star performer like Beck.<span> </span>His music often includes acoustic guitar, and he’s covered Mississippi John Hurt.<span> </span>So he must be a folkie. Oh wait, but he also apes Prince on some funky jams.<span> </span>So maybe he’s a pop star.<span> </span>But he also headlines a bunch of big rock festivals, and we find his music in the “Rock” section at the record store (wait, what’s a record store?).<span> </span>So I guess we’ll call him a rocker.</p>
<p class="MsoNormal">My point being, popular music can be difficult to pin down using genre tags.<span> </span>You’ll find this evidenced in any number of press interviews with musicians who, when pressed by a journalist, pull out that time-worn chesnut that their sound is “unclassifiable”. Genre tags, be it pop, country, rock, hip-hop, salsa, what have you are almost like identifying pornography:<span> </span>I’ll know it when I see it.<span> </span>It’s often somewhat easier to identify what a genre <em>isn’t</em> than what it actually <em>is</em>.<span> </span><span id="more-59"></span>Fans and even so-called experts often have difficulty articulating why a particular song or artist fits in a given genre.<span> </span>Based on my readings for this exam topic thus far, I would argue this is because the act of classifying something is essentially making a statement about its <em>meaning</em>: <span> </span>not just semantic/musical meaning, but also meaning that’s intensely cultural, and often political.</p>
<p class="MsoNormal"><span> </span>That’s why I like musicologist Robert Walser’s definition of genre in popular music.<span> </span>Updating similar ideas that literary critic Tzvetan Todorov has explored, Walser argues that “Genres…come to function as horizons of expectations for readers (or listeners) and as models of composition for authors (or musicians).”<a name="_ftnref1" href="#_ftn1"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[1]</span></span><!--[endif]--></span></span></a><span> </span>In other words, genres labels are modes of discourse wherein musicians, fans, and music industry workers collaborate to make meaning surrounding the music they love.<span> </span>And perhaps <em>surrounding</em> is the key word, there; one can debate about styles of music with a friend all night long, or a record store employee can create increasingly hyper-specialized bin cards for various sub-genres (Psychobilly, Krautrock, etc.).<span> </span>We circle around the sounds we hear through discourse about them, but ultimately, on a musical-sound level, how do we <em>know</em> that Beck belongs in the “rock” section?<span> </span>As Franco Fabbri has noted, problems with genre are “frontier” problems:<span> </span>“We meet with these whenever we attempt to indicate something which exists at the boundary of two or three zones of meaning.”<a name="_ftnref2" href="#_ftn2"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[2]</span></span><!--[endif]--></span></span></a><span> </span>When considering genre tags, there seems to exist an inescapable gap between our need to accurately label a piece of music and the slippery semantic meanings of recorded sound, which may in any given moment resemble two, three, or even more genres.</p>
<p class="MsoNormal"><span> </span>What then, friends, does this problem have to do with digital humanities?<span> </span>Well, as I research and read in preparation for this exam, I can’t help but frequently think of an Internet service called Pandora that’s richly illustrative of many of these genre issues.<span> </span>For those who may not know, <a href="http://www.pandora.com" target="_blank">Pandora</a>, a free website based out of Oakland,  California since the early ‘00s, is a relatively new and different kind of streaming online radio station.<span> </span>Whereas the playlist at a streaming station such as UVa’s own <a href="http://wtju.net/" target="_blank">WTJU</a> is determined by in-the-studio DJs, and the “radio” playlist at a website such as last.fm is determined by a process called collaborative filtering (more on this in my next blog), Pandora is notable for its novel method of selecting songs for listeners.<span> </span>Here’s how it works:<span> </span>enter an artist or song into Pandora’s search engine, and the website will issue you a streaming series of songs by artists considered similar to the one you entered.<span> </span>And how is this similarity determined?<span> </span>Through the Music Genome Project, Pandora’s massive undertaking and claim to fame.</p>
<p class="MsoNormal"><span> </span>The Music Genome Project, whose work is carried out by roughly fifty analysts at the company’s headquarters in Oakland, is an effort to deconstruct and categorize aspects of pop songs using over 400 different “musical attributes” that the company believes comprise the spectrum of recorded sound.<span> </span>It’s a rigorous close-listening endeavor, specifically intended to focus on aspects like timbre, tempo, harmonic movement, and instrumentation instead of aspects like album art and whether or not the artist has appeared on TRL.<span> </span>This scientific (or pseudo-scientific, depending upon one’s perspective) attention to sonic detail seems—to me at least—an attempt to get beyond the established languages of pop music genres by diving into the nitty-gritty which makes genres what they are.<span> </span>The company builds its credibility as an almost-biologic “genome” of musical characteristics through the depth and breadth of songs it has analyzed:<span> </span>over half a million and counting, according to Pandora’s <a href="http://blog.pandora.com/faq/" target="_blank">website</a><a name="_ftnref3" href="#_ftn3"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><!--[endif]--></span></span></a>.<span> </span>Pandora says that each of these songs is listened to for 20 to 30 minutes by a trained analyst who tags the song with Pandora-authored characteristics—everything from<span> </span>“meandering melodic phrasing” to “chopped and screwed production”.<span> </span>The company claims that theirs is the most comprehensive effort to systematically categorize music—ever.</p>
<p class="MsoNormal" style="0.5in;">From the perspective of an academic studying pop music genres, the Music Genome Project presents several fascinating issues.<span> </span>Until my next blog, I’ll briefly set aside an investigation of the company’s claim it can create a taxonomy of the “genes” of popular music.<span> </span>Instead, what I first find most interesting is the story of Pandora and the Music Genome Project’s origins.<span> </span>In a January 2006 <a href="http://twit.tv/itn6" target="_blank">interview</a> with podcast program “Inside the Net”<a name="_ftnref4" href="#_ftn4"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><!--[endif]--></span></span></a>, Pandora co-founder Tim Westergren told hosts Leo Laporte and Amber MacArthur that he first originated the idea of a music genome while a struggling rock musician himself.<span> </span>He told them his band was “facing the challenge of trying to get known,” and in so doing brainstorming about what aspects of a rock song tend to attract the most commercial attention.<span> </span>Additionally, Westergren shared the intriguing information that he was a film composer at the same time, working for hire to complement a director’s visuals in a given scene.<span> </span>He told Laporte and MacArthur that “In that capacity, one of the things that I had to do was to try and figure out the music taste of a film director.”<span> </span>Westergren said that this challenge was part of what got him thinking about music in terms of distinct, differentiable attributes.</p>
<p class="MsoNormal" style="0.5in;">It just so happens that one of my other exam topics this semester concerns film music soundtracks.<span> </span>Having read much of the academic literature on film music, what Westergren recounts here is fascinating—and doesn’t surprise me.<span> </span>The specific challenges a composer faces when working on a film—How do I balance my need to please the director with my need to express personal creativity?—is a central theme of the literature.<span> </span>Scholars even further back than Irwin Bazelon in 1975 have remarked that the collaboration is by nature difficult, “since the composer spends his entire life <em>in</em> music, working out specific musical relationships, while the director spends his time <em>out</em> of music, involved full-time with films—a visual medium—and only part-time with music, as it affects his film”.<a name="_ftnref5" href="#_ftn5"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[3]</span></span><!--[endif]--></span></span></a><span> </span>Even if the director is a fan of music and has interesting ideas about how she wants it used, if she’s not a musician herself she may have difficulty communicating concepts to the composer which can be actualized musically.</p>
<p class="MsoNormal" style="0.5in;">That challenge has interesting aspects as regards musical genre.<span> </span>On the one hand, bridging the communication gap between composer and director can often force each out<span> </span>of their comfort zone, resulting in new timbres, new melodies—innovation.<span> </span>Many pieces composed for film are quite short, maybe less than a minute in duration, so they often don’t have space in the film to unfold into full-blown genre exercises.<span> </span>This process of collaboration between sound and images in many ways results in the most “hybrid” music imaginable.</p>
<p class="MsoNormal" style="0.5in;">From another perspective, however, film music cues are also remarkably genre-bound.<span> </span>Not necessarily bound by <em>musical</em> genres per se (classical, country, pop, rock, etc.), but bound by the generic conventions of film itself.<span> </span>Due to the standardized production practices of most movies, film as a medium tends to be more amenable to genre classification—and soundtracks can play a key role in that<a name="_ftnref6" href="#_ftn6"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[4]</span></span><!--[endif]--></span></span></a>.<span> </span>For instance, soaring strings sketching out an American traditional folk or “cowboy” song, and the audience suddenly knows we’re in a Western.<span> </span>A minor key piano melody and a sultry saxophone, and we know we’re watching a film noir.<span> </span>Given that these generic conventions of film music most certainly exist, it makes me wonder what sorts of films Westergren was scoring as part of his job.<span> </span>Perhaps offbeat indie-Sundance dramas which were aiming for a kind of transcendence of genre strictures?</p>
<p class="MsoNormal" style="0.5in;">In any case, the fact that the Music Genome Project origin story involves the world of film music tells me that musical genre was on Westergren’s mind as he brainstormed—even if he sought to rebel against the concept.<span> </span>Additionally, the Project’s roots in soundtrack-for-hire work demonstrate that you can never take the music <em>business</em> out of the music:<span> </span>commercial forces and the presence of an paying audience (real or imagined) inflect in some way all decisions musicians and music industry workers make.<span> </span>As Westergren’s rock band tailoring their sound in an attempt to “get known” reminds us, market considerations are basically always in a dialectical relationship with “creativity” as a musical genre forms—and this includes classical and avant-garde genres which claim to be above that kind of stuff.</p>
<p class="MsoNormal" style="0.5in;">Given the commercialized roots of the Music Genome Project, it’s a bit surprising to me that the music industry has fought Pandora as tooth-and-nail as they have.<span> </span>It’s common knowledge that traditional AM/FM commercial radio has been the music industry’s biggest promotional tool during the 20<sup>th</sup> century.<span> </span>But as traditional radio fades in influence and web radio such as Pandora ascends, the industry leaders have been notoriously less willing to jump onto the 21<sup>st</sup>-century Internet bandwagon of music promotion.</p>
<p class="MsoNormal" style="0.5in;">Despite the fact that Pandora <em>streams</em> tracks instead of allowing users to illegally download—which means, in my opinion, that record company executives should be groveling at Pandora’s feet in thanks—the major-label music industry has seemed intent upon shutting them down.<span> </span>Or, to be more specific, the labels’ demands (through representative organization SoundExchange) for higher royalty payments created expenses Pandora was finding impossible to sustain.<span> </span>Following a federal board ruling mandating increased royalty rates for web radio, in August of last year the <em>Washington Post</em> <a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/08/15/AR2008081503367_2.html" target="_blank">quoted Westergren</a> as saying Pandora was “approaching a pull-the-plug kind of decision”.<a name="_ftnref7" href="#_ftn7"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><!--[endif]--></span></span></a><span> </span>On the brink of shutting down, Westergren appealed to the company’s subscribers to contact their local representatives on behalf of web radio—and the gambit seems to have worked.<span> </span>The ruling was reconsidered, and <a href="http://www.dmwmedia.com/news/2009/02/23/cnet%3A-webcasters%2C-music-industry-battling-over-royalties" target="_blank">currently representatives from both web radio and the music industry are negotiating</a> newer, more manageable royalty rates.<a name="_ftnref8" href="#_ftn8"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><!--[endif]--></span></span></a><span> </span>Pandora seems to be approaching a delicate truce with the market forces which, to me, seem constitutive of its role as a source for promoting and discovering popular music.<span> </span>(In other words, what took the industry so long to accept current reality?)</p>
<p class="MsoNormal" style="0.5in;">In the next installment of this blog, I’ll continue my exploration of Pandora, especially the logic behind its attempt to map “genes” of music and to approach music in a mode somehow “beyond” genre.</p>
<div><!--[if !supportFootnotes]--></p>
<hr size="1" /><!--[endif]--></p>
<div>
<p class="MsoFootnoteText"><a name="_ftn1" href="#_ftnref1"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[1]</span></span><!--[endif]--></span></span></a> Walser, Robert.<span> </span><em>Running with the Devil: Power, Gender, and Madness in Heavy Metal Music</em>.<span> </span>Middletown, CT:<span> </span>Wesleyan University Press, 1993.<span> </span>p. 29.</p>
</div>
<div>
<p class="MsoNormal" style="-2in;"><a name="_ftn2" href="#_ftnref2"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[2]</span></span><!--[endif]--></span></span></a> Fabbri, Franco and Iain Chambers.<span> </span>“What Kind of Music?”.<span> </span><em>Popular Music,</em> Vol. 2, Theory and Method (1982), pp. 131-143.</p>
</div>
<div>
<p class="MsoFootnoteText"><a name="_ftn5" href="#_ftnref5"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--></span><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[3]</span></span><!--[endif]--></span></span></a> Bazelon, Irwin.<span> </span><em>Knowing the Score:<span> </span>Notes on Film Music</em>.<span> </span>New York:<span> </span>Van Nostrand Reinhold Company, 1975.</p>
</div>
<div>
<p class="MsoFootnoteText"><a name="_ftn6" href="#_ftnref6"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span>[4]</span></span><!--[endif]--></span></span></a> Holt, Fabian.<span> </span><em>Genre in Popular Music</em>.<span> </span>Chicago:<span> </span>The University of Chicago Press,<span> </span>2007.<span> </span>Pp. 4-5.</p>
</div>
<div>
<p class="MsoFootnoteText"><a name="_ftn7" href="#_ftnref7"><span class="MsoFootnoteReference"><span><!--[if !supportFootnotes]--></span></span></a></p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/pandora-and-the-genes-of-music-genres/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Illuminating Historical Architecture</title>
		<link>http://www.scholarslab.org/digital-humanities/illuminating-historical-architectur/</link>
		<comments>http://www.scholarslab.org/digital-humanities/illuminating-historical-architectur/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 20:30:22 +0000</pubDate>
		<dc:creator>Ethan Gruber</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=60</guid>
		<description><![CDATA[Following up on my introduction to using 3D models to recreate archaeological sites and perform meaningful academic analysis on simulated virtual environments, I will discuss in further detail my current project concerning the recreation of the House of the Drinking Contest in Seleucia Pieria, the port city of Roman Antioch. The house in its final [...]]]></description>
			<content:encoded><![CDATA[<p>Following up on my <a title="Research Applications for 3D Models in Art History" href="http://scholarslab.lib.virginia.edu/index.php/digital-humanities/research-applications-for-3d-models-in-art-history/" target="_blank">introduction</a> to using 3D models to recreate archaeological sites and perform meaningful academic analysis on simulated virtual environments, I will discuss in further detail my current project concerning the recreation of the <a href="http://cti.itc.virginia.edu/~jjd5t/ant-pics/10/index.htm">House of the Drinking Contest</a> in Seleucia Pieria, the port city of Roman Antioch.</p>
<p><span id="more-60"></span></p>
<p>The house in its final phase dates to the third century A.D. and exhibited some of the most complete eastern Roman mosaics, all of which were removed from the site following the 1930&#8242;s excavations and placed in American museums (including Richmond&#8217;s very own <a title="Virginia Museum of Fine Arts" href="http://www.vmfa.state.va.us/" target="_blank">Virginia Museum of Fine Arts</a>).  What better way to view the mosaics than to recreate the environment in which they existed?  Mosaics in museums are entirely out of their original context.  Many floor mosaics are now hanging on walls.  Even in occasions that museums create elaborate sets to mimick the rooms from which the artwork was taken, it is impossible to recreate the entire structure or accurately recreate the lighting and allow us to view the mosaics as the original owners of the House of the Drinking Contest would have.</p>
<p>In my previous project of modeling the <a href="http://en.wikipedia.org/wiki/House_of_the_Faun">House of the Faun</a>, one of the largest houses in Pompeii, I had a lot of information to work with.  I had many photographs and artists&#8217; reconstructions to consider.  While the ceilings and roofs are gone, the walls are still more or less intact, and so are many of the wall paintings.  The House of the Drinking Contest is much more of a challenge since the walls collapsed and were removed long ago, leaving at most a half a meter of brick and rubble left.  There are clues, however, that let us accurately estimate the height of the walls, and hence a full reconstruction.  The plan indicates that columns were about 0.9 meters in diameter.  From our knowledge of classical orders and the overall dimensions of the house and rooms, we can assume the columns would not have been Corinthian or Ionic since both would have been too out of proportion with respect to the rest of the house.  The reason is that Corinthian and Ionic columns have 10:1 and 9:1 height-to-diameter ratios, respectively.  We can then safely assume an average-height Doric colonnade at a 5.5:1 ratio.  Other clues and experimentation with natural light simulation allow us to predict plausible window locations.</p>
<p><a title="House of the Drinking Contest 3D by Ethan Gruber" href="http://people.virginia.edu/~ewg4x/house_of_the_drinking_contest.jpg" target="_blank"><img src="http://people.virginia.edu/~ewg4x/house_of_the_drinking_contest_thumb.jpg" alt="House of the Drinking Contest 3D by Ethan Gruber" width="320" height="180" /></a></p>
<p>(click for larger image)</p>
<p>Lighting simulation and computer modeling enable us to take this a step further and create timelapse animations demonstrating how light shifted throughout the hours of the day or days of the year.  We then know when mosaics would have been exposed to direct sunlight or were in the shade.  I found it useful to create an animation of standing in the <em>triclinium </em>(dining room) of the house, looking west toward the courtyard, to see if the <em>triclinium </em>received direct sunlight at any point of the day.  So far I have found that it does on March 21st of A.D. 200, and probably throughout the spring and autumn.  In fact, the room&#8217;s mosaics are illuminated quite beautifully right around dinner time.</p>
<p><a title="House of the Drinking Contest timelapse" href="http://people.virginia.edu/~ewg4x/timelapse.mov" target="_blank" class="broken_link">[Link to video]</a>.</p>
<p>While there is still work to do in the modeling, texturing, and animation of this particular Roman house, the use of accurate modeling techniques and lighting simulation can have a profound impact on archaeology, particularly in cultures that are solar-oriented.  I attended the <a title="CAA 2009" href="http://caa2009.org" target="_blank">Computer Applications and Quantitative Methods in Archaeology</a> conference last week in Williamsburg, and while there were many demonstrations of 3D models, none of the projects focused on incorporating temporal lighting and analyzing the outcome.  In nearly every case, temporal lighting is not even a consideration.</p>
<p>I did get a chance to informally demonstrate some of my work on the House of the Faun and the House of the Drinking Contest to some other classical archaeologists who are also involved in virtual reconstruction, but this facet of computer modeling has yet to hit the mainstream digital archaeology field, it seems.  Perhaps I will have the opportunity to demonstrate it to a wider audience at CAA next year.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/illuminating-historical-architectur/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
<enclosure url="http://people.virginia.edu/~ewg4x/timelapse.mov" length="3260274" type="video/quicktime" />
		</item>
		<item>
		<title>Electronic Text Analysis and the Wary Humanist</title>
		<link>http://www.scholarslab.org/digital-humanities/electronic-text-analysis-and-the-wary-humanist/</link>
		<comments>http://www.scholarslab.org/digital-humanities/electronic-text-analysis-and-the-wary-humanist/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 21:18:34 +0000</pubDate>
		<dc:creator>sh3vr</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=55</guid>
		<description><![CDATA[For a long list of complicated reasons, most practitioners of my discipline—political theory—tend to be suspicious of, if not altogether opposed to, the integration of computer technology into their research and teaching. While some scholars cite the superfluity of computer technology to the discipline (excepting, of course, Microsoft Word), others argue that the introduction of [...]]]></description>
			<content:encoded><![CDATA[<p>For a long list of complicated reasons, most practitioners of my discipline—political theory—tend to be suspicious of, if not altogether opposed to, the integration of computer technology into their research and teaching. While some scholars cite the superfluity of computer technology to the discipline (excepting, of course, Microsoft Word), others argue that the introduction of certain technologies might somehow actually endanger both thinking and learning (and who wouldn’t find the reduction of Plato to a series of PowerPoint slides, well, a tad reductive?).</p>
<p><!--cut--></p>
<p>Nevertheless, working at the Scholars’ Lab has afforded me the opportunity to sample a range of digital scholarship tools/resources, some of which might appeal to that most skeptical of techno-skeptics, the political theorist. One such resource is <a href="http://portal.tapor.ca/portal/portal">TAPoR</a>, the Text Analysis Portal for Research, a website that provides access to tools used in the analysis of electronic texts.</p>
<p>Many classic texts are now available in electronic form, and “computer-assisted text analysis” (TAPoR website) enables the researcher to explore a text in ways that are difficult, if not impossible, using only conventional tools for text analysis such as the index or a concordance (though much electronic text analysis is modeled conceptually on both of these). This is generally accomplished by allowing the researcher to search a text for specific words or word patterns or to generate a listing of the most frequently used words. In the case of TAPoR’s word/word pattern search, the results are displayed in the context of the surrounding text—the words sought are in bold, while several additional words on either side give the researcher some sense of how the words are being used. One may employ this kind of analysis with many ends in view, though some common goals include: a) testing to see whether—and if so, how often—an author employs either specific language or a certain kind of language and b) exploring certain words or phrases in context in order to gauge the narrowness or expansiveness of the author’s meaning when such language is used.</p>
<p>In order to explore the features and capacities of TAPoR, I brought to the portal an aspect of a particular research question that I had been thinking about for some time. To write my dissertation, I must provide at least a provisional answer to the following question: Does John Locke articulate a consistent view of the human person throughout his corpus? Although I was very familiar with the way he spoke about the “person” in one text, I was less familiar with his usage in other texts. After sketching a brief definition of the “person” based the first text, I proceeded to investigate whether Locke spoke of the “person” in similar terms in a second text. I pasted the URL of a webpage that contained the second text into TAPoR, which I then asked to search the document for the word “person.” I performed several additional searches, using other key words/phrases from my original definition as well as others that came to mind as I was searching. The results were illuminating. I discovered that although Locke tended to use the word “person” in a more ordinary, less philosophical sense in the second text, all of the basic features of the first text&#8217;s conception were nevertheless present. While this confirmed the intuition I had about the consistency of Locke’s view of the “person” across his texts (or at least two of them), the specific instances of personhood language that I isolated with the help of TAPoR will allow me to present a much more convincing defense of my position in the dissertation. Additionally, the fact that TAPoR allows the researcher to view all results of a word search simultaneously helped me to formulate more precisely what was going on in the text and to relate it to my more general argument&#8211;i.e. a looser, more familiar usage of &#8220;person&#8221; in certain contexts can co-exist with a unified, consistent account of personhood.</p>
<p>TAPoR enabled me to “see” more than I otherwise would have in a text and could be a valuable resource for scholars in any field concerned with the close and careful reading of texts.<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/electronic-text-analysis-and-the-wary-humanist/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mapping Regional Language Use</title>
		<link>http://www.scholarslab.org/geospatial-and-temporal/50/</link>
		<comments>http://www.scholarslab.org/geospatial-and-temporal/50/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 14:15:36 +0000</pubDate>
		<dc:creator>wmr8e</dc:creator>
				<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=50</guid>
		<description><![CDATA[So for the thousandth (or so it seems) time I’ve gotten into this discussion with my friends from the East Coast and Midwest (I’m from Texas) about the correct way to refer to a sweet carbonated beverage, and I have finally got to thinking about ways to map locally spoken slang and jargon using GIS.  [...]]]></description>
			<content:encoded><![CDATA[<p>So for the thousandth (or so it seems) time I’ve gotten into this discussion with my friends from the East Coast and Midwest (I’m from Texas) about the correct way to refer to a sweet carbonated beverage, and I have finally got to thinking about ways to map locally spoken slang and jargon using GIS.  Starting a database of ‘events’ where a person uses unique language in reference to a common-place item or occurrence (I have a friend from Wisconsin who calls the drinking fountain a “bubbler”) would be an insightful way to examine how jargon or slang starts and spreads geographically.<!--cut--></p>
<p>So I decided to indulge my curiosity and create a small database consisting of the answers to two quick survey questions; What do you call sweet carbonated beverages?, and what state do you identify yourself as being “from”?.  I solicited friends and colleagues for the answers to these questions and ended up with about 150 useable responses (if you were one of the people who responded with “beer”, I thank you for the interest in the survey, but your answer was not included).  I chose to ask this question (please bear in mind that linguistics is not a focus of my studies) because regardless what you refer to it as, most people have had experience with a coke/soda/soda-pop/pop, which isn’t true for all objects of regional jargon (example: before moving to the East Coast I had never seen nor heard of scrapple) and I wanted to document the geographical extent and overlap of a single object rather than attempt to compare multiple similar objects with this first foray.</p>
<p>Approximately 94% of respondents identified that they referred to sweet carbonated beverages as either “coke”, “soda”, “pop”, or “soda-pop”, so I chose to focus the mapping of this data on those four responses.  I took the responses I received and calculated a ‘count’ by state of each type of response; for example, I received a total of 4 responses from people who identified as being from the state of Missouri.  Three of the respondents refer to sweet carbonated beverages as “coke”, and one refers to it as “soda”.  I took these counts and normalized them to the total number of responses received from the state and used that percentage to map the responses by state broken into ~25%, ~50%, ~75%, and 100%.  For each response (coke, soda, pop, soda-pop) I chose a single color to represent responses on the map, and varied the transparency of the color to represent the percentage of the response (25% response = 75% transparency, 50% response = 50% transparency, 75% response = 25% transparency, and 100% response = 0% transparency).  I mapped all four responses separately first (figure 1).</p>
<p><img style="vertical-align: middle;" src="http://people.virginia.edu/~jfg9x/clip_image002.jpg" alt="Figure 1" width="480"></p>
<p>I chose to vary transparency as opposed to saturation of color (eg: monochromatic choropleth) because I wanted to be able to overlap the response maps to visualize the confluence of the regional terms yet keep the original colors of each response (figure 2).</p>
<p><img style="vertical-align: middle;" src="http://people.virginia.edu/~jfg9x/clip_image003.jpg" alt="" width="480" /></p>
<p>The map above shows the overlap of “coke” responses with “soda” responses, which are displayed by the variation in colors from bright red where 100% responses were “coke” to bright blue where 100% responses were “soda” and various shades of purple and pink in between where there was a mix of responses in that state.  This kind of map can be created using a map with a double ended scale, but that type of visualization is limited to displaying the spectrum between two absolute responses, which would mean that I could only display the confluence of two responses rather than all four (figure 3).</p>
<p><img style="vertical-align: middle;" src="http://people.virginia.edu/~jfg9x/clip_image004.jpg" alt="Figure 3" width="480" /></p>
<p>One interesting thing I noticed when looking at the results of this survey is that I need to meet more people from the Pacific Northwest section of the country.  The other interesting result I noticed which is more pertinent to the questions asked in this study is the confluence of the regional jargon that occurs in the region that includes Kentucky, Indiana, Ohio, and Illinois.  This area represents the confluence of the “soda” and “pop” responses and is also the region with responses of “soda-pop”, a hybridization of “soda” and “pop”.</p>
<p>This exercise seems to make the argument that assembling databases of ideas such as regional jargon and using tools like GIS to display that information is a thought provoking and possibly worthwhile endeavor.  (I’d like to thank all of my friends and colleagues who participated in this survey that allowed me to assemble and produce this study for digestion by the blogosphere. Thanks, you guys!)<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/geospatial-and-temporal/50/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Mining and Mapping Apocalyptic Texts, Part 1</title>
		<link>http://www.scholarslab.org/digital-humanities/how-digital-humanities-can-improve-my-dissertation-part-1/</link>
		<comments>http://www.scholarslab.org/digital-humanities/how-digital-humanities-can-improve-my-dissertation-part-1/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 14:10:20 +0000</pubDate>
		<dc:creator>mam3tc</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=49</guid>
		<description><![CDATA[I have used computer technology to help my work in biblical interpretation for a while. I learned to do complex digital word searches with the Bibleworks software package early in my graduate career. When I started working at the Scholars’ Lab in the summer of 2006, I was introduced to digital humanities. I found these [...]]]></description>
			<content:encoded><![CDATA[<p><!--[if gte mso 9]&amp;gt;  Normal 0     false false false  EN-US X-NONE X-NONE              MicrosoftInternetExplorer4              &amp;lt;![endif]--><!--[if gte mso 9]&amp;gt;                                                                                                                                            &amp;lt;![endif]--><!--[if !mso]&amp;gt;-->I have used computer technology to help my work in biblical interpretation for a while.<span> </span>I learned to do complex digital word searches with the Bibleworks software package early in my graduate career.<span> </span>When I started working at the Scholars’ Lab in the summer of 2006, I was introduced to digital humanities.<span> </span>I found these technologies fascinating.<span> </span>But how, I asked, could they help me interpret ancient religious texts in their original languages?<span> </span>I recently posed this question to some of my colleagues in the Scholars’ Lab and was pleasantly surprised by the answers.<span> <!--cut--></span>In this two-part blog, I will consider these answers in relation to my dissertation, which focuses on several passages in the Apostle Paul that speak of the final fate of humankind.<span> </span>Some of these passages suggest that all people will, in the end, be made right with God.<span> </span>Other passages suggest that some people will be permanently alienated from God.<span> </span>I wish to discover the central kernel of Paul’s thinking about the fate of humankind (called soteriology) that would make both of these statements true.<span> </span>In this first entry, I will focus on how I plan to use text-mining to enhance my ability to compare dozens of Greek, Hebrew, and Latin texts with each other more quickly and more thoroughly than I could manually.<span> </span>The second part will focus on how geographic information systems (GIS) will help me to place Paul’s writings in spatial relationships with other writings of the same time.</p>
<p class="MsoNormal">Text-mining involves parsing a digital text, inserting the words along with their linguistic features into a database, searching for patterns within the database, and, finally, evaluating the results.<span> </span>In my case, I will use text-mining methodologies to extract linguistic data from the Pauline texts as well as other early Jewish texts that speak of the fate of humankind.<span> </span>This process will be fairly straightforward for the Pauline texts.<span> </span>There are many versions of the Greek text of Paul that have linguistic data attached to the words.<span> </span>One need simply extract this data from the text and insert it into a database.<span> </span>I will use the text of the Bibleworks 6 package as my source for Paul.<span> </span>For other texts, this process will not be as easy.<span> </span>For instance, the Thesaurus Lingua Graecae has a huge collection of Greek texts for the period that interests me.<span> </span>But they have no linguistic data attached to the words.<span> </span>To attach linguistic data to these words, I will need to write a script, probably in PERL, to query the open source parsing engine from the Perseus Project at Tufts University (<a href="http://www.perseus.tufts.edu/">www.perseus.tufts.edu</a>).<span> </span>I will then insert the results from these queries into the database for that text.</p>
<p class="MsoNormal">The next step will be to design queries that will find appropriate relationships among the texts.<span> </span>Good methodology requires that I test my queries against a set of texts for which I know the results.<span> </span>I will test my queries on several Greek apocalyptic texts which I have already read carefully, noting the sections that relate closely to Paul.<span> </span>Once I have designed a set of useful queries, I will apply these queries to the databases I created earlier.</p>
<p class="MsoNormal">The application of these queries should point to numerous texts that I will then manually analyze to determine their meaning and how they relate to the Pauline passages under investigation.<span> </span>If my investigation were purely manual, I would begin by reading the texts in English in order to find promising passages that I would then examine more closely in their original language, whether Greek, Hebrew, or Latin.<span> </span>This digital method, though, will do this first analysis using the original languages.<span> </span>That means that this computerized comparison of the texts in their original languages will find verbal and grammatical similarities that may be obscured or destroyed in translation.<span> </span>In the end, I expect text-mining to return data that would be only partially accessible by manual means.<span> </span><span> </span></p>
<p class="MsoNormal">In my next post, I will consider how another area of digital humanities, geographic information systems, can help me to explore how the Pauline and apocalyptic texts are related spatially, instead of linguistically, to each other.</p>
<p class="MsoNormal">
<p><!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/how-digital-humanities-can-improve-my-dissertation-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Research Applications for 3D Models in Art History</title>
		<link>http://www.scholarslab.org/digital-humanities/research-applications-for-3d-models-in-art-history/</link>
		<comments>http://www.scholarslab.org/digital-humanities/research-applications-for-3d-models-in-art-history/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 21:51:23 +0000</pubDate>
		<dc:creator>Ethan Gruber</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=45</guid>
		<description><![CDATA[These days, it is difficult to find a television documentary detailing an archaeological site that does not feature a representation in the form of a 3D model. Computer models make good teaching tools. A class of students may not have the opportunity to travel to Rome to view the Colosseum first-hand, and even if they [...]]]></description>
			<content:encoded><![CDATA[<p style="0in;">These days, it is difficult to find a television documentary detailing an archaeological site that does not feature a representation in the form of a 3D model.  Computer models make good teaching tools.  A class of students may not have the opportunity to travel to Rome to view the Colosseum first-hand, and even if they did, they would have great difficulty visualizing what the mostly-ruined structure looked like 1,900 years ago.  A model based on the most recent archaeological research, however, can help fill in the gaps left by time and the elements. <!--cut--></p>
<p>One of the more important aspects of a computer model is that it is dynamic.  Using software, a model can be adjusted to reflect newer theories of the site&#8217;s architectural reconstruction.  This is certainly a stark contrast to artists&#8217; sketches and paintings, which, over time, tend to become outdated.  Importantly, like other visualization methods used in the humanities (such as GIS), 3D models can help scholars get a fuller picture of a site and formulate research questions that never would have been considered otherwise.  This is the case in my most recent research.</p>
<p style="0in;">Having never truly given up on the video game design aspirations of my high school days (I specifically remember my father turning the breaker off to the upstairs when I was up until 4 AM designing a Quake map), I have found a niche within my field of academic interest—Roman archaeology and architectural history.  While many of my Pompeianist classmates take a more traditional approach to graduate research projects, I chose to develop a 3D model of the House of the Faun, one of the largest and most famous houses in the city.  The model was constructed as accurately as possible based on the archaeological plan, a number of artists&#8217; reconstructions, and photographs of the house (many gathered from <a href="http://www.flickr.com/">Flickr</a>).</p>
<p style="0in;">The intent of the model was to test art historians&#8217; philosophical assertions about Roman atrium houses.  With accurate lighting simulation (i. e., calibrating a simulated sunlight to the latitude and longitude of the house and to any point in time back to antiquity), high resolution images of the model rendered by <a href="http://www.mentalimages.com/">Mentalray</a> software gave me a glimpse of what the House of the Faun looked like at noon on January 1<sup>st</sup>, 100 B.C., which is something no artist can replicate.</p>
<p style="0in;">Coincidentally, lighting simulation may have an impact on how we consider the artwork within the house.  For example, when many art historians point to the colors of a mosaic as being proof of its Greek influence, can that assertion bear the burden of the fact that the mosaic was rarely in sunlight?</p>
<p style="0in;"><img class="alignleft" src="http://farm4.static.flickr.com/3316/3300953093_53b43154c2.jpg" alt="House of the Faun" /></p>
<p style="0in;">Many of us have seen Roman floor mosaics hanging on the walls of American and European museums, but they have been removed from their original context.  Even in Pompeii, one of the best-preserved sites of the ancient world, the roofs collapsed long ago, making it difficult to visualize the natural lighting scenario within the House of the Faun and other structures within the city.  3D models allow us to put artworks back in their original context and consider how the ancients viewed them, which is quite different from how we view them now.  In this case, the computer model is more than just a teaching tool; it is a scholarly research tool.<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/research-applications-for-3d-models-in-art-history/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Peer Review for Visual Aids?</title>
		<link>http://www.scholarslab.org/visualization-and-data-mining/peer-review-for-visual-aids/</link>
		<comments>http://www.scholarslab.org/visualization-and-data-mining/peer-review-for-visual-aids/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 19:01:20 +0000</pubDate>
		<dc:creator>wmr8e</dc:creator>
				<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=43</guid>
		<description><![CDATA[How frustrating is this: You sit down to take in some form of scholarly work (be it a book, an article, or a talk) and you find yourself increasingly confused with a bombardment of information from graphs and figures and maps which don’t make sense because they either have too much or too little information [...]]]></description>
			<content:encoded><![CDATA[<p>How frustrating is this: You sit down to take in some form of scholarly work (be it a book, an article, or a talk) and you find yourself increasingly confused with a bombardment of information from graphs and figures and maps which don’t make sense because they either have too much or too little information contained within them or the information is poorly labeled (if at all).  Or even worse, you are the person writing the book/article or giving the talk and instead of fielding questions on your scholarly processes, you are repeatedly explaining to the audience what your visual aids <em>actually </em>represent.</p>
<p>A picture may be worth a thousand words, but if it is not a language your audience speaks, where have your efforts gotten you?<!--cut--><br />
Typically, when I read a scholarly article, my first read-through goes as follows: I read the abstract, I look at each one of the figures/maps/tables/graphs and their annotations, and I read the conclusion.  Its not until the second read-through that I examine the bulk of the text.  I think that words sometimes have the unfortunate tendency to obfuscate the true findings of research and, truth be told, I like to find out if I draw the same conclusions from the provided data as the author(s) do.  My process stumbles when I encounter articles with figures/graphs/maps etc. which have either a glut or a dearth of information contained in them, making non-intuitive to the uninitiated reader.  Some highlights:  A map of a state containing rivers, waterbodies, and watershed boundaries (the focus of this particular article) AND all of the major roads and highways (NOT the focus of the article).  All in gray-scale.  Add in the point locations and names of the state’s twelve most populous cities and cram it into a box three inches tall by five inches wide.  The focus of the article was on modeling and delineating the major and minor watersheds of the area in order to develop a best management practice for cooperating water districts.  Needless to say, that point was lost in the shuffle.  Another example which is all too common: a graph depicting change over time of 10 or more constituents using various dotted, dashed, and solid lines of variable thickness.  With that amount of information crammed into a single visual aid, the results are simply lost in the shuffle.</p>
<p>We have writing clinics and public speaking critique sessions, why don’t we have a peer evaluation system for visual aids?  I think that many people (myself included) fall into a habit of having our material critiqued solely by our close working group.  While this is certainly a necessary step in the writing process&#8211;the people most familiar with our work are the ones most likely to pick up on the esoteric flaws&#8211;many scholars neglect to obtain peer review from individuals tangential to or completely outside of their small fields.  I would say that one of our main objectives as scholars is to use our work to excite interest from members of the scholarly community inside and outside of our focused area.   In my opinion, an important step towards this goal is to make our visual aids more accessible to the curious non-expert.</p>
<p>I would like to see our scholarly community develop this type of peer-review network where we can utilize the human resources around us to improve our intellectual contribution to all of our respective fields.  We could have minds from a variety of fields of study working collaboratively to improve the accessibility (and therefore the use) of our collective body of knowledge.   I think the concept has amazing potential.<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/visualization-and-data-mining/peer-review-for-visual-aids/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Social Media and the Inauguration</title>
		<link>http://www.scholarslab.org/announcements/social-media-and-the-inauguration/</link>
		<comments>http://www.scholarslab.org/announcements/social-media-and-the-inauguration/#comments</comments>
		<pubDate>Fri, 16 Jan 2009 17:46:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=41</guid>
		<description><![CDATA[Join us in the Scholars&#8217; Lab Monday morning through Wednesday night next week, as we project the social media landscape surrounding next week&#8217;s historic presidential inauguration. We&#8217;ll be showing real-time Twitter and Flickr feeds that record people&#8217;s responses to the event and their efforts at citizen-journalism. We&#8217;ve also created a home-grown geospatial visualization so that [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://farm4.static.flickr.com/3525/3202007970_729b7e0186.jpg' alt='Social Media in the SLab' class='alignleft' /> Join us in the Scholars&#8217; Lab Monday morning through Wednesday night next week, as we project the social media landscape surrounding next week&#8217;s historic presidential inauguration.</p>
<p>We&#8217;ll be showing real-time <a href="http://twitter.com">Twitter</a> and <a href="http://flickr.com">Flickr</a> feeds that record people&#8217;s responses to the event and their efforts at citizen-journalism.  We&#8217;ve also created a home-grown geospatial visualization so that you can follow the worldwide conversation!</p>
<p>Visit the Lab for a little social interaction of your own, or <a href="http://www2.lib.virginia.edu/scholarslab/inauguration/">access the site</a> (which includes more information and related links) online.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/announcements/social-media-and-the-inauguration/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Map &#8220;Vocabularies&#8221;</title>
		<link>http://www.scholarslab.org/geospatial-and-temporal/map-vocabularies/</link>
		<comments>http://www.scholarslab.org/geospatial-and-temporal/map-vocabularies/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 15:09:46 +0000</pubDate>
		<dc:creator>wmr8e</dc:creator>
				<category><![CDATA[Geospatial and Temporal]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=33</guid>
		<description><![CDATA[For the past year, I have been working on the Scholars&#8217; Lab Geospatial Data Portal, the lab’s effort to make our GIS data sets readily available to UVA students, faculty, and staff via the world wide web by using a suite of open source, open standards-based applications. A particular aspect of this project that I [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">For the past year, I have been working on the Scholars&#8217; Lab Geospatial Data Portal, the lab’s effort to make our GIS data sets readily available to UVA students, faculty, and staff via the world wide web by using a suite of open source, open standards-based applications.<span> </span>A particular aspect of this project that I have enjoyed exploring is the way in which we display our visual information.<span> </span></p>
<p class="MsoNormal">
<p class="MsoNormal">Stop to think about the last paper map you used.<span> </span>Minor roads were probably displayed with a line of a certain color and thickness, highways with another.<span> </span>Green spaces were colored differently from open water and buildings etcetera.<span> </span>Cartographers have long toiled to develop visual representations of our environment and make them identifiable for the greater use.<span> <!--cut--></span>People naturally associate certain colors on a map with identifiable features in their environment (eg:<span> </span>the association of green on a map to forests, parks, and open areas).<span> </span>Much like a book, these symbols and representations must create a language which is understandable to the audience; else the information contained on the map will go unutilized.<span> </span></p>
<p class="MsoNormal">
<p class="MsoNormal">What I have done for the Geospatial Data Portal is to expand our symbolic vocabulary.<span> </span>I create styles; XML based documents which allow us to display visual information through symbols that our patrons will understand and identify with specific attributes.<span> </span>An example:<span> </span>I can map the waterlines for a given city with a solid pink line with a width of 2 pixels.<span> </span>While it is true that the information is mapped and is useful to an extent, I think there is a way to display the same information while making it more visually recognizable as city waterlines and ultimately making the information more useable to our patrons.<span> </span>Instead of a solid pink line of a single width, we can display the information as blue lines with differing widths dependant upon the size of the pipe (ex: a main line feeder pipe with a diameter of 15ft is represented as a blue line with a pixel width of 8, whereas a small pipeline with a diameter of 2ft is represented with a blue line with a 1 pixel width.<span> </span></p>
<p class="MsoNormal">
<p><img style="width: 100%;" src="http://people.virginia.edu/~wmr8e/clip_image002.jpg" alt="" /></p>
<p class="MsoNormal">
<p class="MsoNormal">
<p class="MsoNormal">So what has this accomplished?<span> </span>People tend to associate size on a map with importance in the real world, so by exaggerating the size difference of the pipe by weighting pixel width we can draw our users’ attention to the important locations on the map.<span> </span>And by using blue, we identify our information of interest as a water feature because most people associate blue on a map with water features in their environment.<span> </span>Now our patrons are able to go from displaying simple lines on a page to creating a map which displays intuitively symbolized information using only their internet browser.<span> </span>I believe this project has the potential to greatly expand the user-base for our GIS data sets and allow for new forms of scholarship because it makes the process of displaying information in an identifiable and comprehensible much more user friendly.<span> </span></p>
<p><!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/geospatial-and-temporal/map-vocabularies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Biblical Statistics</title>
		<link>http://www.scholarslab.org/digital-humanities/biblical-statistics/</link>
		<comments>http://www.scholarslab.org/digital-humanities/biblical-statistics/#comments</comments>
		<pubDate>Thu, 09 Oct 2008 15:39:38 +0000</pubDate>
		<dc:creator>mam3tc</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=22</guid>
		<description><![CDATA[The first topic that I chose for my dissertation in UVA&#8217;s Department of Religious Studies was the “School of Saint Paul.” I hoped to show the existence of a group of followers who surrounded Paul and engaged with him in the interpretation of the Old Testament. In order to do this, I decided to investigate [...]]]></description>
			<content:encoded><![CDATA[<p>The first topic that I chose for my dissertation in UVA&#8217;s <a href="http://artsandsciences.virginia.edu/religiousstudies/index.html">Department of Religious Studies</a> was the “School of Saint Paul.”  I hoped to show the existence of a group of followers who surrounded Paul and engaged with him in the interpretation of the Old Testament.  In order to do this, I decided to investigate how Paul used scripture in his epistles and how the followers of Paul used the same scripture in their writings.  I anticipated finding certain portions of the Old Testament that either were used exclusively in the Pauline and post-Pauline literature or were used differently in the Pauline and post-Pauline literature than in the rest of the New Testament.</p>
<p>But I had a problem.  <!--cut--> The fund of Pauline and post-Pauline quotations and allusions to the Old Testament numbered more than 1000 cases.  How could I represent such a large set of data in a way that made them easily comprehensible?  A friend of mine suggested that I needed to represent the data graphically.  And a colleague here at the Scholars’ Lab, where I work as a graduate consultant, advised <a href="http://en.wikipedia.org/wiki/SPSS">SPSS</a> as the best way to accomplish such graphical representation.</p>
<p>I already had a table that I had made in Microsoft Word of every usage of the Old Testament in both the Pauline and post-Pauline literature.  I needed to get this data into SPSS with as little headache as possible.  So, I converted the data into an Excel file, saved this in a format that SPSS could read, and then imported it into SPSS.  At that point, I had accomplished the hard part.  All that was left to do was to analyze and graphically represent this data.  And here is one example of what I produced:</p>
<p><a href="http://farm4.static.flickr.com/3089/2926354745_1271e8325a_o.jpg"><img src='http://farm4.static.flickr.com/3089/2926354745_30d7a3ab10.jpg' alt='Genesis 1-3 in the New Testament' class='alignnone' /></a></p>
<p>Unfortunately, this analysis of the data made it clear that the evidence was insufficient for my dissertation!  I found no significant chunks of the Old Testament that were used exclusively in the Pauline and post-Pauline literature.  And I discovered that trying to set the Pauline and post-Pauline use of scripture against that of the rest of the New Testament was speculative, at best.  I ended up having to change my dissertation topic.  But it was this statistical analysis and work in information visualization that made it clear to me that the evidence was insufficient.  Without it, it is possible that I would still be chasing the wild goose that was my previous topic.<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/biblical-statistics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Measure Text?</title>
		<link>http://www.scholarslab.org/digital-humanities/how-to-measure-text/</link>
		<comments>http://www.scholarslab.org/digital-humanities/how-to-measure-text/#comments</comments>
		<pubDate>Tue, 09 Sep 2008 04:03:13 +0000</pubDate>
		<dc:creator>cforster</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[Visualization and Data Mining]]></category>

		<guid isPermaLink="false">http://scholarslab.lib.virginia.edu/?p=14</guid>
		<description><![CDATA[&#8230;the words we join have been joined before, and continue to be joined daily. So writing is largely quotation, quotation newly energized, as a cyclotron augments the energies of common particles circulating. - Hugh Kenner, The Pound Era This month marks the beginning of the complicated process of starting up the Large Hadron Collider, the [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>&#8230;the words we join have been joined before, and continue to be joined daily. So writing is largely quotation, quotation newly energized, as a cyclotron augments the energies of common particles circulating.</p>
<p>- Hugh Kenner, <em>The Pound Era</em></p></blockquote>
<p>This month marks the beginning of the complicated process of starting up the <a title="Large Hadron Collider" href="http://lhc.web.cern.ch/lhc/">Large Hadron Collider</a>, the world&#8217;s largest particle accelerator (Kenner would haved called it a &#8220;cyclotron&#8221;), buried beneath the Franco-Swiss border. Near the top of the LHC&#8217;s agenda is having a peek into the fabric of space-time to see about the Higgs-Boson, the theorized source of mass.</p>
<p>But to do so they&#8217;ll need data&#8211;lots of data. According to <a title="Data from the LHC Experiments" href="http://gridcafe.web.cern.ch/gridcafe/animations/LHCdata/LHCdata.html" class="broken_link">CERN</a>, the event summary data extracted from the collider&#8217;s sensors will produce around 10 terabytes daily. That is something like, to use the cliché, the equivalent of a Library of Congress&#8217;s worth of data every day (the raw data is much much greater).<br />
<!--cut--></p>
<p>The physics involved is obviously too complicated for a mere humanities major to discuss in any intelligent way. The interesting thing is the disparity between the sheer amount of data with which the LHC deals, as compared with the scale of the (textual) data of the humanities. How can the LHC, in a single day, focussed on a highly specific set of questions, produce as much information as the literary output of humans represented by the Library of Congress? Why, in short, is the textual data of the humanities so much smaller than the data produced by the LHC?</p>
<p>It is, of course, in some ways a silly, completely naive question. But the differences, in size alone, of these two datasets are nevertheless instructive and worthy of consideration. We might oversimplify the matter, and say that the LHC&#8217;s data, collected from its sensors and culled by its arrays of servers, is fundamentally information-poor data. The challenge faced by the LHC project is sorting through the complexities of the data to find the relevant information that will allow physicists to answer the questions they have. Language, by contrast, is information rich&#8211;so rich that our challenge is not how to separate the wheat from the chaff, but how to deal with the sheer flood of information compressed in text.</p>
<p>It is this fact that explains the disparity in size between the LHC&#8217;s data and the textual record of the humanities. The textual data of the humanities comes &#8220;preorganized&#8221; by language. While our digital texts encode only strings, language fills texts with syntactic and semantic information of which our systems of markup are completely oblivious.</p>
<p>Martin Wattenberg at IBM&#8217;s Watson Research Center puts it well in his <a title="Visualizing Big Data" href="http://www.wired.com/science/discoveries/magazine/16-07/pb_visualizing">interview with <em>Wired</em></a> when he describes language&#8217;s ability to compress information:</p>
<blockquote><p>Language is one of the best data-compression mechanisms we have. The information contained in literature or email, encodes our identity as human beings. The entire literary canon may be smaller than what comes out of particle accelerators or models of the human brain, but the meaning coded into words can&#8217;t be measured in bytes. It&#8217;s deeply compressed. Twelve words from Voltaire can hold a lifetime of experience.</p></blockquote>
<p>What happens if we take this understanding of language seriously? How would it change the way we deal with textual data?</p>
<p>Right now we have plenty of digital texts available, but in order to get the information out of the textual data we have to <em>read </em>it. Right now, only by reading do we attend to the specifically linguistic nature of textual data. Existing text analysis technologies and techniques remain largely quantitative, relying on machine learning techniques to classify texts that are represented by vectors of frequency counts. Key sources of linguistic information, however, like syntax, remain fundamentally unexploited. We are still, in effect, discarding some of the most basic sources of textual information&#8211;such as the <em>order</em> in which the words occur (seriously).</p>
<p>One avenue, though admittedly crude, is to use a technique like part-of-speech tagging to supplement raw text with part-of-speech tags which provide a fuller, more information-rich digital representation of the linguistic data. By analysing such part-of-speech tags, taking them in pairs, or looking at where in a sentence they occur, we get some sense of how a writer uses language. We step, in short, over the threshold from a purely quantitative view of language use (e.g. how many times does &#8220;of&#8221; occur per thousand words? what are the most frequently occurring terms?), to a mode of analysis that is able to extract the sort of information that we, humans, are able to when we read. Such techniques are admittedly crude; but they begin to recapture the fundamentally linguistic nature of textual data which is too easily discarded in representations of natural languages. To truly capitalize on the information contained in textual data requires finding more ways to digitally attend to the specifically linguistic nature of textual data.</p>
<p>We are trying to read the finely wrought braille of language through the burlap sack that current digital tools offer. With the combination of natural language processing tools (such as POS taggers, parsers, etc) and ever-more sophisticated machine learning techniques, we may be able to get closer. Humanities data is not, necessarily, smaller&#8211;it is just more compressed.<!--/cut--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.scholarslab.org/digital-humanities/how-to-measure-text/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
