My Yale PDP 2010 Presentation

Below is the text that I read at The Past’s Digital Presence: Database, Archive, and Knowledge Work in the Humanities, a graduate student symposium held Feb 19 & 20, 2010. I was lucky enough to be selected as one of the 24 presenters in the eight sessions held on Saturday. Before the sessions we were given the treat of wonderful talks by Jacqueline Goldsby and Peter Stallybrass, and after the sessions the closing roundtable featured Rolena Adorno, Ed Ayers, Willard McCarty, and George Miles. These esteemed scholars didn’t just pop in for their talks and then leave—they were with us in sessions, asking questions, learning stuff along with everyone else. They all seemed like incredibly nice people (and those I spoke to directly I know are incredibly nice people).

I was speaker #2 on a panel called “Theorizing the Digital Archive” with Stewart Campbell (Columbia) who presented “Eugène Atget and the Digital Archive” and Alexandre Monnin (Paris 1 Pantheon-Sorbonne) who discussed “What is a Tag: Digital Artifacts as Hermeneutical Devices”. Our panel was moderated by the esteemed Jessica Pressman (Yale) who is just super cool.

Although the title of my talk (no slides, just talking) was “Toward a Realization of the n-Dimensional Text” but as I told some people, the secret title was really “Archives: Ur Doin it Wrong”.

Links are included in the text below for anyone who finds them useful. I use [emphasized text in brackets] for more info; it wasn’t spoken or anything like that.

[Before I started reading, I asked everyone to imagine a world without walls between institutions and where everyone in different departments got along. Everyone just needs to hug and move forward and make things.]

For the last five years I’ve been transitioning out of a lengthy career in the high tech industry in which I developed, designed, implemented, and maintained enterprise web-based applications. For most of that time it has been my job to take a list of needs (desires, pipe dreams) and make something out of it—a new process, a new piece of software, something that fundamentally makes a difference in the way something else is done. So, even though now I’m a student in the humanities, when I read scholarly articles about literary and textual studies that include technical descriptions, specifications, or just plain technical terms, I necessarily scrutinize these aspects of those works as closely as one would examine discussion of meter and rhyme in an article about poetry.

Such was the case a few years ago when a special issue of PMLA arrived in my mailbox. In an issue devoted to “Remapping Genre”, Ed Folsom—co-director of the Walt Whitman Archive—argues for “database” as a genre, following Lev Manovich who discusses in The Language of New Media the narrative of databases and the need to develop a poetics and aesthetics for their study. But the way in which Folsom invokes the term “database” is not the way I, stepping aside and wearing my application developer hat, conceive of a database. Borrowing from Manovich, Folsom describes the “database” that is part and parcel of the Whitman Archive in its “symbolic form” rather than a physical appliance or application whose design is crucial to both the interface, user experience, and future scalability. I’m not here to debate whether or not “database” is a genre or has narrative elements and aesthetics (although I think it surely can and perhaps should), but instead to point out where I, coming to humanities computing more experienced in computing than humanities, recognize a slippage in terms and the implications of that slippage for projects in development.

But I wasn’t the only one who noticed, as both Meredith McGill and Jerome McGann respond to Folsom in the same PMLA issue by (in part) questioning the use of certain technical terms and also raising issues with the way in which users can (or cannot) interact with the Whitman Archive. In short, while Folsom uses the Whitman Archive as the basis for an argument regarding database as genre, and talks about the rhizomorphous nature of Whitman’s work contained within this database, McGill and McGann both point out the reliance of the Whitman Archive and its design on print works and ultimately static displays of content. Whitman’s work may be rhizomorphous, but the same cannot be said for either the database powering the Whitman archive or the archive itself.

For example, McGill argues that instead of providing a space through which rhizomorphous connections can be made, readers still are limited by what has been specifically placed within the archive, in the manner in which the editors placed it, accessible through interfaces ostensibly directed by them and implemented by developers, reaching into a database structured again with direction by the editors and by database administrators who are unlikely to understand the possibilities of the internal and external connections the texts and paratexts might contain and require for completeness. [unless they are also literary scholars, which is possible and will be more likely in the future]

What I’m saying here is in no way meant to disparage the Whitman Archive or anyone who has worked on it, but instead is intended to shed some light on the ways in which the development processes of humanities computing projects can work—more easily than one would think—within different paradigms for content interaction and display…paradigms that more closely match the ways in which users typically interact with content online today.

McGill mentions one way of solving the problem of reliance on book-like forms of display and interaction is to pay attention to “new ideas about database architecture and new developments in technology.” While “new” might be true for considerations of the database use in the humanities, I challenge that notion because the “ideas” about database and application architecture that allow for the organic formations of paths in and around a text are not new; relational databases are as old as computational databases, application programming interfaces—or APIs [see my “Working with APIs” parts one, two, three, and four]—are as old as applications, much in the same way that paratextual elements are as old as written texts.

Without detouring into software and platform studies, I will simply state that it is beyond time that digital archives of texts begin to adhere to the methodologies learned from Web 2.0 activities and applications (for we are now in a post-2.0, a Web n.0, time) and embrace the concepts of layering shared data and user-generated or user-customized content onto the core curated data within the archive.

Reading the articles in this PMLA cluster at the same time I was revisiting Jerome McGann’s theories on n-dimensional texts in various publications such as Radiant Textuality, I realized the source of my discomfort, or confusion—there appeared to be an assumption that current technology could not adequately remediate the textual condition. In other words, comments about “what we want” and “what we need” with regards to archive and interface were framed as technological limitations rather than what they really are—conceptual and developmental limitations in place at the beginning of project development and a continued adherence on static models.

We can move past these limitations by closely examining the texts and paratexts that are physically present in a text-based archive, recognizing that these tangible elements are themselves already highly relational and overlap and crisscross in ways not easily seen, analyzing the whole of the work’s textual condition, and finally building up from the frameworks available in literary theory and the structures already present in texts. This is not groundbreaking; it’s what textual scholars do. In a recent presentation by members of the INKE Group, “Beyond Remediation: The Role of Textual Studies in Implementing New Knowledge Environments”, the Group articulates the crucial role of discipline-specific knowledge in the creation of new models of interactive archives. Specifically, they note how “Textual studies, book history, literary studies, and other humanities disciplines have recently seen approaches that examine long-term continuities and discontinuities, overlap between new and old technology, and the multiplicity of social and cultural effects that result.” Leveraging, and not ignoring, these skills is how we will move forward and begin to model new knowledge environments.

This leads me to the “pipe dream” portion of this talk, in which I discuss how Jerome McGann’s relatively brief descriptions of n-dimensional texts, and what he terms the ‘Patacritical Demon, is not only possible with current technology and computing practices, but should actually be the starting point for design work when creating an interactive archive of texts. Consideration at the beginning of the project of issues of content, database design, interface design, user experience and interaction, scalability, extensibility,and sharability equals the path to success.

In his contribution to A Companion to Digital Humanities, “Marking Text in Many Dimensions” [and I should note (but didn’t) is placed directly after Stephen Ramsay’s chapter called “Databases”!] and in fact scattered throughout Radiant Textuality and other texts, Jerome McGann’s conceptualization of n-dimensional literary texts asserts that all features of the text—perceptual, semantic, syntactic, and rhetorical—signify meaning not only alone but in accordance or discordance with each other. [This lovely little summary comes right from Tanya Clement talking about similar things] The interaction of these features is “recursive,” like “a mobile with a shifting set of poles and hinge points carrying a variety of objects.” The ‘Patacritical Demon is something we in industry call “vaporware,” or a product that has been announced and perhaps lives somewhere in the world of specification documents and whiteboard drawings, but does not actually exist—that’s why there’s nothing on the screen behind me. [pretend you’re in the audience] But the Demon, as spec’d out by McGann’s team six years ago, is technically possible and, I would argue, a necessary layer of actions and interactivity for any text-based archive.

Briefly, [and according to a spec document I’m sure I shouldn’t have seen, but it wasn’t locked down and I found it, so…] McGann’s Demon is “a markup tool for allowing the a reader to record and observe interpretive moves through a textual field. The reader marks what are judged to be meaningfully interesting places/moments in that spacetime field—the marks being keyed to a set of control dimensions […] and behavioral dimensions.” The behavioral dimensions are linguistic, imagistic, documentary, graphical, semiotic, rhetorical, and social; in other words, readers can mark words, how those words (or images) relate within the whole field, and note issues of transmission and reception—among other things. The control dimensions are temporal, resonance, and connection; for example, one could compare marks made in the linguistic dimension during one session against those made in another, or reevaluate the resonance of a mark made in some behavioral dimension when revisiting it in another session. A second site of interactivity would be to view the marks made by other readers of the same text.

Although steeped in academic language and indeed consisting of many layers of interaction, this concept isn’t really functionally different than some of the current browser-based Web annotation applications such as Diigo and ReframeIt, which allow for marks by users to appear within the visual field of the document presented (this is different from something like CommentPress or SideWiki which contain the marks specifically in the margins). [see “SideWiki, Reframe It, Diigo: Considering Competing Web Annotation Systems” by me, for a little more info] In practice, the multidimensional aspects of the Demon could be developed using the z-index property within Cascading Style Sheets; this property allows for stackable elements with varying degrees of transparency—one could effectively layer a user’s marks on top of each other and interact on the client-side to reveal or hide those marks as required, using client-side keyboard or mouse events to swap the visibility properties of the underlying markup elements.

But more important to the discussion of the ‘Patacritical Demon as paradigm for archival interface is the necessity for the texts in the archive itself to be multifaceted, multidimensional, and in effect a boundless quantum archive. Turning back to the PMLA cluster and Meredith McGill’s comments about the Whitman Archive, McGill challenges Folsom’s claim that Whitman Archive “permits readers to follow ‘the webbed roots’ of Whitman’s writing as the ‘zig and zag with everything'”—for the archive doesn’t do this. The texts are there, to be sure, but in no form other than a static digital reproduction of the book, important though that may be, with no interface that identifies, highlights, or in any way illuminates or stores these webs and zigs and zags any differently than the physical book, or with any apparatus that allows the user to mark their own path or paths. McGill offers suggestions for encouraging the rhizomorphous connections that could be made when she says the archive should “provide hyperlinks to Whitman’s editorials in the Brooklyn Daily Eagle or to his short fiction that is available through public-domain sites such as Making of America” and so on. These are good suggestions to be sure, but still antiquated in relation to how content can be shared online, and in fact is shared online. Invoking Richard Stallman [restating Stewart Brand] here, “information wants to be free.” Stallman is not referring to price, but instead “to the freedom to copy the information and to adapt it to one’s own uses […] When information is generally useful, redistributing it makes humanity wealthier no matter who is distributing and no matter who is receiving.”

In her response to Ed Folsom, Meredith McGill asks “what would it take to realize Folsom’s vision of a database that allows readers to follow Whitman’s writing as it ‘darts off in unexpected ways’?” First, it’s not the database that allows this—it’s the interface to the database, or really, the interface to someone else’s database. Instead of using an uncontextualized hyperlink to send the reader outside the archive, use application programming interfaces to bring the content inside. In other words, mashup the archive.

If you want an archive to be rhizomorphous, then implement the processes that can mine the data from other sources—on demand, as the user winds their way through Whitman (in this case). With editorial (and programmatic) control over the APIs in use, “containing multitudes” takes on a whole new meaning. The multitudes of contextual and paratextual information that could be associated to the core archival texts through external means would truly allow the scholar to find and illuminate their own paths through the Whitman archive.

Return now to McGann’s ‘Patacritical Demon and think for a moment about how just an interaction layer for marking text in multiple dimensions, sitting on top of a content layer, changes modes of scholarship.

Now think about integrating other tools into that application—Juxta, for comparing and collating textual works; Collex, for collecting and annotating objects; the Voyeur tools for seeing through the texts, Omeka for sharing custom collections, all through application programming interfaces (APIs).

Now think about an archive that doesn’t stop with the last digitized image and TEI encoded page pair that has been carefully curated and sits happily on a server; think about the connections or pathways that could unfold much like a choose-your-own-adventure novel. Each path through—each set of connections followed—brings about new knowledge, and the ability to mark those paths from session to session and compare not only the paths and the texts that unfold but the impetus behind following those paths, is in fact the creation of a new knowledge environment.

All of these processes either exist in parts or have clear developmental paths to their creation. I know the title of this panel is “Theorizing the Digital Archive,” and I’m all for theorizing, but I’m also all about building. McGann’s ‘Patacritical Demon isn’t a pipe dream, and archives should be conceived initially as sites of interactivity and rhizomorphous growths of content. An audience member in the Goldsby talk yesterday asked how the content in the Mapping the Stacks project can be shared—a question like that should be the first question asked by team members before development commences, and not as an afterthought or an add-on. The technology for sharing integration, and creating new models of interactivity is there for the using, and has been there for several years; we need to catch up, and think outside the book so that we can fully immerse and engage with scholars inside it.

So there you have it. I am grateful for the comments and questions I received both during and after the discussion portion. Probably when I finish the dissertation I’ll start working on a demo.

Tagged with: , , ,

15 Comments on “My Yale PDP 2010 Presentation

  1. Good piece, Julie — thanks for this. I’m trying these days to clarify some of the things I keep finding myself thinking about the uses of technology, and this one helps!

  2. Thanks, Richard! I’m happy to talk about anything, anytime (just ask Cheryll G!) I have so many ideas for projects that it’s a little ridiculous. In that literature side of me, I work on Muir (who knew, right?) and I’m going to start a project once I get to Victoria about Muir’s Victoria stopovers on his way to Alaska.

  3. Enjoyed the piece. It made me think of another riposte to the Folsom forum I heard during a talk by Matt Cohen, a sometime associate editor (I think) of the Whitman Archive. After pointing out (as you did here) the various ways interface already structures what we think of as database, Cohen talked about some of his “desires, pipe dreams” for a way of searching the archive outside the box, as it were. Rightly complaining of how a search box seems to presume a single, navigational authority over a site’s contents, Cohen said he wanted “multiple and redundant search engines” (not sure I have the phrase exactly right) for the archive, which would actually be (his point) more Whitmanesque than what Folsom claims. Cohen said a lot more during his talk, but that phrase has stuck with me, and reminds me a lot about how you’ve specced out the Patacritical Demon.

  4. It’s an ambitious dream. Seeing it in action would surely be something. I agree that crowdsourcing textual markup is long overdue. But would the resulting creation be a chorus or cacophony?

    I wonder why you dismiss databases so quickly in this post. I think Folsom connected databases to catalogs, because they both offer a vision of control and completeness over the disparate and data. The gaze of the gatekeeper can certainly give us pause, but maybe information wants to be both free and linked.

    I think a case could be made for viewing the millions of web sites mentioning Walt Whitman as a corpora of n dimensions. We might even think of mining the web of Whitman as reenacting the rhizomorphous nature of his compositions. I guess it all depends on whether the meaning and significance of texts are found in crowdsourced close readings or in something more akin to the semantic web.

  5. Paul – thanks for the additional info! The PD specs are pure UVa — I just took what I saw and ran with them. It’ll be interesting to see how this is received at UVa.

    Sterling — I’m not talking about crowdsourcing textual markup; I’m talking about creating an individualized research tool as an overlay onto an archive (in part). I also don’t know where I dismiss databases. I revere databases. I do not revere a misunderstanding of what computational databases are and do. I’m also not at all suggesting anywhere that categorization and gatekeeperish actions do not exist. I am, however, saying that data view can coexist with others. And yes, I am talking about ongoing data mining through automated means as one part of producing rhizomorphous growth. Obviously the dissertation version of this is clearer. Or will be.

    Augusta — no one’s looking forward to the next installment more than I am. Except maybe you! 🙂

  6. If you’re not already familiar with it, check out the Open Annotation Collaboration which is funded by the Mellon Foundation:

    Among the integrations of their annotation spec is the AXE annotation tool developed at MITH, into CHNM’s Zotero. This is largest project I’m currently working on at MITH. You’ll also be able to sync your annotations to the Zotero server, and using the OAC data standards it’s feasible that those annotations could be shared with any other tool. Sounds related to what you’re discussing.

  7. Dave, Dave, Dave. Am I familiar with it. OF COURSE. For years it was something I had on my to-do list to build, and when MITH released the first version of it I was thrilled. I completely agree that it will “revolutionize the production of electronic editions and digital archives” (so says its description page). I discuss its existence and what it means for scholars and these sorts of future collaborations in the ol’ diss.

    Which is to say we should probably talk more if you’re the one working on it now (yay)…

  8. Right. I was so happy w/ your comment that I just focused on the part that mentioned how the tools I like and identified as most useful (to what I want to see happen) are actually actively (and probably quickly) moving forward in that way.

    Would you agree that the timelines for these sorts of things are much more industry than academic? In other words, I see a lot of work happening very quickly, which is what I am used to, but I don’t think people in academia are, for sure. I wonder if it will open an even further divide (as people already think they’re far behind). That’s just musing on my part.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.