When I was in Fort Worth for the ARLIS conference this spring, I learned a lot about the history of the cattle business in the post-Civil War period. The cities of the East and the West were hungry for Texas beef, but there was no practical way to get it to them. The age of the cowboy took place because a particular kind of soul was needed to lead the herds north from Fort Worth through exposed frontier to the train yards of Kansas City. But when the railroad reached Fort Worth (in 1886), everyone quickly adapted. The cattlemen soon realized that they could do more than load their cattle on trains at Fort Worth, and entrepreneurs raised capital from wealthy Boston investors to build “processing” plants at the Fort Worth rail yards. The moral of this story (and I’m sorry to have turned off the vegetarians in the reading audience) is simply that everyone involved in the process was more than happy to take advantage of infrastructure as it became available.
I was at HASTAC in May – a growing and thriving hub of digital humanities practitioners. There were some lively debates about, for example, whether librarians should be seen as a service bureau to their colleagues in the humanities or whether they should be creative participants and partners in the formulation and execution of DH projects. But my two or three year old impression–that the wild west of DH fostered go-it-alone one-off boutique and unscalable projects–didn’t really hold up. At least it didn’t fully hold up. Really smart people who can talk about George Eliot and talk about big data don’t want their work to be orphaned in non-conforming software. But, being on a panel (with Nancy Maron of ITHAKA and Julie Bobay from the library and Indiana University) about how to make DH projects scalable, shareable, connectable, sustainable, and preservable left me feeling like we (collectively) don’t have the balance between infrastructure and the innovative edges quite right yet.
I was there to report on how some projects that won a contest to use our Shared Shelf cataloging and asset management software were approaching their work. The Historic Dress project created by faculty and staff from Smith, Vassar, OSU, Drexel, and others has been gathering content and (even more impactfully) designing and implementing a Costume Core data model for working with these materials. The people who have envisioned and are carrying out that project recognize the importance of authority files and are deriving Costume Core from the Getty’s Art & Architecture Thesaurus as much as possible. Another one of our digital humanities contest winners is Medieval Portland, which gathers content related to digital collections in Oregon from over 10 institutions ranging from Portland State to Mount Angel Abbey, and which will use Shared Shelf and its ability to publish to Mirador (a Hydra-IIIF based) image-viewing platform to bring its digital content into a higher level of aggregation and exposure. To us, these are like the enterprising cattlemen who saw a railroad track arriving and said, “Hey, there’s one thing that we don’t have to figure out. How can we take advantage of that?”
But our session was sparsely attended, amidst a conference of hundreds people, and those in the audience were almost entirely from the librarian side of a very mixed population. It was a perfect example of what the sociologist Robert K. Merton considered as self-exemplification: if our thesis in assembling the panel was that many DH practitioners didn’t focus on long term sustainability or seek to enmesh their projects in larger networks of digital asset building, it would make sense that they would not come to a panel that focused on those aspects of DH work. People who live in border lands think that the right place to live is in the border lands. They like speaking the language of both sets of neighbors and figuring out how to negotiate between cultures. They feel that figuring out how to solve a mix of complex technical and humanistic challenges is a great place to be.
Except when it isn’t. The humanities rely on primary source material and the one fundamental truth about primary source material is that it is endless. Literally endless. New people are writing new journal entries every day, photographers are taking new pictures of new buildings being built every day. And the archives of manuscripts, church records, shipping contracts, voting registers, movie posters, tickets stubs, massing models for unbuilt buildings boggles the mind.
And these are the basis of many thrilling DH projects. But as the digital manifestations of these materials are created and managed to solve the short-term needs of a project, there is a very high probability that they (the digital manifestations) will live a short and only modestly productive life. The world of primary source users recognize–just as the eaters of beef in 1880s Boston and the investors of capital in 1880s Fort Worth–that the world is optimally served by relying on networked infrastructure. And no, the Internet alone is not the equivalent of the railroads.
One missing aspect of a networked approach to digital content building is the capacity to build, manage, and use a network of shared authority files/registries/taxonomies/thesauri and controlled lists. Why is this major stratum of a national digital infrastructure missing?
Many of the smart people who design and carry out DH projects recognize the need to control and standardize the metadata that they use to manage their digital content. This awareness is particularly acute in those who are cataloging media assets like images and video for which the words attached to the asset are (for the moment, at least) the only way to discover that asset. But tags are also crucial to categorizing other assets, such as the content of political cartoons, the subject matter of letters, or the topics broached in oral histories. Convenience dictates the use of thesauri and controlled lists long before the consideration of the project’s legacy does; the DH investigator quickly learns that free-form cataloging gets sloppy and redundant once your list of terms requires scrolling down. So, homebrewed lists are created and often they are created in a sophisticated hierarchical fashion – projects around the country and around the world have forged their own hierarchies of emotions, topics, materials, roles, or foods intermixed with cultures or places or times. In this way, following all the right instincts of metadata strategy, taxonomic islands are created.
With the emergence of the Digital Public Library of America and with the IMLS setting out to encourage and support the cyber-infrastructure of a National Digital Platform, Digital Humanities practitioners can play a formative role in working together to jointly craft shared, shareable, community-supported, and scalable solutions to the need for discipline-based (and subdiscipline-based) authority files and taxonomies. Do you know examples of such efforts? If so, we would love to know about them; we need to start with lessons learned and then, we all will benefit by putting lessons about collaboration and scalability in this key area of infrastructure into practice. If you have good examples or good ideas, tweet me @jamesshulman or write to me at js[at]artstor.org.