Early English Print in the HathiTrust (ElEPHãT)

The ElEPHãT project used Linked Data to enable scholarly investigation across dynamic collections combining EEBO-TCP and the HathiTrust

The ElEPHãT project -- Early English Print in HathiTrust, a Linked Semantic Worksets Prototype -- demonstrates the use of Linked Data for combining, through worksets, information from independent collections into a coherent view which can be studied and analyzed to facilitate and improve academic investigation of the constituents.

The project focuses on the potential symbiosis between two datasets: the first is Early English Books Online - Text Creation Partnership (EEBO-TCP), a mature corpus of digitized content consisting of English text from the first book printed through to 1700, with highly accurate, fully-searchable, XML-encoded texts; the second is a custom dataset from the HathiTrust Digital Library of all materials in English published between 1470 and 1700.

The project is a sub-award of the Mellon funded Workset Creation for Scholarly Analysis (WCSA) at the University of Illinois, and within Oxford is a collaboration between the Oxford e-Research Centre and the Bodleian Libraries. The project is working towards several technical objectives:

  • To generate RDF metadata for EEBO-TCP to complement the WCSA HathiTrust RDF;
  • To identify suitable ontologies for encoding the EEBO-TCP RDF that can be usefully linked to the HathiTrust data, and other external entities;
  • To identify and align co-references to entities within both datasets, and store these as RDF;
  • To provide infrastructure to host the RDF datasets and SPARQL query interfaces;
  • To create SPARQL queries of sufficient expressivity to parameterise worksets for scholarly investigation;
  • To demonstrate the construction and utility of such parameterised worksets through prototype user interfaces, showing how a user might create and view a workset and the content within it.