Digital humanities: Text mining using Gale Digital Scholar Lab

The University Library has acquired digital archives from Gale Cengage, a publisher of large primary source materials, including historical documents and newspapers.   These digital archives are now available within a new resource called the “Gale Digital Scholar Lab” which has been specifically designed for the purpose of enabling text-mining and analysis.

Using the Lab you can search the archives as you would on their native platforms and build content sets from these search results.   You can make multiple content sets and analyse the corpus that you amass using the tools provided in the Lab.  The tools available in the Lab now are all Open Source (and it is the ambition of the publisher that these will be expanded on over time): Topic Modelling (Mallet); Frequencies (Lucene); Clustering (SciKit Learn); Parts-of-Speech Tagger (spaCy); Sentiment Analysis (OpenNLP); Named Entity Recognition (spaCy); Ngrams (Lucene).

The Lab promises to open up new possibilities for the relative newcomer to digital scholarship in this area, allowing natural language processing tools to be applied to raw text data (OCR), facilitating new discoveries and insights.  The Lab makes much of visualization of results and data and thus lends itself to scholarly sharing and “bridging the gap between scholarly resources and faculty researchers/students”. The Lab facilitates organisation of content sets, including renaming, duplicating and versioning as well as identifying the searches used to create the content set, which makes sharing and reproducing research projects easier than is usually the case.   Archives included in the Lab to which Cambridge has access for analysis are:

17th and 18th century Burney collection

19th century UK periodicals

British Library newspapers

Economist historical archive, 1843-2014

Eighteenth century collections online

Illustrated London News historical archive, 1842-2003

Making of modern law: legal treatises, 1800-1926

Nineteenth century U.S. newspapers

Times digital archive

Times literary supplement historical archive

U.S. declassified documents online

 

The access to the Lab is on a trial basis to help Cambridge assess its usefulness to the practitioner and to encourage and promote the resource to digital humanities scholarship in Cambridge generally.   Access is available now from the details below, up to 31 December 2018.

Please contact ejournals@lib.cam.ac.uk to obtain the username and password.   The username and password will be supplied to the enquirer by return email.

http://galesupport.com/international/trialsite

 When you first login you will be requested to create a user account using your Google account or your Microsoft account.  We recommend that you use the Microsoft account option which connects through to your institutional account via the University of Cambridge login page.

Due to the nature of the access as trial only, restrictions are in place around export of documents.

Feedback on the trial is welcomed.  You may prefer to ask that it is sent to your library’s email address or you can use the ejournals@lib.cam.ac.uk in any communication you do and we will collect what we receive for onward sharing in 2019.

Leave a comment