Get the latest
Recent Posts

Collections as Data

One of the things we’ve been working on in Digital Library Services recently is thinking about ways to develop our digital collections to better support researchers who are interested in using them for data visualizations or other forms of computational research. This is a broader movement within the LIS field, and we’ve been working on developing a local approach to releasing some of our materials in bulk download format for researchers.

Our first project is a curated collection of OCR text files extracted from oral histories centered on the topic of mining. The data is available on the Marriott Library’s Github repository for anyone who wishes to experiment with digital humanities techniques such as topic modeling. We’ve included metadata for this collection in the github repository and instructions for bulk downloading the source PDF files as well.

topic model example

A group of us presented on the project at the 4th Digital Humanities Utah Symposium recently. Slides from that presentation are available for more background about the project.

No Comments

Post A Comment