Harvest, Clean and Analyse large amounts of digitised text

Time & Room

Time	09:30 – 15:30
Room	to be announced
Signing up	Send an e-mail to DHBenelux (attn. Martijn Kleppe )

Description

When analysing sources of the National Library of the Netherlands (KB), researchers often use Delpher, the online gateway to more than 10 million pages of historical text (newspapers, books,journals & radio bulletins), mostly in Dutch. Delpher allows you to search and browse all documents in full text, making it a good resource for close reading. However, when you want to analyse large amounts of data to do distant reading, the KB allows researchers access to both the digital images, metadata, and full text in bulk via KB’s Dataservices & API’s, as well as additional data such as the Medieval Illuminated Manuscripts and the Dutch Digital Parliamentary Papers. To successfully harvest this data and subsequently clean and analyse it, you need knowledge about:

the KB’s data formats and infrastructure,
tools to clean the data and subsequently
tools to analyse the data.

During this workshop, you will get a hands-on experience and guidance on all three steps. Experts of the KB (René Voorburg, Steven Claeyssens and Martijn Kleppe) will first guide you through KB’s metadata and available datasets. Then a PhD researcher of Utrecht University (Melvin Wevers) will show you which tools are available to clean the data and will assist you in making the first analyses.

During the first part of the workshop you will be guided through a number of exercises and all use the same dataset. During the second part you will be able to make a start with freely collecting and working with a selection of KB datasets that best fits your research interest, all under guidance of KB experts.

This workshop is aimed specifically at beginning users that have an interest in the KB Data. We assume no prior experience working with KB (meta)data nor any other significant technical knowledge or skills, such as programming skills, although basic computer skills are expected. The workshop will be in English. All data that we will work with, will be in Dutch.

Program

The program of this workshop can be downloaded here as a PDF-file.

Organisers

Dr. Steven Claeyssens, dr. Martijn Kleppe, ir. Rene Voorburg, drs. Melvin Wevers

Contact

For more information, please e-mail: Martijn Kleppe

Digital Humanities Benelux Conference 2017