Optional Module: Distant Reading and Text Analysis

This module is intended to introduce you to some basics of distant reading and text analysis. The readings and discussion will introduce you to a few of the myriad possibilities of text analysis. The technical activities will let you distant read historical documents.

Readings/Discussion

1. read Mark J Hill and Simon Hengchen, “Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study,” Digital Scholarship in the Humanities 34, no 4 (2019).

2. read Elizabeth Callaway, et al, “The Push and Pull of Digital Humanities: Topic Modeling the ‘What is digital humanities?’ Genre,” Digital Humanities Quarterly 14, no 1 (2020).

3. read Shawn Martin, “Topic Modeling and Textual Analysis of American Scientific Journals, 1818-1922,” Current Research in Digital History 2 (2019) OR Peter Carr Jones, “Macroanalysis of the Indian Claims Commission,” Current Research in Digital History 1 (2018).

4. use the Slack channel to make note of your thoughts as you read and complete these exercises; depending on how many people are completing this optional module, this may or may not turn into an active discussion (that’s okay).

Technical Activities (Required)

1. download AntConc and complete the Programming Historian tutorial on corpus analysis

2. create your own (small!) dataset of historical documents and re-complete the tutorial with your own documents. What kinds of historical questions can you ask and answer with your dataset and this tool?

3. try to upload one of those historical documents to Voyant Tools and compare what you can see in Voyant to what you can do with AntConc. (For help figuring out the Voyant interface, there is a brief introduction/tutorial available on the website.) When might you want to use one versus the other?

Technical Activities (Optional)

4. feeling adventurous and particularly interested in topic modeling? Complete the Programming Historian tutorial on topic modeling instead of (or in addition to) the corpus analysis activities (activities 1-3).

NOTE: for anyone who hasn’t used the command line before (especially if your computer is running a Windows OS) you should check out the “Introduction to the Bash Command Line” tutorial before trying to do anything with wget.