Textmining

About

This site contains a GitHub repository of files for a a workshop in text mining using Python programming, as well as Voyant Tools and DH-Lab tools. The latter consists of web applications and Jupyter Notebooks with Python example code to text mine the National Library of Norway, whereas the former can be used to text mine any corpus, with proper preprocessing of the data. Voyant accepts a wide variety of formats, whereas Python does not. Read more about preprocessing of data for Python in the recommended textbooks, especially “Processing Raw Text” and “Parsing and Manipulating Structured Data”.

The lessons in programming with Python to text mine are adapted from the Library Carpentry: Data & Text Mining.

My GitHub profile

My University of Oslo Library profile

I also maintain the UiO Library’s webpages on text mining:

“Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources… The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts.” - from What is Text Mining? by Marti Hearst

Index