Textmining

Introduction to Text Mining with DH-Lab Jupyter Notebooks with example Python code

Keypoints:

The DH-Lab at the National Library of Norway has written example code to text mine the National Library’s huge digitized collection, and it is developing web-based apps for a simpler introduction to text mining the National Library’s collection. The code is written in Python and shared in Jupyter Notebook, and the apps are made using Streamlit, a free and open-source app framework in Python.

Access DH-Lab with Python code in Jupyter Notebooks at DH-Lab Digital tekstanalyse.

To run code to text mine the National Library’s digitized collections:

Sentiment Analysis

Also known as “opinion mining”, sentiment analysis describes automated methods to identify affective states in data sets. This is done through systematic selection of expressions of subjective opinions and emotional evaluations in the material. Sentiment analysis is popular in marketing, advertising and to examine the tone of political communication, public debate, social media as well as studies of plot and genre in literary corpora.

Digital sentiment analysis uses word lists and data sets where words and expressions are given a score based on perceived emotional meaning in sentiment analysis of text data. NLTK comes with a sentiment package, and DataCamp has made a NLTK Sentiment Analysis Tutorial for Beginners.

Scientists at the University of Oslo participating in the project Sentiment Analysis for Norwegian Text (SANT) have developed data sets for sentiment analysis in Norwegian, NorSentLex, a Norwegian sentiment lexicon, indicating the prior positive or negative polarity of words. DH-Lab has written example code in Python using NorSentLex, to do a sentiment analysis of a newspaper corpus in the digitized collection of the National Library. See example code shared in Jupyter Notebook.

Index or
DH-Lab apps