Text analysis: tools & approaches

Advanced Qualitative Methods


Jeremy Buhler, jeremy.buhler@ubc.ca
Mathew Vis-Dunbar, mathew.vis-dunbar@ubc.ca

Learning objectives


  1. Identify common text analysis techniques
  2. Understand sources of bias in automated analysis
  3. Use NVivo software for text coding

Outline

2:05 Initial questions
2:15 Text analysis techniques
2:25 Automated text analysis
2:35 Working with NLP libraries
2:45 5 minute break
2:50 Orientation to NVivo
3:05 Coding in NVivo
3:20 Importing survey data
3:30 Questions and discussion

Initial questions

    What is your dataset?

    What is your objective?

    Manual or automated processes?

What is your dataset?

  • size/extent
  • format
  • privacy/sensitivity

What is your objective?

  • research questions
  • type of analysis
  • expected output

Manual or automated processing?

    automated: machine learning, algorithms

    manual: computer-assisted, but more directly involved

Text analysis techniques

  • Word frequency
  • Collocation
  • Concordance
  • Text extraction
  • Text classification

Word frequency

Word frequency with stemming

Collocation

...there's some concern that development happen in a way that still protects the environment water quality. Hopefully development is gonna start to become...

...one of the highest producing water bodies in North Carolina is because there isn't that much development. And so you know, I...

Concordance

Text extraction

  • Keywords
  • Named entities
    • people
    • places
    • organizations
    • ...

Text classification

  • Sentiment
  • Action
  • Activity
  • Belief
  • Emotion
  • Issues
  • ...

Creatures of Classification

Implications of classifications

Genetic lineage

Harakeke

Horticulture

Questions

  • Context of creation
  • Human identified patterns
  • Machine identified patterns
  • Degree of intervention

Natural Language Processing Toolkits

Functions, lexicons, and algorithms.

Examples

  • NLTK
  • spaCy
  • StanfordCoreNLP
  • CogCompNLP
  • MALLET (MAchine Learning for LanguagE Toolkit)

Breaking down text

Because I could not stop for Death -,

He kindly stopped for me -,

The Carriage held but just Ourselves -,

and Immortality

LineSentence
1Because I could not stop for Death -
2He kindly stopped for me -
3The Carriage held but just Ourselves -
4and Immortality
Line Order Word
1 1 because
1 2 i
1 3 could
1 4 not
1 5 stop
1 6 for
1 7 death
2 1 he
2 2 kindly
2 3 stopped

Patterns

YearCandidateWon (W) or Lost (L) the Popular VoteNumber of 'will', 'shall', 'going to'
1960KennedyW163
1960NixonL122
1976CarterW68
1976FordL32
1980ReaganW19
1980CarterL18

Complex algorithms

  • Rules of grammar
  • Lexicons of sentiment correlation
  • Document structures or genres
  • Machine detected patterns

The problem

I have a tear...

I have a tear...

in my pants.

Hermans, F. (January 25, 2019). Explicit Direct Instruction in Programming Education. [Talk]. https://rstudio.com/resources/rstudioconf-2019/explicit-direct-instruction-in-programming-education/

The needed outcome

Text: ... The thieves stole the paintings. They were subsequently sold. ...

Human: Who or what was sold?

Machine: The paintings.

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. http://www.nltk.org/book/

Hands-on practice with NVivo

For NVivo and QDA resources see
https://ubc-library-rc.github.io/nvivo/

NVivo alternatives

  • ATLAS.ti - similar features, also not free
  • Taguette - for tagging/coding only, open source

NVivo demo