R

Web Scraping with R: A Great Resource for Language Learning and Teaching

1 . Introduction 1.1 . Tools Needed 2 . Example 1: Scraping Webpages 2.1 . Wikipedia entries 2.2 . More ideas 3 . Example 2: Scraping Online Newspapers 3.1 . Op-Eds from The Washington Post 3.2 . More ideas 4 . Example 3: Scraping Blogs 4.1 . The Big Bang Theory transcripts 4.2 . More ideas 5 . Summary 1 . Introduction Recently, I helped a colleague scrape text from Wikipedia for a class project.

Update to Using Stanford CoreNLP with R

1 . Preparation 1.1 . Install Java 1.2 . Install cleanNLP and language model 2 . Annotation Using Stanford CoreNLP 3 . Example Text Analysis: Creating Bigrams and Trigrams 3.1 . With tidytext 3.2 . Manually Creating Bigrams and Trigrams 3.3 . Example Analysis: Be + words Forget my previous posts on using the Stanford NLP engine via command and retreiving information from XML files in R….

Comparing tools for obtaining word token and type

1 . Text files 2 . Working with R packages 2.1 . Quanteda 2.2 . Tidytext 3 . Results from Natural Language Processing Tools 3.1 . spacy 3.2 . Stanford CoreNLP 4 . Comparisons 4.1 . Tokens 4.2 . Types When analyzing texts in any context, the most basic linguistic characteristics of the corpus (i.e., texts) to describe are word tokens (i.e., the number of words) and types (i.

Working with XML-formatted text annotations in R

1 . From XML to tagged corpus 1.1 . Creating tagged text 1.2 . Rendering xml to data frame 1.3 . Creating tagged texts 2 . Example query and concordances In this post I’m documenting how to reformat the XML-formatted files outputted by the Stanford CoreNLP tool. This might not be the most elegant way to go about it, but this is something that works for me.