Download, manage and visualize twitter data with R

Simple map with tweets

Hey there,

in this article I will show you how to download, manage and visualize twitter data. I will supply you with all the code necessary to connect to the twitter API, download and plot user defined keywords in space.

Let’s start! So how do we get the Twitter data? Twitter allows users to stream subsets of data via their Publich API. To be allowed to download them, you have to be a registred user on twitter then simply follow these steps:

1. Create App

Create Twitter App: Go to the website, click on the “Create New App” button, fill in your Name, a Description of Use and a Website of your App. Please read the Disclaimer carefully.

Create twitter app

2. Download twitteR package for R

The second step is to open R and download the twitteR package. This enables you to use the twitter stream API via R which makes life much easier. Simply install and load the folllowing packages:

install.packages("RCurl")
install.packages("twitteR")
library(RCurl)
library(twitteR)

3. Establish a connection

Go back to the apps.twitter.com page, click on your app and then navigate to Keys and Access Tokens. Under “Your Access Token” you should be able to see the following:
Access Token: CominationOfNumbersAndLetters
Access Token Secret: CominationOfNumbersAndLetters

These tokens are unique to every user, you should not reveal them in public.
Now go back to R and use the following code to establish a connection to the Twitter Stream API:

connect2twitter = function() {
# Set SSL certs globally
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

reqURL = "https://api.twitter.com/oauth/request_token"
accessURL = "https://api.twitter.com/oauth/access_token"
authURL = "https://api.twitter.com/oauth/authorize"
consumerKey = 'PASTE YOUR KEY HERE'
consumerSecret = 'PASTE YOUR SECRET HERE'

twitCred = OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
twitCred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
registerTwitterOAuth(twitCred)
}
connect2twitter() #execute function

Simly plug in your Consumer Key and your Consumer Secret and run the function. Please note that this code is modified after the following post on Stackoverflow.

When you run the code, a new browser window should open: Click on authorize app and copy the PIN number provided by the website and paste it into your R console and press Enter.

connect to twitter

If the console returns TRUE, your are ready to go.

 

4. Download data

There are a couple of functions and ways on how to download Twitter data. In this tutorial we will use the searchTwitter() function that allows you to search for tweets with keywords, coordinates and a date. The code looks like this:

tweets = searchTwitter("elections", n=1000,
since="2015-01-01",
geocode="37.78,-122.41,1000km",
retryOnRateLimit=200)

#create df out of query
tweets.df = do.call("rbind",lapply(tweets,as.data.frame))

The code above will return a subset of tweets containing the word “elections” that were created from 2015-01-01 until now in the area of (Lat.Lon.) 37.78, -122.41, (this is San Francisco) within a radius of 1000km.  n=1000 is a restriction on how many tweets you want to get. But usually you only get about 20-80 tweets per request, so you don’t  have to worry about getting a too big data set.

The other line of code underneath, creates a dataframe out of the search which makes data handling much easier.

If you run the code, you migh get a warning message, but don’t get irritated. It simply states that the twitter API could not find the requestes amount of tweets, but the function executes anyway.

In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  1000 tweets were requested but the API can only return 81

5. Simple visualization on a map

To make a quick and simple visualization of the tweets on a map I will use the basic plot functions and data from the maps package:

install.packages("maps")
library(maps)
#plots worldmap
map('world')
#plots tweets
points(tweets.df$longitude,tweets.df$latitude, pch=20, cex=1, col="red")

The result will look like this:

Simple map with tweetsEvery red dot represents a single tweet, and since we restricted the area to San Francisco and a radius of 1000km it’s not a surprise that all the dots are in California and surrounding states.

Conclusions

I hope this short introduction was helpful to some of you and showed you how to establish a connection to the twitter API via R using the twitteR package.

Note that this tutorial only provides you with code on how to download data for a single location on a map. Therefore there is another tutorial planned for next week, where I will show you how to download data for the entire world automatically at once and how to make maps with a nice layout using ggplot2 and web-mapping services like CartoDB.

Pay us a visit soon!

Cheers Martin

Martin

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at GeoVille - an Earth Observation Company based in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and sports.

3 Comments

You can post comments in this post.


  • Hi Martin.

    I just followed your code in StudioR. And I’ve got some problem to get result same as yours.
    I found there is an error. Could you please help me to solve the problem?
    The error message is as below:

    Error in registerTwitterOAuth(twitCred) :
    ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth

    Version: R 3.2.1 GUI 1.66 Mavericks build

    Irene Lee 2 years ago Reply


    • Irene, same for me.
      Did you find a solution for that?

      Best

      Jura 2 years ago Reply


  • Hi Martin,

    I am trying to find the location from where tweets are coming,so i followed your code

    tweets <- searchTwitter('influenza',n=1000,since='2017-01-01',geocode="12.97.59,1000km",lang='en')

    i converted the above command into data frame and i tried executing it,I couldn't find any change in the map nothing as red dots. moreover when i tried printing data frame i found NA in place of latitude and longitude columns.Could you please help me solving this.

    Amrutha 8 months ago Reply


Post A Reply

*