Hey there,
in this article I will show you how to download, manage and visualize twitter data. I will supply you with all the code necessary to connect to the twitter API, download and plot user defined keywords in space.
Let’s start! So how do we get the Twitter data? Twitter allows users to stream subsets of data via their Publich API. To be allowed to download them, you have to be a registred user on twitter then simply follow these steps:
1. Create App
Create Twitter App: Go to the website, click on the “Create New App” button, fill in your Name, a Description of Use and a Website of your App. Please read the Disclaimer carefully.
2. Download twitteR package for R
The second step is to open R and download the twitteR package. This enables you to use the twitter stream API via R which makes life much easier. Simply install and load the folllowing packages:
install.packages("RCurl") install.packages("twitteR") library(RCurl) library(twitteR)
3. Establish a connection
Go back to the apps.twitter.com page, click on your app and then navigate to Keys and Access Tokens. Under “Your Access Token” you should be able to see the following:
Access Token: CominationOfNumbersAndLetters
Access Token Secret: CominationOfNumbersAndLetters
These tokens are unique to every user, you should not reveal them in public.
Now go back to R and use the following code to establish a connection to the Twitter Stream API:
connect2twitter = function() { # Set SSL certs globally options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))) reqURL = "https://api.twitter.com/oauth/request_token" accessURL = "https://api.twitter.com/oauth/access_token" authURL = "https://api.twitter.com/oauth/authorize" consumerKey = 'PASTE YOUR KEY HERE' consumerSecret = 'PASTE YOUR SECRET HERE' twitCred = OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL) twitCred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) registerTwitterOAuth(twitCred) } connect2twitter() #execute function
Simly plug in your Consumer Key and your Consumer Secret and run the function. Please note that this code is modified after the following post on Stackoverflow.
When you run the code, a new browser window should open: Click on authorize app and copy the PIN number provided by the website and paste it into your R console and press Enter.
If the console returns TRUE, your are ready to go.
4. Download data
There are a couple of functions and ways on how to download Twitter data. In this tutorial we will use the searchTwitter() function that allows you to search for tweets with keywords, coordinates and a date. The code looks like this:
tweets = searchTwitter("elections", n=1000, since="2015-01-01", geocode="37.78,-122.41,1000km", retryOnRateLimit=200) #create df out of query tweets.df = do.call("rbind",lapply(tweets,as.data.frame))
The code above will return a subset of tweets containing the word “elections” that were created from 2015-01-01 until now in the area of (Lat.Lon.) 37.78, -122.41, (this is San Francisco) within a radius of 1000km. n=1000 is a restriction on how many tweets you want to get. But usually you only get about 20-80 tweets per request, so you don’t have to worry about getting a too big data set.
The other line of code underneath, creates a dataframe out of the search which makes data handling much easier.
If you run the code, you migh get a warning message, but don’t get irritated. It simply states that the twitter API could not find the requestes amount of tweets, but the function executes anyway.
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the API can only return 81
5. Simple visualization on a map
To make a quick and simple visualization of the tweets on a map I will use the basic plot functions and data from the maps package:
install.packages("maps") library(maps) #plots worldmap map('world') #plots tweets points(tweets.df$longitude,tweets.df$latitude, pch=20, cex=1, col="red")
The result will look like this:
Every red dot represents a single tweet, and since we restricted the area to San Francisco and a radius of 1000km it’s not a surprise that all the dots are in California and surrounding states.
Conclusions
I hope this short introduction was helpful to some of you and showed you how to establish a connection to the twitter API via R using the twitteR package.
Note that this tutorial only provides you with code on how to download data for a single location on a map. Therefore there is another tutorial planned for next week, where I will show you how to download data for the entire world automatically at once and how to make maps with a nice layout using ggplot2 and web-mapping services like CartoDB.
Pay us a visit soon!
Cheers Martin
4 Comments
You can post comments in this post.
Hi Martin.
I just followed your code in StudioR. And I’ve got some problem to get result same as yours.
I found there is an error. Could you please help me to solve the problem?
The error message is as below:
Error in registerTwitterOAuth(twitCred) :
ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth
Version: R 3.2.1 GUI 1.66 Mavericks build
Irene Lee 8 years ago
Irene, same for me.
Did you find a solution for that?
Best
Jura 7 years ago
Hi Martin,
I am trying to find the location from where tweets are coming,so i followed your code
tweets <- searchTwitter('influenza',n=1000,since='2017-01-01',geocode="12.97.59,1000km",lang='en')
i converted the above command into data frame and i tried executing it,I couldn't find any change in the map nothing as red dots. moreover when i tried printing data frame i found NA in place of latitude and longitude columns.Could you please help me solving this.
Amrutha 6 years ago
how can i extract tweets with replies plz?
amina bahri 4 years ago
Post A Reply