Downloading stops using the "search_tweets"-function of the "rtweet"-library - r

I'm trying to use the rtweet-package to download some tweets from a certain hashtag. I've used a guide from a place called OpenCodez, and I've run into problems
Using the "search_tweets" function of the rtweet-package, I'm not able to download more than 5 tweets, while the limit of rtweet should be around 18.000 tweets.
I don't get any errors, but the "Downloading"-graphic when running my script simply stops at 10% (when trying to download n=2000).
I've tried using the "retryonratelimit=TRUE" without luck. I've reset my script, tried different tutorials to establish a connection - which all work fine - up until I'm actually using the search_tweets-function.
So this is my code to connect to the API:
api_key <- "xxxx"
api_secret_key <- "xxxx"
access_token <- "xxxx"
access_token_secret <- "xxxx"
## authenticate via web browser
token <- create_token(
app = "xxxx",
consumer_key = api_key,
consumer_secret = api_secret_key,
access_token = access_token,
access_secret = access_token_secret)
And this is my "scraper":
my_tweets = search_tweets("#vmd19", n=2000, lang='en')
The resulting data-frame is simply 5 columns, which is odd, when there should be at least a couple of hundred tweets under the hashtag. I've tried different queries (hashtags etc.), without luck. The download stops looking like this:
Downloading [===>-------------------------------------] 10%
I cannot figure out what I'm doing wrong. Hopefully, someone can help me troubleshoot this!

This issue was addressed here: https://github.com/ropensci/rtweet/issues/364
It looks like it's because of the window from which you can gather tweets (about the last week). If the number of tweets available from that time window is less than the n in your search_tweets function, it will cut out before reaching 100%. So if you ask for 100 tweets with a certain term, and that term was only tweeted 7 times in the last week, it will stop downloading at 7%.

Related

Error Crawling Data Twitter With R Programming

How do I fix an error like this:
library(twitteR)
library(ROAuth)
customer_key<-"-aaaaaa"
customer_secret<-"aaaaaaaa"
access_token<-"212123213-aaaaa"
access_secret<-"ccccccccccccc"
setup_twitter_oauth(customer_key,customer_secret,access_token,access_secret)
Tweets = searchTwitter("anxiety", n=100, lang = "en")
Error: invalid assignment for reference class field ‘language’, should be from class “character” or a subclass (was class “NULL”)
I User RStudio for run this source code, this is my first time using the twitter API. before I saw the tutorial and they succeeded, after I tried it and there was an error like that. thankyou
consumer_key <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
consumer_secret <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
access_token <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
access_secret <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
The steps below are the process of extracting Twitter with keyword #statistics and retrieving 10000 data
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
tw = searchTwitter('#statistics',
n = 10000,
retryOnRateLimit = 1617)
Make sure the token you use has been updated, and the possibility that the word you will scrape is "anxiety" does not exist in the last 7 days. Because scraping using the API will extract data 7 days back only. Suppose I extract data on the 8th, the data that will be pulled is data or sentences containing the word "anxiety" from the 2nd to the 8th.

No search results for a # that really exists

I am analysing #angtunaynalalake or #AngTunayNaLalake - a tweet that is famous in the Philippines. I run the following code but there are 0 results. I tried the same code using other famous #s like #MeToo or as #rstats and the code was successful. I tried using 'lalake' instead of '#angtunaynalalake' and again the code was successful.
I was able to get results for #angtunaynalalake when I use twitteR. I want to use rtweet because it can give more search results than twitteR.
Why do you think this happens?
create_token(
app = "my_twitter_research_app",
consumer_key = "xxxx",
consumer_secret = "xxxx",
access_token = "xxxx",
access_secret = "xxxx")
> rt <- search_tweets(
+ "#angtunaynalalake", n = 25000, retryonratelimit = TRUE
+ )
Searching for tweets...
This may take a few seconds...
Finished collecting tweets!
> rt
data frame with 0 columns and 0 rows
'''
This is an issue with the twitter API and not with the rtweet package.
See this quote from the twitter API page.
"Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week."
The most recent tweet to #angtunaynalalake seems to have been in January 2019, therefore the standard twitter API won't find any tweets.
If you need to access tweets older than 7-days, consider the twitter API pricing plans.

Live sreaming of results with rtweet package

I'm using the R package rtweet to stream live tweets.
Everything is ok, but what I want is to automatically store the information in Google Big Query and display it on Data Studio, and that information should be updated each X time (for example, 5 minutes).
How can I do it? The problem is that while sreaming, the R session is busy, so I can't do anything else.
I would also consider stopping the streaming for a second to store the information and resume it after...
Here is my code:
library(rtweet)
library(bigrquery)
token <- create_token(
app = "app name",
consumer_key = "consumer_key ",
consumer_secret = "consumer_secret ",
acess_token = "acess_token",
access_secret = "access_secret")
palabras <- ""
streamtime <- 2 * 60
rt <- stream_tweets(q = palabras, timeout = streamtime)
#This is what I want to do each X time to store the information in Big Query:
insert_upload_job("project id", "dataset name", "table name", df, write_disposition = "WRITE_APPEND")
Thanks to all,
I don't know much about R, but I had a similar case and there's nothing to do meanwhile stream_tweets() is running but wait the timeout.
I'm not sure if this is possible but, stream_tweets() creates a JSON object that is being filled meanwhile the function is running. Wouldn't be possible to run other R script in which when new item is added to the JSON store it to Big Query? Something like, split your code in two and run it parallely?
Hope my answer give you some ideas.

R - Twitter - fromJSON - get list of tweets

I would like to retrieve a list of tweets from Twitter for a given hashtag using package RJSONIO in R. I think I am pretty close to the solution, but I seem to miss one step.
My code reads as follows (in this example, I use #NBA as a hashtag):
library(httr)
library(RJSONIO)
# 1. Find OAuth settings for twitter:
# https://dev.twitter.com/docs/auth/oauth
oauth_endpoints("twitter")
# Replace key and secret below
myapp <- oauth_app("twitter",
key = "XXXXXXXXXXXXXXX",
secret = "YYYYYYYYYYYYYYYYY"
)
# 3. Get OAuth credentials
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
# 4. Use API
req=GET("https://api.twitter.com/1.1/search/tweets.json?q=%23NBA&src=typd",
config(token = twitter_token))
req <- content(req, as = "text")
response=fromJSON(req)
How can I get the list of tweets from object 'response'?
Eventually, I would like to get something like:
searchTwitter("#NBA", n=5000, lang="en")
Thanks a lot in advance!
The response object should be a list of length two: statuses and metadata. So, for example, to get the text of the first tweet, try:
response$statuses[[1]]$text
However, there are a couple of R packages designed to make just this kind of thing easier: Try streamR for the streaming API, and twitteR for the REST API. The latter has a searchTwitter function exactly as you describe.

Using R to send tweets

I saw a cute demonstration of tweeting from R in a presentation some months ago. The scratch code used by the presenter is here:
http://www.r-bloggers.com/twitter-from-r%E2%80%A6-sure-why-not/
the code is short and sweet:
library("RCurl")
opts <- curlOptions(header = FALSE,
userpwd = "username:password", netrc = FALSE)
tweet <- function(status){
method <- "http://twitter.com/statuses/update.xml?status="
encoded_status <- URLencode(status)
request <- paste(method,encoded_status,sep = "")
postForm(request,.opts = opts)
}
With this function, you can send a tweet simply by using the update function:
tweet("This tweet comes from R! #rstats")
I thought that this could be a useful way of announcing when long jobs are completed. I tried to run this on my machine, and I got some error:
[1] "\n\n Basic authentication is not supported\n\n"
attr(,"Content-Type")
charset
"application/xml" "utf-8"
Warning message:
In postForm(request, .opts = opts) : No inputs passed to form
I'm wondering if there has been some changes on the twitter end of this, that make this code produce this error? I don't know too much about getting R to talk to webpages, so any guidance is much appreciated!!
E
Yes, the basic authentication scheme was disabled on the 16th August 2010.. You'll need to set it up to use OAuth. Unfortunately that is not nearly as simple as using basic authentication
See this twitter wiki page for more information and this StackOverflow question about OAuth for R.
Besides the code you show, there is also a full-blown twitteR package on CRAN you could look at.
The easiest way to tweet in R through the Twitter-API is to use the twitteR Package.
You can set your Twitter-API-APP here: https://apps.twitter.com/
First step is to authenticate:
consumer_key <- "yourcredentials"
consumer_secret <- "yourcredentials"
access_token <- "yourcredentials"
access_secret <- "yourcredentials"
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
And just tweet (limit per day:2400 tweets):
tweet("Hello World")
If twitteR does not work or you simply want to try to build it yourself ...
See here for a demo of how to do your own Twitter authentication and use of the API with help of the httr package.

Resources