Rate limited issue when crawling tweets using Twittr package - r

I'm scraping some tweets using the twittR package. It all works fine, but when I want to scrape a significant amount of tweets I get the following message:
[1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
From reading [(https://dev.twitter.com/streaming/overview/request-parameters)] I understand there's a maximum of requests that can be scraped. What I do not understand however is that sometimes I already hit the wall when I crawl 20 tweets and sometimes I can get up to 260 before it's limited.
Any thoughts on what the rate of tweets you can gather per time span is?

Rate Limits, and the way they function works differently from API call to API call, What call are you making specificaly? If you are just interested in gathering Tweets related to a subject, I'd suggest using the streaming API (streamR) as it requires only 1 API Call and allows you to stream for an indefinite amount of time.

Related

Twitter API limit on live streaming tweets through the "rtweet" R package

I am currently live streaming tweets via the stream_tweets command based on a pre defined query, provided by the "rtweet" package. My only concern is whether I have some sort of limits from Twitter`s API?
Note that I am a beginner with regards to APIs, thus this question may be quite foolish.
Thank you
Update The stream_tweets section of the documentation references the link below, which states you have a rate limit of 10 requests per 60 seconds.
https://developer.twitter.com/en/docs/twitter-api/enterprise/decahose-api/api-reference/decahose
Original
The rtweet documentation says the rate limit for the standard search API
is 18,000 tweets per fifteen minutes. This is on page 5 in the bearer_token function section.
https://cran.r-project.org/web/packages/rtweet/rtweet.pdf

Twitter Tweets extraction with R - Not able to extract more number of tweets

I need around 10k tweets from twitter but i am not able to extract them.
Getting below warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit
= retryOnRateLimit, : 10000 tweets were requested but the API can only return 476
Is there any way to extract 10k tweets?
See the Twitter search API, with a standard account you can only request tweets of the last 7 days or 180 tweets in a 15 minutes window with user auth (450 with app auth).
Edit1: It seems that I misunderstood the API description. You can make 180/450 requests a second does not mean you get 180/450 tweets, but that you can make 180/450 different API calls. The explanation to the phenomenon you are describing is also made in the above mentioned link:
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
For one keyword, Twitter may see only a few hundred as important, whereas for other keywords a few thounds may be interesting enough.

Google Analytics Add-on limitaions

I would like to get some data from GA via spreadsheet add-on as I did a few weeks ago (I gathered ~200 000 rows). I am using same metrics, dimensions and rest of the settings but I am still getting this error :
https://i.stack.imgur.com/hTpIg.png
I found that I will get some data when I do not set up "max-results", but the default is set up on 1000 which is not enough for my needs. Why?
What I have tried to solve this problem and it doesn’t work:
change GA views
change dimensions and metrics
change time range
create new spreadsheet
set up sharing settings of spreadsheet to "public on web"
I found the link regarding limits and quotas on API (https://developers.google.com/analytics/devguides/config/mgmt/v3/limits-quotas#) and I should pass only through 50 000 requests per project, which I actually exceed on the first run, so another question how is it even possible to get more data than I suppose to get?
Should I really order more request or does "request" mean anything else than "one row"? Second why or what?
There is no any interpretation for the error.
Perhaps I am missing something, appreciate your help.
In short: while one could only guess what causes your problem it's most certainly not the API limit. Rows and requests are not at all the same, every request may fetch up to 10,000 rows.
"Request" is a call to the API, which might include one or many rows of data (unless your script somehow only requests one row at a time, which would be unusual).
If you exceeded your API quota the error message would say pretty much that.
The default is 1000 rows because that's a sensible default (compromise between convenience and performance). The API will return max 10,000 rows per request. To fetch 200 000 results the Add-on would have to do 20 requests, not 50 000.
Also a Google spreadsheet support 2mio cells at max, this might be exceeded by your result set.
"Service error" is a very unspecific error message which can be caused by a variety of causes from out-of-bound ranges to script timeouts or network latency. Sometimes the spreadsheet service dumps an additional error message in the browser console, so you should check your developer tools.

Search Twitter by hour in R

Currently using the twitteR package and I am running into a roadblock to extract tweets by minute or hour. My ultimate goal is seeing the total number of tweets for a particular topic at a granular level (specifically for big events like Super Bowl or world cup).
The package allows tweets to be searched using since and until, but the most granularity one can get is by day.
Here is an example of the code:
tweets <- searchTwitter("grammy", n=1500, since='2016-02-15', until='2016-02-16')
Based on the results by #SQLMenace it appears that twitteR only retrieves the status without returning accurate date/time information.
In that case, it depends on the scenario where you're performing the analysis. If you're performing the analysis "live" while the event is occurring you can simply run the R script as a CRONjob. Let's say every twenty minutes you run a job to get all the most recent tweets. Then you can eliminate duplicates to get an idea of how many unique tweets occurred in a 20 minute span.
However, if you're performing the analysis retrospectively, the above method wouldn't work. And I'd caution against using twitteR. It seems as though the functionality for gathering tweets by date is not that versatile. I'd recommend using tweepy (for Python) which retrieves not only the status, but also the exact time the tweet was sent.
Hope that helps.

twitteR r package: How to get as many tweets as possible per account within API limits

I'm a novice R and twitteR package user but I wasn't able to find a strong recommendation on how to accomplish the following.
I'd like to mine a small number of twitter accounts to identify their output for keyword usage. (i.e. I don't know what the keywords are yet)
Assumptions:
I have a small number of tweeter accounts (<6) I want to mine with a max of 7000 tweets if you aggregate the various account statuses
Those accounts are not generating new tweets at a fast rate (a few a
day)
The accounts all have less than 3200 tweets according to the profile data returned by lookupUsers()
When I use the twitteR function userTimeline("accountname", n=3200) I get between 40 and 600 observations returned i.e no where near the 3200. I know there are API limits but if it was an issue of limits I would expect to get the same number of observations back or get the notice that I need to wait 15 mins
How do I get all the text I need while still playing nice ?
By using a combination of cran and github packages it was possible to get all the tweets for a user
The packages used were streamR available in cran and https://github.com/SMAPPNYU/smappR/ to help with the analysis and getting the tweets.
The basic steps are
Authenticate to twitter using oauth and your twitter keys, tokens and secrets
use smappR function getTimeline() which saves the tweets to a json file you specify
Use parseTweets(jsonfile) to read the json contents into a dataframe
This can be accomplished with rtweet package, which is still supported. First you need to be approved as a developer and create an app. (As a note, twitter has now changed their policies, and approval can take a while. It took me almost a week.)
After that, just use get_timeline() to get all of the tweets from a timeline, up to 3200.
djt <- get_timeline("adamgreatkind", n = 3200)

Resources