Currently using the twitteR package and I am running into a roadblock to extract tweets by minute or hour. My ultimate goal is seeing the total number of tweets for a particular topic at a granular level (specifically for big events like Super Bowl or world cup).
The package allows tweets to be searched using since and until, but the most granularity one can get is by day.
Here is an example of the code:
tweets <- searchTwitter("grammy", n=1500, since='2016-02-15', until='2016-02-16')
Based on the results by #SQLMenace it appears that twitteR only retrieves the status without returning accurate date/time information.
In that case, it depends on the scenario where you're performing the analysis. If you're performing the analysis "live" while the event is occurring you can simply run the R script as a CRONjob. Let's say every twenty minutes you run a job to get all the most recent tweets. Then you can eliminate duplicates to get an idea of how many unique tweets occurred in a 20 minute span.
However, if you're performing the analysis retrospectively, the above method wouldn't work. And I'd caution against using twitteR. It seems as though the functionality for gathering tweets by date is not that versatile. I'd recommend using tweepy (for Python) which retrieves not only the status, but also the exact time the tweet was sent.
Hope that helps.
Related
I'm using rtweet's function get_timeline to download tweets. However, some of the users I'm interested in have way more than the 3200 tweets you are allowed to download (some have around 47'000). There is the "retryonratelimit" argument, if you are downloading tweets based on words or hashtags, therefore I'm wondering whether there is a similar way to get more than 3200 tweets from one user?
The documentation - see ?get_timeline - includes a link to the Twitter developer documentation for GET statuses/user_timeline
. The R function is just a wrapper for this.
If you then follow the link to Working with timelines, you'll find an explanation of the max_id parameter.
The basic approach then is:
get the first 3200 tweets
get the earliest status ID using something like min(as.numeric(zanetti$status_id))
run get_timeline again setting max_id = ID where ID is the ID from step 2
Note: I just tried this using my own timeline and only 40 tweets were returned by step 3. So you may also have to wait an appropriate amount of time to avoid rate limits. And be aware that Twitter basically does all it can to prevent you from requesting large amounts of data via the API - at the end of the day, what you want may not be possible.
I'm scraping some tweets using the twittR package. It all works fine, but when I want to scrape a significant amount of tweets I get the following message:
[1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
From reading [(https://dev.twitter.com/streaming/overview/request-parameters)] I understand there's a maximum of requests that can be scraped. What I do not understand however is that sometimes I already hit the wall when I crawl 20 tweets and sometimes I can get up to 260 before it's limited.
Any thoughts on what the rate of tweets you can gather per time span is?
Rate Limits, and the way they function works differently from API call to API call, What call are you making specificaly? If you are just interested in gathering Tweets related to a subject, I'd suggest using the streaming API (streamR) as it requires only 1 API Call and allows you to stream for an indefinite amount of time.
I'm a novice R and twitteR package user but I wasn't able to find a strong recommendation on how to accomplish the following.
I'd like to mine a small number of twitter accounts to identify their output for keyword usage. (i.e. I don't know what the keywords are yet)
Assumptions:
I have a small number of tweeter accounts (<6) I want to mine with a max of 7000 tweets if you aggregate the various account statuses
Those accounts are not generating new tweets at a fast rate (a few a
day)
The accounts all have less than 3200 tweets according to the profile data returned by lookupUsers()
When I use the twitteR function userTimeline("accountname", n=3200) I get between 40 and 600 observations returned i.e no where near the 3200. I know there are API limits but if it was an issue of limits I would expect to get the same number of observations back or get the notice that I need to wait 15 mins
How do I get all the text I need while still playing nice ?
By using a combination of cran and github packages it was possible to get all the tweets for a user
The packages used were streamR available in cran and https://github.com/SMAPPNYU/smappR/ to help with the analysis and getting the tweets.
The basic steps are
Authenticate to twitter using oauth and your twitter keys, tokens and secrets
use smappR function getTimeline() which saves the tweets to a json file you specify
Use parseTweets(jsonfile) to read the json contents into a dataframe
This can be accomplished with rtweet package, which is still supported. First you need to be approved as a developer and create an app. (As a note, twitter has now changed their policies, and approval can take a while. It took me almost a week.)
After that, just use get_timeline() to get all of the tweets from a timeline, up to 3200.
djt <- get_timeline("adamgreatkind", n = 3200)
How can I use the TwitterR package for R to get more than 100 search results?
Although I would prefer an example in R (since it currently uses R), I could just as easily use Java, so, an example that searches Twitter in Java to get 200 search results may suffice.
I don't know if this is even possible. I do not remember seeing a "page number" that you can specify when you search (The Google API supports that). I think that with Twitter you can specify the minimum Tweet ID, which allows you to get only newer tweets, but I don't expect that to work as desired in this case (unless the search results are some how ordered based on date / time rather than relevance to the search term).
I'm getting tweets this way.
someTweets <- searchTwitter("#EroticBroadway", n=500)
The n argument tells it how many tweets to cap it at. If there aren't that many tweets it won't return 500 though.
From the docs:
n The maximum number of tweets to return
There is also a time limit on the twitter api search.
The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets.
The t gem doesn't have that.
You have to jump through more hoops, which has some (a lot of) disadvantages, but I've used TAGS to collect hundreds of tweets (even thousands over the course of time...) and then read them into R as a CSV.
I'm trying to use searchTwitter() to find certain topic on twitter. For example:
searchTwitter("#Fast and Furious 7", n = 10000)
can only give me a few thousand results. I have also done some research on other topics. It seems that by looking at the date from the result it can only return the result from 9 days before (There are arguments called since and until which are used to specify time range. But they don't work).
So I'm thinking is there a way to get information for all of this topic? (Or at least I can take control date range).
Apart from this. Can I use xml in R to achieve the same purpose?
Twitter provides search for the last few days only.
The cost of keeping the data indexed is too high, given the few users interested. Twitter's business model is live information.
If you want historical data, you will have to buy this from third party providers. I don't remember the name, but a company offering such data was linked from the Twitter web page where they explained this limitation of their search API.