Search Twitter (using R and the TwitteR package) to get 200 results - r

How can I use the TwitterR package for R to get more than 100 search results?
Although I would prefer an example in R (since it currently uses R), I could just as easily use Java, so, an example that searches Twitter in Java to get 200 search results may suffice.
I don't know if this is even possible. I do not remember seeing a "page number" that you can specify when you search (The Google API supports that). I think that with Twitter you can specify the minimum Tweet ID, which allows you to get only newer tweets, but I don't expect that to work as desired in this case (unless the search results are some how ordered based on date / time rather than relevance to the search term).

I'm getting tweets this way.
someTweets <- searchTwitter("#EroticBroadway", n=500)
The n argument tells it how many tweets to cap it at. If there aren't that many tweets it won't return 500 though.
From the docs:
n The maximum number of tweets to return
There is also a time limit on the twitter api search.
The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets.
The t gem doesn't have that.

You have to jump through more hoops, which has some (a lot of) disadvantages, but I've used TAGS to collect hundreds of tweets (even thousands over the course of time...) and then read them into R as a CSV.

Related

Can I get more than 3200 tweets from a user with "rtweet"?

I'm using rtweet's function get_timeline to download tweets. However, some of the users I'm interested in have way more than the 3200 tweets you are allowed to download (some have around 47'000). There is the "retryonratelimit" argument, if you are downloading tweets based on words or hashtags, therefore I'm wondering whether there is a similar way to get more than 3200 tweets from one user?
The documentation - see ?get_timeline - includes a link to the Twitter developer documentation for GET statuses/user_timeline
. The R function is just a wrapper for this.
If you then follow the link to Working with timelines, you'll find an explanation of the max_id parameter.
The basic approach then is:
get the first 3200 tweets
get the earliest status ID using something like min(as.numeric(zanetti$status_id))
run get_timeline again setting max_id = ID where ID is the ID from step 2
Note: I just tried this using my own timeline and only 40 tweets were returned by step 3. So you may also have to wait an appropriate amount of time to avoid rate limits. And be aware that Twitter basically does all it can to prevent you from requesting large amounts of data via the API - at the end of the day, what you want may not be possible.

Why do I get different results when I use twitteR to search for tweets by location

I am using the twitteR R package to search for tweets by keyword. The function searchTwitter() allows you to search with or without a location specified. If you specify a location it returns tweets within a radius, or else it returns all tweets with the keyword.
I have limited my search to just one or two days and I am searching all the locations globally where the keywords that I am interested in would be tweeting from but I get many more tweets with the general search (i.e. without location) than if I search by location and add up the tweets from all the cities around the world which are likely giving tweets. I was not expecting the same number exactly but I am way off.
Am I missing something with this function or is it not possible to get the location of some tweets?

Search Twitter by hour in R

Currently using the twitteR package and I am running into a roadblock to extract tweets by minute or hour. My ultimate goal is seeing the total number of tweets for a particular topic at a granular level (specifically for big events like Super Bowl or world cup).
The package allows tweets to be searched using since and until, but the most granularity one can get is by day.
Here is an example of the code:
tweets <- searchTwitter("grammy", n=1500, since='2016-02-15', until='2016-02-16')
Based on the results by #SQLMenace it appears that twitteR only retrieves the status without returning accurate date/time information.
In that case, it depends on the scenario where you're performing the analysis. If you're performing the analysis "live" while the event is occurring you can simply run the R script as a CRONjob. Let's say every twenty minutes you run a job to get all the most recent tweets. Then you can eliminate duplicates to get an idea of how many unique tweets occurred in a 20 minute span.
However, if you're performing the analysis retrospectively, the above method wouldn't work. And I'd caution against using twitteR. It seems as though the functionality for gathering tweets by date is not that versatile. I'd recommend using tweepy (for Python) which retrieves not only the status, but also the exact time the tweet was sent.
Hope that helps.

twitteR r package: How to get as many tweets as possible per account within API limits

I'm a novice R and twitteR package user but I wasn't able to find a strong recommendation on how to accomplish the following.
I'd like to mine a small number of twitter accounts to identify their output for keyword usage. (i.e. I don't know what the keywords are yet)
Assumptions:
I have a small number of tweeter accounts (<6) I want to mine with a max of 7000 tweets if you aggregate the various account statuses
Those accounts are not generating new tweets at a fast rate (a few a
day)
The accounts all have less than 3200 tweets according to the profile data returned by lookupUsers()
When I use the twitteR function userTimeline("accountname", n=3200) I get between 40 and 600 observations returned i.e no where near the 3200. I know there are API limits but if it was an issue of limits I would expect to get the same number of observations back or get the notice that I need to wait 15 mins
How do I get all the text I need while still playing nice ?
By using a combination of cran and github packages it was possible to get all the tweets for a user
The packages used were streamR available in cran and https://github.com/SMAPPNYU/smappR/ to help with the analysis and getting the tweets.
The basic steps are
Authenticate to twitter using oauth and your twitter keys, tokens and secrets
use smappR function getTimeline() which saves the tweets to a json file you specify
Use parseTweets(jsonfile) to read the json contents into a dataframe
This can be accomplished with rtweet package, which is still supported. First you need to be approved as a developer and create an app. (As a note, twitter has now changed their policies, and approval can take a while. It took me almost a week.)
After that, just use get_timeline() to get all of the tweets from a timeline, up to 3200.
djt <- get_timeline("adamgreatkind", n = 3200)

Count number of results for a particular word on Twitter

To further a personal project of mine, I have been pondering how to count the number of results for a user specified word on Twitter. I have used their API extensively, but have not been able to come up with an efficient or even halfway practical way to count the occurrences of a particular word. The actual results are not critical, just the overall count. I'll keep scratching my head. Any ideas or direction pointing would be most appreciated.
e.g. http://search.twitter.com/search?q=tomatoes
I'm able to go back about a week. I start my search with the parameters that Adam posted and then key off of the smallest id in the set of search results, like so,
http://search.twitter.com/search.atom?lang=en&q=iphone&rpp=100&max_id=
where max_id = the min(id) of the 100 results I just pulled.
net but I have made recursive function to call search query again and again until I don't find word "page=" in result.

Resources