With the package twitteR, it is possible to search tweets as follows:
tweets <- searchTwitter("term", n=100,lang="en",resultType="recent",
since="2016-06-10", until="2016-06-26")
When the resultType="recent" we can get the big number of tweets, but they are ranked with created time, so we begin with a lot of 2016-06-25 23:59:59.
I wanted to search for popular tweets first, so I use resultType="popular" :
tweets <- searchTwitter("term", n=100,lang="en",resultType="popular",
since="2016-06-10", until="2016-06-26")
But then I got this warning :
Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
100 tweets were requested but the API can only return 93
I understand that Twitter limit the resquests, but since they can return 100 tweets in the order of created time, I hoped that I could get the same number of tweets in the order of popularity. Apparently it is not true.
Or maybe I didn't use the function in right way.
So I would like find a way to search tweets efficiently:
How to get more popular tweets, in a day ?
How to specify the an hour for the search, for example 10am ? so that they are not tweeted at 2016-06-25 23:59:59, which can have a bias.
Maybe we have to pay, in order to get more tweets and more information ? For example, I noticed that my tweets are never geocoded.
Usually I save them in a data.frame, after that play with # of RT's, etc... I don't think you can do it directly. Hope it helps.
Don't believe Twitter will return the most popular Tweets in order. Either most recent or popular (however Twitter determines it) tweets are returned. Since Twitter only returned 93 Tweets, I'd suggest you try broadening your search terms and then looking at number of favorites, retweets, replies, etc. for each tweet.
Related
I'm trying to get all user data for the followers of an account, but am running into an issue with the 90,000 user lookup limit. The documentation page says that that can be done by iterating through the user IDs while avoiding the rate limit that has a 15 minute reset time, but doesn't really give any guidance on how to do this. How would a complete user lookup with a list of users that is greater than 90,000 be achieved?
I'm using the rtweet package. Below is an an attempt with #lisamurkowski who has 266,000 followers. i have tried using a retryonratelimit = TRUE argument to lookup_users(), but that doesn't do anything.
lisa <- lookup_users("lisamurkowski")
mc_flw <- get_followers("lisamurkowski", n = lisa$followers_count,
retryonratelimit = TRUE)
mc_flw_users <- lookup_users(mc_flw$user_id)
The expected output would be a tibble of all the user lookups, but instead I get
max number of users exceeded; looking up first 90,000
And then the outputted object contains 90,000 observations and ends the process.
I am doing sentiment analysis of twitter data in r. But have more repeated tweets in data. is it affect the result?
RT #Ananduvi: Will You Support #BharathBandh on Today against #demonetization ???
RT #Ananduvi: Will You Support #BharathBandh on Today against #demonetization ???
if yes! then how to deal with it.? i wanna remove that tweets from twitter dataset.
text<- gsub("(RT|via)((?:\\b\\W*#\\w+)+ )", "", text)
This code removes only name of person but tweet remains as it is.
I will be glad if you help me.
If you have tweets repeated, it will skew the analytics!
With the Twitter API you have Tweets returned in JSON format - you need to treat the "id" field (or better the "id_str" field) from the Tweet as the unique identifier and only select single instances of a given "id" in your analytics.
{"id": 123456789, "id_str": "123456789"}
If you make sure you only have one instance of each tweet keyed on the field above, you will avoid this problem.
I'm trying to retrieve all the tweets from a specific users timeline (donald trump for example) using the userTimeline() function from the twitteR package. My problem is that the function is only returning 469 tweets, as opposed to the 3200 I've specified.
I saw this post of a similar problem, where the answer suggested that the Twitter API only returns tweets from the past week. But in my data, I can see tweets that were created nearly 3 months ago. Does anyone know a way that I can get the max number of tweets possible? If not, does anyone know why I'm getting tweets that were created longer than a week ago?
You can see my code below:
library(devtools)
library(httr)
install_github("twitteR", username = "geoffjentry")
library(twitteR)
setup_twitter_oauth(...."details")
mht=userTimeline('realDonaldTrump',n=3200)
tweets.df <- do.call(rbind, lapply(mht, as.data.frame))
Running this code will get me 469 tweets from his most recent to the 2016-08-12. I want to get as many tweets as possible from as a long a period as I can get.
Trying to dabble in doing some basic sentiment analysis using twitteR library and searchTwitter function. Say I'm searching for tweets specific to "Samsung". I can retrieve the tweets with the below command:
samsung_t = searchTwitter("#samsung", n=1500, lang="en",cainfo="cacert.pem")
This I know will return all the tweets containing the hash-tag #samsung. However, if I wanted to search for tweets containing "samsung" in them: I give the same command but without the "#"
samsung_t = searchTwitter("samsung", n=1500, lang="en",cainfo="cacert.pem")
This however will return all the tweets containing the term "samsung" in them including the handle. For example: it will return a tweet: "#I_Love_Samsung: I like R programming", which is completely irrelevant to my criteria. If I wanted to do a sentiment analysis on say, "Samsung phones", I'm afraid that data like this can skew the results.
Is there a way I can force searchTwitter to only look in the "Tweet" but not the "Handle"?
Thanks a lot in advance.
Looking at the search API documentation and the listing of available search operators, I don't think the twitter search API offers this specific search capability (which seems kind of strange, frankly). I think your best bet is probably to run your search with the tools available to you and filter out the tweets that don't match your criteria from the results you get back.
This question is about measuring twitter impressions and reach using R.
I'm working on a twitter analysis of "People voice about Lynas Malaysia through Twitter Analysis with R" . To be more perfect, I wish to find out how to measure impressions, reach, frequency and so from twitter.
Definition:
Impressions: The aggregated number of followers that have been exposed to a brand/message.
Reach: The total number of unique users exposed to a message/brand.
Frequency: The number of times each unique user reached is exposed to a message.
My trial: #1.
From my understanding, the impression is the followers numbers of the total tweeters that tweet specific "keyword".
For #1. I made one:
rdmTweets <- searchTwitter(cloudstatorg, n=1500)
tw.df=twListToDF(rdmTweets)
n <- length(tw.df[,2])
S <- 0
X <- 0
for (i in 1:n) {
tuser <- getUser(tw.df$screenName[[i]])
X <- tuser$followersCount
S <- S + X
}
S
But the problem occurred will be
Error in .self$twFromJSON(out) :
Error: Rate limit exceeded. Clients may not make more than 150 requests per hour.
For #2. and #3., still don't have any ideas, hope to get helps here. Thanks a lot.
The problem you are having for #1 has nothing to do with R nor your code, is about the # of calls you have made to the Twitter Search API and that it exceeded the 150 calls you have by default.
Depending on what you are trying to do, you are able to mix and match several components of the API to get the results you need,
You can read more in their docs: https://dev.twitter.com/docs/rate-limiting