This question is about measuring twitter impressions and reach using R.
I'm working on a twitter analysis of "People voice about Lynas Malaysia through Twitter Analysis with R" . To be more perfect, I wish to find out how to measure impressions, reach, frequency and so from twitter.
Definition:
Impressions: The aggregated number of followers that have been exposed to a brand/message.
Reach: The total number of unique users exposed to a message/brand.
Frequency: The number of times each unique user reached is exposed to a message.
My trial: #1.
From my understanding, the impression is the followers numbers of the total tweeters that tweet specific "keyword".
For #1. I made one:
rdmTweets <- searchTwitter(cloudstatorg, n=1500)
tw.df=twListToDF(rdmTweets)
n <- length(tw.df[,2])
S <- 0
X <- 0
for (i in 1:n) {
tuser <- getUser(tw.df$screenName[[i]])
X <- tuser$followersCount
S <- S + X
}
S
But the problem occurred will be
Error in .self$twFromJSON(out) :
Error: Rate limit exceeded. Clients may not make more than 150 requests per hour.
For #2. and #3., still don't have any ideas, hope to get helps here. Thanks a lot.
The problem you are having for #1 has nothing to do with R nor your code, is about the # of calls you have made to the Twitter Search API and that it exceeded the 150 calls you have by default.
Depending on what you are trying to do, you are able to mix and match several components of the API to get the results you need,
You can read more in their docs: https://dev.twitter.com/docs/rate-limiting
Related
I am using the R tweetscores package to estimate Twitter users’ ideology score (i.e. estimating a user’s ideology based on the accounts they follow).
I am using the code below to loop through a list of usernames, get who they follow (getFriends()) and then estimate their ideology score (estimateIdeology2()).
The getFriends() function makes calls to the Twitter API until it hits the rate limit. In this case, it should wait and then resume to making calls.
However, the loop seems to self-terminate after about 40 minutes.
It looks like the variable that holds the number of calls left, changes from 0 to NULL after a while, causing the loop to break.
Has anyone encountered this and/or knows how to fix this issue? I have tried adapting code to catch it when this variable turns to NULL and change its value but that doesn't prevent the loop from terminating. I would ideally like to keep this loop running and not manually restart it every 40 minutes. The raw code for the getFriends() function is here (it seems to break at line47): https://github.com/pablobarbera/twitter_ideology/blob/master/pkg/tweetscores/R/get-friends.R
for(user in usernames$user_screen_name){
skip_to_next <- FALSE
tryCatch({
friends <- getFriends(screen_name=user, oauth=my_oauth)
results <- estimateIdeology2(user, friends)
}, error=function(e){skip_to_next <<- TRUE})
if(skip_to_next) { next }
print("results computed successfully.")
user_scores[nrow(user_scores) + 1,] = list(screen_name = user,
ideology_score = results)
}
Tweetscores package uses API v1 endpoints and rtweet package. These are being replaced by API v2 and academictwitter. So I would suggest you to get friends list through academictwitteR.
get_user_following(user_ids, bearer_token)
But rate limits are real: You can make 15 requests per a 15 minute window. So if your users only follow a handful accounts (so that no pagination is required), in the best case scenario, you can get followers for one user per minute. If you have got hundreds of thousands of users, this could take ages. Looking ways to work around this issue.
I have a data set, Recoset, of ratings given by users for certain product categories. I am trying to learn use the ratings given by users to predict product categories that other users may like by putting the User Based Collaborative Filtering of recommenderLab package to use, in R.
Here is the code
m <- read.csv("recoset2.csv")
rrm <-as(m,"realRatingMatrix")
rrm2 <- rrm[rowCounts(rrm)>5] #only users who have bought from more than 5 verticals (any random 5)
learn <- Recommender(rrm2,method = "UBCF")
learned <- predict(learn,rrm[2001:2010,],n=3)
learned_list <- as(learned,"list")
Now the code works perfectly fine (sometimes) as long as I am predicting for 10 users or less. But the moment I increase the number of users to 11 or more in this manner
learned <- predict(learn,rrm[2001:2020],n=3)
I am greeted by this error
Error in neighbors[, x] : incorrect number of dimensions
This error at times also props up for as low as 2 users, but never have I received this error for 1 user.
I have spent days by myself, gone through the entire recommenderLab documentation, scoured numerous sources & tutorials but debug this error. Any help in resolving this would be immensely appreciated
> user item rating
> ORG1SNQ0TV16NQP6ZB5SD9XGX1FP7 MobileCable 2
> ORG441VE999BMCTYGZ0H7HDWHGX62 OTGPendrive 2
> ORG7L1NRFQZTPDJRFXC0CQ1LXLY6E MobileScreenGuard 3
> ORGBYFYMG92YFDC043NG7PZEHEPTS MobileScreenGuard 2
> ORGLZH07SFPSFQ3RZJMCV85XKKDKE Smartphone 5
> ORGMBN2841ZDJDZD4HHEN28HB5YYP Headphone 1
Here is the link to the dataset
I know it's probably not a whole lot of help: but I found using Item Based Collaborative filtering circumvents this issue. I found an example here which returned the same error:
https://rpubs.com/dhairavc/639597
I think it has something to do with the number of neighbours for UBCF.
If you change the line to:
learn <- Recommender(rrm2,method = "IBCF")
I imagine you will likely return a result. User-based isn't a friendly towards sparce matricies.
I hope this helps, I'll update you if I get any closer to a fix!
Change the Recommendation line with different methods
learn <- Recommender(rrm2,method = "IBCF") # Change with "UBCF","SVD","POPULAR"
I'm trying to get all user data for the followers of an account, but am running into an issue with the 90,000 user lookup limit. The documentation page says that that can be done by iterating through the user IDs while avoiding the rate limit that has a 15 minute reset time, but doesn't really give any guidance on how to do this. How would a complete user lookup with a list of users that is greater than 90,000 be achieved?
I'm using the rtweet package. Below is an an attempt with #lisamurkowski who has 266,000 followers. i have tried using a retryonratelimit = TRUE argument to lookup_users(), but that doesn't do anything.
lisa <- lookup_users("lisamurkowski")
mc_flw <- get_followers("lisamurkowski", n = lisa$followers_count,
retryonratelimit = TRUE)
mc_flw_users <- lookup_users(mc_flw$user_id)
The expected output would be a tibble of all the user lookups, but instead I get
max number of users exceeded; looking up first 90,000
And then the outputted object contains 90,000 observations and ends the process.
I have a list of 36 locations for which I have to get a distance matrix from each location to every other location, i.e. a 36x36 matrix. Using help from other questions on this topic on this forum, I was able to put together a basic code (demonstrated with four locations only) as follows:
library(googleway)
library(plyr)
key <- "VALID KEY" #removed for security reasons
districts <- c("Attock, Pakistan",
"Bahawalnagar, Pakistan",
"Bahawalpur, Pakistan",
"Bhakkar, Pakistan")
#Calculate pairwise distance between each location
lst <- google_distance(origins=districts, destinations=districts, key=key)
res.lst <- list()
lst_elements <- for (i in 1:length(districts)) {
e.row <- rbind(cbind(districts[i], distance_destinations(lst),
distance_elements(lst)[[i]][['distance']]))
res.lst[[i]] <- e.row
}
# view results as list
res.lst
# combine each element of list into a dataframe.
res.df <- ldply(res.lst, rbind)
#give names to columns
colnames(res.df) <- c("origin", "destination", "dist.km", "dist.m")
#Display result
res.df
This code works fine for small number of queries; i.e. if locations are few e.g. 5 at a time. For anything larger, I get a "Over-Query-Limit" error with the message: "You have exceeded your rate-limit for this API" even though I have not reached the 2500 limit. I also signed up for 'Pay-as-you-use' billing option but I continue to get the same error. I wonder if this is an issue of how many requests are being sent per second (i.e. the rate)? And if so, can I modify my code to address this? Even without an API key, this code does not ask for more than 2500 queries so I should be able to do it but I'm stumped how to resolve this even with billing enabled.
The free quota is 2500 elements.
Each query sent to the Distance Matrix API is limited by the number of allowed elements, where the number of origins times the number of destinations defines the number of elements.
Standard Usage Limits
Users of the standard API:
2,500 free elements per day, calculated as the sum of client-side and server-side queries.
Maximum of 25 origins or 25 destinations per request.
a 36x36 request would be 1296 elements. After 2 you would be out of quota.
For anyone still struggling with this issue; I was able to resolve it by using a while loop. Since I was well under the 2500 query limit, this was a rate problem rather than a query limit being reached problem. With a while loop, I broke the locations into chunks (running distance queries for 2x36 at a time) and looping over the entire data to get the 36x36 I needed.
I've recently been playing with the mapdist function in the ggmap package.
For small volumes of queries it works fine for me, but for larger numbers (still below the 2,500 limit) it falls over and I'm not sure why.
I've had an old colleague try this script and they get the same results as I do (they are in a different organisation, using a different computer, on a different network etc.).
Here is my testing script which runs the same request again and again to see how many queries it manages to pass before failing. It was consistently returning 129 for a time, lately it has begun returning 127 (though this number is still consistent within a certain test).
Note that although this repeats the same postcodes, I have tried similar with a random selection of destination postcodes and get the same results.
library("ggmap")
# Setup ----------
no.of.pcd.to.check <- 500
a <- rep("SW1A 1AA",no.of.pcd.to.check) # Use a repeating list of the same postcode to remove it as a causal factor
b <- rep("WC2H 0HE",no.of.pcd.to.check) # As above
test.length <- 5 # How many iterations should the test run over
# Create results dataframe ----------
# and pre-set capacity to speed up the for loop
results.df <- data.frame(
Iteration=seq(1:test.length),
Result=as.integer(rep(0,test.length)),
Remaining=as.integer(rep(0,test.length)))
# Run the test ----------
for(i in 1:test.length){
x <- distQueryCheck() # Get remaining number of queries pre submission
try(mapdist(a, b, mode="driving", output="simple",override_limit=TRUE))
y <- distQueryCheck() # Get remaining number of queries post submission
query.use <- (x-y) # Difference between pre and post (ie number of successful queries submitted)
print(paste(query.use, "queries used"))
results.df[i,"Result"] <- query.use # Save successful number of queries for each test iteration
results.df[i,"Remaining"] <- y
}
I'd be really grateful for any insight on where I'm going wrong here.
So I had this same error message, and what ended up fixing it was simply changing the '#' in an address to 'Number '. I'm no expert and haven't even looked into the mapdist code, but eliminating '#' allowed me to use mapdist with no problems.