I have a list of 36 locations for which I have to get a distance matrix from each location to every other location, i.e. a 36x36 matrix. Using help from other questions on this topic on this forum, I was able to put together a basic code (demonstrated with four locations only) as follows:
library(googleway)
library(plyr)
key <- "VALID KEY" #removed for security reasons
districts <- c("Attock, Pakistan",
"Bahawalnagar, Pakistan",
"Bahawalpur, Pakistan",
"Bhakkar, Pakistan")
#Calculate pairwise distance between each location
lst <- google_distance(origins=districts, destinations=districts, key=key)
res.lst <- list()
lst_elements <- for (i in 1:length(districts)) {
e.row <- rbind(cbind(districts[i], distance_destinations(lst),
distance_elements(lst)[[i]][['distance']]))
res.lst[[i]] <- e.row
}
# view results as list
res.lst
# combine each element of list into a dataframe.
res.df <- ldply(res.lst, rbind)
#give names to columns
colnames(res.df) <- c("origin", "destination", "dist.km", "dist.m")
#Display result
res.df
This code works fine for small number of queries; i.e. if locations are few e.g. 5 at a time. For anything larger, I get a "Over-Query-Limit" error with the message: "You have exceeded your rate-limit for this API" even though I have not reached the 2500 limit. I also signed up for 'Pay-as-you-use' billing option but I continue to get the same error. I wonder if this is an issue of how many requests are being sent per second (i.e. the rate)? And if so, can I modify my code to address this? Even without an API key, this code does not ask for more than 2500 queries so I should be able to do it but I'm stumped how to resolve this even with billing enabled.
The free quota is 2500 elements.
Each query sent to the Distance Matrix API is limited by the number of allowed elements, where the number of origins times the number of destinations defines the number of elements.
Standard Usage Limits
Users of the standard API:
2,500 free elements per day, calculated as the sum of client-side and server-side queries.
Maximum of 25 origins or 25 destinations per request.
a 36x36 request would be 1296 elements. After 2 you would be out of quota.
For anyone still struggling with this issue; I was able to resolve it by using a while loop. Since I was well under the 2500 query limit, this was a rate problem rather than a query limit being reached problem. With a while loop, I broke the locations into chunks (running distance queries for 2x36 at a time) and looping over the entire data to get the 36x36 I needed.
Related
I am using the R tweetscores package to estimate Twitter users’ ideology score (i.e. estimating a user’s ideology based on the accounts they follow).
I am using the code below to loop through a list of usernames, get who they follow (getFriends()) and then estimate their ideology score (estimateIdeology2()).
The getFriends() function makes calls to the Twitter API until it hits the rate limit. In this case, it should wait and then resume to making calls.
However, the loop seems to self-terminate after about 40 minutes.
It looks like the variable that holds the number of calls left, changes from 0 to NULL after a while, causing the loop to break.
Has anyone encountered this and/or knows how to fix this issue? I have tried adapting code to catch it when this variable turns to NULL and change its value but that doesn't prevent the loop from terminating. I would ideally like to keep this loop running and not manually restart it every 40 minutes. The raw code for the getFriends() function is here (it seems to break at line47): https://github.com/pablobarbera/twitter_ideology/blob/master/pkg/tweetscores/R/get-friends.R
for(user in usernames$user_screen_name){
skip_to_next <- FALSE
tryCatch({
friends <- getFriends(screen_name=user, oauth=my_oauth)
results <- estimateIdeology2(user, friends)
}, error=function(e){skip_to_next <<- TRUE})
if(skip_to_next) { next }
print("results computed successfully.")
user_scores[nrow(user_scores) + 1,] = list(screen_name = user,
ideology_score = results)
}
Tweetscores package uses API v1 endpoints and rtweet package. These are being replaced by API v2 and academictwitter. So I would suggest you to get friends list through academictwitteR.
get_user_following(user_ids, bearer_token)
But rate limits are real: You can make 15 requests per a 15 minute window. So if your users only follow a handful accounts (so that no pagination is required), in the best case scenario, you can get followers for one user per minute. If you have got hundreds of thousands of users, this could take ages. Looking ways to work around this issue.
I'm using R to access GA data using RGoogleAnalytics plugin.
I wrote the following query to get the Search Terms from the Site Search from Oct 16 to 22.
query <- Init(start.date = "2017-10-16",
end.date = "2017-10-22",
dimensions = "ga:searchKeyword,ga:searchKeywordRefinement",
metrics = "ga:searchUniques,ga:searchSessions,ga:searchExits,ga:searchRefinements",
max.results = 99999,
sort = "-ga:searchUniques",
table.id = "ga:my_view_id")
ga.query2 <- QueryBuilder(query)
ga.data.refined <- GetReportData(ga.query2, token, paginate_query = T)
However, this returns 34000 rows, which doesn't match with 45000 row that I see in GA. Note: I did add another dimension to Search Terms.
Interestengly, if I remove ga:searchKeywordRefinement dimension from the code and also in GA, the number of rows does match.
This is most likely caused by sampling in the data. I can't seem to locate the documentation on how to access this, but the documentation otherwise makes it clear it is possible:
RGoogleAnalytics GitHub with Readme
In cases where queries are sampled, the output also returns the percentage of sessions that were used for the query
So the answer is to access the output that returns the percentage of sessions that were used for the query, and if it less than 100%, then you found your problem.
To solve for sampling... there are some techniques. Review the section in the documentation that talks about splitting your queries into single day, then union all the dates together.
I have a table in DDB with site_id as my hash key and person_id as the range key. There are another 6-8 columns on this table with numeric statistics about this person (e.g. times seen, last log in etc). This table has data for about 10 sites and 20 million rows (this is only used as a proof of concept now - the production table will have much bigger numbers).
I m trying to retrieve all person_ids for a given site where time_seen > 10. So I m doing a query using the hashkey and enter the time_seen > 10 as a criterion. This will result to a few thousand entries which I expected to get pretty much instantly. My test harness runs in AWS on the same region.
The read capacity on this table is 100 units. The results I m getting are attached.
For some reason I m hitting the limits. Given the only two limits I m aware of are the max data size returned. I m only returning 32 bytes per row (so approx 100KB per result) so no chance this is the case. The time as you see doesnt hit the 5 sec limit either. So why cant I get my results faster?
Results are retrieved in a single thread from C#.
Thanks
I've recently been playing with the mapdist function in the ggmap package.
For small volumes of queries it works fine for me, but for larger numbers (still below the 2,500 limit) it falls over and I'm not sure why.
I've had an old colleague try this script and they get the same results as I do (they are in a different organisation, using a different computer, on a different network etc.).
Here is my testing script which runs the same request again and again to see how many queries it manages to pass before failing. It was consistently returning 129 for a time, lately it has begun returning 127 (though this number is still consistent within a certain test).
Note that although this repeats the same postcodes, I have tried similar with a random selection of destination postcodes and get the same results.
library("ggmap")
# Setup ----------
no.of.pcd.to.check <- 500
a <- rep("SW1A 1AA",no.of.pcd.to.check) # Use a repeating list of the same postcode to remove it as a causal factor
b <- rep("WC2H 0HE",no.of.pcd.to.check) # As above
test.length <- 5 # How many iterations should the test run over
# Create results dataframe ----------
# and pre-set capacity to speed up the for loop
results.df <- data.frame(
Iteration=seq(1:test.length),
Result=as.integer(rep(0,test.length)),
Remaining=as.integer(rep(0,test.length)))
# Run the test ----------
for(i in 1:test.length){
x <- distQueryCheck() # Get remaining number of queries pre submission
try(mapdist(a, b, mode="driving", output="simple",override_limit=TRUE))
y <- distQueryCheck() # Get remaining number of queries post submission
query.use <- (x-y) # Difference between pre and post (ie number of successful queries submitted)
print(paste(query.use, "queries used"))
results.df[i,"Result"] <- query.use # Save successful number of queries for each test iteration
results.df[i,"Remaining"] <- y
}
I'd be really grateful for any insight on where I'm going wrong here.
So I had this same error message, and what ended up fixing it was simply changing the '#' in an address to 'Number '. I'm no expert and haven't even looked into the mapdist code, but eliminating '#' allowed me to use mapdist with no problems.
This question is about measuring twitter impressions and reach using R.
I'm working on a twitter analysis of "People voice about Lynas Malaysia through Twitter Analysis with R" . To be more perfect, I wish to find out how to measure impressions, reach, frequency and so from twitter.
Definition:
Impressions: The aggregated number of followers that have been exposed to a brand/message.
Reach: The total number of unique users exposed to a message/brand.
Frequency: The number of times each unique user reached is exposed to a message.
My trial: #1.
From my understanding, the impression is the followers numbers of the total tweeters that tweet specific "keyword".
For #1. I made one:
rdmTweets <- searchTwitter(cloudstatorg, n=1500)
tw.df=twListToDF(rdmTweets)
n <- length(tw.df[,2])
S <- 0
X <- 0
for (i in 1:n) {
tuser <- getUser(tw.df$screenName[[i]])
X <- tuser$followersCount
S <- S + X
}
S
But the problem occurred will be
Error in .self$twFromJSON(out) :
Error: Rate limit exceeded. Clients may not make more than 150 requests per hour.
For #2. and #3., still don't have any ideas, hope to get helps here. Thanks a lot.
The problem you are having for #1 has nothing to do with R nor your code, is about the # of calls you have made to the Twitter Search API and that it exceeded the 150 calls you have by default.
Depending on what you are trying to do, you are able to mix and match several components of the API to get the results you need,
You can read more in their docs: https://dev.twitter.com/docs/rate-limiting