I'm trying to import data from Google analytic s to R by using rga library and following line.
myresults <- ga$getData(id, start.date="2015-04-28",
end.date="2015-05-28", metrics = "ga:exits",start = 1,max = 1000)
Above code works and does extract information specified by query "ga:exits", and I was wondering if there exits a query that would provide a report of page views for every pages.
P.S. I have tried google analytics query explorer.
Sincerely,
YJ
What about something like this:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Axxxxxxxx&start-date=2015-05-01&end-date=2015-06-08&metrics=ga%3Apageviews&dimensions=ga%3ApagePath&sort=-ga%3Apageviews
Look for a dimensions: https://developers.google.com/analytics/devguides/reporting/core/dimsmets. For example, ga:pageTitle or ga:pagePath.
you should use ga:pagePath in the dimension and ga:pageviews in the metric.
rga.open(instance = "ga",
client.id = "xxxxxxxxxxxxxxgooleusercontent.com",
client.secret = "xxxxx-xxxxe46z_N")
ga$getData(ids,metrics="ga:pageviews", dimensions="ga:pagePath,
start.date="yyyy-mm-dd",
end.date="yyyy-mm-dd")
Hope this helps.
Related
I'm trying to fetch data from the Google Plus API but I only know how to search if I know the user_id.
Here's how I get the JSON using RCurl library:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/people/",
user_id,"/activities/public?maxResults=100&key=", api_key),
ssl.verifypeer = FALSE)
I have tried formatting the URL like the documentation on google
like so:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/activities/",
keyword,"?key=",api_key),ssl.verifypeer = FALSE)
but it doesn't work.
Is it even possible to search using a keyword from R or not? As R isn't in the supported programming languages for the API according to this link
I figured out how to make it work.
The GET request should be formatted as:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/activities?key=",api_key,"&query=",search_string),ssl.verifypeer = FALSE)
I got quite a big set of URLs (> 8.500) I want to query the Google Analytics API with using R. I'm working with the googleAnalyticsR package. The problem is, that I am indeed able to loop through my set of urls, but the dataframe created only returns the total values for the host-id for each row (e.g. same values for each row).
Here's how far I got to this point:
library(googleAnalyticsR)
library(lubridate)
#Authorize with google
ga_auth()
ga.acc.list = ga_account_list()
my.id = 123456
#set time range
soty = floor_date(Sys.Date(), "year")
yesterday = floor_date(Sys.Date(), "day") - days(1)
#get some - in this case - random URLs
urls = c("example.com/de/", "example.com/us/", "example.com/en/")
urls = gsub("^example.com/", "ga:pagePath=~", urls)
df = data.frame()
#get data
for(i in urls){
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
df = rbind(df, ga.data)}
With the result of always receiving the total statistics for the my.id-domain in each row in the dataframe created (own data):
Output result
Anyone knows of a better way on how to tackle this or does google analytics simply prevent us from querying it in such a way?
What you're getting is normal: you only queried for metrics (c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate")), so you only get your metrics.
If you want to break down those metrics, you need to use dimensions:
https://developers.google.com/analytics/devguides/reporting/core/dimsmets
In your case you're interested in the ga:pagePath dimension, so something like this (untested code):
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
dimensions=c("pagePath"),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
I advise you to use the Google Analytics Query Explorer until you get the desired results, then port it to R.
As for the number of results, you might be limited to 1K by default until you increase max_rows. There is a hard limit on 10K from the API, which means you then have to use pagination to retrieve more results if needed. I see some examples in the R documentation with max=99999999, I don't know if the R library automatically handles pagination beyond the first 10K or if they are unaware of the hard limit:
batch_gadata <- google_analytics(id = ga_id,
start="2014-08-01", end="2015-08-02",
metrics = c("sessions", "bounceRate"),
dimensions = c("source", "medium",
"landingPagePath",
"hour","minute"),
max=99999999)
For downloading Data from Google Analytics I use RStudio (R version 3.2.3 (2015-12-10)) with the following code:
library(RGoogleAnalytics)
library(RGA)
gaData4_5 <- ga$getData(profileId,
start.date = as.Date("2017-01-04"),
end.date=as.Date("2017-01-05"),
metrics = "ga:impressions,ga:adCost,ga:adClicks,ga:users",
dimensions = "ga:date,ga:adwordsCampaignID,ga:adwordsAdGroupID", filter = "ga:source==google,ga:medium==cpc",
sort = "ga:date", batch = TRUE)
When summing up impression, adCost und adClicks, the numbers in R match exactly the numbers that are shown onthe Analytics Web site itself. However, only the number of users is too high and I do not understand why.
I checked also newUsers, and userType (not in the above code), but neither of these alternatives provided the correct result.
Can anyone please explain, how it can happen that 3 metrics results are perfectly correct, only one is not? And what can be done to correct this?
Thanks a lot!
I am new to Google analytics API.....I authenticated my application in R using code:
library(RGoogleAnalytics)
client.id <- "**************.apps.googleusercontent.com"
client.secret <- "**********************"
token <- Auth(client.id, client.secret)
save(token,file="./token_file")
ValidateToken(token)
I am figuring out what we need to enter in the below credentials:
query.list <- Init(start.date = "2011-11-28",
end.date = "2014-12-04",
dimensions = "ga:date,ga:pagePath,ga:hour,ga:medium",
metrics = "ga:sessions,ga:pageviews",
max.results = 10000, sort = "-ga:date", table.id = "ga:33093633")
Where can I find dimensions, metrics, sort, table.id
My eventual goal is to pull the text from "https://plus.google.com/105253676673287651806/posts"
Please do assist me in this....
Using Google Analytics and R may not suit what you want to do here, as the Google+ website won't be included in the data you collect.
You may want to look at using RVest which is a URL scraper tool for R. You could then get the information you need from any public URL into an R dataframe for you to analyse later.
Query Explorer:
https://ga-dev-tools.appspot.com/query-explorer/?csw=1
Dimensions and metrics:
https://developers.google.com/analytics/devguides/reporting/core/dimsmets
I am trying to extract twitter data for a keyword using the following code:
cred<- OAuthFactory$new(consumerKey='XXXX', consumerSecret='XXXX',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='https://api.twitter.com/oauth/authorize')
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
To enable the connection, please direct your web browser to:
https://api.twitter.com/oauth/authorize?oauth_token=Cwr7GgWIdjh9pZCmaJcLq6CG1zIqk4JsID8Q7v1s
When complete, record the PIN given to you and provide it here: 8387466
registerTwitterOAuth(cred)
search=searchTwitter('facebook',cainfo="cacert.pem",n=1000)
But evenwith n=1000, the function returns a list of only 99 tweets where it should more than that. I also tried the same function with a specific timeline:
search=searchTwitter('facebook',cainfo="cacert.pem",n=1000,since='2013-01-01',until='2014-04-01')
But this function returns a empty list.
Can anyone help me out, with the correct set of additional queries so that I can extract data from a specific timeline and without any restriction on the number of tweets? Does it have to do anything with the amount of data fetched by the API?
Thanks in advance
It looks like Twitter API restricts number of returned tweets. You should check this out in the API documentation. Keeping the restriction in mind, you can use the since and sinceID arguments of searchTwitter() within a loop, something like:
for (i in 1:20) {
if (i==1) search = searchTwitter('facebook',cainfo="cacert.pem",n=2, since='2014-04-15')
else search = searchTwitter('facebook',cainfo="cacert.pem",n=2, since='2014-04-15', sinceID=search[[1]]$id)
print(search)
Sys.sleep(10)
}
You may need to adjust the Sys.sleep(10) portion if you hit API restrictions.