I am attempting to extract unsampled data for the past nine months. The website is pretty active, and as such, I'm unable to get the data in its entirety (over 3 m rows) unsampled. I'm currently attempting to break out the filtering so that I'm only returning under 10k rows at a time (which is the API response limit). Is there a way I can loop over a number of days? I tried using the batch function with no success. I have included my code for reference, I was thinking of writing a loop and doing it in 10 day intervals? I appreciate any input.
Thanks!
library(RGA)
gaData <- get_ga(id, start.date = start_date,
end.date= "today" , metrics = "ga:sessions",
dimensions = "ga:date, ga:medium, ga:country, ga:hour, ga:minute",
filters = "ga:country==United States;ga:medium==organic",
max.results = NULL,
batch = TRUE,
sort = "ga:date")
The get_ga function havn't batch param (see ?get_ga). Try it with the fetch.by option. You could test a different variants: "month", "week", "day".
library(RGA)
authorize()
gaData <- get_ga(id, start.date = start_date,
end.date= "today" , metrics = "ga:sessions",
dimensions = "ga:date, ga:medium, ga:country, ga:hour, ga:minute",
filters = "ga:country==United States;ga:medium==organic",
sort = "ga:date", fetch.by = "week")
Related
I am currently using the Radlibrary package. I used the following code:
query <- adlib_build_query(ad_active_status = "ALL",
ad_delivery_date_max = "2022-11-08",
ad_delivery_date_min = "2022-06-24",
ad_reached_countries = "US",
ad_type = "POLITICAL_AND_ISSUE_ADS",
search_terms = "democrat",
publisher_platforms = "FACEBOOK",
fields = c('id',
'ad_creation_time',
'ad_creative_bodies',
'ad_creative_link_captions',
'ad_creative_link_descriptions',
'ad_creative_link_titles',
'ad_delivery_start_time',
'ad_delivery_stop_time',
'ad_snapshot_url',
'bylines',
'currency',
'delivery_by_region',
'estimated_audience_size',
'languages',
'page_id',
'page_name',
'spend',
'publisher_platforms',
'demographic_distribution',
'impressions'))
response <- adlib_get(query)
data <- as_tibble(response)
I've noticed that I only get 1000 observations at a time from that time frame? Is there an efficient way to be able to collect all the observations within the time frame? I've thought about changing the "stop time" based on the last date in the dataset, but that might take a long time if there are a lot of ads in the span of a few days. Any suggestions?
I am at a complete loss at this point on how to resolve my query issue. We have been using the R package, RGA, for about a year now without any problems. I have a script that has been fetching data from 7 views, matching sessions based off of specific pages on our website and totaling them to our product offerings.
This has been working without problem for months. Out of nowhere I have been getting 503 and 500 internal server errors and I'm not sure why.
I've tried changing the fetch.by status to "month", "quarter", "year", "day", etc... but I think the initial query is just too big.
I've also tried changing the max.results options and fetching just one profile ID at a time. We have 7 to process.
date1 <- as.character(cut(Sys.Date(), "month"))
date1_name <- format(as.Date(date1), format = '%Y%m')
date2 <- as.character(Sys.Date())
date2_name <- format(as.Date(date2), format = '%Y%m')
dimensions <- c("ga:yearMonth"
)
metrics <- c("ga:sessions"
)
filters2 <- "ga:sessions>0"
#fetch trip level data for all users and for the micro-goal segment
# country_short_table
short_unq <- unique(country_short_table$destination)
brand_trip_unique <- unique(trip_country_brand$brand_trip)
brand_trip=1
brand_trip=73
all_sessions <- data.frame()
for (brand_trip in 1:length(brand_trip_unique)){
mkt <- gsub('_.*', '', brand_trip_unique[brand_trip])
trip <- gsub('.*_', '', brand_trip_unique[brand_trip])
id <- as.character(ids[ids$market==mkt, 'id'])
segment <- paste('ga:pagePath=~(reisen|circuit)/.*/', trip, sep = '')
segment_def <- paste('users::condition::',segment,sep = '')
table <- get_ga(profileId = id,
start.date = date1,
end.date = date2,
metrics = metrics,
dimensions = dimensions
,filters = filters2
,segment = segment_def
,samplingLevel = "HIGHER_PRECISION"
,fetch.by = "quarter"
,max.results = NULL
)
if (is.list(table)==T) {
table$trip <- trip
table$market <- mkt
all_sessions <- bind_rows(all_sessions, table)
} else {
next()
}
}
GOAL: Can you recommend any way that I could avoid this issue, maybe by separating the date queries and cycling them by weeks or days of the month? I need monthly data aggregated every day but I'm not sure how to edit this script I inherited.
If I remove the custom dimensions 15 and 16 the code runs perfectly fine but does not return anything with them.
I checked the query with query explorer and it returns the data perfectly fine.
query.list <- Init(start.date = "2017-02-01",
end.date = "2017-02-04",
dimensions = "ga:eventAction,ga:eventLabel,ga:pagePath,ga:dimension15,ga:dimension16",
#dimensions = paste(toString(paste("ga:dimension", dim, sep="")),"ga:pagePath,ga:eventLabel,ga:eventAction",sep=", "),
metrics = "ga:totalEvents",
max.results = 10000,
# table.id = "ga:XXXXXX"
table.id = "ga:XXXXX"
)
ga.query <- QueryBuilder(query.list)
ga.data <- GetReportData(ga.query, token, split_daywise = T)
The dates that I was passing the team did not create those custom dimensions during that time.
Hi I'm using GoogleAnalyticsR to import my data from Google Analytics but I'm having a problem because it only downloads 1,000 rows from a total of 1,000,000.
Any advice how to download all 1,000,000?
Here's my code!
df1 <- google_analytics_4(my_id,
date_range = c("2016-05-13", "2017-05-13"),
metrics = c("pageviews"),
dimensions = c("pagePath"))
By default it gets 1000 rows, if you set max = -1 in your call it gets everything:
df1 <- google_analytics_4(my_id,
date_range = c("2016-05-13", "2017-05-13"),
metrics = "pageviews",
dimensions = "pagePath",
max = -1)
I'm using skardhamar's rga ga$getData to query GA and get all data in an unsampled manner. The data is based on more than 500k sessions per day.
At https://github.com/skardhamar/rga, paragraph 'extracting more observations than 10,000' mentions this is possible by using batch = TRUE. Also, paragraph 'Get the data unsampled' mentions that by walking over the days, you can get unsampled data. I'm trying to combine these two, but I can not get it to work. E.g.
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
.. indeed gets unsampled data, but not all data. I get a dataframe with only 20k rows (10k per day). This is limiting to chunks of 10k per day, contrary to what I expect because of using the batch = TRUE setting. So for the 30th of march, I get a dataframe of 20k rows after seeing this output:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
When I leave out the walk = TRUE setting, I do get all observations (771k rows, around 335k per day), but only in a sampled manner:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
Is my data just too big to get all observations unsampled?
You could try querying by device with filters = "ga:deviceCategory==desktop" (and filters = "ga:deviceCategory!=desktop" respectively) and then merging the resulting dataframes.
I'm assuming that your users uses different devices to access your site. The underlying logic is that when you filter data, Google Analytics servers filter it before you get it, so you can "divide" your query and get unsampled data. I think is the same methododology of the "walk" function.
Desktop only
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory==desktop",
segment = "",
,batch = TRUE, walk = TRUE
)
Mobile and Tablet
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory!=desktop",
segment = "",
,batch = TRUE, walk = TRUE
)