I am using the RGoogleAnalytics library to get all the data from my Google Analytics Account into R. However, complex queries deliver 0 results.
My code looks like:
query.list <- Init(start.date = paste(c(lastmonth.startdate)),
end.date = paste(c(lastmonth.enddate)),
metrics = "ga:goalCompletionsAll",
dimensions = "ga:countryIsoCode,ga:yearMonth",
filters = "ga:goalCompletionsAll>0",
max.results = 10000,
table.id = sprintf("ga:%s", sites$profile.id[i]))
# Create the Query Builder object so that the query parameters are validated
ga.query <- QueryBuilder(query.list)
# Extract the data and store it in a data-frame
ga.countriesConversions1 <- GetReportData(ga.query, token)
Everything is inside a "for", and the script stops if one of the queries end in 0 results, because GetReportData(ga.query, token) cannot create a dataframe if there is no data.
I would like to know if there is a way use the warning message ("Your query matched 0 results. Please verify your query using the Query Feed Explorer and re-run it") fired by the library to the console, assign it to a variable and use this as an if condition. So I could create a dummy data.frame before the next function comes.
Assuming getReportData is throwing an error, then you can try:
ga.countriesConversions1 <- try(GetReportData(ga.query, token), silent=TRUE)
if(inherits(ga.countriesConversions1, "try-error")) {
warning(geterrmessage())
... error handling logic ...
}
Related
Each day, I get an email with the quantities of fruit sold on a particular day. The structure of the email is as below:
Date of report:,04-JAN-2022
Time report produced:,5-JAN-2022 02:04
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
I'm trying to build some code in R that will search through my emails, find all emails with a particular subject, iterate through each email to find the variables I'm looking for, take the values and place them in a dataframe with the "Date of report" put in a date column.
With the assistance of people in the community, I was able to achieve the desired result in Python. However as my project has developed, I need to now achieve the same result in R if at all possible.
Unfortunately, I'm quite new to R and therefore if anyone has any advice on how to take this forward I would greatly appreciate it.
For those interested, my Python code is below:
#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
searchf = [
Fruit_1,
Fruit_2
]
#DEF THE STUFF
def get_report_vals(report, searches):
dct = {}
for line in report:
term, *value = line
if term.casefold().startswith('date'):
dct['date'] = pd.to_datetime(value[0])
elif term in searches:
dct[term] = float(value[0])
if len(dct.keys()) != len(searches):
dct.update({x: None for x in searches if x not in dct})
return dct
#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
results = []
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
data = [line.strip().split(",") for line in message.body.split('\n')]
results.append(get_report_vals(data, searchf))
else:
pass
fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)
I'm probably going about this the wrong way, but I'm trying to use the steps I took in Python to achieve the same result in R. So for example I create some variables to hold the fruit sales I'm searching for, then I create a vector to store the searchables, and then when I create an equivalent 'get_vals' function, I create an empty vector.
library(RDCOMClient)
Fruit_1 <- "Apples"
Fruit_2 <- "Pears"
##Create vector to store searchables
searchf <- c(Fruit_1, Fruit_2)
## create object for outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")
search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT QUANTITIES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")
vec <- c()
for (x in emails)
{
subject <- emails(i)$Subject(1)
if (grepl(search, subject)[1])
{
text <- emails(i)$Body()
print(text)
break
}
}
read.table could be a good start for get_report_vals.
Code below outputs result as a list, exception handling still needs to be implemented :
report <- "
Date of report:,04-JAN-2022
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
"
get_report_vals <- function(report,searches) {
data <- read.table(text=report,sep=",")
colnames(data) <- c('key','value')
# find date
date <- data[grepl("date",data$key,ignore.case=T),"value"]
# transform dataframe to list
lst <- split(data$value,data$key)
# output result as list
c(list(date=date),lst[searches])
}
get_report_vals(report,c('Lemons','Oranges'))
$date
[1] "04-JAN-2022"
$Lemons
[1] "4"
$Oranges
[1] "2"
The results of various reports can then be concatenated in a data.frame using rbind:
rbind(get_report_vals(report,c('Lemons','Oranges')),get_report_vals(report,c('Lemons','Oranges')))
date Lemons Oranges
[1,] "04-JAN-2022" "4" "2"
[2,] "04-JAN-2022" "4" "2"
The code now functions as intended. Function was written quite a bit differently from those recommended:
get_vals <- function(email) {
body <- email$body()
date <- str_extract(body, "\\d{2}-[:alpha:]{3}-\\d{4}") %>%
as.character()
data <- read.table(text = body, sep = ",", skip = 9, strip.white = T) %>%
row_to_names(1) %>%
mutate("Date" = date)
return(data)
}
In addition I've written this to bind the rows together:
info <- sapply(results, get_vals, simplify = F) %>%
bind_rows()
May this is not what you are expecting to get as an answer, but I must state that here to help other readers to avoid such mistakes in future.
Unfortunately your Python code is not well-written. For example, I've noticed the following code where you iterate over all items in a folder and check the Subject and message bodies for keywords:
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
You need to use the Find/FindNext or Restrict methods of the Items class instead. So, you don't need to iterate over all items in a folder. Instead, you get only items that correspond to your conditions. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
You may combine all your search criteria into a single query. So, you just need to iterate over found items and extract the data.
Also you may find the AdvancedSearch method helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information.
Below is a snippet of my code that I use in R to extract IDs from a PostgreSQL database. When I run the function I get the following warning message from R:
In result_create(conn#ptr, statement) :
Closing open result set, cancelling previous query
How do I avoid this warning message from happening without making use of options(warn=-1) at the beginning of my code, suppressing the warning instead of
con <- dbConnect(RPostgres::Postgres(),
user = "postgres",
dbname = "DataBaseName",
password = "123456",
port = 5431)
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
return(data_extract)
}
get_id(con, "users")
I found a method for solving the problem.
I found a thread on GitHub for RSQLite a https://github.com/r-dbi/RSQLite/issues/143. In this thread, they explicitly set n = -1 in the dbFetch() function.
This seemed to solve my problem, and the warning message did not show up again by editing the code like the following:
data_extract <- dbFetch(data_extract_query, n = -1)
The meaning of n is the number of rows that the query should return. By setting this to -1 all rows will be retrieved. By default, it is set to n = -1 but for some reason, in this build of R (3.6.3) the warning will still be shown.
Calling ?dbFetch in R you can see more information on this. I have included a snippet from the R-help page:
Usage
dbFetch(res, n = -1, ...)
fetch(res, n = -1, ...)
Arguments
res An object
inheriting from DBIResult, created by dbSendQuery().
n maximum number of records to retrieve per fetch. Use n = -1 or
n = Inf to retrieve all pending records. Some implementations may
recognize other special values.
... Other arguments passed on to methods.
This issue comes up with other database implementations if the results are not cleared before submitting a new one. From the docs of DBI::dbSendQuery
Usage
dbSendQuery(conn, statement, ...)
...
Value
dbSendQuery() returns an S4 object that inherits from DBIResult. The result set can be used with dbFetch() to extract records. Once you have finished using a result, make sure to clear it with dbClearResult(). An error is raised when issuing a query over a closed or invalid connection, or if the query is not a non-NA string. An error is also raised if the syntax of the query is invalid and all query parameters are given (by passing the params argument) or the immediate argument is set to TRUE.
To get rid of the warning the get_id() function must be modified as follows:
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
# Here we clear whatever remains on the server
dbClearResult(data_extract_query)
return(data_extract)
}
See Examples section in help for more.
So I tried to use gl_translate function to 500,000 characters in Rstudio, which means I have to authenticate my google translate API. The problem is that I tried it like two months ago with my old google account and now I'm using the new one.
So when I tried to authenticate new client_id with my new google account, I got error message that my API hadn't been enabled yet, which I had enabled it. I restarted my Rstudio and now I got this error message:
020-01-22 19:01:24 -- Translating html: 147 characters -
2020-01-22 19:01:24> Request Status Code: 403
Error: API returned: Request had insufficient authentication scopes.
It is very frustrating because then I tried to enable the old google account and it requires me to put my credit card number, which is again I did and then now they asked me to wait several days.
Anyone can figure out what's problem with this?
here is my R code for authentication:
install.packages("googleAnalyticsR", dependencies = TRUE)
library(googleAnalyticsR)
install.packages("googleLanguageR")
library(googleLanguageR)
install.packages("dplyr")
library(dplyr)
library(tidyverse)
install.packages("googleAuthR")
library(googleAuthR)
client_id <- "107033903887214478396"
private_key <- "-----BEGIN PRIVATE KEY-----\nMIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQChPmvib1v9/CFA\nX7fG8b8iXyS370ivkufMmX30C6/rUNOttA+zMhamS/EO0uaYtPw44B4QdNzRsSGq\nm+0fQ5Sp1SHJaVPkoImuUXZdMlLO73pvY48nMmEFg7deoOZI+CNWZYgIvPY8whMo\nk4vKE2iyuG+pl9MT7dT6dwWNmXDWr8kgxAfryfVEUIeqaf+57Z9g3FfVPLARz4iH\nCaTu55hhbmo/XknUx0hPsrwBMPaNLGl2+o5MU1ZuIkl47EJvdL8CdUuEmb9qJDtv\nUyKqANlwFa7cVW8ij2tjFpjJ7bigRVJsI8odbsEbwmx1b5SLckDOQ7t4l8zERtmo\nUv5fxyNNAgMBAAECggEAApeLyWyL2IXcjPnc7OxG68kGwJQuoW/lnQLcpPcpIUm/\n1Vt/IxzLg2nWGqxmO48xPMLRiOcwA4jq5yCxi56c/avo6qFwUU0JWY2CrxXXge8U\nk0TQ8MrdB2cqI/HHMeYXP1TLfoR3GtvtzemtRhbQyIqxdNL1eC0LDump47BTQYg0\nVPuCxU3zXVIj+Qt0FZa+Pa/nAOGHf5b4ye56L7vxL2BCeRncuHdDcE6Ilkpz79Gv\nkXP1K5j22uEVCmobe1qRlq3BLx2Qimj4h8MI8CKiNS40tGR/oewJ5uMgmeCePYKv\nqSFOwCDvRkw9V2KdGu40WQFEq21mczlv9gMWhp2/EQKBgQDRmBZZM7ugIpo64wx6\nDFYhZo05LmMiwymIfWs2CibzKDeXPuy3OSytvTPVFCkG+RlcYthxAIAn1Z/qJ4UI\n+8c8Zwfg+toYtEa2gTYM2185vmnqQwqmAsaK+4xKZzgfqxie/CBuPzUOZO41q6P8\ni7A2KqXHcDb4SMqnkdGGLk/7+QKBgQDE8dBesgx4DsHFYg1sJyIqKO4d2pnLPkDS\nAzx5xvQuUcVCNTbugOC7e0vGxWmQ/Eqal5b3nitH590m8WHnU9UiE4HciVLe+JDe\nDe5CWzBslnncBjpgiDudeeEubhO7MHv/qZyZXMh73H2NBdO8j0uiGTNbBFoOSTYq\nsFACiCZu9QKBgE2KjMoXn5SQ+KpMkbMdmUfmHt1G0hpsRZNfgyiM/Pf8qwRjnUPz\n/RmR4/ky6jLQOZe6YgT8gG08VVtVn5xBOeaY34tWgxWcrIScrRh4mHROg/TNNMVS\nRY3pnm9wXI0qyYMYGA9xhvl6Ub69b3/hViHUCV0NoOieVYtFIVUZETJRAoGAW/Y2\nQCGPpPfvD0Xr0parY1hdZ99NdRQKnIYaVRrLpl1UaMgEcHYJekHmblh8JNFJ3Mnw\nGovm1dq075xDBQumOBU3zEzrP2Z97tI+cQm3oNza5hyaYbz7aVsiBNYtrHjFTepb\nT1l93ChnD9SqvB+FR5nQ2y07B/SzsFdH5QbCO4kCgYBEdRFzRLvjdnUcxoXRcUpf\nfVMZ6fnRYeV1+apRSiaEDHCO5dyQP8vnW4ewISnAKdjKv/AtaMdzJ5L3asGRWDKU\n1kP/KDBlJkOsOvTkmJ4TxbIhgcSI62/wqDBi5Xqw1ljR2mh8njzRwqDRKs12EtQ0\n9VaUDm7LCNTAskn2SR/o4Q==\n-----END PRIVATE KEY-----\n"
options(googleAuthR.client_id = client_id)
options(googleAuthR.client_secret = private_key)
devtools::reload(pkg = devtools::inst("googleAnalyticsR"))
ga_auth()
in case you need to see what's my translate code like:
translate <- function(tibble) {
tibble <- tibble
count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
for (i in 1:nrow(tibble)) {
des <- pull(tibble[i,2]) # extract description as single character string
if (count$cumsum[nrow(count)] >= 80000) { # API limit check
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
Sys.sleep(100)
count <- count[1,] # reset count file
}
if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
} else { # else request translation from API
trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
}
tibble[i,3:4] <- trns[,1:2] # add to tibble
nchar = nchar(pull(tibble[i,2])) # count number of characters
req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
count <- rbind(count, req) # add to count file
if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
print("large description (>20,000), pausing to manage API limit")
Sys.sleep(100)
count <- count[1,] # reset count file
}
}
return(tibble)
}
I figured it out after 24 hours.
Apparently it is really easy. I just followed the step from this link.
But yesterday I make mistake because the json file I downloaded is the json file from service client Id, while I actually need the json file from service account.
Then I install the googleLanguageR package with this code:
remotes::install_github("ropensci/googleLanguageR")
library(googleLanguageR)
and then just set the file location of my download Google Project JSON file in a GL_AUTH argument like this:
gl_auth("G:/My Drive/0. Thesis/R-Script/ZakiServiceAccou***************kjadjib****.json")
and now I'm happy :)
I've set up a connection between R and SQL using the package RODBC and managed to connect and query the database from R.
I've created a small R function which objective is to delete some lines (as parameter) in a specific table.
Here it is (nameDB is the name of my database, and values_conversion another function I did to convert some data from R format to SQL format) :
delete_SQL = function(data, table){
ch = odbcConnect(nameDB,"postgres")
names = sqlColumns(channel=ch,table,schema="public",catalog = nameDB)$COLUMN_NAME
for(i in 1:nrow(data)){
sqlQuery(channel=ch,query=paste0("DELETE FROM public.\"",table,"\" WHERE ",
paste0(names," = ",values_conversion(data[i,]),collapse = " and "),";"),errors = TRUE)
}
odbcCloseAll()
}
Query exemple : "DELETE FROM public.\"lieu_protection\" WHERE lieu_id = 3 and protection_id = 1430;"
The code inside this function works fine when I execute everything directly, but when I call the function it throws an
Error in sqlQuery(channel = ch, query = paste0("DELETE FROM public.\"", :
first argument is not an open RODBC channel
I have a similar function that's getting and returning data from SQL and which is working fine, so I guess it has something to do with the delete, but the error is about the channel, so I'm quite confused.
Thanks to anyone that can help !
I am building a function to download Google analytics data from a long list of profiles and need a loop function that can tolerate a profile returning no data.
The problem is that there are several functions needed between the start of the loop and where the error can occur.
The Paste function is pulling an ID from idsvector and then the API query is constructed in 2 successive steps. This is then sent to the API using GetReportData(). The second ID in the list returns no data from the API. Currently it downloads the data from the first profile, merges it with the master dataset and then stops.
for (v in idsvector){
view.id <- paste("ga:",v,sep="") #the View ID parameter need to have "ga:" in front of the ID
sourcequery.list <- Init(
start.date = start.date,
end.date = end.date,
dimensions = "ga:channelGrouping,ga:campaign,ga:source,ga:medium,ga:date",
metrics = "ga:sessions,ga:bounces",
table.id = view.id,
max.results = 9999999
)
}
ga.sourcequery <- QueryBuilder(sourcequery.list)
data <- GetReportData(ga.sourcequery, token)
error=function(e){dev.off(); return(NULL)}
if (!is.null(data)) {
data$Property <- view.id
final.data<-rbind(sourcequery.data,data)
}
else {
next
}
}
How do I adapt this so that it loops back and tries the next ID?
Not sure if this will solve your problem, but a better approach for this is to use lapply.
It is not clear which library you are using to access GA, so I will make up some code for you:
library(data.table)
ga_load_property_data <- function(property) {
# here goes GA API wrapper magic
}
data <- lapply(properties, ga_load_property_data)
data <- rbindlist(data, idcol = "property")
This way you separate the load logic from your iterations.