How to include / exclude filter statement in R httr query for Localytics - r

I can successfully query data from Localytics using R, such as the following example:
r <- POST(url = "https://api.localytics.com/v1/query,
body=list(app_id=<APP_ID>,
metrics=c("occurrences","users"),
dimensions=c('a:URI'),
conditions=list(day = c("between", "2020-02-11", "2020-03-12"),
event_name = "Content Viewed",
"a:Item URI" = "testing")
),
encode="json",
authenticate(Key,Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
But what I would like to do is create a function so I can do this quickly and not have to copy/paste data.
The issue I am running into is with the line "a:Item URI" = "testing", where I am filtering all searches by the Item URI where they all equal "testing", but sometimes, I don't want to include the filter statement, so I just remove that line entirely.
When I wrote my function, I tried something like the following:
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = list(day = c("between", as.character(from), as.character(to)),
event_name = eventName,
filterDim = filterCriteria)
),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}
But my attempt at adding filterDim and filterCriteria only produce the error Unprocessable Entity. (Keep in mind, there are lots of variables I can filter by, not just "a:Item URI" so I need to be able to manipulate that as well.
How can I include a statement, where if I need to filter, I can incorporate that line, but if I don't need to filter, that line isn't included?

conditions is just a list, so you can conditionally add elements to it. Here we just use an if statement to test of the values are passed and if so, add them in.
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
conditions <- list(day = c("between", as.character(from), as.character(to)),
event_name = eventName)
if (!is.null(filterDim) & !is.null(filterCriteria)) {
conditions[[filterDim]] <- filterCriteria)
}
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = conditions),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}

Related

Cant fetch all records in Qualtrics API using httr package

I am trying to fetch all my mailinglists contacts using the following custom function, but contact lists didn't download all the records inside them. Idk what I am doing wrong?
get_all_contacts<-function(mailingListID){
directoryId<-"POOL_XXXXXXXXXX"
apiToken<-"XXXXXXXXXX"
fetch_url<- VERB(verb = "GET", url = paste("https://iad1.qualtrics.com/API/v3/directories/", directoryId,
"/mailinglists/",mailingListID ,"/contacts",sep = ""),
add_headers(`X-API-TOKEN` = apiToken), encode = "json")
fetch_url<-content(fetch_url, "parse",encoding = "UTF-8")
fetch_url<-fetch_url$result$nextPage
elements <- list()
while(!is.null(fetch_url)){
res<- VERB(verb = "GET", url = fetch_url,
add_headers(`X-API-TOKEN` = apiToken),
encode = "json")
res<-content(res, "parse",encoding = "UTF-8")
elements <- append(elements,res$result$elements)
fetch_url <- res$result$nextPage
}
return(elements)
}

How to get all of the records in COVID-19 Data Lake linelistrecords

I'd like to use the https://api.c3.ai/covid/api/1/linelistrecord/fetch API but only get 2000 records back. I know that there are more than 2000 records -- how do I get them?
Here's my code in R:
library(tidyverse)
library(httr)
library(jsonlite)
resp <- POST(
"https://api.c3.ai/covid/api/1/linelistrecord/fetch",
body = list(
spec = {}
) %>% toJSON(auto_unbox = TRUE),
accept("application/json")
)
length(content(resp)$objs)
I get 2000 records.
The spec you are passing in has the following optional fields, among others:
limit // maximum number of objects to return
offset // offset to use for paged reads
The default value of limit is 2000.
The fetch result that is returned has a boolean field, along with the array of objects, called hasMore, which indicates whether there are more records in the underlying data store.
You can write a loop that ends once hasMore is false. Start with an offset of 0, and limit n (say , n=2000), and then iteratively increase offset by n.
library(tidyverse)
library(httr)
library(jsonlite)
limit <- 2000
offset <- 0
hasMore <- TRUE
all_objs <- c()
while(hasMore) {
resp <- POST(
"https://api.c3.ai/covid/api/1/linelistrecord/fetch",
body = list(
spec = list(
limit = limit,
offset = offset,
filter = "contains(location, 'California')" # just as an example, to cut down on the dataset
)
) %>% toJSON(auto_unbox = TRUE),
accept("application/json")
)
hasMore <- content(resp)$hasMore
offset <- offset + limit
all_objs <- c(all_objs, content(resp)$objs)
}
length(all_objs)
You could also do something similar in Python too. Here is a code snippet for doing the same in Python
import requests
headers = {'Accept': 'application/json'}
import io
import pandas as pd
def read_data(url, payload, headers = headers):
df_list = []
has_more = True
offset = 0
payload['spec']['offset'] = offset
while has_more:
response = requests.post('https://api.c3.ai/covid/api/1/linelistrecord/fetch', json=payload, headers = headers)
df = pd.DataFrame.from_dict(response.json()['objs'])
has_more = response.json()['hasMore']
payload['spec']['offset'] += df.shape[0]
df_list.append(df)
df = pd.concat(df_list)
return df
url = 'https://api.c3.ai/covid/api/1/linelistrecord/fetch'
payload = {
"spec":{
"filter": "exists(hospitalAdmissionDate)",
"include": "caseConfirmationDate, outcomeDate, hospitalAdmissionDate, age"
}
}
df = read_data(url, payload)

Truncated updated string with R DBI package

I need to update a wide table on an SQL SERVER from R. So the package DBI seems to be very useful for that.
The problem is that the R data.frame contains strings of more than 3000 characters and when I use the DBI dbSendQuery function, all strings are truncated to 256 characters.
Here could be a code example :
con <- odbc::dbConnect(drv = odbc::odbc(),
dsn = '***',
UID = '***',
PWD = '***')
df = data.frame(TEST = paste(rep("A", 300), collapse=""),
TEST_ID = 1068858)
df$TEST = df$TEST %>% as.character
query = paste0('UPDATE MY_TABLE SET "TEST"=? WHERE TEST_ID=?')
update <- DBI::dbSendQuery(con, query)
DBI::dbBind(update, df)
DBI::dbClearResult(update)
odbc::dbDisconnect(con)
Then the following request return 256 instead of 300 :
SELECT LEN(TEST) FROM MY_TABLE WHERE TEST_ID = 1068858
NB : TEST is of type (varchar(max), NULL) and already contains strings of more than 256 chars.
Thanks in advance for any advice
In the end, I choose to get rid of sophisticated functions. A solution was to write the table in .csv file and bulk insert it into the database. Here is an example using RODBC package :
write.table(x = df,
file = "/path/DBI_error_test.csv",
sep = ";",
row.names = FALSE, col.names = FALSE,
na = "NULL",
quote = FALSE)
Query = paste("CREATE TABLE #MY_TABLE_TMP (
TEST varchar(max),
TEST_ID int
);
BULK INSERT #MY_TABLE_TMP
FROM 'C:\\DBI_error_test.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
BATCHSIZE = 500000,
CHECK_CONSTRAINTS
)
UPDATE R
SET R.TEST = #MY_TABLE_TMP.TEST
FROM MY_TABLE AS R
INNER JOIN #MY_TABLE_TMP ON #MY_TABLE_TMP.TEST_ID = R.TEST_ID;
DROP TABLE #MY_TABLE_TMP;
")
channel <- RODBC::odbcConnect(dsn = .DB_DSN_NAME,
uid = .DB_UID,
pwd = .DB_PWD)
RODBC::sqlQuery(channel = channel, query = query, believeNRows = FALSE)
RODBC::odbcClose(channel = channel)

How to scrape additional data points from Zillow using R

I inherited a file from a previous coworker to use R to pull Zillow "Zestimate" and "Rent Zestimate" data for properties, and then output these data points to a CSV file. However, I am very new to coding and have not been successful with pulling additional information that I know is available. I have searched the site for answers, but since I am still trying to learn how to code I haven't been successful with making my own edits to the current code. Any help I can get adding code to pull any of these additional data points would be much appreciated.
Property details (sqft, year built, beds, baths, property type)
Zestimate range (high and low)
Rent Zestimate range (high and low)
Last sold date and price
Price history (latest event, date, and price)(not sure this can be scraped )
Tax history (latest year and property taxes) (not sure this can be scraped )
Current code:
houseAddsSplit = read.csv(houseAddsFileLocation) zillowAdds = paste(houseAddsSplit$STREET, houseAddsSplit$CITY, houseAddsSplit$STATE, houseAddsSplit$ZIP, sep = " ")
library(ZillowR)
library(XML)
set_zillow_web_service_id(zwsId)
zpidList = NULL
zestimate = NULL
rentZestimate = NULL
for(i in 1:length(zillowAdds)){
print(paste("Processing house: ", i, ", address: ", zillowAdds[i]))
print(zillowAdds[i])
houseZpidClean = "ERR"
houseZestClean = "ERR"
houseRentZestClean = "ERR"
houseInfo = try(GetSearchResults(address = zillowAdds[i], citystatezip = as.character(houseAddsSplit$ZIP[i]), rentzestimate = TRUE))
'#'while(houseInfo$message$code != "0"){
'#' houseInfo = try(GetSearchResults(address = cipAdds[i], citystatezip = as.character(cipLoans$ZIP[i]), rentzestimate = TRUE))
'#' Sys.sleep(runif(1, 3, 5))
'#'}
if(houseInfo$message$code == "0"){
houseZpid = try(xmlElementsByTagName(houseInfo$response, "zpid", recursive = TRUE))
houseZest = try(xmlElementsByTagName(houseInfo$response, "amount", recursive = TRUE))
houseZpidAlmostClean = try(toString.XMLNode(houseZpid$results.result.zpid))
houseZestAC = try(toString.XMLNode(houseZest$results.result.zestimate.amount))
houseRentZestAC = try(toString.XMLNode(houseZest$results.result.rentzestimate.amount))
houseZpidClean = try(substr(houseZpidAlmostClean, 7, nchar(houseZpidAlmostClean) - 7))
houseZestClean = try(substr(houseZestAC, 24, nchar(houseZestAC) - 9))
houseRentZestClean = try(substr(houseRentZestAC, 24, nchar(houseRentZestAC) - 9))
}
closeAllConnections()
zpidList[i] = houseZpidClean
print(paste("zpid: ", houseZpidClean))
zestimate[i] = houseZestClean
print(paste("zestimate: ", houseZestClean))
rentZestimate[i] = houseRentZestClean
print(paste("rent zestimate: ", houseRentZestClean))
Sys.sleep(runif(1, 7, 10))
}
outputData = cbind(houseAddsSplit, zestimate, rentZestimate)
write.csv(outputData, paste(writeToFolder, "/zillowPullOutput.csv", sep = ""))
print(paste("All done. File written to", paste(writeToFolder, "/zillowPullOutput.csv", sep = "")))
Hope you solved this, but GetSearchResult API wouldn't return all the results you are looking for. You may have to call GetUpdatedPropertyDetails API to get all the results.

Skip a value in a loop if URL doesn't exist

I am trying to get a code to grab all NBA box scores for the month of October. I want the code to try every URL possible for the combination of dates (27-31) and the 30 teams. However, as not all of the teams play every day, some combinations won't exist, so I am trying to implement the try function to skip the non-existent URLs, but I cant seem to figure it out. Here is what I have written so far:
install.packages("XML")
library(XML)
teams = c('ATL','BKN','BOS','CHA','CHI',
'CLE','DAL','DEN','DET','GS',
'HOU','IND','LAC','LAL','MEM',
'MIA','MIL','MIN','NOP','NYK',
'OKC','ORL','PHI','PHX','POR',
'SAC','SA','TOR','UTA','WSH')
october = c()
for (i in teams){
for (j in (c(27:31))){
url = paste("http://www.basketball-reference.com/boxscores/201510",
j,"0",i,".html",sep = "")
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
away_1 = as.data.frame(data[1])
colnames(away_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
away_1 = away_1[away_1$Players != "Reserves",]
away_1 = away_1[away_1$MP != "Did Not Play",]
away_1$team = rep(toupper(substr(names(as.data.frame(data[1]))[1],
5, 7)),length(away_1$Players))
away_1$loc = rep(i,length(away_1$Players))
home_1 = as.data.frame(data[3])
colnames(home_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
home_1 = home_1[home_1$Players != "Reserves",]
home_1 = home_1[home_1$MP != "Did Not Play",]
home_1$team = rep(toupper(substr(names(as.data.frame(data[2]))[1],
5, 7)),length(home_1$Players))
home_1$loc = rep(i,length(home_1$Players))
game = rbind(away_1,home_1)
october = rbind(october, game)
}
}
Everything above and below the following lines appears to work:
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
I just need to properly format these two.
For anyone interested, I figured it out using url.exists in RCurl. Just impliment the following after the url definition line:
if(url.exists(url) == TRUE){...}
How about using tryCatch for error handling?
result = tryCatch({
expr
}, warning = function(w) {
warning-handler-code
}, error = function(e) {
error-handler-code
}, finally = {
cleanup-code
})
where readHTMLTable will be use as the main part ('expr'). You can simply return missing value if error/warning occurs and then omit missing values on final result.

Resources