How to scrape additional data points from Zillow using R - r

I inherited a file from a previous coworker to use R to pull Zillow "Zestimate" and "Rent Zestimate" data for properties, and then output these data points to a CSV file. However, I am very new to coding and have not been successful with pulling additional information that I know is available. I have searched the site for answers, but since I am still trying to learn how to code I haven't been successful with making my own edits to the current code. Any help I can get adding code to pull any of these additional data points would be much appreciated.
Property details (sqft, year built, beds, baths, property type)
Zestimate range (high and low)
Rent Zestimate range (high and low)
Last sold date and price
Price history (latest event, date, and price)(not sure this can be scraped )
Tax history (latest year and property taxes) (not sure this can be scraped )
Current code:
houseAddsSplit = read.csv(houseAddsFileLocation) zillowAdds = paste(houseAddsSplit$STREET, houseAddsSplit$CITY, houseAddsSplit$STATE, houseAddsSplit$ZIP, sep = " ")
library(ZillowR)
library(XML)
set_zillow_web_service_id(zwsId)
zpidList = NULL
zestimate = NULL
rentZestimate = NULL
for(i in 1:length(zillowAdds)){
print(paste("Processing house: ", i, ", address: ", zillowAdds[i]))
print(zillowAdds[i])
houseZpidClean = "ERR"
houseZestClean = "ERR"
houseRentZestClean = "ERR"
houseInfo = try(GetSearchResults(address = zillowAdds[i], citystatezip = as.character(houseAddsSplit$ZIP[i]), rentzestimate = TRUE))
'#'while(houseInfo$message$code != "0"){
'#' houseInfo = try(GetSearchResults(address = cipAdds[i], citystatezip = as.character(cipLoans$ZIP[i]), rentzestimate = TRUE))
'#' Sys.sleep(runif(1, 3, 5))
'#'}
if(houseInfo$message$code == "0"){
houseZpid = try(xmlElementsByTagName(houseInfo$response, "zpid", recursive = TRUE))
houseZest = try(xmlElementsByTagName(houseInfo$response, "amount", recursive = TRUE))
houseZpidAlmostClean = try(toString.XMLNode(houseZpid$results.result.zpid))
houseZestAC = try(toString.XMLNode(houseZest$results.result.zestimate.amount))
houseRentZestAC = try(toString.XMLNode(houseZest$results.result.rentzestimate.amount))
houseZpidClean = try(substr(houseZpidAlmostClean, 7, nchar(houseZpidAlmostClean) - 7))
houseZestClean = try(substr(houseZestAC, 24, nchar(houseZestAC) - 9))
houseRentZestClean = try(substr(houseRentZestAC, 24, nchar(houseRentZestAC) - 9))
}
closeAllConnections()
zpidList[i] = houseZpidClean
print(paste("zpid: ", houseZpidClean))
zestimate[i] = houseZestClean
print(paste("zestimate: ", houseZestClean))
rentZestimate[i] = houseRentZestClean
print(paste("rent zestimate: ", houseRentZestClean))
Sys.sleep(runif(1, 7, 10))
}
outputData = cbind(houseAddsSplit, zestimate, rentZestimate)
write.csv(outputData, paste(writeToFolder, "/zillowPullOutput.csv", sep = ""))
print(paste("All done. File written to", paste(writeToFolder, "/zillowPullOutput.csv", sep = "")))

Hope you solved this, but GetSearchResult API wouldn't return all the results you are looking for. You may have to call GetUpdatedPropertyDetails API to get all the results.

Related

azure kql parse function - unable to parse ? using regex (zero or one time)

I'm trying to parse this line:
01/11/1011 11:11:11: LOG SERVER = 1 URL = /one/one.aspx/ AccountId = 1111 MainId = 1111 UserAgent = Browser = Chrome , Version = 11.0, IsMobile = False, IP = 1.1.1.1 MESSAGE = sample message TRACE = 1
using this parse statement:
parse-where kind=regex flags=i message with
timestamp:datetime
":.*LOG SERVER = " log_server:string
".*URL = " url:string
".*AccountId = " account_id:string
".*MainId = " main_id:string
".*?UserAgent = " user_agent:string
",.*Version = " version:string
",.*IsMobile = " is_mobile:string
",.*IP = " ip:string
".*MESSAGE = " event:string
".*TRACE = " trace:string
now the thing is that sometimes I got records that has one "key=value" missing but the order of the rest of the columns remains the same.
to match all kinds of rows I just wanted to add (<name_of_colum>)? for example:
"(,.*Version = )?" version:string
but it fails everytime.
I think parse/parse-where operators are more useful when you have well formatted inputs - the potentially missing values in this case would make it tricky/impossible to use these operators.
If you control the formatting of the input strings, consider normalizing it to always include all fields and/or add delimiters and quotes where appropriate.
Otherwise, you could use the extract function to parse it - the following expression would work even if some lines are missing some fields:
| extend
timestamp = extract("(.*): .*", 1, message, typeof(datetime)),
log_server = extract(".*LOG SERVER = ([^\\s]*).*", 1, message),
url = extract(".*URL = ([^\\s]*).*", 1, message),
main_id = extract(".*MainId = ([^\\s]*).*", 1, message),
user_agent = extract(".*UserAgent = ([^,]*).*", 1, message),
version = extract(".*Version = ([^,]*).*", 1, message),
is_mobile = extract(".*IsMobile = ([^,]*).*", 1, message),
ip = extract(".*IP = ([^\\s]*).*", 1, message),
event = iff(message has "TRACE", extract(".*MESSAGE = (.*) TRACE.*", 1, message), extract(".*MESSAGE = (.*)", 1, message)),
trace = extract(".*TRACE = (.*)", 1, message)

How to include / exclude filter statement in R httr query for Localytics

I can successfully query data from Localytics using R, such as the following example:
r <- POST(url = "https://api.localytics.com/v1/query,
body=list(app_id=<APP_ID>,
metrics=c("occurrences","users"),
dimensions=c('a:URI'),
conditions=list(day = c("between", "2020-02-11", "2020-03-12"),
event_name = "Content Viewed",
"a:Item URI" = "testing")
),
encode="json",
authenticate(Key,Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
But what I would like to do is create a function so I can do this quickly and not have to copy/paste data.
The issue I am running into is with the line "a:Item URI" = "testing", where I am filtering all searches by the Item URI where they all equal "testing", but sometimes, I don't want to include the filter statement, so I just remove that line entirely.
When I wrote my function, I tried something like the following:
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = list(day = c("between", as.character(from), as.character(to)),
event_name = eventName,
filterDim = filterCriteria)
),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}
But my attempt at adding filterDim and filterCriteria only produce the error Unprocessable Entity. (Keep in mind, there are lots of variables I can filter by, not just "a:Item URI" so I need to be able to manipulate that as well.
How can I include a statement, where if I need to filter, I can incorporate that line, but if I don't need to filter, that line isn't included?
conditions is just a list, so you can conditionally add elements to it. Here we just use an if statement to test of the values are passed and if so, add them in.
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
conditions <- list(day = c("between", as.character(from), as.character(to)),
event_name = eventName)
if (!is.null(filterDim) & !is.null(filterCriteria)) {
conditions[[filterDim]] <- filterCriteria)
}
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = conditions),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}

Is it possible to create Isolines using public transport or cycling using the Here API?

I am trying to create something based on this Isolines Shiny app.
This uses the Here API. Supposedly it is possible to use public transport and cycling in the Mode, but when I do this, I get an error:
Isoline API failed with the following error: InvalidInputData
From looking at the documentation public transport and cycling works with routing, but possibly not with isolines.
This is the relevant Shiny code (heavily influenced by the app mentioned above):
is_valid = validate_inputs(session, input$origin, input$departure, input$min, input$max, input$step)
if(is_valid) {
progress$status$set(message = 'Requesting...')
departure = paste0(input$departure, ' ',input$time, ':00:00')
range_type = switch(input$range_type, 'Time (minutes)' = 'time', 'Distance (metres)' = 'distance')
unit = switch(input$range_type, 'Time (minutes)' = ' minutes', 'Distance (metres)' = ' metres')
mode = switch(input$mode, 'Pedestrian' = 'pedestrian', 'Public Transport' = 'publicTransport', 'Bike' = 'bicycle', 'Car' = 'car')
isoline_sequence = if(input$range_type == 'Time (minutes)') {
seq(input$min, input$max, input$step) * 60 %>% sort()
} else {
round(seq(input$min, input$max, input$step), digits = 0) %>% sort()
}
layers = sapply(1:length(isoline_sequence), function(x) {
progress$status$inc(amount = 1/length(isoline_sequence),
message = paste0('Processing request ', x, ' of ', length(isoline_sequence)))
isoline(str_remove(input$origin, ' '), departure = departure, range_type = range_type,
range = isoline_sequence[x], mode = mode, app_id = keys$app_id, app_code = keys$app_code)
})
Any help would be great!
As at now, the calculate Isoline API supports only car, truck and pedestrian transport modes.
See documentation:
https://developer.here.com/documentation/routing/topics/resource-calculate-isoline.html:
Types supported in isoline request: fastest, shortest.
TransportModes supported in isoline request: car, truck (only with type fastest), pedestrian.

Trouble with a function in R, "BinHist"

I'm trying to use a bit of code that I found in an academic journal (). I'm new-ish to R. I keep getting an error when I reach the code calling up the function "binHist" that says "could not find the function "binhist". I can't figure out if it's in a library/ package I need to install or if there's another problem with the code. Any help would be much appreciated. Here's the code I extracted from the article:
whichData = yourData
baseH = data.frame()
RunningSum = 0
for (i in 2:16) {
tempBin = NULL
tempBin = binhist(i, whichData$rt)
theMean = sum(tempBin)/(i)
Divisor = sum(tempBin)
new = data.frame()
for (j in 1:ncol(tempBin)) {
grabVal = (tempBin[j] - theMean)^2
names(grabVal) <- NULL
new = c(new,grabVal)
}
extra = i - ncol(tempBin)
NewSum = Reduce("+",new) + extra*((0 - theMean)^2)
StdDev = sqrt(NewSum /(i-1))
RowVal = StdDev /Divisor
RunningSum = RunningSum + RowVal
baseH = c(baseH, list(tempBin))
}
paste("Number of Trials:",Divisor)
paste("Modulo-Binning Score (MBS): ",RunningSum)
library(plyr)
baseNow = do.call(rbind.fill,baseH)

Skip a value in a loop if URL doesn't exist

I am trying to get a code to grab all NBA box scores for the month of October. I want the code to try every URL possible for the combination of dates (27-31) and the 30 teams. However, as not all of the teams play every day, some combinations won't exist, so I am trying to implement the try function to skip the non-existent URLs, but I cant seem to figure it out. Here is what I have written so far:
install.packages("XML")
library(XML)
teams = c('ATL','BKN','BOS','CHA','CHI',
'CLE','DAL','DEN','DET','GS',
'HOU','IND','LAC','LAL','MEM',
'MIA','MIL','MIN','NOP','NYK',
'OKC','ORL','PHI','PHX','POR',
'SAC','SA','TOR','UTA','WSH')
october = c()
for (i in teams){
for (j in (c(27:31))){
url = paste("http://www.basketball-reference.com/boxscores/201510",
j,"0",i,".html",sep = "")
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
away_1 = as.data.frame(data[1])
colnames(away_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
away_1 = away_1[away_1$Players != "Reserves",]
away_1 = away_1[away_1$MP != "Did Not Play",]
away_1$team = rep(toupper(substr(names(as.data.frame(data[1]))[1],
5, 7)),length(away_1$Players))
away_1$loc = rep(i,length(away_1$Players))
home_1 = as.data.frame(data[3])
colnames(home_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
home_1 = home_1[home_1$Players != "Reserves",]
home_1 = home_1[home_1$MP != "Did Not Play",]
home_1$team = rep(toupper(substr(names(as.data.frame(data[2]))[1],
5, 7)),length(home_1$Players))
home_1$loc = rep(i,length(home_1$Players))
game = rbind(away_1,home_1)
october = rbind(october, game)
}
}
Everything above and below the following lines appears to work:
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
I just need to properly format these two.
For anyone interested, I figured it out using url.exists in RCurl. Just impliment the following after the url definition line:
if(url.exists(url) == TRUE){...}
How about using tryCatch for error handling?
result = tryCatch({
expr
}, warning = function(w) {
warning-handler-code
}, error = function(e) {
error-handler-code
}, finally = {
cleanup-code
})
where readHTMLTable will be use as the main part ('expr'). You can simply return missing value if error/warning occurs and then omit missing values on final result.

Resources