I am trying to get a code to grab all NBA box scores for the month of October. I want the code to try every URL possible for the combination of dates (27-31) and the 30 teams. However, as not all of the teams play every day, some combinations won't exist, so I am trying to implement the try function to skip the non-existent URLs, but I cant seem to figure it out. Here is what I have written so far:
install.packages("XML")
library(XML)
teams = c('ATL','BKN','BOS','CHA','CHI',
'CLE','DAL','DEN','DET','GS',
'HOU','IND','LAC','LAL','MEM',
'MIA','MIL','MIN','NOP','NYK',
'OKC','ORL','PHI','PHX','POR',
'SAC','SA','TOR','UTA','WSH')
october = c()
for (i in teams){
for (j in (c(27:31))){
url = paste("http://www.basketball-reference.com/boxscores/201510",
j,"0",i,".html",sep = "")
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
away_1 = as.data.frame(data[1])
colnames(away_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
away_1 = away_1[away_1$Players != "Reserves",]
away_1 = away_1[away_1$MP != "Did Not Play",]
away_1$team = rep(toupper(substr(names(as.data.frame(data[1]))[1],
5, 7)),length(away_1$Players))
away_1$loc = rep(i,length(away_1$Players))
home_1 = as.data.frame(data[3])
colnames(home_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
home_1 = home_1[home_1$Players != "Reserves",]
home_1 = home_1[home_1$MP != "Did Not Play",]
home_1$team = rep(toupper(substr(names(as.data.frame(data[2]))[1],
5, 7)),length(home_1$Players))
home_1$loc = rep(i,length(home_1$Players))
game = rbind(away_1,home_1)
october = rbind(october, game)
}
}
Everything above and below the following lines appears to work:
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
I just need to properly format these two.
For anyone interested, I figured it out using url.exists in RCurl. Just impliment the following after the url definition line:
if(url.exists(url) == TRUE){...}
How about using tryCatch for error handling?
result = tryCatch({
expr
}, warning = function(w) {
warning-handler-code
}, error = function(e) {
error-handler-code
}, finally = {
cleanup-code
})
where readHTMLTable will be use as the main part ('expr'). You can simply return missing value if error/warning occurs and then omit missing values on final result.
Related
I can successfully query data from Localytics using R, such as the following example:
r <- POST(url = "https://api.localytics.com/v1/query,
body=list(app_id=<APP_ID>,
metrics=c("occurrences","users"),
dimensions=c('a:URI'),
conditions=list(day = c("between", "2020-02-11", "2020-03-12"),
event_name = "Content Viewed",
"a:Item URI" = "testing")
),
encode="json",
authenticate(Key,Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
But what I would like to do is create a function so I can do this quickly and not have to copy/paste data.
The issue I am running into is with the line "a:Item URI" = "testing", where I am filtering all searches by the Item URI where they all equal "testing", but sometimes, I don't want to include the filter statement, so I just remove that line entirely.
When I wrote my function, I tried something like the following:
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = list(day = c("between", as.character(from), as.character(to)),
event_name = eventName,
filterDim = filterCriteria)
),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}
But my attempt at adding filterDim and filterCriteria only produce the error Unprocessable Entity. (Keep in mind, there are lots of variables I can filter by, not just "a:Item URI" so I need to be able to manipulate that as well.
How can I include a statement, where if I need to filter, I can incorporate that line, but if I don't need to filter, that line isn't included?
conditions is just a list, so you can conditionally add elements to it. Here we just use an if statement to test of the values are passed and if so, add them in.
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
conditions <- list(day = c("between", as.character(from), as.character(to)),
event_name = eventName)
if (!is.null(filterDim) & !is.null(filterCriteria)) {
conditions[[filterDim]] <- filterCriteria)
}
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = conditions),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}
I'm trying to use a bit of code that I found in an academic journal (). I'm new-ish to R. I keep getting an error when I reach the code calling up the function "binHist" that says "could not find the function "binhist". I can't figure out if it's in a library/ package I need to install or if there's another problem with the code. Any help would be much appreciated. Here's the code I extracted from the article:
whichData = yourData
baseH = data.frame()
RunningSum = 0
for (i in 2:16) {
tempBin = NULL
tempBin = binhist(i, whichData$rt)
theMean = sum(tempBin)/(i)
Divisor = sum(tempBin)
new = data.frame()
for (j in 1:ncol(tempBin)) {
grabVal = (tempBin[j] - theMean)^2
names(grabVal) <- NULL
new = c(new,grabVal)
}
extra = i - ncol(tempBin)
NewSum = Reduce("+",new) + extra*((0 - theMean)^2)
StdDev = sqrt(NewSum /(i-1))
RowVal = StdDev /Divisor
RunningSum = RunningSum + RowVal
baseH = c(baseH, list(tempBin))
}
paste("Number of Trials:",Divisor)
paste("Modulo-Binning Score (MBS): ",RunningSum)
library(plyr)
baseNow = do.call(rbind.fill,baseH)
I inherited a file from a previous coworker to use R to pull Zillow "Zestimate" and "Rent Zestimate" data for properties, and then output these data points to a CSV file. However, I am very new to coding and have not been successful with pulling additional information that I know is available. I have searched the site for answers, but since I am still trying to learn how to code I haven't been successful with making my own edits to the current code. Any help I can get adding code to pull any of these additional data points would be much appreciated.
Property details (sqft, year built, beds, baths, property type)
Zestimate range (high and low)
Rent Zestimate range (high and low)
Last sold date and price
Price history (latest event, date, and price)(not sure this can be scraped )
Tax history (latest year and property taxes) (not sure this can be scraped )
Current code:
houseAddsSplit = read.csv(houseAddsFileLocation) zillowAdds = paste(houseAddsSplit$STREET, houseAddsSplit$CITY, houseAddsSplit$STATE, houseAddsSplit$ZIP, sep = " ")
library(ZillowR)
library(XML)
set_zillow_web_service_id(zwsId)
zpidList = NULL
zestimate = NULL
rentZestimate = NULL
for(i in 1:length(zillowAdds)){
print(paste("Processing house: ", i, ", address: ", zillowAdds[i]))
print(zillowAdds[i])
houseZpidClean = "ERR"
houseZestClean = "ERR"
houseRentZestClean = "ERR"
houseInfo = try(GetSearchResults(address = zillowAdds[i], citystatezip = as.character(houseAddsSplit$ZIP[i]), rentzestimate = TRUE))
'#'while(houseInfo$message$code != "0"){
'#' houseInfo = try(GetSearchResults(address = cipAdds[i], citystatezip = as.character(cipLoans$ZIP[i]), rentzestimate = TRUE))
'#' Sys.sleep(runif(1, 3, 5))
'#'}
if(houseInfo$message$code == "0"){
houseZpid = try(xmlElementsByTagName(houseInfo$response, "zpid", recursive = TRUE))
houseZest = try(xmlElementsByTagName(houseInfo$response, "amount", recursive = TRUE))
houseZpidAlmostClean = try(toString.XMLNode(houseZpid$results.result.zpid))
houseZestAC = try(toString.XMLNode(houseZest$results.result.zestimate.amount))
houseRentZestAC = try(toString.XMLNode(houseZest$results.result.rentzestimate.amount))
houseZpidClean = try(substr(houseZpidAlmostClean, 7, nchar(houseZpidAlmostClean) - 7))
houseZestClean = try(substr(houseZestAC, 24, nchar(houseZestAC) - 9))
houseRentZestClean = try(substr(houseRentZestAC, 24, nchar(houseRentZestAC) - 9))
}
closeAllConnections()
zpidList[i] = houseZpidClean
print(paste("zpid: ", houseZpidClean))
zestimate[i] = houseZestClean
print(paste("zestimate: ", houseZestClean))
rentZestimate[i] = houseRentZestClean
print(paste("rent zestimate: ", houseRentZestClean))
Sys.sleep(runif(1, 7, 10))
}
outputData = cbind(houseAddsSplit, zestimate, rentZestimate)
write.csv(outputData, paste(writeToFolder, "/zillowPullOutput.csv", sep = ""))
print(paste("All done. File written to", paste(writeToFolder, "/zillowPullOutput.csv", sep = "")))
Hope you solved this, but GetSearchResult API wouldn't return all the results you are looking for. You may have to call GetUpdatedPropertyDetails API to get all the results.
RStudio provides a nice function View (with uppercase V) to take a look into the data, but with R it's still nasty to get orientation in a large data set. The most common options are...
names(df)
str(df)
If you're coming from SPSS, R seems like a downgrade in this respect. I wondered whether there is a more user-friendly option? I did not find a ready-one, so I'd like to share my solution with you.
Using RStudio's built-in function View, it's white simple to have a variable listing for a data.frame similar to the one in SPSS. This function creates a new data.frame with the variable information and displays in the RStudio GUI via View.
# Better variables view
Varlist = function(sia) {
# Init varlist output
varlist = data.frame(row.names = names(sia))
varlist[["comment"]] = NA
varlist[["type"]] = NA
varlist[["values"]] = NA
varlist[["NAs"]] = NA
# Fill with meta information
for (var in names(sia)) {
if (!is.null(comment(sia[[var]]))) {
varlist[[var, "comment"]] = comment(sia[[var]])
}
varlist[[var, "NAs"]] = sum(is.na(sia[[var]]))
if (is.factor(sia[[var]])) {
varlist[[var, "type"]] = "factor"
varlist[[var, "values"]] = paste(levels(sia[[var]]), collapse=", ")
} else if (is.character(sia[[var]])) {
varlist[[var, "type"]] = "character"
} else if (is.logical(sia[[var]])) {
varlist[[var, "type"]] = "logical"
n = sum(!is.na(sia[[var]]))
if (n > 0) {
varlist[[var, "values"]] = paste(round(sum(sia[[var]], na.rm=T) / n * 100), "% TRUE", sep="")
}
} else if (is.numeric(sia[[var]])) {
varlist[[var, "type"]] = typeof(sia[[var]])
n = sum(!is.na(sia[[var]]))
if (n > 0) {
varlist[[var, "values"]] = paste(min(sia[[var]], na.rm=T), "...", max(sia[[var]], na.rm=T))
}
} else {
varlist[[var, "type"]] = typeof(sia[[var]])
}
}
View(varlist)
}
My recommendation is to store that as a file (e.g., Varlist.R) and whever you need it, just type:
source("Varlist.R")
Varlist(df)
Again please take note of the uppercase V using as function name.
Limitation: When working with data.frame, the listing will not be updated unless Varlist(df) is run again.
Note: R has a built-in option to view data with print. If working with pure R, just replace the View(varlist) by print(varlist). Yet, depending on screen size, Hmisc::describe() could be a better option for the console.
I am trying to use a function to modify another function default settings through formals but my problem is that when I check my function defaults afterwards then nothing has changed. My code (minus unrelated stuff) is:
ScouringSettings <- function(min.MAF=NULL, eq.thresh=NULL){
if (is.null(min.MAF) && is.null(eq.thresh)){
maf <- paste0("Minimum MAF criterion is: ", formals(GeneScour)$min.maf)
eq <- paste0("ChiĀ² HW equilibrium threshold: ", formals(GeneScour)$min.eq)
cat(paste(maf, eq, sep="\n"))
} else if (is.null(eq.thresh)) {
formals(GeneScour) <- alist(gene=, min.maf = min.MAF, min.eq = formals(GeneScour)$min.eq)
} else if (is.null()){
formals(GeneScour) <- alist(gene=, min.maf = formals(GeneScour)$min.maf, min.eq = eq.thresh)
} else {
formals(GeneScour) <- alist(gene=, min.maf = min.maf, min.eq = eq.thresh)
}
}
I thought that maybe it was because of a problem of scope or something so I tried printing out the defaults while still being in my first function and it printed :
$gene
$min.maf
min.MAF
$min.eq
formals(GeneScour)$min.eq
And even when I forcefully type
formals(GeneScour) <- alist(gene=, min.maf = 2, min.eq = formals(GeneScour)$min.eq)
The modification is not carried over outside of the ScouringSettings.
I am a bit lost, how could I manage that ?