Related
I would like to choose the initial population of the genetic algorithm for variable selection with makeFeatSelControlGA() function from mlr package.
Is it doable ?
Edit
Since it is not implemented I did a dirty editing of selectFeaturesGA function for this purpose. Here is the code, if it helps anyone.
library(R.utils)
reassignInPackage("selectFeaturesGA","mlr",function (learner, task, resampling, measures, bit.names, bits.to.features,
control, opt.path, show.info)
{
states=get("states",envir=.GlobalEnv)
mu = control$extra.args$mu
lambda = control$extra.args$lambda
yname = opt.path$y.names[1]
minimize = opt.path$minimize[1]
if (!length(states))
{
for (i in seq_len(mu)) {
while (TRUE) {
states[[i]] = rbinom(length(bit.names), 1, 0.5)
if (is.na(control$max.features) || sum(states[[i]] <=
control$max.features))
break
}
}
}
evalOptimizationStatesFeatSel(learner, task, resampling,
measures, bits.to.features, control, opt.path, show.info,
states, 0L, NA_integer_)
pop.inds = seq_len(mu)
for (i in seq_len(control$maxit)) {
pop.df = as.data.frame(opt.path)[pop.inds, , drop = FALSE]
pop.featmat = as.matrix(pop.df[, bit.names, drop = FALSE])
mode(pop.featmat) = "integer"
pop.y = pop.df[, yname]
kids.list = replicate(lambda, generateKid(pop.featmat,
control), simplify = FALSE)
kids.evals = evalOptimizationStatesFeatSel(learner, task,
resampling, measures, bits.to.features, control,
opt.path, show.info, states = kids.list, i, as.integer(NA))
kids.y = extractSubList(kids.evals, c("y", yname))
oplen = getOptPathLength(opt.path)
kids.inds = seq(oplen - lambda + 1, oplen)
if (control$extra.args$comma) {
setOptPathElEOL(opt.path, pop.inds, i - 1)
pool.inds = kids.inds
pool.y = kids.y
}
else {
pool.inds = c(pop.inds, kids.inds)
pool.y = c(pop.y, kids.y)
}
pop.inds = pool.inds[order(pool.y, decreasing = !minimize)[seq_len(mu)]]
setOptPathElEOL(opt.path, setdiff(pool.inds, pop.inds),
i)
}
makeFeatSelResultFromOptPath(learner, measures, resampling,
control, opt.path)
})
And in the global environment, you need to add a list named states with length equal to mu
This is currently not supported by mlr, so you would have to modify the source code to achieve that.
I inherited a file from a previous coworker to use R to pull Zillow "Zestimate" and "Rent Zestimate" data for properties, and then output these data points to a CSV file. However, I am very new to coding and have not been successful with pulling additional information that I know is available. I have searched the site for answers, but since I am still trying to learn how to code I haven't been successful with making my own edits to the current code. Any help I can get adding code to pull any of these additional data points would be much appreciated.
Property details (sqft, year built, beds, baths, property type)
Zestimate range (high and low)
Rent Zestimate range (high and low)
Last sold date and price
Price history (latest event, date, and price)(not sure this can be scraped )
Tax history (latest year and property taxes) (not sure this can be scraped )
Current code:
houseAddsSplit = read.csv(houseAddsFileLocation) zillowAdds = paste(houseAddsSplit$STREET, houseAddsSplit$CITY, houseAddsSplit$STATE, houseAddsSplit$ZIP, sep = " ")
library(ZillowR)
library(XML)
set_zillow_web_service_id(zwsId)
zpidList = NULL
zestimate = NULL
rentZestimate = NULL
for(i in 1:length(zillowAdds)){
print(paste("Processing house: ", i, ", address: ", zillowAdds[i]))
print(zillowAdds[i])
houseZpidClean = "ERR"
houseZestClean = "ERR"
houseRentZestClean = "ERR"
houseInfo = try(GetSearchResults(address = zillowAdds[i], citystatezip = as.character(houseAddsSplit$ZIP[i]), rentzestimate = TRUE))
'#'while(houseInfo$message$code != "0"){
'#' houseInfo = try(GetSearchResults(address = cipAdds[i], citystatezip = as.character(cipLoans$ZIP[i]), rentzestimate = TRUE))
'#' Sys.sleep(runif(1, 3, 5))
'#'}
if(houseInfo$message$code == "0"){
houseZpid = try(xmlElementsByTagName(houseInfo$response, "zpid", recursive = TRUE))
houseZest = try(xmlElementsByTagName(houseInfo$response, "amount", recursive = TRUE))
houseZpidAlmostClean = try(toString.XMLNode(houseZpid$results.result.zpid))
houseZestAC = try(toString.XMLNode(houseZest$results.result.zestimate.amount))
houseRentZestAC = try(toString.XMLNode(houseZest$results.result.rentzestimate.amount))
houseZpidClean = try(substr(houseZpidAlmostClean, 7, nchar(houseZpidAlmostClean) - 7))
houseZestClean = try(substr(houseZestAC, 24, nchar(houseZestAC) - 9))
houseRentZestClean = try(substr(houseRentZestAC, 24, nchar(houseRentZestAC) - 9))
}
closeAllConnections()
zpidList[i] = houseZpidClean
print(paste("zpid: ", houseZpidClean))
zestimate[i] = houseZestClean
print(paste("zestimate: ", houseZestClean))
rentZestimate[i] = houseRentZestClean
print(paste("rent zestimate: ", houseRentZestClean))
Sys.sleep(runif(1, 7, 10))
}
outputData = cbind(houseAddsSplit, zestimate, rentZestimate)
write.csv(outputData, paste(writeToFolder, "/zillowPullOutput.csv", sep = ""))
print(paste("All done. File written to", paste(writeToFolder, "/zillowPullOutput.csv", sep = "")))
Hope you solved this, but GetSearchResult API wouldn't return all the results you are looking for. You may have to call GetUpdatedPropertyDetails API to get all the results.
RStudio provides a nice function View (with uppercase V) to take a look into the data, but with R it's still nasty to get orientation in a large data set. The most common options are...
names(df)
str(df)
If you're coming from SPSS, R seems like a downgrade in this respect. I wondered whether there is a more user-friendly option? I did not find a ready-one, so I'd like to share my solution with you.
Using RStudio's built-in function View, it's white simple to have a variable listing for a data.frame similar to the one in SPSS. This function creates a new data.frame with the variable information and displays in the RStudio GUI via View.
# Better variables view
Varlist = function(sia) {
# Init varlist output
varlist = data.frame(row.names = names(sia))
varlist[["comment"]] = NA
varlist[["type"]] = NA
varlist[["values"]] = NA
varlist[["NAs"]] = NA
# Fill with meta information
for (var in names(sia)) {
if (!is.null(comment(sia[[var]]))) {
varlist[[var, "comment"]] = comment(sia[[var]])
}
varlist[[var, "NAs"]] = sum(is.na(sia[[var]]))
if (is.factor(sia[[var]])) {
varlist[[var, "type"]] = "factor"
varlist[[var, "values"]] = paste(levels(sia[[var]]), collapse=", ")
} else if (is.character(sia[[var]])) {
varlist[[var, "type"]] = "character"
} else if (is.logical(sia[[var]])) {
varlist[[var, "type"]] = "logical"
n = sum(!is.na(sia[[var]]))
if (n > 0) {
varlist[[var, "values"]] = paste(round(sum(sia[[var]], na.rm=T) / n * 100), "% TRUE", sep="")
}
} else if (is.numeric(sia[[var]])) {
varlist[[var, "type"]] = typeof(sia[[var]])
n = sum(!is.na(sia[[var]]))
if (n > 0) {
varlist[[var, "values"]] = paste(min(sia[[var]], na.rm=T), "...", max(sia[[var]], na.rm=T))
}
} else {
varlist[[var, "type"]] = typeof(sia[[var]])
}
}
View(varlist)
}
My recommendation is to store that as a file (e.g., Varlist.R) and whever you need it, just type:
source("Varlist.R")
Varlist(df)
Again please take note of the uppercase V using as function name.
Limitation: When working with data.frame, the listing will not be updated unless Varlist(df) is run again.
Note: R has a built-in option to view data with print. If working with pure R, just replace the View(varlist) by print(varlist). Yet, depending on screen size, Hmisc::describe() could be a better option for the console.
I am trying to get a code to grab all NBA box scores for the month of October. I want the code to try every URL possible for the combination of dates (27-31) and the 30 teams. However, as not all of the teams play every day, some combinations won't exist, so I am trying to implement the try function to skip the non-existent URLs, but I cant seem to figure it out. Here is what I have written so far:
install.packages("XML")
library(XML)
teams = c('ATL','BKN','BOS','CHA','CHI',
'CLE','DAL','DEN','DET','GS',
'HOU','IND','LAC','LAL','MEM',
'MIA','MIL','MIN','NOP','NYK',
'OKC','ORL','PHI','PHX','POR',
'SAC','SA','TOR','UTA','WSH')
october = c()
for (i in teams){
for (j in (c(27:31))){
url = paste("http://www.basketball-reference.com/boxscores/201510",
j,"0",i,".html",sep = "")
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
away_1 = as.data.frame(data[1])
colnames(away_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
away_1 = away_1[away_1$Players != "Reserves",]
away_1 = away_1[away_1$MP != "Did Not Play",]
away_1$team = rep(toupper(substr(names(as.data.frame(data[1]))[1],
5, 7)),length(away_1$Players))
away_1$loc = rep(i,length(away_1$Players))
home_1 = as.data.frame(data[3])
colnames(home_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
"FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")
home_1 = home_1[home_1$Players != "Reserves",]
home_1 = home_1[home_1$MP != "Did Not Play",]
home_1$team = rep(toupper(substr(names(as.data.frame(data[2]))[1],
5, 7)),length(home_1$Players))
home_1$loc = rep(i,length(home_1$Players))
game = rbind(away_1,home_1)
october = rbind(october, game)
}
}
Everything above and below the following lines appears to work:
data <- try(readHTMLTable(url, stringsAsFactors = FALSE))
if(inherits(data, "error")) next
I just need to properly format these two.
For anyone interested, I figured it out using url.exists in RCurl. Just impliment the following after the url definition line:
if(url.exists(url) == TRUE){...}
How about using tryCatch for error handling?
result = tryCatch({
expr
}, warning = function(w) {
warning-handler-code
}, error = function(e) {
error-handler-code
}, finally = {
cleanup-code
})
where readHTMLTable will be use as the main part ('expr'). You can simply return missing value if error/warning occurs and then omit missing values on final result.
I am trying to use a function to modify another function default settings through formals but my problem is that when I check my function defaults afterwards then nothing has changed. My code (minus unrelated stuff) is:
ScouringSettings <- function(min.MAF=NULL, eq.thresh=NULL){
if (is.null(min.MAF) && is.null(eq.thresh)){
maf <- paste0("Minimum MAF criterion is: ", formals(GeneScour)$min.maf)
eq <- paste0("ChiĀ² HW equilibrium threshold: ", formals(GeneScour)$min.eq)
cat(paste(maf, eq, sep="\n"))
} else if (is.null(eq.thresh)) {
formals(GeneScour) <- alist(gene=, min.maf = min.MAF, min.eq = formals(GeneScour)$min.eq)
} else if (is.null()){
formals(GeneScour) <- alist(gene=, min.maf = formals(GeneScour)$min.maf, min.eq = eq.thresh)
} else {
formals(GeneScour) <- alist(gene=, min.maf = min.maf, min.eq = eq.thresh)
}
}
I thought that maybe it was because of a problem of scope or something so I tried printing out the defaults while still being in my first function and it printed :
$gene
$min.maf
min.MAF
$min.eq
formals(GeneScour)$min.eq
And even when I forcefully type
formals(GeneScour) <- alist(gene=, min.maf = 2, min.eq = formals(GeneScour)$min.eq)
The modification is not carried over outside of the ScouringSettings.
I am a bit lost, how could I manage that ?