I have a function that takes a call xm, where xm is a learnt machine learning model. Is there a way tht within the function I can print the name of xm rather than the summary of the model which is what happens when you print(xm)
For example, my function generates graphs that I am saving within the function
modsummary <- function(xm){
mypath <- file.path("C:","Users","Documents",paste("rf_fit_hmeas_random", ".png", sep = ""))
png(file = mypath)
print(plot(xm))
dev.off()
}
modsummary(rf_fit)
What I am trying to do is set this up in way so that it will paste xm (in this case rf_fit) so that it automatically detecs the function called and replaces xm_hmeas_random each time a different model is called.
Thank you
Yes, you can use deparse(substitute(xm)) to get this.
So it would be..
modsummary <- function(xm){
mypath <- file.path("C:/","Users/","Documents/",paste0(deparse(substitute(xm)), "_hmeas_random.png"))
png(file = mypath)
print(plot(xm))
dev.off()
}
modsummary(rf_fit)
I've added in the slashes (/) between your directory names and switched to using paste0() which negates the need to specify a separator.
Related
I have a few data frames (colors, sets, inventory) and I want to save each of them into a folder that I have set as my wd. I want to do this using a for loop, but I am not sure how to write the file argument such that R understands that it should use the elements of the vector as the file names.
I might write:
DFs <- c("colors", "sets", "inventory")
for (x in 1:length(DFs)){
save(x, file = "x.Rda")
}
The goal would be that the files would save as colors.Rda, sets.Rda, etc. However, the last element to run through the loop simply saves as x.Rda.
In short, perhaps my question is: how do you tell R that I am wanting to use elements being run through a loop within an argument when that argument requires a character string?
For bonus points, I am sure I will encounter the same problem if I want to load a series of files from that folder in the future. Rather than loading each one individually, I'd also like to write a for loop. To load these a few minutes ago, I used the incredibly clunky code:
sets_file <- "~/Documents/ME teaching/R notes/datasets/sets.csv"
sets <- read.csv(sets_file)
inventories_file <- "~/Documents/ME teaching/R notes/datasets/inventories.csv"
inventories <- read.csv(inventories_file)
colors_file <- "~/Documents/ME teaching/R notes/datasets/colors.csv"
colors <- read.csv(colors_file)
For compactness I use lapply instead of a for loop here, but the idea is the same:
lapply(DFs, \(x) save(list=x, file=paste0(x, ".Rda"))))
Note that you need to generate the varying file names by providing x as a variable and not as a character (as part of the file name).
To load those files, you can simply do:
lapply(paste0(DFs, ".Rda"), load, envir = globalenv())
To save you can do this:
DFs <- list(color, sets, inventory)
names(DFs) = c("color", "sets", "inventory")
for (x in 1:length(DFs)){
dx = paste(names(DFs)[[x]], "Rda", sep = ".")
dfx = DFs[[x]]
save(dfx, file = dx)
}
To specify the path just inform in the construction of the dx object as following to read.
To read:
DFs <- c("colors", "sets", "inventory")
# or
DFs = dir("~/Documents/ME teaching/R notes/datasets/")
for(x in 1:length(DFs)){
arq = paste("~/Documents/ME teaching/R notes/datasets/", DFs[x], ".csv", sep = "")
DFs[x] = read.csv(arq)
}
It will read as a list, so you can access using [[]] indexation.
My goal is to read many files into R, and ultimately, run a Root Mean Square Error (rmse) function on each pair of columns within each file.
I have this code:
#This calls all the files into a dataframe
filnames <- dir("~/Desktop/LGsampleHUCsWgraphs/testRSMEs", pattern = "*_45Fall_*")
#This reads each file
read_data <- function(z){
dat <- read_excel(z, skip = 0, )
return(dat)
}
#This combines them into one list and splits them by the names in the first column
datalist <- lapply(filnames, read_data)
bigdata <- rbindlist(datalist, use.names = T)
splitByHUCs <- split(bigdata, f = bigdata$HUC...1 , sep = "\n", lex.order = TRUE)
So far, all is working well. Now I want to apply an rmse [library(Metrics)] analysis on each of the "splits" created above. I don't know what to call the "splits". Here I have used names but that is an R reserved word and won't work. I tried the bigdata object but that didn't work either. I also tried to use splitByHUCs, and rMSEs.
rMSEs <- sapply(splitByHUCs, function(x) rmse(names$Predicted, names$Actual))
write.csv(rMSEs, file = "~/Desktop/testRMSEs.csv")
The rmse code works fine when I run it on a single file and create a name for the dataframe:
read_excel("bcc1_45Fall_1010002.xlsm")
bcc1F1010002 <- read_excel("bcc1_45Fall_1010002.xlsm")
rmse(bcc1F1010002$Predicted, bcc1F1010002$Actual)
The "splits" are named by the "splitByHUCs" script, like this:
They are named for the file they came from, appropriately. I need some kind of reference name for the rmse formula and I don't know what it would be. Any ideas? Thanks. I made some small versions of the files, but I don't know how to add them here.
As it is a list, we can loop over the list with sapply/lapply as in the OP's code, but the names$ is incorrect as the lambda function object is x which signifies each of the elements of the list (i.e. a data.frame). Therefore, instead of names$, use x$
sapply(splitByHUCs, function(x) rmse(x$Predicted, x$Actual))
Here is a function to take a file name stem, assemble the file name from path, stem, and suffix, read the file, drop a variable, and save the file again. In this simplified version the read stem and the save stem are the same, and I remove just one variable.
library(tidyverse)
strip_cps <- function(X, path = path){
readRDS(paste0(path = path, X, ".RDS")) %>%
select(-"YEAR") %>%
saveRDS(file = paste0(path = path, X, ".RDS"))
}
Here is a function to call the previous function repeatedly, once per stem.
compact_cps <- function(stems, path){
stems %>%
walk(strip_cps0(., path=path))
}
Here is a microscopic version of my data. Replace the data directory with a directory on your own system
df1 <- tibble(YEAR = 1, SEX = 1)
df2 = df1[1,] + c(1, 1)
saveRDS(df1, file = paste0("./CPS_1962-2018/", "df1", ".RDS"))
saveRDS(df2, file = paste0("./CPS_1962-2018/", "df2", ".RDS"))
Running the foregoing code on this data gets me the following error:
Error in gzfile(file, "rb") : invalid 'description' argument
From the traceback, I learn that gzfile is called by readRDS. Rerunning the code with debug, I learn that the description argument of gzfile is
chr{1:2] "./CPS_1962-2018/df1.RDS" "./CPS_1962-2018/df1.RDS"
which is to say, a length-2 character vector.
Now, it seems to be a characteristic of bugs in my code that I can not resolve myself that they are not where I think they are. But it appears to me that walk, having received the character vector stems = c(“df1”, ”df2”) as its principle argument via %>% from the calling function compact_cps, then hands the whole vector to strip_cps instead of handing it one element at a time.
It seems more likely that I am missing something than that walk would do that, but I can't see what.
try this
compact_cps <- function(stems, path){
stems %>%
walk(strip_cps, path=path))
}
the first argument to walk is the vector stems, the next argument is the name of the function you want to call for each element in the vector, then any additional arguments to the function.
my question ties to the following problem:
Run external R script n times and save outputs in a data frame
The difference is, that I dont generate different results by randomization functions but would like to use every time a different set of input variables (e.g. run the chunk of code for a range of latitudes lat=c(50,60,70,80))
Has anyone a hint for me?
Thanks a lot!
Wrap the script into a function by putting:
my_function <- function(latitude) {
at the top and
}
at the bottom.
That way, you could source it once then then use ldply from the plyr package:
results <- ldply(10 * 5:8, myFunction)
If you wanted a column to identify which latitude was used, you could either add that to your function's data.frame or use:
results <- ldply(10 * 5:8, function(lat) data.frame(latitude = lat, myFunction())
If for some reason you didn't want to modify your script, you could create a wrapper function:
my_wrapper <- function(a) {
latitude <- a
source("script.R", local = TRUE)$value
}
or even use eval and parse:
my_function <- eval(parse(text = c("function(latitude) {",
readLines("script.R"), "}")))
While fine-tuning parameters for plots I want to save all the test runs in different files so that they will not be lost. So far, I managed to do it using the code below:
# Save the plot as WMF file - using random numbers to avoid overwriting
number <- sample(1:20,1)
filename <- paste("dummy", number, sep="-")
fullname <- paste(filename, ".wmf", sep="")
# Next line actually creates the file
dev.copy(win.metafile, fullname)
dev.off() # Turn off the device
This code works, generating files with name "dummy-XX.wmf", where XX is a random number between 1 and 20, but it looks cumbersome and not elegant at all.
Is there any more elegant method to accomplish the same? Or even, to keep a count of how many times the code has been run and generate nice progressive numbers for the files?
If you really want to increment (to avoid overwriting what files already exist) you can create a small function like this one:
createNewFileName = function(path = getwd(), pattern = "plot_of_something", extension=".png") {
myExistingFiles = list.files(path = path, pattern = pattern)
print(myExistingFiles)
completePattern = paste0("^(",pattern,")([0-9]*)(",extension,")$")
existingNumbers = gsub(pattern = completePattern, replacement = "\\2", x = myExistingFiles)
if (identical(existingNumbers, character(0)))
existingNumbers = 0
return(paste0(pattern,max(as.numeric(existingNumbers))+1,extension))
}
# will create the file myplot1.png
png(filename = createNewFileName(pattern="myplot"))
hist(rnorm(100))
dev.off()
# will create the file myplot2.png
png(filename = createNewFileName(pattern="myplot"))
hist(rnorm(100))
dev.off()
If you are printing many plots, you can do something like
png("plot-%02d.png")
plot(1)
plot(1)
plot(1)
dev.off()
This will create three files "plot-01.png", "plot-02.png", "plot-03.png"
The filename you specify can take an sprintf-like format where the index of the plot in passed in. Note that counting is reset when you open a new graphics device so all calls to plot() will need to be done before calling dev.off().
Note however with this method, it will not look to see which files already exist. It will always reset the counting at 1. Also, there is no way to change the first index to anything other than 1.