This is calling for some "tricky R", but this time it's beyond my fantasy :-) I need to save() an object whose name is in the variable var. I tried:
save(get(var), file = ofn)
# Error in save(get(var), file = ofn) : object ‘get(var)’ not found
save(eval(parse(text = var)), file = ofn)
# Error in save(eval(parse(text = var)), file = ofn) :
# object ‘eval(parse(text = var))’ not found
both of which fail, unfortunatelly. How would you solve this?
Use the list argument. This saves x in the file x.RData. (The list argument can specify a vector of names if you need to save more than one at a time.)
x <- 3
name.of.x <- "x"
save(list = name.of.x, file = "x.RData")
# loading x.RData to check that it worked
rm(x)
load("x.RData")
x
## [1] 3
Note
Regarding the first attempt in the question which attempts to use get we need to specify the name rather than its value so that attempt could use do.call converting the character name to a name class object.
do.call("save", list(as.name(name.of.x), file = "x.RData"))
Regarding the second attempt in the question which uses eval, to do that write out the save, substitute in its name as a name class object and then evaluate it.
eval(substitute(save(Name, file = "x.RData"), list(Name = as.name(name.of.x))))
If it's just one object, you can use saveRDS:
a<-1:4
var<-"a"
saveRDS(get(var),file="test.R")
readRDS(file="test.R")
[1] 1 2 3 4
Related
According to the article https://diskframe.com/articles/ingesting-data.html a good use case for inmapfn as part of csv_to_disk_frame(...) is for date conversion. In my data I know the name of the date column at runtime and would like to feed in the date to a convert at read in time function. One issue I am having is that it doesn't seem any additional parameters can be passed into the inmapfn argument beyond the chunk itself. I can't use a hardcoded variable at runtime as the name of the column isn't known until runtime.
To clarify the issue is that the inmapfn seems to run in its own environment to prevent any data races/other parallelisation issues but I know the variable won't be changed so I am hoping there is someway to override this as I can make sure that this is safe.
I know the function I am calling works when called on an arbitrary dataframe.
I have provided a reproducible example below.
library(tidyverse)
library(disk.frame)
setup_disk.frame()
a <- tribble(~dates, ~val,
"09feb2021", 2,
"21feb2012", 2,
"09mar2013", 3,
"20apr2021", 4,
)
write_csv(a, "a.csv")
dates_col <- "dates"
tmp.df <- csv_to_disk.frame(
"a.csv",
outdir = file.path(tempdir(), "tmp.df"),
in_chunk_size = 1L,
inmapfn = function(chunk) {
chunk[, sdate := as.Date(do.call(`$`, list(chunk,dates_col)), "%d%b%Y")]
}
)
#> -----------------------------------------------------
#> Stage 1 of 2: splitting the file a.csv into smallers files:
#> Destination: C:\Users\joelk\AppData\Local\Temp\RtmpcFBBkr\file4a1876e87bf5
#> -----------------------------------------------------
#> Stage 1 of 2 took: 0.020s elapsed (0.000s cpu)
#> -----------------------------------------------------
#> Stage 2 of 2: Converting the smaller files into disk.frame
#> -----------------------------------------------------
#> csv_to_disk.frame: Reading multiple input files.
#> Please use `colClasses = ` to set column types to minimize the chance of a failed read
#> =================================================
#>
#> -----------------------------------------------------
#> -- Converting CSVs to disk.frame -- Stage 1 of 2:
#>
#> Converting 5 CSVs to 6 disk.frames each consisting of 6 chunks
#>
#> Error in do.call(`$`, list(chunk, dates_col)): object 'dates_col' not found
You can experiment with different backend and chunk_reader arguments. For example, if you set the backend to readr, the inmapfn user defined function will have access to previously defined variables. Furthermore, readr will do column type guessing
and will automatically impute Date type columns if it recognizes the string format as a date (in your example data it wouldn't recognize that as a date type, however).
If you don't want to use the readr backend for performance reasons, then I would ask if your example correctly represents your actual scenario? I'm not seeing the need to pass in the date column as a variable in the example you provided.
There is a working solution in the Just-in-time transformation section of the link you provided, and I'm not seeing any added complexities between that example and yours.
If you really need to use the default backend and chunk_reader plan AND you really need to send the inmapfn function a previously defined variable, you can wrap the the csv_to_disk.frame call in a wrapper function:
library(disk.frame)
setup_disk.frame()
df <- tribble(~dates, ~val,
"09feb2021", 2,
"21feb2012", 2,
"09mar2013", 3,
"20apr2021", 4,
)
write.csv(df, file.path(tempdir(), "df.csv"), row.names = FALSE)
wrap_csv_to_disk <- function(col) {
my_date_col <- col
csv_to_disk.frame(
file.path(tempdir(), "df.csv"),
in_chunk_size = 1L,
inmapfn = function(chunk, dates = my_date_col) {
chunk[, dates] <- lubridate::dmy(chunk[[dates]])
chunk
})
}
date_col <- "dates"
df_disk_frame <- wrap_csv_to_disk(date_col)
#> str(collect(df_disk_frame)$dates)
# Date[1:4], format: "2021-02-09" "2012-02-21" "2013-03-09" "2021-04-20"
I see. For a work around would it be possible to do something like this?
date_var = knonw_at_runtime()
saveRDS(date_var, "some/path/date_var.rds")
a = csv_to_disk.frame(files, inmapfn = function(chunk) {
date_var = readRDS("some/path/date_var.rds")
# do the rest
})
I think letting inmapfn have other options is doable see https://github.com/xiaodaigh/disk.frame/issues/377 for tracking
I was working through an example from the Hadley Wickham's "The Joy of Functional Programming (for Data Science)" (available on youtube) lecture on purrr and wanted to add in error-recovery using possibly() to handle any zip files that fail to be read.
A bit of his code;
paths <- dir_ls("NSFopen_Alldata/", glob= ".zip")
files<- map_dfr(paths, ~ tibble(path=.x, files= unzip(.x, list = TRUE)$Name))
Adding error recovery with possibly();
unzip_safe <- possibly(.f= unzip, otherwise = NA)
files<- map_dfr(paths, ~ tibble(path=.x, files= unzip_safe(.x, list = TRUE)$Name))
I get the following error: $ operator is invalid for atomic vectors.
Is this because possibly is in a tibble?
Files that fail return an NA and you are trying to extract $Name from it which returns an error. See
NA$NAme
Error in NA$NAme : $ operator is invalid for atomic vectors
Extract $Name from the successful files in possibly itself. Try :
library(purrr)
unzip_safe <- possibly(.f= ~unzip(., list = TRUE)$Name, otherwise = NA)
files <- map_dfr(paths, ~ tibble(path=.x, files = unzip_safe(.x))
Is there a way to document variables in R with their date of creation/modification?
Because I have .RData files that I used fluency, but sometimes need update the values based in how old is it.
Try file.info():
To get the last modified time:
file.info('path/to/file.Rdata')$mtime
If you want to know when individual variables within your .RData object were last defined by R the only thing I know to do would be to manually add that metadata in something like this:
a = 3
attr(a, 'time_defined') = Sys.time()
b = 4
attr(b, 'time_defined') = Sys.time()
save(a, b, file = 'my_data.RData')
# ... later on ...
load('my_data.RData')
if(difftime(attr(a, 'time_defined'), Sys.time(), units = 'days') > 10) # do the following if more than 10 days old
I have a zoo object, prices, which, when I type class(prices), it returns “zoo.” I then create a file using:
write.zoo(prices, file = “foo”, index.name = “time”)
The resulting files looks like this:
"time" "AAPL.Adjusted" “SHY.Adjusted"
2013-05-01 60.31 84.12
2013-05-02 61.16 84.11
2013-05-03 61.77 84.08
I then try and read this file with this statement:
myData <- read.zoo(“foo”)
and I get this error:
Error in read.zoo(“foo") :
index has bad entries at data rows: 1 2 3 4
I’ve tried a number of parameter settings and nothing seems to work. Help much appreciated.
Newbie
The file has a header line so try:
z <- read.zoo("foo", header = TRUE, check.names = FALSE)
The check.names part gives nicer looking column names but you could leave it out if that were not important.
I made a list, read the list into a for loop, do some calculations with it and export a modified dataframe to [1] "IAEA_C2_NoStdConditionResiduals1" [2] "IAEA_C2_EAstdResiduals2" ect. When I do View(IAEA_C2_NoStdConditionResiduals1) after the for loop then I get the following error message in the console: Error in print(IAEA_C2_NoStdConditionResiduals1) : object 'IAEA_C2_NoStdConditionResiduals1' not found, but I know it is there because RStudio tells me in its Environment view. So the question is: How can I access the saved data (in this assign construct) for further usage?
ResidualList = list(IAEA_C2_NoStdCondition = IAEA_C2_NoStdCondition,
IAEA_C2_EAstd = IAEA_C2_EAstd,
IAEA_C2_STstd = IAEA_C2_STstd,
IAEA_C2_Bothstd = IAEA_C2_Bothstd,
TIRI_I_NoStdCondition = TIRI_I_NoStdCondition,
TIRI_I_EAstd = TIRI_I_EAstd,
TIRI_I_STstd = TIRI_I_STstd,
TIRI_I_Bothstd = TIRI_I_Bothstd
)
C = 8
for(j in 1:C) {
#convert list Variable to string for later usage as Variable Name as unique identifier!!
SubNameString = names(ResidualList)[j]
SubNameString = paste0(SubNameString, "Residuals")
#print(SubNameString)
LoopVar = ResidualList[[j]]
LoopVar[ ,"F_corrected_normed"] = round(LoopVar[ ,"F_corrected_normed"] / mean(LoopVar[ ,"F_corrected_normed"]),
digit = 5
)
LoopVar[ ,"F_corrected_normed_error"] = round(LoopVar[ ,"F_corrected_normed_error"] / mean(LoopVar[ ,"F_corrected_normed_error"]),
digit = 5
)
assign(paste(SubNameString, j), LoopVar)
}
View(IAEA_C2_NoStdConditionResiduals1)
Not really a problem with assign and more with behavior of the paste function. This will build a variable name with a space in it:
assign(paste(SubNameString, j), LoopVar)
#simple example
> assign(paste("v", 1), "test")
> `v 1`
[1] "test"
,,,, so you need to get its value by putting backticks around its name so the space is not misinterpreted as a parse-able delimiter. See what happens when you type:
`IAEA_C2_NoStdCondition 1`
... and from here forward, use paste0 to avoid this problem.