I am a noob in R and a experience a lot of trouble with the following:
I have to read in over 200 datasets and I want to do this automatically. I wrote some code that works perfectly for Rdata extensions but if I try it for SAS-files it always blocks...
path= "road"
# I make a list of all the different paths of all the files in my folder
File_pathnames <- list.files (path= Road, pattern = "*.sas7bdat", full.names=T)
# I create an empty list
list.data<-list()
# I try to run a loop to load all the SAS files:
for (i in 1: length(File_pathnames))
{
list.data[[i]] <- read_sas(File_pathnames[i])
}
Problem: it does not load the tables into my global environment (when I used the rdata files I used the load function and all the data appeared in the global environment). How Can I solve this?
many thanks!
Actually, your data ARE in the global environment, as elements of list.data (check list.data[[1]], list.data[[2]], ...)
The issue you have is linked to the fact that load loads an object in the environment using the name it had when it was saved. As an example
x <- 10
save(x, file='tmp')
rm(x)
x
load('tmp')
x
save x and reload it, while read_sas only load the data that you have to assign to a variable.
If you want to assign specifically each data set, you have to define a name for each of them and assign the data. Your loop would look like
for (i in 1: 1: length(File_pathnames))
{
namei <- paste0("name",i)
data <- read_sas(File_pathnames[i])
assign(namei, data)
}
and your data would be stored in "name1", "name2", ...
You should the assign each SAS files read in File_pathnames[i] as an object named FilenamesS[i]. Try
for (i in 1: length(File_pathnames))
{
data <- read_sas(File_pathnames[i])
assign (FilenamesS[i], data)
}
Related
is there a way to load a csv file in R and define the variable automatically from the filename? So, if you have a csv file called 'hello', can I load it in R and create the df/var. without defining it?
So, rather than define hello in the load procedure: hello=read("filepath/hello"); instead we have read("filepath/hello") but include a command to create and name a variable that is the same name of the file name (hello in this example?)
Depending on why you would like to do this I would offer you another solution:
I suppose your problem is that you have a big folder with a lot of csv files and you would like to load them all and give the variables the name of the csv file without typing everything manually.
Then you can run
> setwd("C:/Users/Testuser/testfiles")
> file_names <- list.files()
> file_names
[1] "rest" "test1.txt" "test2.csv" "test3.csv"
where as path you use the path where all your csv files are stored.
Then if there are stored any other files and you only would like to have the csv files we have to grep them with regex
> file_names_csv <- file_names[grepl(".csv",file_names)]
> file_names_csv
[1] "test2.csv" "test3.csv"
Now we load them with a for loop and assign them to a variable that is named as the corresponding csv file
for( name in file_names_csv){
assign(paste(name, sep=""), read.csv(file = paste(name, sep="")))
}
And we have
> test2.csv
test
1 1234
> test3.csv
test
1 2323
You can also gsub the .csv away before you load the data by
> file_names_csv <- gsub(".csv","",file_names_csv )
> file_names_csv
[1] "test2" "test3"
So basically you have exactly what you have asked for without using global variables.
I have advised you not to do this in any real world scenario, but if it helps understanding the concepts, this is not a complete solution but the important ingredients.
<<- the superassignement operator in the enclosing environment, which in the following case is the global namespace:
rm(hello) # just in case, ignore warning if there is any
dont <- function(){
hello <<- 42
}
print(hello)
dont()
print(hello)
So you can define values in the enclosing environment within a function without a return value.
The name of that variable does not have to be fixed (as hello in the example above) but can depend on an argument to that function as in
dontdothis <- function(name){
eval(parse(text = paste0(name, " <<- 42")))
}
dontdothis("frederik")
print(frederik * 2)
You will need to add the file operations and some small detail but that is how one could do it. You may want to google for namespaces and environments and assignment operators in R to get a better understanding of the details in there.
Worthwhile short read to distinguish between global environment and enclosing environment: Why is using `<<-` frowned upon and how can I avoid it?
I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)
Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)
My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.
You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)
you can use
save.image("myfile.RData")
This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.
I have thousands of hdf files in a folder. Is there a way to create a loop to read all of the hdf files in that folder and write some specific data to another file?
I read the first file in the folder using the code below:
mydata <- h5read("/path to file/name of the file.he5", "/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily")
But I have 1686 more files in the folder, and it is not possible to read one by one. I think I need to write a for loop to read all files in the folder.
I found some codes listing the txt files in a folder and then, read all the files:
nm <- list.files(path="path/to/file")
do.call(rbind, lapply(nm, function(x) read.table(file=x)[, 2]))
I tried to change the code as seen below:
nm <- list.files(path="path/to/file")
do.call(rbind, lapply(nm, function(x) h5read(file=x)[, 2]))
But the error message says:
Error in h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
Error in h5checktypeOrOpenLoc(). Cannot open file. File 'D:\path to file\name of the file.he5' does not exist.
What should I do in that situation?
If you are not bound to a specific technology, you may want to take a look at HDFql. Using HDFql in R, your issue can be solved as follows (for the sake of this example, assume that (1) dataset /HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily exists in all the HDF5 files stored in the directory, (2) it has one dimension (size 1024), and (3) is of data type integer):
# load HDFql R wrapper (make sure it can be found by the R interpreter)
source("HDFql.R")
# create variable "values" and initialize it
values <- array(dim = c(1024))
for(x in 1:1024)
{
values[x] <- as.integer(0)
}
# show (i.e. get) files stored in directory "/path/to/hdf5/files" and populate HDFql default cursor with it
hdfql_execute("SHOW FILE /path/to/hdf5/files")
# iterate HDFql default cursor
while(hdfql_cursor_next() == HDFQL_SUCCESS)
{
file_name <- hdfql_cursor_get_char()
# select (i.e. read) data from dataset "/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily" and populate variable "values" with it
hdfql_execute(paste("SELECT FROM", file_name, "\"/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily\" INTO MEMORY", hdfql_variable_transient_register(values)))
# display values stored in variable "values"
for(x in 1:1024)
{
print(values[x])
}
}
Additional examples on how to read datasets using HDFql can be found in the quick start guide and reference manual.
I am using Rversion 3.03 on a Windows 7 OS and am trying to solve a problem. I am not sure if this is just me being stupid or if this is really a problem with my version of R.
Intitial problem: I have a folder with 300+ csv files and I need to specify a function that reads in a user-specified number of files. So my idea was to use the list.files function to give me a list of the csv's and then choose from this list rather than having to reformat the user input to match the csv filenames.
pm <- function(directory, id = 1:332) {
setwd("C:/Users/cw/Documents")
setwd(directory)
x <- id[1]
x
files <- list.files()
#for (x in 1:length(id))
#data[i] <- read.csv(files[x], header=T)
#}
}
pm("specdata", 25:30)
So first I set the wd which works like a charm. Then I wanted to set x equal to the first element of id to obtain a starting point. Next I wanted to build a vector 'files' to choose the filenames from.
Real problem: if I run the 'pm'-function, R tells me that the object files does not exist. So am I doing sth wrong (obviously I am) and what?
Thanks very much,
C
files is just a local variable that you declare inside your pm function. To use the results in your calling code, you should assign it to a variable (I used filelist here):
filelist <- pm("specdata", 25:30)
I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)
Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)
My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.
You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)
you can use
save.image("myfile.RData")
This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.