Load csv file in R - r

is there a way to load a csv file in R and define the variable automatically from the filename?  So, if you have a csv file called 'hello',  can I load it in R and create the df/var. without defining it? 
So, rather than define hello in the load procedure: hello=read("filepath/hello"); instead we have read("filepath/hello") but include a command to create and name a variable that is the same name of the file name (hello in this example?)

Depending on why you would like to do this I would offer you another solution:
I suppose your problem is that you have a big folder with a lot of csv files and you would like to load them all and give the variables the name of the csv file without typing everything manually.
Then you can run
> setwd("C:/Users/Testuser/testfiles")
> file_names <- list.files()
> file_names
[1] "rest" "test1.txt" "test2.csv" "test3.csv"
where as path you use the path where all your csv files are stored.
Then if there are stored any other files and you only would like to have the csv files we have to grep them with regex
> file_names_csv <- file_names[grepl(".csv",file_names)]
> file_names_csv
[1] "test2.csv" "test3.csv"
Now we load them with a for loop and assign them to a variable that is named as the corresponding csv file
for( name in file_names_csv){
assign(paste(name, sep=""), read.csv(file = paste(name, sep="")))
}
And we have
> test2.csv
test
1 1234
> test3.csv
test
1 2323
You can also gsub the .csv away before you load the data by
> file_names_csv <- gsub(".csv","",file_names_csv )
> file_names_csv
[1] "test2" "test3"
So basically you have exactly what you have asked for without using global variables.

I have advised you not to do this in any real world scenario, but if it helps understanding the concepts, this is not a complete solution but the important ingredients.
<<- the superassignement operator in the enclosing environment, which in the following case is the global namespace:
rm(hello) # just in case, ignore warning if there is any
dont <- function(){
hello <<- 42
}
print(hello)
dont()
print(hello)
So you can define values in the enclosing environment within a function without a return value.
The name of that variable does not have to be fixed (as hello in the example above) but can depend on an argument to that function as in
dontdothis <- function(name){
eval(parse(text = paste0(name, " <<- 42")))
}
dontdothis("frederik")
print(frederik * 2)
You will need to add the file operations and some small detail but that is how one could do it. You may want to google for namespaces and environments and assignment operators in R to get a better understanding of the details in there.
Worthwhile short read to distinguish between global environment and enclosing environment: Why is using `<<-` frowned upon and how can I avoid it?

Related

Save .RData in a different directory

I load my files (.RData) from a particular folder, and i created a subfolder to save some samples and subsets. So, i want to save these elements in the subfolder, and they don't have the same name structure because i have multiple datasets (for example it cannot be sub1, sub2 etc, i have to write try1, full_sample, sub_2021 and so on).
I tried the following :
subsets_samples <- file.path <-("/Volumes/WD_BLACK/Merge/SAMPLES_SUBSETS")
fname <- file.path(subsets_samples, ".RData")
save(mydata, file=fname)
But obviously there is a problem with the saving part. My goal is to have something like :
save(mydata, file = "newname")
With the .RData format from fname that is put automatically.
I saw some answers with loops and so on but i don't really understand the process i'm sorry.
Thanks !
The problem with file.path is that it will place a separator (e.g., /¸) between each of the elements. So you would have to use paste0 in addition for the actual file name:
# If I understand you correctly, you want the iteration, like try1, full_sample, sub_2021 and so on in your file name. define them somewhere in your loop/script
iteration <- "full_sample"
fname <- file.path("Volumes", "WD_BLACK", "Merge", "SAMPLES_SUBSETS", paste0(iteration, ".Rds"))
Additionally, I would suggest to use saveRDS instead of save, since it is the appropriate function if you want to save just one object.
saveRDS(mydata, file = fname)

Read many .sas7bdat files via a loop in R

I am a noob in R and a experience a lot of trouble with the following:
I have to read in over 200 datasets and I want to do this automatically. I wrote some code that works perfectly for Rdata extensions but if I try it for SAS-files it always blocks...
path= "road"
# I make a list of all the different paths of all the files in my folder
File_pathnames <- list.files (path= Road, pattern = "*.sas7bdat", full.names=T)
# I create an empty list
list.data<-list()
# I try to run a loop to load all the SAS files:
for (i in 1: length(File_pathnames))
{
list.data[[i]] <- read_sas(File_pathnames[i])
}
Problem: it does not load the tables into my global environment (when I used the rdata files I used the load function and all the data appeared in the global environment). How Can I solve this?
many thanks!
Actually, your data ARE in the global environment, as elements of list.data (check list.data[[1]], list.data[[2]], ...)
The issue you have is linked to the fact that load loads an object in the environment using the name it had when it was saved. As an example
x <- 10
save(x, file='tmp')
rm(x)
x
load('tmp')
x
save x and reload it, while read_sas only load the data that you have to assign to a variable.
If you want to assign specifically each data set, you have to define a name for each of them and assign the data. Your loop would look like
for (i in 1: 1: length(File_pathnames))
{
namei <- paste0("name",i)
data <- read_sas(File_pathnames[i])
assign(namei, data)
}
and your data would be stored in "name1", "name2", ...
You should the assign each SAS files read in File_pathnames[i] as an object named FilenamesS[i]. Try
for (i in 1: length(File_pathnames))
{
data <- read_sas(File_pathnames[i])
assign (FilenamesS[i], data)
}

Why does "write.dat" (R) save data files within folders?

In order to conduct some analysis using a particular software, I am required to have separate ".dat" files for each participant, with each file named as the participant number, all saved in one directory.
I have tried to do this using the "write.dat" function in R (from the 'multiplex' package).
I have written a loop that outputs a ".dat" file for each participant in a dataset. I would like each file that is outputted to be named the participant number, and for them all to be stored in the same folder.
## Using write.dat
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
data_list[[i]] <- newdata %>%
filter(SJNB == participants_ID[i])
write.dat(data_list[[i]], paste0("/Filepath/Directory/", participants_ID[i], ".dat"))
}
## Using write_csv this works perfectly:
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
newdata %>%
filter(SJNB == participants_ID[i]) %>%
write_csv(paste0("/Filepath/Directory/", participants_ID[i], ".csv"), append = FALSE)
}
If I use the function "write_csv", this works perfectly (saving .csv files for each participant). However, if I use the function "write.dat" each participant file is saved inside a separate folder - the folder name is the participant number, and the file inside the folder is called "data_list[[i]]". In order to get all of the data_list files into the same directory, I then have to rename them which is time consuming.
I could theoretically output the files to .csv and then convert them to .dat, but I'm just intrigued to know if there's anything I could do differently to get the write.dat function to work the way I'm trying it :)
The documentation on write.dat is subminimal, but it would appear that you have confused a directory path with a file name . You have deliberately created a directory named "/Filepath/Directory/[participants_ID[i]].dat" and that's where each output file is placed. That you cannot assing a name to the x.dat file itself appears to be a defect in the package as supplied.
However, not all is lost. Inside your loop, replace your write.dat line with the following lines, or something similar (not tested):
edit
It occurs to me that there's a smoother solution, albeit using the dreaded eval:
Again inside the loop, (assuming participants_ID[i] is a char string)
eval(paste0(participants_ID[i],'<- dataList[[i]]'))
write.dat(participants_ID[i], "/Filepath/Directory/")
previous answer
write.dat(data_list[[i]], "/Filepath/Directory/")
thecommand = paste0('mv /Filepath/Directory/dataList[[i]] /Filepath/Directory/',[participants_ID[i]],'.dat',collapse="")
system(thecommand)

Reading a file in, then writing out a file with similar name in R

In R, I would like to read in data from a file, then do a bunch of stuff, then write out data to another file. I can do that. But I'd like to have the two files have similar names automatically.
e.g. if I create a file params1.R I can read it in with
source("c:\\personal\\consults\\ElwinWu\\params1.R")
then do a lot of stuff
then write out a resulting table with write.table and a filename similar to above, except with output1 instead of params1.
But I will be doing this with many different params files, and I can foresee making careless mistakes of not changing the output file to match the params file. Is there a way to automate this?
That is, set the number for output to match the number for params?
thanks
Peter
If your source file always contains "params" which you want to change to "output" then you can easilly do this with gsub:
source(file <- "c:\\personal\\consults\\ElwinWu\\params1.R")
### some stuff
write.table(youroutput, gsub("params","output",file) )
# Will write in "c:\\personal\\consults\\ElwinWu\\output1.R"
Edit:
Or to get .txt as filetype:
write.table(youroutput, gsub(".R",".txt",gsub("params","output",file)))
# Will output in c:\\personal\\consults\\ElwinWu\\output1.txt"
Edit2:
And a loop for 20 param files then would be:
n <- 20 # number of files
for (i in 1:n)
{
source(file <- paste("c:\\personal\\consults\\ElwinWu\\params",i,".R",sep=""))
### some stuff
write(youroutput, gsub(".R",".txt",gsub("params","output",file)))
}
If the idea is just to make sure that all the outputs go in the same directory as the input then try this:
source(file <- "c:\\personal\\consults\\ElwinWu\\params1.R")
old.dir <- setwd(dirname(file))
write.table(...whatever..., file = "output1.dat")
write.table(...whatever..., file = "output2.dat")
setwd(old.dir)
If you don't need to preserve the initial directory you can omit the last line.

How to save() with a particular variable name

I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)
Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)
My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.
You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)
you can use
save.image("myfile.RData")
This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.

Resources