I am reading some files into R with a "for loop" and naming them based on the looping index. I have used the assign fucntion to dynamically name one data frame for each file being read as follows:
for (i in 1:10){
assign(paste0("df",i),read_csv(paste0("file",i,".csv")))
}
This works fine. However, while inside the for loop, I also want to do some other operations to the data frame, but I dont know how to refer to it given that the name is changing as the loop progresses.
I want to add to the loop something along the lines of:
names(paste0("df",i)) <- make.names(names(paste0("df",i)))
I want to execute the above command but I don't know how to properly pass the changing data frame name to the names() function.
Thanks in advance,
MR
Related
Sorry if this sounds a bit messy, I'm new to using R. I have a data frame named "AB13" ( a postcode) and I am using a list of all postcodes which I imported as a .csv file. Is there any way I can print one of the columns and run it through the console automatically. Say print the 10th element of the list, which is AB13. Then it will display AB13 ( the data frame)?
I have tried using print, cat and multiple other print functions but I haven't been able to get anywhere.
You can do
get(your_list[[10]])
I'm a new student at R. I have a directory containing EXCEL files and I need to make a summary in a data frame with complete cases in each file. How can I do this. I tried the following code buwt doesn't work. Appreciate your support
Always begin with the steps required. You will need to do the following:
Read in your data
Clean up your data
Since you do not have any code shown, I will provide you with pseudo code.
library(readxl)
df <- read_xls(path, other options)
df <- complete.cases(df)
You'll want to do that for all of your files. You can use lapply once you are more advanced, and loop over your list.files() list of excel files.
It should be very simple, but for now cannot figure it out. Say I create a generic dataframe with some data in a loop, let's call it df.
Now I want to assign a specific name to it and want to save it to specific destination. I generate two character variables - filename and file_destination and try to use the following in the script code:
assign(filename, df)
save(filename, file = file_destination)
Of course it save just a string with a name in the file and not the actual data.
How do i save the dataframe created via assign(filename,df)?
Try save(list=filename,file=file_destination). Also, use better names for your variables. filename for an object which is not a file name is very odd.
Put this as answer, to ensure other people find it easily.
I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.
I am writing an R script (not a function, just a collection of commands) that will read a csv file into a data frame. The CSV file is large and I don't want to read it every time I am running the script, if it has been already read. This is how I am checking if the variable exists:
if (!exists("df")) {
df <- read_csv(file = "./some_file.csv")
}
However, every time I run the script no matter whether df exists or not, the read_csv function runs.
what am I missing here? Should I specify where df data frame should be searched?
Edit: Here is a bit of context to what I am trying to achieve. Usually, when working I work interactively in R or Rstudio. If I am readying a file, I read it and then the data is in the GlobalEnvinronment and I play with my data. I was trying to put all my work in a script and add to it step by step. In the beginning of the script, I read this CSV file which is about 11MB and then start manipulating the data. However, as I add new lines to my script and I want to test them, I don't want to read the CSV file again. It is already read and the data frame is available in the Global environment. That was the the reason I put the call to read_csv() function inside an if statement.
The problem is despite variable existing in global environment, every time I run the script, the read_csv() function is run, as if the if statement is ignored.
df is actually a function in the stats package which normally exists :-)
So basically, just choose a better variable name!
Can you please use the "where" and "environment" argument and then try. These argument basically drive the exists command to look at this variable at what place/environment.
exists(x, where = -1, envir = , frame, mode = "any",
inherits = TRUE)
Here x is variable name