Loading and removing datasets using paste in R - r

I have a large number of similarly named .R datasets. I am trying to load them in (which i can do successfully), do something to them and then remove them from the workspace all in one loop. I am struggling however to remove them because they are in the wrong class coming out of the paste command. While I know whats wrong, I have no idea how to correct my code, so suggestions are welcome. Here is some example code
for(i in 1:n){
load(paste("C",i,".R",sep=""))
# do stuff to dataset
rm(paste("C",i,sep="")) #this line is clearly wrong
}
Thanks for the help

The list argument of rm should do what you want. It takes character of variable names which are removed. So, something like this should work:
for (i in 1:n) {
loaded <- load(paste0("C", i, ".R"))
# do stuff to dataset
rm(list = loaded)
}
Note, that the load function returns a character with the names of the loaded object(s). So we can use that when removing the loaded object(s) again. The loaded object(s) with load does not nessesarily correspond to the filename.

Related

How to Read Data from .rda with read.table [duplicate]

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit
Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.
I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.
I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.
The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?
I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

R program does not output

I'm new to R and programming and taking a Coursera course. I've asked in their forums, but nobody can seem to provide an answer in the forums. To be clear, I'm trying to determine why this does not output.
When I first wrote the program, I was getting accurate outputs, but after I tried to upload, something went wonky. Rather than producing any output with [1], [2], etc. when I run the program from RStudio, I only get the the blue +++, but no errors and anything I change still does not produce an output.
I tried with a previous version of R, and reinstalled the most recent version 3.2.1 for Windows.
What I've done:
Set the correct working directory through RStudio
pol <- function(directory, pol, id = 1:332) {
files <- list.files("specdata", full.names = TRUE);
data <- data.frame();
for (i in ID) {
data <- rbind(data, read.csv(files_list[i]))
}
subset <- subset(data, ID %in% id);
polmean <- mean(subset[pol], na.rm = TRUE);
polmean("specdata", "sulfate", 1:10)
polmean("specdata", "nitrate", 70:72)
polmean("specdata", "nitrate", 23)
}
Can someone please provide some direction - debug help?
when I adjust the code the following errors tend to appear:
ID not found
Missing or unexpected } (although I've matched them all).
The updated code is as follow, if I'm understanding:
data <- data.frame();
files <- files[grepl(".csv",files)]
pollutantmean <- function(directory, pollutant, id = 1:332) {
pollutantmean <- mean(subset1[[pollutant]], na.rm = TRUE);
}
Looks like you haven't declared what ID is (I assume: a vector of numbers)?
Also, using 'subset' as a variable name while it's also a function, and pol as both a function name and the name of one of the arguments of that same function is just asking for trouble...
And I think there is a missing ")" in your for-loop.
EDIT
So the way I understand it now, you want to do a couple of things.
Read in a bunch of files, which you'll use multiple times without changing them.
Get some mean value out of those files, under different conditions.
Here's how I would do it.
Since you only want to read in the data once, you don't really need a function to do this (you can have one, but I think it's overkill for now). You correctly have code that makes a vector with the file names, and then loop over over them, rbinding them to each other. The problem is that this can become very slow. Check here. Make sure your directory only contains files that you want to read in, so no Rscripts or other stuff. A way (not 100% foolproof) to do this is using files <- files[grepl(".csv",files)], which makes sure you only have the csv's (grepl checks whether a certain string is a substring of another, and returns a boolean the [] then only keeps the elements for which a TRUE was returned).
Next, there is 'a thing you want to do multiple times', namely getting out mean values. This is where you'd use a function. Apparently you want to get the mean for different types of pollution, and you want this in restricted IDs.
Let's assume that 1. has given you a dataframe df with a column named Type for the type of pollution and a column called Id that somehow represents a sort of ID (substitute with the actual names in your script - if you don't have a column for ID, I'll edit the answer later on). Now you want a function
polmean <- function(type, id) {
# some code that returns the mean of a restricted version of df
}
This is all you need. You write the code that generates df, you then write a function that will get you what you want from that dataframe, and then you call it for the circumstances you want to use it in (the three polmean calls at the end of your original code, but now without the first argument as you no longer need this).
Ok - I finally solved this. Thanks for the help.
I didn't need to call "specdata" in line 2. the directory in line 1 referred to the correct directory.
My for/in statement needed to refer the the id in the first line not the ID in the dataset. The for/in statement doesn't appear to need to be indented (but it looks cleaner)
I did not need a subset
The last 3 lines for pollutantmean did not need to be a part of the program. These are used in the R console to call the results one by one.

change object name in loop inR

I have a for loop which reads in a different RData file in each iteration and that works well with just paste.
However, once the RData file is loaded there is an object loaded called
topy in the first instance of the loop, in the second it is ropy, then eopy and so on.
What I now tried is
vals<-c("topy","ropy","eopy")
paste("vals[i]")->r
to assign these different objects to r which is used further in the script and is overwirtten in every step of the loop. But that does not work. Topy and ropy and the rest ar matrixes.
When I load the RData file and then just type manually topy the matrix will be shown but if I do paste and then type r it will only show "topy". I also tried assign - did not work..any idea?
I'm not sure whether I got your point, is the following what you need?
p <-parse(text=vals[i])
r <- eval(p)

Pass variable name to a function and output inside of loop in R

There seems to be variations to this question, but none seem to address the situation of being in a loop AND naming and output file. How I thought this might work:
for(j in 1:3) {
for(k in 1:17){
extract_[j]km <- extract(RasterStack, SpatialPolygonsDataFrame_[j]km, layer=[k], nl=1, df=TRUE)
}
}
The extract function is from the raster package. I have already created a series of RasterStacks and SpatialPolygons and I want to pass these to a function ("extract") that has several parameters, some of which I wish to manipulate through the loop, and label the output accordingly. This is a breeze in BASH, but I can't figure this out in R.
Ultimately, I'd like to pass strings as well, but another post seems to show the way there.
EDIT: I originally posted the above function as being a single dataframe, when in fact, they are specified objects from the raster package (which are ultimately dataframes).
As Justin points out, working with a list is more inline with R's structure than messing up the workspace with lots of named variables. It quickly becomes challenging to work when you have a lot of objects in the workspace to "know" what's next.
Your way:
for(j in 1:3) {
assign(
paste("extract",j,"km",sep=""), # or paste0 to avoid need for sep=""
function(
get(
paste("data",j,"km",sep="")
)
)
)
}
Personally, I prefer working with lists, so below, I convert your data objects to a list and show you how to run a function on all elements of that list. Working in this way usually relegates the need to use strings in the "get" and "assign" fashion.
# just converting your variables to a list
data.list <- mget(grep("data",ls(),value=TRUE),envir=.GlobalEnv)
# then output results
result.list <- lapply(data.list,your_function)

How do you stop source from abbreviating function names?

I am trying to source multiple functions, that differ by a number in the name.
For example: func1, func2.
I tried using "func_1", and "func_2", as well as putting the number first, "1func" and "2func". No matter how I index the function names, the source function just reads in one function that it calls "func" - which is not what I want.
I have tried using for-loops and sapply:
for-loop:
func.list <- list.files(path="/some_path",pattern="some pattern",full.names=TRUE)
for(i in 1:length(func.list)){
source(func.list[i])
}
sapply:
sapply(func.list,FUN=source)
I am going to be writing multiple versions of a data correction function, and would really like to be able to index them - because giving a concise, but specific, name would be difficult, and not allow me to selectively source just the function files from their directory.
In my code, func.list gives the output (I have replaced the actual directory because of privacy/contractual issues):
[1] "mypath/1resp.correction.R"
[2] "mypath/2resp.correction.R"
Then when I source func.list with either the for-loop or sapply code (listed above), R only loads one function named resp.correction, with the code body from "2resp.correction.R".
The argument to source is a file name, not a function name. So you cannot be fancy here: you need to provide the exact filenames.
It sounds like your two files contain the definitions of a function with the same name (resp.correction) in both files, so yes, as you source one file after the other, the function is overwritten in your global environment.
You could, inside your loop, reassign the function to a different name:
func.list <- list.files(path="/some_path",pattern="some pattern",full.names=TRUE)
for(i in 1:length(func.list)) {
source(func.list[i], local = TRUE)
assign(paste0("resp.correction", i), resp.correction, envir = .GlobalEnv)
}

Resources