I'm trying to write a function in R that drops columns from a data frame and returns the new data with a name specified as an argument of the function:
drop <- function(my.data,col,new.data) {
new.data <<- my.data[,-col]
return(new.data)
}
So in the above example, I want a new data frame to exist after the function is called that is named whatever the user inputs as the third argument.
When I call the function the correct data frame is returned, but then if I then try to use the new data frame in the global environment I get object not found. I thought by using the <<- operator I was defining new.data globally.
Can someone help me understand what's going on and if there is a way to accomplish this?
I found this and this that seemed related, but neither quite answered my question.
Use the assign() function.
assign("new.data", my.data[,-col], envir = .GlobalEnv)
The first argument should be a string. In this case, the resultant global variable will be named "new.data". If new.data is the name itself, drop the quotes from the function call.
<<- does not always assign to the global environment.
In general, however, it is better to return things from a function than set global variables from inside a function. The latter is a lot harder to debug.
One reason to need this is when working a great deal with the RStudio console to perform lots of text mining. For example, if you have a large corpus and you want to break it up into sub-corpi based on themes, performing the processing as a function and returning a cleaned corpus can be much faster. An example is below:
processText <- function(inputText, corpName){
outputName <- Corpus(VectorSource(inputText))
outputName <- tm_map(outputName,PlainTextDocument)
outputName <- tm_map(outputName, removeWords, stopwords("english"))
outputName <- tm_map(outputName, removePunctuation)
outputName <- tm_map(outputName, removeNumbers)
outputName <- tm_map(outputName, stripWhitespace)
assign(corpName, outputName, envir = .GlobalEnv)
return(corpName)
}
In the case above, I enter the column from the data frame as the inputText and the desired output corpus as corpName. This allows the simple task of the following to process a bunch of text data:
processText(retail$Essay,"retailCorp")
Then the new corpus "retailCorp" shows up in the global environment for further work such as plotting word clouds, etc. Also, I can send lists through the function and get lots of corpi back.
Related
I'm trying to rename multiple variables which show up in a few different files I'm working with. In this example I'll just provide one row for the rule. Here's the code:
renaming <- function(dataset){
names(dataset)[names(dataset)=="Lookup Code...3"]<-"Recipient Code"
.
.
.
}
data <- read_excel("File.xlsx",sheet = "Sheet name")
renaming(data)
In the above example I am passing through one dataset. At this point the variable is not being renamed. I'm only new to making functions in R so maybe my syntax is off somewhere.
Once that problem is resolved I would like to then be able to pass a list into this function. I would like to do this by using a for loop which would look something like this:
dataset_list <- c("Data","Data_1",...)
for(i in 1:length(dataset_list)){
renaming(dataset_list[i])
}
I made an attempt at a for loop similar to this but the dataset doesn't seem to get picked up in order to be passed into the function.
I appreciate the help and if you need clarification on this please ask.
You can try -
renaming <- function(dataset){
names(dataset)[names(dataset)=="Lookup Code...3"]<-"Recipient Code"
#Some other code
#Some more code
#Return the changed dataset
dataset
}
#Get all the filenames in a vector
filenames <- list.files(pattern = '.xlsx')
#apply the function to each file
list_data <- lapply(filenames, function(x) {
renaming(readxl::read_excel(x))
})
list_data would have list of dataframes where each dataframe should have the changed column name and other code applied as written in renaming function. You can access individual dataframes using list_data[[1]], list_data[[2]] etc.
I am trying to add columns to several dataframes. I am trying to create a function that will add the columns and then I would want to use that function with lapply over a list of object. The function is currently just adding empty columns to the data frame. But, if I solve the problem below, I would like to add to it to automatically populate the new columns (and still keeping the initial name of the object).
This is the code I have so far:
AAA_Metadata <- data.frame(AAA_Code=character(),AAA_REV4=character(),AAA_CONCEPT=character(),AAA_Unit=character(),AAA_Date=character(),AAA_Vintage=character())
add_empty_metadata <- function(x) {
temp_dataframe <- setNames(data.frame(matrix(ncol=length(AAA_Metadata),nrow=nrow(x))),as.list(colnames(AAA_Metadata)))
x <- cbind(temp_dataframe,x)
}
However when I run this
a <- data.frame(matrix(ncol=6,nrow=100))
add_empty_metadata(a)
and look at the Global Environment
object "a" still has 6 columns instead of 12.
I understand that I am actually working on a copy of "a" within the function (based on the other topics I checked, e.g. Update data frame via function doesn't work). So I tried:
x <<- cbind(temp_dataframe,x)
and
x <- cbind(temp_dataframe,x)
assign('x',x, envir=.GlobalEnv)
But none of those work. I want to have the new a in the Global Environment for future reference and keep the name 'a' unchanged. Any idea what I am doing wrong here?
Is this what you're looking for:
addCol <- function(x, newColNames){
for(i in newColNames){
x[,i] <- NA
}
return(x)
}
a <- data.frame(matrix(ncol=6,nrow=100));dim(a)
a <- addCol(a, newColNames = names(WIS_Metadata));dim(a)
Amazing source for this kind of stuff is Advanced R by Hadley Wickham with a website here.
R objects are immutable - they don't change - just get destroyed and rebuilt with the same name. a is never changed - it is used as an input to the function and unless the resulting object inside the function is reassigned via return, the version inside the function (this is a separate environment) is binned when the function is complete.
I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.
I have created a list of objects in my work environment
data <- c("variable1", "variable2", "variable3")
i would like to save the files to different directories with the variable name as the directory... so i did this to give me a list of file names to pass to the save function via lapply..
paste0(data,"/",data,".rda")
lapply(data,FUN=save,file = paste0(data,"/",data,".rda"))
i get the error
Error in FUN(X[[i]], ...) : object ‘X[[i]]’ not found
i'm not sure what i'm doing wrong here..
Do you have a list of objects, or a list of names of objects? You say you have the former, but the code you give is for the latter.
Also, if you only have one object per file, then it's better to use the saveRDS function (and loadRDS to load it).
lapply(data, function(x) saveRDS(get(x), paste0(x, "/", x, ".rds")))
If you have to use save:
lapply(data, function(x) save(list=x, file=paste0(x, "/", x, ".rds")))
Several things going on here.
First, you need not use lapply when you don't care about the return value of the function called at each iteration. It offers nothing in this case.
Second, and more importantly, what you are doing is writing objects to files with names derived from their variable names in R. That's an anti-pattern.
Instead, create a list of the objects, and use for for the work. We need to use saveRDS for this (thanks Hong Ooi) as l[[n]] is also not the name of an object in the environment.
l <- list(variable1 = variable1, variable2 = variable2, variable3=variable3)
for (n in names(l)) {
fname = paste0(n, '/', n, '.rda')
saveRDS(file=fname, l[[n]])
}
It would be better to just save the entire list, but then all the data would be in one file in one directory.
As for what's actually wrong with your code:
You pass the same value for file to all invocations of save, and you don't intend to do so. This value is a vector, but what you want is that each iteration gets one element from this vector.
The way lapply computes the value to pass to the function confuses save. In particular, it does this:
names <- as.character(substitute(list(...)))[-1L]
That results in something like the following, which is not the name of an object in the environment.
c("variable1", "variable2", "variable3")[[1]]
There seems to be variations to this question, but none seem to address the situation of being in a loop AND naming and output file. How I thought this might work:
for(j in 1:3) {
for(k in 1:17){
extract_[j]km <- extract(RasterStack, SpatialPolygonsDataFrame_[j]km, layer=[k], nl=1, df=TRUE)
}
}
The extract function is from the raster package. I have already created a series of RasterStacks and SpatialPolygons and I want to pass these to a function ("extract") that has several parameters, some of which I wish to manipulate through the loop, and label the output accordingly. This is a breeze in BASH, but I can't figure this out in R.
Ultimately, I'd like to pass strings as well, but another post seems to show the way there.
EDIT: I originally posted the above function as being a single dataframe, when in fact, they are specified objects from the raster package (which are ultimately dataframes).
As Justin points out, working with a list is more inline with R's structure than messing up the workspace with lots of named variables. It quickly becomes challenging to work when you have a lot of objects in the workspace to "know" what's next.
Your way:
for(j in 1:3) {
assign(
paste("extract",j,"km",sep=""), # or paste0 to avoid need for sep=""
function(
get(
paste("data",j,"km",sep="")
)
)
)
}
Personally, I prefer working with lists, so below, I convert your data objects to a list and show you how to run a function on all elements of that list. Working in this way usually relegates the need to use strings in the "get" and "assign" fashion.
# just converting your variables to a list
data.list <- mget(grep("data",ls(),value=TRUE),envir=.GlobalEnv)
# then output results
result.list <- lapply(data.list,your_function)