I am trying to add columns to several dataframes. I am trying to create a function that will add the columns and then I would want to use that function with lapply over a list of object. The function is currently just adding empty columns to the data frame. But, if I solve the problem below, I would like to add to it to automatically populate the new columns (and still keeping the initial name of the object).
This is the code I have so far:
AAA_Metadata <- data.frame(AAA_Code=character(),AAA_REV4=character(),AAA_CONCEPT=character(),AAA_Unit=character(),AAA_Date=character(),AAA_Vintage=character())
add_empty_metadata <- function(x) {
temp_dataframe <- setNames(data.frame(matrix(ncol=length(AAA_Metadata),nrow=nrow(x))),as.list(colnames(AAA_Metadata)))
x <- cbind(temp_dataframe,x)
}
However when I run this
a <- data.frame(matrix(ncol=6,nrow=100))
add_empty_metadata(a)
and look at the Global Environment
object "a" still has 6 columns instead of 12.
I understand that I am actually working on a copy of "a" within the function (based on the other topics I checked, e.g. Update data frame via function doesn't work). So I tried:
x <<- cbind(temp_dataframe,x)
and
x <- cbind(temp_dataframe,x)
assign('x',x, envir=.GlobalEnv)
But none of those work. I want to have the new a in the Global Environment for future reference and keep the name 'a' unchanged. Any idea what I am doing wrong here?
Is this what you're looking for:
addCol <- function(x, newColNames){
for(i in newColNames){
x[,i] <- NA
}
return(x)
}
a <- data.frame(matrix(ncol=6,nrow=100));dim(a)
a <- addCol(a, newColNames = names(WIS_Metadata));dim(a)
Amazing source for this kind of stuff is Advanced R by Hadley Wickham with a website here.
R objects are immutable - they don't change - just get destroyed and rebuilt with the same name. a is never changed - it is used as an input to the function and unless the resulting object inside the function is reassigned via return, the version inside the function (this is a separate environment) is binned when the function is complete.
Related
Suppose that after some calculations I obtain a (floating) number, stored in the variable a, for example
a <- sqrt(2)
Now, I want to define a function that uses that parameter, for example:
myfunction <- function(x){x-a}
How can I save myfunction into an RDS file, in such a way that it can be loaded and used in a new R session, where the variable a is not defined?
Or from a different perspective: How to define the function, but substituting for a its actual numerical value in the function definition? That is, I'd like R to actually define the function
myfunction <- function(x){x - 2.1415.....}
where the actual value of a has been substituted in the definition.
Simply trying saveRDS(myfunction, 'myfunction.rds') does not work: if I start a new R session and do
myfunction <- readRDS('myfunction.rds')
myfunction(1)
then R complains that a is not defined.
Please note that I'm here giving a minimal working example of the problem. Obviously, in the case above I could just define myfunction <- function(x){x-sqrt(2)} and save that in an RDS file; it could be loaded in a new session and used without problems.
However, in my case I have many parameters like a, not just one, obtained from long calculations. I'm not interested in saving their values, I only want to save the function that uses them in its definition, and be able to use that function in a new R session.
An RDS file won't save the global environment, but if you create a closure, it will preserve the values in that environment. One such way to do what would be
myfunction <- {function(a) function(x){x-a}}(a)
And then you can call this function like a regular function
myfunction(1)
I have a series of tables and graphs that are produced from a list in R. I would like to create a pdf for each iteration of the list. I have tried simply using pdf() within the function but I get the error that too many graphic devices are open. How can I do this (and name the files according to the list element name?
So far I have tried:
ReportPDF<-function(x){
pdf(paste(name(x),"~\\Myfile.pdf")
tb<-table(x$acolumn)
print(fb)
}
lapply(mylist,ReportPDF)
I cant quite work out how to attach the name of the list element to the filename and I'm not even sure this is the best way to create a pdf from lapply
Can you clean some of this up?
Please give a more specific example of the object you're passing to ReportPDF(), I would expect a plot object, rather than what appears to be a data frame that you are selecting a column from.
The function example has some errors too, did you mean this?
ReportPDF<-function(x){
pdf(paste(names(x),"Myfile.pdf"))
tb<-table(x$acolumn)
print(tb)
dev.off()
}
lapply(mylist,ReportPDF)
I believe I've done something similar before and can update this answer when I get the other information.
Here's an update making some assumptions about your objects. It uses a for loop as lmo suggests, but I think a more elegant method must exist. I'm using the for loop because lapply passes the object within each element of the list, with no reference to name of the element in the list -- which is what you need to name the files individually. Note the difference between calling mylist[i] and mylist[[i]], which is part of what's breaking the code in your example. In your code, names(x) will get the names of the columns within x, rather than the name of x as it is inside of mylist, which is what you want.
x <- data.frame(acolumn = rnorm(10))
q<- data.frame(acolumn = rnorm(10))
mylist <- list(a = x,b = q)
for(i in seq_along(mylist) ){
filename <- paste(names(mylist[i]),'-myFile.pdf', sep = "")
pdf(filename)
plot(myList[[i]]$acolumn)
dev.off()
}
I am creating a user-defined function which helps me to create 20+ data tables. However, these data tables do not get created in the global environment, unlike when not using user-defined function. See the sample code below.
library(data.table)
samplefunction <- function(f) {
df=as.data.table(list(V1=c('a','a','b','c'),V2=c(1,2,3,4)))
df.a=df[V1==f]
df.a
}
samplefunction('a')
I only know whether the function works provided the line df.a outputs the data.table in the console window, because df and df.a data tables do not get created in the global environment. I will not be able to view my data tables in this way. Is it possible to create the tables in the global environment?
Use the <<- operator instead of = to assign df.a as a global variable in the function:
samplefunction <- function(f) {
df=as.data.table(list(V1=c('a','a','b','c'),V2=c(1,2,3,4)))
df.a <<- df[V1==f]
}
samplefunction('a')
df.a
See https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html
Or you could just have the function return a list of the data tables and use standard assignment.
I think that #Nat's second answer is the better choice:
samplefunction <- function(f) {
df=as.data.table(list(V1=c('a','a','b','c'),V2=c(1,2,3,4)))
df.a=df[V1==f]
return(list("df"=df, "df.a"=df.a)
}
Global assignment can overwrite objects in the global environment, which is usually bad. With the modification of the function here, you can then use it as follows:
myresults <- samplefunction(input)
# show df
myresults[["df"]]
# show df.a
myresults[["df.a"]]
I'm trying to write a function in R that drops columns from a data frame and returns the new data with a name specified as an argument of the function:
drop <- function(my.data,col,new.data) {
new.data <<- my.data[,-col]
return(new.data)
}
So in the above example, I want a new data frame to exist after the function is called that is named whatever the user inputs as the third argument.
When I call the function the correct data frame is returned, but then if I then try to use the new data frame in the global environment I get object not found. I thought by using the <<- operator I was defining new.data globally.
Can someone help me understand what's going on and if there is a way to accomplish this?
I found this and this that seemed related, but neither quite answered my question.
Use the assign() function.
assign("new.data", my.data[,-col], envir = .GlobalEnv)
The first argument should be a string. In this case, the resultant global variable will be named "new.data". If new.data is the name itself, drop the quotes from the function call.
<<- does not always assign to the global environment.
In general, however, it is better to return things from a function than set global variables from inside a function. The latter is a lot harder to debug.
One reason to need this is when working a great deal with the RStudio console to perform lots of text mining. For example, if you have a large corpus and you want to break it up into sub-corpi based on themes, performing the processing as a function and returning a cleaned corpus can be much faster. An example is below:
processText <- function(inputText, corpName){
outputName <- Corpus(VectorSource(inputText))
outputName <- tm_map(outputName,PlainTextDocument)
outputName <- tm_map(outputName, removeWords, stopwords("english"))
outputName <- tm_map(outputName, removePunctuation)
outputName <- tm_map(outputName, removeNumbers)
outputName <- tm_map(outputName, stripWhitespace)
assign(corpName, outputName, envir = .GlobalEnv)
return(corpName)
}
In the case above, I enter the column from the data frame as the inputText and the desired output corpus as corpName. This allows the simple task of the following to process a bunch of text data:
processText(retail$Essay,"retailCorp")
Then the new corpus "retailCorp" shows up in the global environment for further work such as plotting word clouds, etc. Also, I can send lists through the function and get lots of corpi back.
There seems to be variations to this question, but none seem to address the situation of being in a loop AND naming and output file. How I thought this might work:
for(j in 1:3) {
for(k in 1:17){
extract_[j]km <- extract(RasterStack, SpatialPolygonsDataFrame_[j]km, layer=[k], nl=1, df=TRUE)
}
}
The extract function is from the raster package. I have already created a series of RasterStacks and SpatialPolygons and I want to pass these to a function ("extract") that has several parameters, some of which I wish to manipulate through the loop, and label the output accordingly. This is a breeze in BASH, but I can't figure this out in R.
Ultimately, I'd like to pass strings as well, but another post seems to show the way there.
EDIT: I originally posted the above function as being a single dataframe, when in fact, they are specified objects from the raster package (which are ultimately dataframes).
As Justin points out, working with a list is more inline with R's structure than messing up the workspace with lots of named variables. It quickly becomes challenging to work when you have a lot of objects in the workspace to "know" what's next.
Your way:
for(j in 1:3) {
assign(
paste("extract",j,"km",sep=""), # or paste0 to avoid need for sep=""
function(
get(
paste("data",j,"km",sep="")
)
)
)
}
Personally, I prefer working with lists, so below, I convert your data objects to a list and show you how to run a function on all elements of that list. Working in this way usually relegates the need to use strings in the "get" and "assign" fashion.
# just converting your variables to a list
data.list <- mget(grep("data",ls(),value=TRUE),envir=.GlobalEnv)
# then output results
result.list <- lapply(data.list,your_function)