I have this example data and some example functions
other_data<-c(1,2,3) # data I have to have
fun<-function(a,b,c){
data<-c(a,b,c)
return(data)
} # first function
var_1<-runif(20,10,20) # variables
var_2<-runif(20,10,20)
var_3<-runif(20,10,20)
vars<-data.frame(var_1,var_2,var_3) # data frame of variables
subfun<-function(x){
res<-fun(vars[x,1],vars[x,2],vars[x,3])
return(res)
} # sub function of the first one to use more options and get them into list
final<-lapply(c(1:nrow(vars)),subfun) # this should be the final result I want to get
The problem is, that my real data is much more bigger and I have about 500 "data" (like in first function) which has to be reloaded every time because of different values of a,b,c. And it seems to slow down the function because of memory, i.e. environment.
I don't want to do it like rm(data) and repeat it 500x times in first function before the row return(data).
So my questions
Is there any straightforward way how to remove all objects which was loaded during the function, but only these objects in fun(a,b,c)? Because I need to DONT remove other_data.
Or more simply, is there straightforward way how to delete all objects like rm(ls(),instead of=c("other_data")?
If you only want to keep certain object you can use function keep from library gdata. http://www.inside-r.org/packages/cran/gdata/docs/keep
library('gdata')
var1 <- 1
var2 <- 2
ls()
[1] "var1" "var2"
keep(var1, sure = T)
ls()
[1] "var1"
Setting sure to True performs the removal. Otherwise keep will return names of objects that would have been removed.
Related
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
My objective: read data files from yahoo then perform calculations on each xts using lists to create the names of xts and the names of columns to assign results to.
Why? I want to perform the same calculations for a large number of xts datasets without having to retype separate lines to perform the same calculations on each dataset.
First, get the datasets for 2 ETFs:
library(quantmod)
# get ETF data sets for example
startDate = as.Date("2013-12-15") #Specify period of time we are interested in
endDate = as.Date("2013-12-31")
etfList <- c("IEF","SPY")
getSymbols(etfList, src = "yahoo", from = startDate, to = endDate)
To simplify coding, replace the ETF. prefix from yahoo data
colnames(IEF) <- gsub("SPY.","", colnames(SPY))
colnames(IEF) <- gsub("IEF.","", colnames(IEF))
head(IEF,2)
Open High Low Close Volume Adjusted
#2013-12-16 100.86 100.87 100.52 100.61 572400 98.36
#2013-12-17 100.60 100.93 100.60 100.93 694800 98.67
Creating new columns using the functions in quantmod is straightforward, e.g.,
SPY$logRtn <- periodReturn(Ad(SPY),period='daily',subset=NULL,type='log')
IEF$logRtn <- periodReturn(Ad(IEF),period='daily',subset=NULL,type='log')
head(IEF,2)
# Open High Low Close Volume Adjusted logRtn
#2013-12-16 100.86 100.87 100.52 100.61 572400 98.36 0.0000000
#2013-12-17 100.60 100.93 100.60 100.93 694800 98.67 0.0031467
but rather that creating a new statement to perform the calculation for each ETF, I want to use a list instead. Here's the general idea:
etfList
#[1] "IEF" "SPY"
etfColName = "logRtn"
for (etfName in etfList) {
newCol <- paste(etfName, etfColName, sep = "$"
newcol <- periodReturn(Ad(etfName),period='daily',subset=NULL,type='log')
}
Of course, using strings (obviously) doesn't work, because
typeof(newCol) # is [1] "character"
typeof(logRtn) # is [1] "double"
I've tried everything I can think of (at least twice) to coerce the character string etfName$etfColName into an object that I can assign calculations to.
I've looked at many variations that work with data.frames, e.g., mutate() from dplyr, but don't work on xts data files. I could convert datasets back/forth from xts to data.frames, but that's pretty kludgy (to say the least).
So, can anyone suggest an elegant and straightforward solution to this problem (i.e., in somewhat less than 25 lines of code)?
I shall be so grateful that, when I make enough to buy my own NFL team, you will always have a place of honor in the owner's box.
This type of task is a lot easier if you store your data in a new environment. Then you can use eapply to loop over all the objects in the environment and apply a function to them.
library(quantmod)
etfList <- c("IEF","SPY")
# new environment to store data
etfEnv <- new.env()
# use env arg to make getSymbols load the data to the new environment
getSymbols(etfList, from="2013-12-15", to="2013-12-31", env=etfEnv)
# function containing stuff you want to do to each instrument
etfTransform <- function(x, ...) {
# remove instrument name prefix from colnames
colnames(x) <- gsub(".*\\.", "", colnames(x))
# add return column
x$logRtn <- periodReturn(Ad(x), ...)
x
}
# use eapply to apply your function to each instrument
etfData <- eapply(etfEnv, etfTransform, period='daily', type='log')
(I didn't realize that you had posted a reproducible example.)
See if this is helpful:
etfColName = "logRtn"
for ( etfName in etfList ) {
newCol <- get(etfName)[ , etfColName]
assign(etfName, cbind( get(etfName),
periodReturn( Ad(get(etfName)),
period='daily',
subset=NULL,type='log')))
}
> names(SPY)
[1] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close"
[5] "SPY.Volume" "SPY.Adjusted" "logRtn" "daily.returns"
I'm not an quantmod user and it's only from the behavior I see that I believe the Ad function returns a named vector. (So I did not need to do any naming.)
R is not a macro language, which means you cannot just string together character values and expect them to get executed as though you had typed them at the command line. Theget and assign functions allow you to 'pull' and 'push' items from the data object environment on the basis of character values, but you should not use the $-function in conjunction with them.
I still do not see a connection between the creation of newCol and the actual new column that your code was attempting to create. They have different spellings so would have been different columns ... if I could have figured out what you were attempting.
a R novice is once again seeking for help.
General situation: I am currently creating a script, I got several data frames per experiment.
The experiments vary in time-steps of measurements and number of reactors, therefore I need
two dimensional flexibility of my script to "massage" data into the right shape for the desired tests, and draw the necessary data from multiple data frames.
Unfortunately I choose to use for loops to account for this, which I see now is bad practice in R,
but I have gotten to fare to change directions now.
The Problem: I try to achieve that one dimensional matrix are named by the objects names, inside a for loop, I need them to be in matrix format because of further functions I want to apply.
# Simple but non- flexible examples of what I want to do:
# creates two matrix objects
a1 <- matrix(c(1,2,3,4,5))
a2 <- matrix(c(1,2,3,4,5))
#names header of the objects name
colnames(a1) <- "a1"
colnames(a2) <- "a2"
this works, but I need it to work with in a for loop...
# here are the two flexible but non- working approaches of mine
# creates two matrix objects
a1 <- matrix(c(1,2,3,4,5))
a2 <- matrix(c(1,2,3,4,5))
# should name object according to progress in loop
for(i in 1:2)
{
assign(colnames(paste("a",i,sep="",collapse="")),do.call("c",list(paste("a",i,sep=""))))
}
which isn’t the proper use of assign and creates an error.
the second attempt doesn't create an error but doesn't work either, it creates empty objects
# creates two matrix objects
a1 <- matrix(c(1,2,3,4,5))
a2 <- matrix(c(1,2,3,4,5))
# should name object according to progress in loop
for(i in 1:2)
{
assign(paste("a",i,sep="", colapse=""),do.call("colnames",list(paste("a",i,sep="", colapse=""))))
}
My conclusion: I do not understand the proper way of combining assign, and colnames,
If anyone got suggestion how I could get it up and running, this would be awesome.
So fare I searched for: R combining assign and colnames inside for loop, R using assign and colnames, R naming data with for loops,...
but unfortunately didn’t manage to extrapolate solutions to my problem.
The following is a function that, by default, will take any objects in the parent environment that have a name that starts with a followed by numbers, check that they are one column matrices, and if they are, name the columns with the name of the object.
a1 <- matrix(c(1:5))
a2 <- matrix(c(1:5))
name_cols()
a1
# a1
# [1,] 1
# [2,] 2
# ...
a2
# a2
# [1,] 1
# [2,] 2
# ...
And here is the code:
name_cols <- function(pattern="^a[0-9]+", env=parent.frame()) {
lapply(
ls(pattern=pattern, envir=env),
function(x) {
var <- get(x, envir=env)
if(is.matrix(var) && identical(ncol(var), 1L)) {
colnames(var) <- x
assign(x, var, env)
} } )
invisible(NULL)
}
Note I chose specification by pattern, but you can easily change this to be by specifying the names of the variable (instead of using ls, just pass the names and lapply over those), or potentially even the objects (though you have to use substitute for that, and the wiseness of this becomes questionable).
More generally, if you have several related objects on which you will be performing related analysis (e.g. in this case modifying columns), you should really consider storing them in lists rather than at the top level. If you do this, then you can easily use the built in *pply functions to operate on all your objects at once. For example:
a.lst <- list(a1=matrix(1:5), a2=matrix(1:5))
a.lst <- lapply(names(a.lst), function(x) {colnames(a.lst[[x]]) <- x; a.lst[[x]]})
(edited to reflect help...I'm not doing great with formatting, but appreciate the feedback)
I'm a bit stuck on what I suspect is an easy enough problem. I have multiple different data sets that I have loaded into R, all of which have different numbers of observations, but all of which have two variables named "A1," "A2," and "A3". I want to create a new variable in each of the three data frames that contains the value held in "A1" if A3 contains a value greater than zero, and the value held in "A2" if A3 contains a value less than zero. Seems simple enough, right?
My attempt at this code uses this faux-data:
set.seed(1)
A1=seq(1,100,length=100)
A2=seq(-100,-1,length=100)
A3=runif(100,-1,1)
df1=cbind(A1,A2,A3)
A3=runif(100,-1,1)
df2=cbind(A1,A2,A3)
I'm about a thousand percent sure that R has some functionality for creating the same named variable in multiple data frames, but I have tried doing this with lapply:
mylist=list(df1,df2)
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0]
return(x)
})
But the newVar is not available for me once I leave the lapply loop. For example, if I ask for the mean of the new variable:
mean(df1$newVar)
[1] NA
Warning message:
In mean.default(df1$newVar) :
argument is not numeric or logical: returning NA
Any help would be appreciated.
Thank you.
Well first of all, df1 and df2 are not data.frames but matrices (the dollar syntax doesn't work on matrices).
In fact, if you do:
set.seed(1)
A1=seq(1,100,length=100)
A2=seq(-100,-1,length=100)
A3=runif(100,-1,1)
df1=as.data.frame(cbind(A1,A2,A3))
A3=runif(100,-1,1)
df2=as.data.frame(cbind(A1,A2,A3))
mylist=list(df1,df2)
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2
})
the code almost works but gives some warnings. In fact, there's still an error in the last line of the function called by lapply. If you change it like this, it works as expected:
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0] # you need to subset x$A2 otherwise it's too long
return(x) # better to state explicitly what's the return value
})
EDIT (as per comment):
as basically always happens in R, functions do not mutate existing objects but return brand new objects.
So, in this case df1 and df2 are still the same but lapply returns a list with the expected 2 new data.frames i.e. :
resultList <- lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0]
return(x)
})
newDf1 <- resultList[[1]]
newDf2 <- resultList[[2]]
I try to calculate the column means for diffrent groups in R. there exist several methods to assign groups and so two columns where created that contain diffrent groupings.
# create a test df
df.abcd.2<-data.frame(Grouping1=c("a","f","a","d","d","f","a"),Grouping2=c("y","y","z","z","x","x","q"),Var1=sample(1:7),Var2=sample(1:7),Var3=rnorm(1:7))
df.abcd.2
Now I created a loop with assign, lapply, split and colMeans to get my results and store the in diffrent dfs. The loop works fine.
#Loop to create the colmeans and store them in dataframes
for (i in 1:2){
nam <- paste("RRRRRR",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df.abcd.2[,3:5], df.abcd.2[,i]), colMeans)
)
)
}
So now i would like to create a function to apply this method on diffrent dataframes. My attemp looked like this:
# 1. function to calculate colMeans for diffrent groups
# df= desired datatframe,
# a=starting column: beginning of the columns that contain the groups, b= end of columns that contain the groups
# c=startinc column: beginning of columns to be analized, d=end of columns do be analized
function.split.colMeans<-function(df,a,b,c,d)
{for (i in a:b){
nam <- paste("OOOOO",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df[,c:d], df[,i]), colMeans)
)
)
}
}
#test the function
function.split.colMeans(df.abcd.2,1,2,3,5)
So when I test this function I get neither an error message nor results... Can anyone help me out, please?
It's working perfectly. Read the help for assign. Learn about frames and environments.
In other words, its creating the variables inside your function, but they don't leak out into the environment you see when you do ls() at the command line. If you put print(ls()) inside your functions loop you'll see them, but when the function ends, they disappear.
Normally, the only way functions interact with their calling environment is by their return value. Any other method is entering a whole world of pain.
DONT use assign to create things with sequential or informative names. Ever. Unless you know what you are doing, which you don't... Stick them in lists, then you can index the parts for looping and so on.