using lm(my_formula) inside [.data.table's j - r

I have gotten in the habit of accessing data.table columns in j even when I do not need to:
require(data.table)
set.seed(1); n = 10
DT <- data.table(x=rnorm(n),y=rnorm(n))
frm <- formula(x~y)
DT[,lm(x~y)] # 1 works
DT[,lm(frm)] # 2 fails
lm(frm,data=DT) # 3 what I'll do instead
I expected # 2 to work, since lm should search for variables in DT and then in the global environment... Is there an elegant way to get something like # 2 to work?
In this case, I'm using lm, which takes a "data" argument, so # 3 works just fine.
EDIT. Note that this works:
x1 <- DT$x
y1 <- DT$y
frm1 <- formula(x1~y1)
lm(frm1)
and this, too:
rm(x1,y1)
bah <- function(){
x1 <- DT$x
y1 <- DT$y
frm1 <- formula(x1~y1)
lm(frm1)
}
bah()
EDIT2. However, this fails, illustrating #eddi's answer
frm1 <- formula(x1~y1)
bah1 <- function(){
x1 <- DT$x
y1 <- DT$y
lm(frm1)
}
bah1()

The way lm works it looks for the variables used in the environment of the formula supplied. Since you create your formula in the global environment, it's not going to look in the j-expression environment, so the only way to make the exact expression lm(frm) work would be to add the appropriate variables to the correct environment:
DT[, {assign('x', x, environment(frm));
assign('y', y, environment(frm));
lm(frm)}]
Now obviously this is not a very good solution, and both Arun's and Josh's suggestions are much better and I'm just putting it here for the understanding of the problem at hand.
edit Another (possibly more perverted, and quite fragile) way would be to change the environment of the formula at hand (I do it permanently here, but you could revert it back, or copy it and then do it):
DT[, {setattr(frm, '.Environment', get('SDenv', parent.frame(2))); lm(frm)}]
Btw a funny thing is happening here - whenever you use get in j-expression, all of the variables get constructed (so don't use it if you can avoid it), and this is why I don't need to also use x and y in some way for data.table to know that those variables are needed.

Related

Making defining objects easier

so, I'm a noobie in R and want to make my experience with it as straightforward as possible. I work with multi-response datasets (like 50+ responses) and would like to avoid manually typing in x1 = dataset$x1 / x2 = dataset$x2 / ect....
Is there a script to make every column header an object?
Cheers!
There are two common approaches (these have also been mentioned in the comments):
You could attach() the dataset, and detach() when done.
You could also use with().
Suppose you have a data.frame named dataset, and in it are $x1 and $x2.
An example using attach() would be:
attach(dataset)
newvar <- x1 + x2
newvar2 <- x1 - x2
detach(dataset)
And an example using with():
with(dataset, {
newvar <- x1 + x2
newvar2 <- x1 - x2
})
I hope I answered your question, if not, feel free to rephrase / edit.
For further examples, take a look at the example in ?attach(), and the boxplot example in ?with().
Here is a reproducible suggestion using only base R functions:
# mtcars is dummy dataset to work with
list_objects = as.list(mtcars) # make a list with all your columns
# note that you can do lapply(list_object, function) at this stage...
#but if you really want your objects to be in your global environment here is the trick :
list2env(list_objects, globalenv()) # extract the objects of the previous list in your environment

how to emulate parameters passed by reference [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Is there a way?
NB: the question is not whether it is right, good or sensible to do such a thing.
The question is if there is a way, so if your answer would be
"why would you want to do that?" "R uses functions what you want was once called procedure and good R usage/style does not ...", "could you explain better... provide some code" do NOT answer.
I did a quick try, that did not work eventually worked, using environments, more or less:
function(mydf) {
varName <- deparse(substitute(mydf))
...
assign(varName,mydf,envir=parent.frame(n = 1))
}
1) Wrap the function body in eval.parent(substitute({...})) like this:
f <- function(x) eval.parent(substitute({
x <- x + 1
}))
mydf <- data.frame(z = 1)
f(mydf)
mydf
## z
## 1 2
Also see the defmacro function in gtools and the wrapr package.
2) An alternative might be to use a replacement function:
"incr<-" <- function(x, value) {
x + value
}
mydf <- data.frame(z = 1)
incr(mydf) <- 1
mydf
## z
## 1 2
3) or just overwrite the input:
f2 <- function(x) x + 1
mydf <- data.frame(z = 1)
mydf <- f2(mydf)
mydf
## z
## 1 2
If the problem is that there are multiple outputs then use list in the gsubfn package. This is used on the left hand side of an assignment with square brackets as shown. See help(list, gsubfn)
library(gsubfn)
f3 <- function(x, y) list(x + 1, y + 2)
mydf <- mydf2 <- data.frame(z = 1)
list[mydf, mydf2] <- f3(mydf, mydf2)
mydf
## z
## 1 2
mydf2
## z
## 1 3
At least for my specific/limited needs I found a solution
myVar = 11
myF <- function(x) {
varName <- deparse(substitute(x))
# print(paste("var name is", varName))
x = 99
assign(varName,x,envir=parent.frame(n = 1))
NA # sorry this is not a function
# in real life sometimes you also need procedures
}
myF(myVar)
print(myVar)
# [1] 99
I think there is no way to emulate call-by-reference. However, several tricks can be used from case to case:
globals: It is, of course, possible to have a global variable instead of the parameter. This can be written from within a function using <<- instead of = or <-. In this way, many cases of needing call-by-reference vanish.
However, this is not compatible with parallelization and also not compatible with recursion.
When you need recursion, you can do very much the same and have a global stack. Before the recursive call, you have to append to this stack and as the first line of your function, you can get the index (similar to a stack pointer in CPUs) in order to write to the global stack.
Both approaches are not encouraged and should be used as a last resort or for education. If you really can't avoid call-by-reference, go to C++ with Rcpp and write a C++-function that does your heavy loading. If needed, it can actually call R functions. Look at some Rcpp tutorials, most of them cover this case...

Function to rename values in r doesn't work [duplicate]

How do I modify an argument being passed to a function in R? In C++ this would be pass by reference.
g=4
abc <- function(x) {x<-5}
abc(g)
I would like g to be set to 5.
There are ways as #Dason showed, but really - you shouldn't!
The whole paradigm of R is to "pass by value". #Rory just posted the normal way to handle it - just return the modified value...
Environments are typically the only objects that can be passed by reference in R.
But lately new objects called reference classes have been added to R (they use environments). They can modify their values (but in a controlled way). You might want to look into using them if you really feel the need...
There has got to be a better way to do this but...
abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 5")))}
g <- 4
abc(g)
g
gives the output
[1] 5
I have a solution similar to #Dason's, and I am curious if there are good reasons not to use this or if there are important pitfalls I should be aware of:
changeMe = function(x){
assign(deparse(substitute(x)), "changed", env=.GlobalEnv)
}
I think that #Dason's method is the only way to do it theoretically, but practically I think R's way already does it.
For example, when you do the following:
y <- c(1,2)
x <- y
x is really just a pointer to a the value c(1,2). Similarly, when you do
abc <- function(x) {x <- 5; x}
g <- abc(g)
It is not that you are spending time copying g to the function and then copying the result back into g. I think what R does with the code
g <- abc(g)
is:
The right side is looked at first. An environment for the function abc is set up.
A pointer is created in that environment called x.
x points to the same value that g points to.
Then x points to 5
The function returns the pointer x
g now points to the same value that x pointed to at the time of return.
Thus, it is not that there is a whole bunch of unnecessary copying of large options.
I hope that someone can confirm/correct this.
Am I missing something as to why you can't just do this?
g <- abc(g)

Using functions to change variable names from upper to lower

I'm working with a bunch of SAS datasets and I prefer the variable names to all be lowercase, using read.sas7bdat, but I do it so often I wanted to write a function. This method works fine,
df <- data.frame(ALLIGATOR=1:4, BLUEBIRD=rnorm(4))
names(file1) <- tolower(names(file1))
but when I try to put it into a function it doesn't assign.
lower <- function (df) {names(df) <- tolower(names(df))}
lower(file1)
I know that there is some larger concept that I'm missing, that is blocking me. It doesn't seem to do anything.
Arguments in R are passed by copy. You have to do:
lower <- function (df) {
names(df) <- tolower(names(df))
df
}
file1 <- lower(file1)
Although I don't see why you would do this rather than simply : names(df) <- tolower(names(df)), I think you should do:
lower <- function (x) {tolower(names(x))}
names(df) <- lower(df)
Here is an answer that I don't recommend using anywhere other than the globalenvironment but it does provide you some convenient shorthand. Basically we take care of the assignment inside the function, overwriting the object passed to it. Short-hand for you, but please be careful about how you use it:
tl <- function(x){
ass <- all.names(match.call()[-1])
assign( ass , setNames( x , tolower(names(x))) , env = sys.frame(sys.parent()) )
}
# This will 'overwrite' the names of df...
tl(df)
# Check what df now looks like...
df
alligator bluebird
1 1 0.2850386
2 2 -0.9570909
3 3 -1.3048907
4 4 -0.9077282

modify variable within R function

How do I modify an argument being passed to a function in R? In C++ this would be pass by reference.
g=4
abc <- function(x) {x<-5}
abc(g)
I would like g to be set to 5.
There are ways as #Dason showed, but really - you shouldn't!
The whole paradigm of R is to "pass by value". #Rory just posted the normal way to handle it - just return the modified value...
Environments are typically the only objects that can be passed by reference in R.
But lately new objects called reference classes have been added to R (they use environments). They can modify their values (but in a controlled way). You might want to look into using them if you really feel the need...
There has got to be a better way to do this but...
abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 5")))}
g <- 4
abc(g)
g
gives the output
[1] 5
I have a solution similar to #Dason's, and I am curious if there are good reasons not to use this or if there are important pitfalls I should be aware of:
changeMe = function(x){
assign(deparse(substitute(x)), "changed", env=.GlobalEnv)
}
I think that #Dason's method is the only way to do it theoretically, but practically I think R's way already does it.
For example, when you do the following:
y <- c(1,2)
x <- y
x is really just a pointer to a the value c(1,2). Similarly, when you do
abc <- function(x) {x <- 5; x}
g <- abc(g)
It is not that you are spending time copying g to the function and then copying the result back into g. I think what R does with the code
g <- abc(g)
is:
The right side is looked at first. An environment for the function abc is set up.
A pointer is created in that environment called x.
x points to the same value that g points to.
Then x points to 5
The function returns the pointer x
g now points to the same value that x pointed to at the time of return.
Thus, it is not that there is a whole bunch of unnecessary copying of large options.
I hope that someone can confirm/correct this.
Am I missing something as to why you can't just do this?
g <- abc(g)

Resources