Replace (recode) values in a list

Replace (recode) values in a list - r

A have a list that has the following structure.
mylist=list(y~ A,
y ~ A+B,
y ~ A+B+C)
I want to replace (recode) the “y “ with a “z”. my goal is
mylist=list(z~ A,
z ~ A+B,
z ~ A+B+C)
Q: How to replace (recode) values in a list?
I have tried this:
for i in range(len(mylist)):
mylist[i] = mylist[i].replace('y','z')
is not working

The update function is useful for formulas.
Just include a . to indicate any formula side to retain. So, for your problem the following is a quick one-liner.
lapply(mylist, update, new = z~.)

I would alternatively suggest to use R built in formulas manipulation functionality. This allows us to operate on different terms of a fromula separately without using regex
lapply(mylist, function(x) reformulate(as.character(terms(x))[3], "z"))
# [[1]]
# z ~ A
# <environment: 0x59c6040>
#
# [[2]]
# z ~ A + B
# <environment: 0x59c0308>
#
# [[3]]
# z ~ A + B + C
# <environment: 0x59bb7b8>

Since you have a list of formulas as a start, you can convert the formulas to characters, use gsub to do the replacement and convert it back to formula. Use env parameter to specify the environment of the formulas to make sure they are the same as original list:
lapply(mylist, function(f) formula(gsub("y", "z", format(f)), env = .GlobalEnv))
# [[1]]
# z ~ A
# [[2]]
# z ~ A + B
# [[3]]
# z ~ A + B + C
To take care of the concern of #David Arenburg, so that the replacement of y will always happen on the left side of the formula, we can use a more restricted regular expression:
lapply(mylist, function(f) formula(gsub("y(\\s)?(?=~)", "z", format(f), perl = T), env = .GlobalEnv))
# [[1]]
# z ~ A
# [[2]]
# z ~ A + B
# [[3]]
# z ~ A + B + C

Just like expressions, formulas have replaceable parts. So you could use [[<- to replace parts of the formula. The y value is the second in the expression list, as ~ is a function and hence the first.
lapply(mylist, "[[<-", 2, as.name("z"))
# [[1]]
# z ~ A
#
# [[2]]
# z ~ A + B
#
# [[3]]
# z ~ A + B + C

Related

Applying function on vector in R

I am fairly new to programming in R and I am wondering why this does not work:
w <- c(1,0)
deriv(~x^2+y,c("x","y"),function.arg = TRUE)(w)
I really want to apply the function produced by deriv() on a variable w.
Maybe some background on how to deal with these kinds of "macros" might be helpful...

We can use do.call and pass the 'w' as a list of arguments
do.call(deriv(~x^2+y,c("x","y"),function.arg = TRUE), as.list(w))

Your function exposes two non-default parameters but you only pass one argument. Below would work, if this is your intention:
w <- c(1,0)
deriv( ~ x^2 + y, c("x","y"), function.arg = TRUE)(w, w)
# [1] 2 0
# attr(,"gradient")
# x y
# [1,] 2 1
# [2,] 0 1
Alternatively, set up a default parameter:
w <- c(1,0)
deriv( ~ x^2 + y, c("x","y"), function.arg = function(x, y=2){})(x=w)
# [1] 3 2
# attr(,"gradient")
# x y
# [1,] 2 1
# [2,] 0 1
# MORE READABLE VERSION WITH identity()
myfunc <- deriv( ~x^2 + y, c("x","y"), func = function(x, y=2) identity(x,y))
myfunc(x=w)

Create expression test to see if `as.formula('X ~ 1')` in R contains an intercept?

In R, one can specify a formula:
F <- as.formula('X ~ 1')
I am trying to come up with a way to test if F above contains an intercept only, i.e., ~ 1. I was trying to use grepl to no avail. Is there a way to definitely test if the above formula contains only an intercept? i.e., I am hoping to come up with a method that would return true in the following different cases:
F <- as.formula('X~ 1')
F <- as.formula('X~1')
F <- as.formula('X ~1')
as well. Thanks!

attr(terms(x~y), 'intercept') will do what you want.
formula <- x~y
formula2 <- x~y-1 # no intercept
attr(terms(formula), 'intercept')
## [1] 1
attr(terms(formula2), 'intercept')
## [1] 0
EDIT: I initially misread the question. If you are looking for a specific example that will look for whether a formula contains only an intercept you could use:
f1 <- x ~ y
f2 <- x ~ y-1
f3 <- x ~ 1
f3 <- x ~ 0
onlyIntercept <- function(f){
return(attr(terms(f), 'intercept') & length(attr(terms(f), 'factors')) == 0)
}
# Examples on above, then on OPs examples:
onlyIntercept(f1)
## [1] FALSE
onlyIntercept(f2)
## [1] FALSE
onlyIntercept(f3)
## [1] TRUE
onlyIntercept(f4)
## [1] FALSE
onlyIntercept(as.formula('X~ 1'))
## [1] TRUE
onlyIntercept(as.formula('X~1'))
## [1] TRUE
onlyIntercept(as.formula('X ~1'))
## [1] TRUE
The onlyIntercept function I define here checks if the intercept attribute is 0 or 1 and checks if there are any additional factors(variables) that would normally be included in a model. If none are present this attribute has a length of 0 and can easily be checked.

You can use the lazyeval package:
> F <- as.formula('X ~ 1')
> lazyeval::f_rhs(F)
[1] 1

We can extract and check
F[[3]] == 1
because if we do as.list, the 3rd list element is 1
as.list(F)
#[[1]]
# `~`
#[[2]]
#X
#[[3]]
#[1] 1
it will return TRUE in all the 3 'F's in the OP"s post

Sum of all vectors of variables with common prefix

Is it possible to sum all vector variables with a common prefix ?
Exemple:
x1 <- c(1,2,3)
x2 <- c(4,5,6)
.
.
.
xn <- c(n,n,n)
y = x1 + x2 + ... xn
The number of variables xn (ie with prefix x) is only known at runtime.

Assuming your y has the same dimension as x, you could try capturing all the variables into the list and apply a summation operation.
> x2 <- c(4,5,6)
> x1 <- c(1,2,3)
> ls(pattern = "^x\\d+$") # this is regex for finding "x" and "digits",
# ^ is start of string, $ is end of string
[1] "x1" "x2"
> sapply(ls(pattern = "^x\\d+$"), get, simplify = FALSE)
$x1
[1] 1 2 3
$x2
[1] 4 5 6
> out <- sapply(ls(pattern = "^x\\d+$"), get, simplify = FALSE)
> Reduce("+", out)
[1] 5 7 9
You can also use mget as suggested by #LyzandeR's, especially if fancy one-liners.
Reduce("+", mget(ls(pattern = "^x\\d+$")))

You can check an example:
xx <- 1
xx2 <- 2
xx3 <- 3
#get the names of the variables containing xx
vars <- ls(pattern = 'xx')
#mget will get the variables from the names, unlist will add them in an atomic vector
sum(unlist(mget(vars)))
#[1] 6

A very naive solution could be:
# first 2 vectors are of interest
x1 <- c(1,2,3)
x2 <- c(4,5,6)
# answer doesn't need to have z sum in it
z <- c(7,8,9)
# create a dummy answer vector, initialize it will all 0; length will be the length of single vector that we are adding
answer<-rep(0,length(x1))
# loop through each variable in current environment
for (var in ls()){
# see if variable name begins with x
if (startsWith(var,'x')){
# add it to our answer
answer = answer + get(var)
}
}
# print the answer
print(answer)

Behavior of do.call() in the presence of arguments without defaults

This question is a follow-up to a previous answer which raised a puzzle.
Reproducible example from the previous answer:
Models <- list( lm(runif(10)~rnorm(10)),lm(runif(10)~rnorm(10)),lm(runif(10)~rnorm(10)) )
lm1 <- lm(runif(10)~rnorm(10))
library(functional)
# This works
do.call( Curry(anova, object=lm1), Models )
# But so does this
do.call( anova, Models )
The question is why does do.call(anova, Models) work fine, as #Roland points out?
The signature for anova is anova(object, ...)
anova calls UseMethod, which should* call anova.lm which should call anova.lmlist, whose first line is objects <- list(object, ...), but object doesn't exist in that formulation.
The only thing I can surmise is that do.call might not just fill in ellipses but fills in all arguments without defaults and leaves any extra for the ellipsis to catch? If so, where is that documented, as it's definitely new to me!
* Which is itself a clue--how does UseMethod know to call anova.lm if the first argument is unspecified? There's no anova.list method or anova.default or similar...

In a regular function call ... captures arguments by position, partial match and full match:
f <- function(...) g(...)
g <- function(x, y, zabc) c(x = x, y = y, zabc = zabc)
f(1, 2, 3)
# x y zabc
# 1 2 3
f(z = 3, y = 2, 1)
# x y zabc
# 1 2 3
do.call behaves in exactly the same way except instead of supplying the arguments directly to the function, they're stored in a list and do.call takes care of passing them into the function:
do.call(f, list(1, 2, 3))
# x y zabc
# 1 2 3
do.call(f, list(z = 3, y = 2, 1))
# x y zabc
# 1 2 3

I think it is worth stressing that the names of the list elements do matter. Hadley mentioned it, but it can be an annoyance. Consider the next example:
x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000) + 0.2
Models <- list()
Models$xy <- lm(z~x)
Models$yz <- lm(z~y)
# This will fail, because do.call will not assign anything to the argument "object" of anova
do.call(anova, Models)
# This won't
do.call(anova, unname(Models))

do.call passes the first element of the list to the first argument:
fun <- function(x,...) {
print(paste0("x=",x))
list(x, ...)
}
do.call(fun, list(1,2))
# [1] "x=1"
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2

Evaluate a symbolic Ryacas expression

This is a reproducible example:
a <- 0.05
za.2 <- qnorm(1-a/2)
b <- 0.20
zb <- qnorm(1-b)
lambda12 <- -log(1/2)/12
lambda18 <- -log(1/2)/18
theta <- lambda18/lambda12
(d = round(4*(za.2+zb)^2/log(theta)^2))
Tf<-36
library(Ryacas)
n <- Sym("n")
Solve(n/2*(2-exp(-lambda12*Tf)-exp(-lambda18*Tf))==d , n)
The last line returns
expression(list(n == 382/1.625))
Is there a way to extract the quotient and assign it to another variable (235.0769)?

G.Grothendieck pointed out in comments that you'll need to first to capture the expression to be operated upon below:
soln <- Solve(n/2*(2-exp(-lambda12*Tf)-exp(-lambda18*Tf))==d , n)
X <- yacas(soln)$text
Then, to extract the quotient, you can take advantage of the fact that many R language objects either are or can be coerced to lists.
X <- expression(list(n == 382/1.625))
res <- eval(X[[1]][[2]][[3]])
res
[1] 235.0769
The following just shows why that sequence of indices extracts the right piece of the expression:
as.list(X)
# [[1]]
# list(n == 382/1.625)
as.list(X[[1]])
# [[1]]
# list
#
# [[2]]
# n == 382/1.625
as.list(X[[1]][[2]])
# [[1]]
# `==`
#
# [[2]]
# n
#
# [[3]]
# 382/1.625