Subset r data.table conditionally using is.null() - r

I have a data.table
library(data.table)
testDT <- data.table(
L = (1:32),
M = rep(letters[23:26], each = 64),
N = rep(LETTERS[1:2], times = 2, each = 512),
O = rnorm(2048, 1))
testDT$L <- factor(testDT$L, levels = seq(from = 1, to = 32, by = 1))
I created a function to subset the data set conditionally. If the subsetting variable G is NULL and H is "w", then I want all values within testDT$N and all values "w" in testDT$M to be returned in testDT. This is what I created, which does not function correctly:
G <- NULL
H <- "w"
testDT1 <- testDT[if(is.null(G)) {eval(call("%in%", as.name("N"), G))} &
if(is.null(H)) {eval(call("%in%", as.name("M"), H))}]
I verified that everything but the if(is.null()) portion works by creating this, which subsets correctly:
G <- "A"
H <- "w"
testDT1 <- testDT[{eval(call("%in%", as.name("N"), G))} &
{eval(call("%in%", as.name("M"), H))}]
How can I use the is.null() condition correctly?

Using computing on the language you can prepare call object using dedicated function.
library(data.table)
testDT = data.table(
L = factor(1:32),
M = rep(letters[23:26], each = 64),
N = rep(LETTERS[1:2], times = 2, each = 512),
O = rnorm(2048, 1)
)
i.expr = function(var, x){
if(is.null(x)) TRUE
else call("%in%", as.name(var), x)
}
G = NULL
H = "w"
i.expr("N",G)
#[1] TRUE
i.expr("M",H)
#M %in% "w"
testDT1 = testDT[eval(i.expr("N",G)) & eval(i.expr("M",H))]
G = "A"
H = "w"
i.expr("N",G)
#N %in% "A"
i.expr("M",H)
#M %in% "w"
testDT2 = testDT[eval(i.expr("N",G)) & eval(i.expr("M",H))]
If you always subset by two conditions and & operator. I would merge it into a single function so you can call it once using testDT[eval(i.expr(...))].

Related

Conditional statement: change one variable in a data list based on certain input

Can I use conditional statement to change one variable in a data list based on certain input?
For instance, a data list as follows. I need d = perd or phyd when I use different input: dlist[x], d=perd; dlist[y], d=phyd. x and y can be anything, what I need is just to give an order and then make it as perd or phyd.
dlist <- list(
Nsubjects = 1,
Ntrials = 2,
d = perd,
)
perd <- c (1,2,3)
phyd <- c (4,5,6)
Can you create another list with names to store perd and phyd ?
plist <- list(x = c (1,2,3), y = c (4,5,6))
You can then extract the data from it by it's name.
val <- 'x'
dlist <- list(
Nsubjects = 1,
Ntrials = 2,
d = plist[[val]]
)
Without creating plist you can do. :
list(
Nsubjects = 1,
Ntrials = 2,
d = if(val == 1) c(1,2,3) else c(4,5,6)
)
Or also :
list(
Nsubjects = 1,
Ntrials = 2,
d = list(c(1,2,3),c(4,5,6))[[val]]
)
where val <- 1 or 2.

Changing behavior for closure stored in data.table between R 3.4.3 and R 3.6.0

I noticed the following peculiar behavior when I upgraded from R 3.4.3 to R 3.6.0 (both were using data.table 1.12.6). In 3.4.3 the code below leads to the all.equal statement being TRUE, but in 3.6.0 there is a mean relative difference that comes from the fact that even though we are trying to access the approxfun calculated from group "a", the values from group "b" are used (probably somehow due to lazy evaluation). In 3.6.0, this issue can be solved by adding a copy statement in the calls to approxfun based on this question:
Handling of closures in data.table
The fascinating thing to me is that I do not get an error in 3.4.3. Any idea what changed?
library(data.table)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
dtFuncs <- data[ , list(
func = list(stats::approxfun(x, y, rule = 2))
), by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
After running git bisect on the r-source, I was able to deduce that it was this commit that caused the behavior: https://github.com/wch/r-source/commit/adcf18b773149fa20f289f2c8f2e45e6f7b0dbfe
What fundamentally happened was that in the case where x's were ordered in approxfun, an internal copy was no longer made. If the data had been randomly sorted, the code would have continued to work! (see snippet below)
Lesson for me is that its probably best not to mix complicated objects with data.table as the same environment is used over and over for each "by" group (or being very deliberate with data.table::copy)
## should be run under R > 3.6.0 to see disparity
library(data.table)
## original sorted x (does not work)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
dtFuncs <- data[ , {
print(environment())
list(
func = list(stats::approxfun(x, y, rule = 2))
)
}, by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
## unsorted x (works)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
set.seed(10)
data <- data[sample(1:.N, .N)]
dtFuncs <- data[ , {
print(environment())
list(
func = list(stats::approxfun(x, y, rule = 2))
)
}, by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
## better approach: maybe safer to avoid mixing objects treated by reference
## (data.table & closures) all together...
fList <- lapply(split(data, by = "group"), function(x){
with(x, stats::approxfun(x, y, rule = 2))
})
fList
fList[[1]](.07) != fList[[2]](.07)

Creating a Table out of a While Loop in R

I am trying to make a table from a while loop. Basically, I want to make a while loop where the value of r increases by 1 and repeats this until the inequality is met. But in addition to that, I want to combine these values into a table with three columns: the value of r, the value of w, and the value of rhs (rounded to 3 decimal places).
```{r}
al = 0.10; n = 30; a = 3; b = 5; r = 2; int = 8; h = (int/2); msE = 19.19
table = function(MSE, V, H, alpha = al, r = 2){
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
while(w > rhs){
r = r+1
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
}
rbind(g)
}
table(MSE = msE, V = a*b, H = h)
```
I figured it would go something like this, but this only prints out the last value of r before the loop ends (it ends at 26), which results in a "table" that only has one row. I would like a table with 24 rows (since it starts at r = 2).
Any help would be appreciated!
Perhaps this might help:
al = 0.10; n = 30; a = 3; b = 5; r = 2; int = 8; h = (int/2); msE = 19.19
table = function(MSE, V, H, alpha = al, r = 2){
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
gn = data.frame(r, round(w, 3), round(rhs, 3))
while(w > rhs){
r = r+1
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
gn <- rbind(gn,g)
}
return(gn)
}
table(MSE = msE, V = a*b, H = h)
A slightly different approach, eliminating the need for an interim data frame and for rbind(). Commented in the code.
# your parameters
al <- 0.10; n <- 30; a <- 3; b <- 5; int <- 8; h <- (int/2); msE <- 19.19
# your function definition (name changed to avoid confusion / conflict with existing R function)
tabula <- function(MSE, V, H, alpha = al, r = 2)
{
g <- data.frame( N = 0, W = 1, RHS = 0 ) # initiate data frame, values set
# so that the while condition is met
# the while function populates the data frame cell by cell,
# eliminating the need for an interim data.frame and rbind()
while( g[ r - 1, "W" ] > g[ r - 1, "RHS" ] ) # check condition in the last data frame row
{ # write values in a new row
g[ r, "N" ] <- r
g[ r, "W" ] <- round( qf( alpha, V - 1, V * ( r - 1 ), lower.tail = FALSE ), 3 )
g[ r, "RHS" ] <- round( h^2 * r / ( ( V - 1 ) * MSE ), 3 )
r <- r + 1 # increment row counter
}
return( g[ -1, ] ) # return the data frame, removing the initial row
}
tabula( MSE = msE, V = a * b, H = h )

R put for loop into vector

I would like to put the output of the following for loop into a single vector.
test=c("A","B","C","D")
for(i in 1:3)
{e=runif(1,5,10);
f=round(e);
g=sample(test,f,TRUE);
h=paste(g,collapse = "");
print(h)}
Output:
[1] "BDCCABD"
[1] "DDBAADBBAA"
[1] "DACCAB"
I would like to get a vector like:
i=c("BDCCABD","DDBAADBBAA","DACCAB")
Thank you for your help
Just a slight adaptation of your code will do it.
set.seed(8632) # make the results reproducible
i <- sapply(1:3, function(x){
e = runif(1, 5, 10)
f = round(e)
g = sample(test, f, TRUE)
h = paste(g, collapse = "")
print(h)
h
})
i
#[1] "CACDAABCC" "ADDAACA" "ACCDAACAB"
Do you really need to print(h)?
EDIT.
I've just tested it and the following simplification gives exactly the same result.
set.seed(8632) # make the results reproducible
j <- sapply(1:3, function(x){
f <- sample(5:10, 1) # this is equivalent to your original code
g = sample(test, f, TRUE)
h = paste(g, collapse = "")
print(h)
h
})
j
#[1] "CACDAABCC" "ADDAACA" "ACCDAACAB"
identical(i, j)
#[1] TRUE
You mention vector, then let us using vector
V=vector()
test=c("A","B","C","D")
for(i in 1:3)
{e=runif(1,5,10);
f=round(e);
g=sample(test,f,TRUE);
h=paste(g,collapse = "");
V[i]=h}
V
[1] "BCCAD" "CCDCACBAD" "ADCDBCBCC"
V[1]
[1] "BCCAD"
I think something like this:
test=c("A","B","C","D")
h_final<-0
for(i in 1:3){e=runif(1,5,10);
f=round(e);
g=sample(test,f,TRUE);
h=paste(g,collapse = "");
h_final[i]<-h
if(i==3){print(h_final)}
}
something like this?
j <- character()
test = c("A", "B", "C", "D")
for (i in 1:3) {
e = runif(1, 5, 10)
f = round(e)
g = sample(test, f, TRUE)
h = paste(g, collapse = "")
j <- c(j, h)
}
print(j)
> print(j)
[1] "DDDBADBCD" "ABBCBCC" "BBCAA"
EDIT: Even simpler
test = c("A", "B", "C", "D")
for (i in 1:3) {
e = runif(1, 5, 10)
f = round(e)
g = sample(test, f, TRUE)
h[i] = paste(g, collapse = "")
}
> print(h)
[1] "DBDADDD" "AABDA" "CDBDDABC"
Not the most elegant way especially if you have lots of iterations but it works:
test=c("A","B","C","D")
k = NA
for(i in 1:3)
{e=runif(1,5,10)
f=round(e)
g=sample(test,f,TRUE)
h=paste(g,collapse = "")
k = append(k,h)
print(h)}
k <- na.omit(k)
You should take a look at vectors in R
You need to initialize an empty vector, let's call it test_vector
test_vector = c()
test=c("A","B","C","D")
for(i in 1:3)
{e=runif(1,5,10);
f=round(e);
g=sample(test,f,TRUE);
h=paste(g,collapse = "");
print(h)
test_vector <- c(test_vector,h)
}
Note that you could apply a function to your test vector directly without using a for loop.

Example quaternion multiplication in R

My question
multiplying numbers and symbols in R was answered and here I would like to give an example of using this for quaternion multiplication. Actually, I am using this on a much larger set (a group of 256 elements) but the principle is the same. I'm very new to working with data.tables so any additional tips are appreciated.
groupMult = data.table(
e = c("i","j","k", "e"),
i = c("-e","-k","j", "i"),
j = c("k","-e","-i", "j"),
k = c("-j","i","-e", "k")
);
row.names(groupMult) = c("i", "j", "k", "e");
setkey(groupMult);
# Find X*Y with X = 2i - 3j, Y = k - 4e
X = data.table(i = 2, j = -3);
Y = data.table(k = 1, e = -4);
# reduce groupMult to the vectors we need for multiplication
multMa = groupMult[names(X), names(Y), with = F];
# repeat values of Y ncol(X) times
multY = Y[rep(seq_len(nrow(Y)), each=ncol(X)),];
# repeat values of X ncol(Y) times
multX = t(X[rep(seq_len(nrow(X)), each=ncol(Y)),]);
# coefficient matrix
multMaNum = multY*multX;
row.names(multMaNum) = names(X);
# elementwise multiplicaton of multMaNum with multMa
res = mapply(paste, multMaNum, multMa, MoreArgs=list(sep='*') )
res[] <- sapply(res , function(x) sub("(.*)([-])(.*)", "\\2\\1\\3", x));
# collapse all elements of the data.table to get final result
res = paste(lapply(res, paste, collapse = " "), collapse = " + ");
> res
[1] "-2*j + -3*i + -8*i + 12*j"

Resources