How to apply row-wise data frame operation in R - r

I have the data frame blood2 (49x3) to which I wish to apply the following row-operation.
cblood2[1,] <- ((blood2[1,]*mcabc)*100)/(sum(blood2[1,]*mcabc))
where, mcabc is a 1x3 row-vector.
I want to do the above operation on every row in blood2 (49 row-wise operations) and store the results in cblood2. But I do not want to write out each of the row-operations in my R code. Is there a way to do this with tidyverse or similar R packages?
I tried the following code but the results are different from what I expect.
cblood2[1:49,] <- ((blood2[1:49,]*mcabc)*100)/(sum(blood2[1:49,]*mcabc))
Any advice is greatly appreciated. Thank you!

Use apply with MARGIN = 1 for loop over the rows and apply the function
out1 <- t(apply(blood2, 1, function(x) x * mcabc * 100/(sum(x * mcabc))))
Or another option with vectorized functions (rowSums)
mcabc2 <- c(mcabc)[col(blood2)]
out2 <- ((blood2 * mcabc2) * 100)/(rowSums(blood2 *mcabc2))
-checking the outputs
> identical(out1, out2)
[1] TRUE
data
set.seed(24)
blood2 <- matrix(rnorm(49 * 3), nrow = 49, ncol = 3)
mcabc <- matrix(1:3, ncol = 3)

Related

R - change matrix values based on another matrix indices

I have two matrices:
m1 <- matrix(runif(750), nrow = 50, byrow=T)
m2 <- matrix(rep(TRUE,750), nrow = 50, byrow=T)
For each m1 row, I need to find the indices of the two lowest values. Then, I need to use the remaining indices (i.e. not the two lowest values) to assign FALSE in m2.
It is fairly easy to do for one row:
ind <- order(m1[1,], decreasing=FALSE)[1:2]
m2[1,][-ind] <- FALSE
Therefore, I can use a loop to do the same for all rows:
for (i in 1:dim(m1)[1]){
ind <- order(m1[i,], decreasing=FALSE)[1:2]
m2[i,][-ind] <- FALSE
}
However, in my data set this loop runs slower than I would like (since my matrices are quite large - 500000x150000).
Is there any faster, R way to achieve the same result without the use of loops?
You can try the code below
m2 <- t(apply(m1,1,function(x) x %in% head(sort(x),2)))
You can try apply since you have matrix :
val <- rep(TRUE, ncol(m1))
m3 <- t(apply(m1, 1, function(x) {val[-order(x)[1:2]] <- FALSE;val}))
You can do:
m2 <- t(apply(m1, 1, function(x) rank(x)<3))
Using pmap
library(purrr)
pmap_dfr(as.data.frame(m1), ~ min_rank(c(...)) < 3)

apply list of indices to list of dataframes

I need to apply a list of indices to a list of dataframes with a one on one mapping. First element of the list of indices goes to the first dataframe only and so on. List of indices applies to the rows in the dataframes.
And a list of complementary dataframes needs to created by selecting rows not mentioned in the indices list.
Here is some sample data:
set.seed(1)
A <- data.frame(matrix(rnorm(40,0,1), nrow = 10))
B <- data.frame(matrix(rnorm(40,2,3), nrow = 10))
C <- data.frame(matrix(rnorm(40,3,4), nrow = 10))
dflis <- list(A,B,C)
# Create a sample row index
ix <- lapply(lapply(dflis,nrow), sample, size = 6)
So far I have managed this working but ugly looking code:
dflis.train <- lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
dflis.test <- lapply(seq_along(dflis), function(x) dflis[[x]][-ix[[x]],])
Can someone suggest something better, more elegant?
Use Map/mapply instead of the univariate lapply, so that you can iterate over both objects and apply a function, like:
Map(function(d,r) d[r,], dflis, ix)
Or if you want to be fancy:
Map(`[`, dflis, ix, TRUE)
Matches your requested answer.
identical(
Map(function(d,r) d[r,], dflis, ix),
lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
)
#[1] TRUE

Avoiding a loop when populating data frames in R

I have an empty data frame T_modelled with 2784 columns and 150 rows.
T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
where
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
I filled T_modelled by column with a nested for loop, based on a formula:
for (i in 1:ncol(T_modelled)) {
col_tmp <- colnames(T_modelled)[i]
for (j in 1:nrow(T_modelled)) {
z_tmp <- z[j]-0.1
T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
T_modelled[j ,col_tmp] <- T_tmp
}
}
where
MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)
I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.
Anyone got any ideas how to get the same result in a faster, more "R-like" manner?
I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.
Tm <- T_modelled
Now run your code then run the code below.
z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
Tm[ , i] <- T_tmp
}
all.equal(T_modelled, Tm)
#[1] TRUE
You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)
Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.
z_adj <- z - 0.1
T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))
colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)
all.equal(T_modelled, T_modelled2)
# [1] TRUE
Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.
You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this
x <- c(1, 6, 3)
sum(x)
not needing something like this
x_ <- 0
for (i in x) {
x_ <- i + x_
}
x_
That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.
When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.
If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).
z_tmp <- z - 0.1
i <- 1
T_tmp <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
Looks OK. Then you start looping over the elements of t_sec_ERT.
T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))
for (i in 1:length(t_sec_ERT)) {
T_tmp[, i] <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}
Or you can do it with sapply() which is often neater.
f <- function(x) {
MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}
T_tmp <- sapply(t_sec_ERT, f)
I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
v <- expand.grid(t_sec_ERT, z)
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)
T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

How to sum over range and calculate a series in R?

Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))

lapply with condition?

I try to do a function which works with for loops with lapply instead. I am very new to R and not confortable with lapply. I especially don't get how to make an "if" condition.
My current code with for loops looks like that (it nomalizes volume series):
function(TableVolume,VolumeM,VolumeStD,n){
TableBN = TableVolume[n:nrow(TableVolume),]
for(k in 1:nrow(TableBN)){for (i in 2:ncol(TableBN)){if(!is.na(VolumeM[k,i]) && (VolumeM[k,i]) && (TableVolume[n-1+k,i]>VolumeM[k,i]+1.96/sqrt(n)*VolumeStD[k,i])){TableBN[k,i]=TableVolume[n-1+k,i]/VolumeM[k,i]}else{TableBN[k,i]=0}}}
TableBN=TableBN[n:nrow(TableVolume),]
return(TableBN)
}
I know from Apply over two data frames how to do a function that works with 2 data frame but I still don't see how to handle tests.
Thanks for your support,
Vincent
You need use lapply (or other apply family function). It is generally required when you have some non-vectorized function to apply to vectorized argument. Your function and conditions is a combinations of arithmetic functions, which are nicely vectorized. So you can use subsetting and ifelse function please see as below:
set.seed(123)
# simulation
TableBN <- matrix(1:12, nrow = 3)
VolumeM <- matrix(12:1, nrow = 3)
VolumeStD <- matrix(12:1, nrow = 3)
TableVolume <- matrix(abs(rnorm(10)), nrow = 5)
# function
f <- function(TableVolume, VolumeM, VolumeStD, n){
TableBN <- TableVolume[n:nrow(TableVolume), ]
ifelse(
test = !is.na(VolumeM) && VolumeM && TableBN[, -1] > (VolumeM + 1.96 / sqrt(n) * VolumeStD),
yes = TableBN[, -1] / VolumeM,
no = 0)
}
# Test
f(TableVolume, VolumeM, VolumeStD, 3)
Output:
0

Resources