R apply and for loop calculation - r

Would someone be able to explain why this apply doesn't work correctly? I wanted to normalise all the values in each row by the sum of the values in each row - such that the sum of each row =1 However, when I did this using an apply function, the answer is incorrect.
data <- data.frame(Sample=c("A","B","C"),val1=c(1235,34567,234346),val2=c(3445,23446,234235),val3=c(457643,234567,754234))
norm <- function(x){
x/sum(x)}
applymeth <- data
applymeth[,2:4] <- apply(applymeth[,2:4], 1, norm)
rowSums(applymeth[,2:4])
loopmeth <- data
for(i in 1:nrow(data)){
loopmeth[i,2:4] <- norm(loopmeth[i,2:4])
}
rowSums(loopmeth[,2:4])
Thanks.

apply() gives you (in the result) a matrix column by column - in your case from a row-by-row input. You have to transpose the result:
applymeth <- data
applymeth[,2:4] <- t(apply(applymeth[,2:4], 1, norm))
rowSums(applymeth[,2:4])
Have a look at
apply(matrix(1:12, 3), 1, norm)
The reason for this result of apply() is a convention:
in a matrix or a multidimensional array the index of the first dimension is running first, then the second and so on. Example:
array(1:12, dim=c(2,2,3))
So (without any reorganisation of the data) apply() produces one column after the other. This behavior not depends on the parameter MARGIN= of the function apply().

Related

I want to apply two functions one function on the block diagonal and the second function on the off-diagonal elements in the data frame

df<- data.frame(a=c(1:10), b=c(21:30),c=c(1:10), d=c(14:23),e=c(11:20),f=c(-6:-15),g=c(11:20),h=c(-14:-23),i=c(4:13),j=c(1:10))
In this data frame, I have three block-diagonal matrices which are as shown in the image below
I want to apply two functions, one is the sine function for block diagonal and the second is cosine function for the other elements and generates the same structure of the data frame.
sin(df[1:2,1:2])
sin(df[3:5,3:5])
sin(df[6:10,6:10])
cos(the rest of the elements)
1) outer/arithmetic Create a logical block diagonal matrix indicating whether the current cell is on the block diagonal or not and then use that to take a convex combination of the sin and cos values giving a data.frame as follows:
v <- rep(1:3, c(2, 3, 5))
ind <- outer(v, v, `==`)
ind * sin(df) + (!ind) * cos(df)
2) ifelse Alternately, this gives a matrix result (or use as.matrix on the above). ind is from above.
m <- as.matrix(df)
ifelse(ind, sin(m), cos(m))
3) Matrix::bdiag Another approach is to use bdiag in the Matrix package (which comes with R -- no need to install it).
library(Matrix)
ones <- function(n) matrix(1, n, n)
ind <- bdiag(ones(2), ones(3), ones(5)) == 1
Now proceed as in the last line of (1) or as in (2).
If it's okay for you that the result is stored in a new data frame you could change the order of your instructions and do it like that:
ndf <- cos(df)
ndf[1:2,1:2] <- sin(df[1:2,1:2])
ndf[3:5,3:5] <- sin(df[3:5,3:5])
ndf[6:10,6:10] <- sin(df[6:10,6:10])

Compute 15 rows in parallel (through vectorization) and create df with them

I am creating 15 rows in a dataframe, like this. I cannot show my real code, but the create row function involves complex calculations that can be put in a function. Any ideas on how I can do this using lapply, apply, etc. to create all 15 in parallel and then concatenate all the rows into a dataframe? I think using lapply will work (i.e. put all rows in a list, then unlist and concatenate, but not exactly sure how to do it).
for( i in 1:15 ) {
row <- create_row()
# row is essentially a dataframe with 1 row
rbind(my_df,row)
}
Something like this should work for you,
create_row <- function(){
rnorm(10, 0,1)
}
my_list <- vector(100, mode = "list")
my_list_2 <- lapply(my_list, function(x) create_row())
data.frame(t(sapply(my_list_2,c)))
The create_row function is just make the example reproducible, then we predefine an empty list, then fill it with the result from the create_row() function, then convert the resulting list to a data frame.
Alternatively, predefine a matrix and use the apply functions, over the row margin, then use the t (transpose) function, to get the output correct,
df <- data.frame(matrix(ncol = 10, nrow = 100))
t(apply(df, 1, function(x) create_row(x)))

Sequentially/recursively replace the first numeric element of a column in a matrix with zero, until all = zero

I want to iteratively remove the first numeric elements, in all columns of a matrix, one iteration at a time, until all values=0. i.e.
matrix(nrow=3,ncol=2,Iteration1)
Iteration1=c(1,1,0,1,1,0)
Iteration2=c(0,1,0,0,1,0)
Iteration3=c(0,0,0,0,0,0)
The following function, based on a previous post, works to identify and replace the first numeric element. How can I cycle this result back into the function to remove the next numeric elements, and keep the results of each iteration?
simulate data
data<-rbinom(20, size=1, prob=0.5)
data<-c(data)
dat<-matrix(data,nrow = 5,ncol=4)
mat<-dat
#identify first element=1
fun1<-function(mat){
cols<-c(1,2,3,4)
rown <-apply(mat[,cols] , 2, function(x) which(x==1)[1])
mat[cbind(rown,cols )] <- c(0,0,0,0)
return(mat)}
fun1(mat)
There are two problems to be solved here. The first one is that your fun1() fails as soon as one of the columns does not contain any non-zero values. The second one is how to do the recursion and store the intermediate results. I will address both of them below.
Improving fun1()
Your fun1() throws an error, as soon as one of the columns does not contain any non-zero values. The reason is that for these columns, the value of rown is NA, which leads to an error in mat[cbind(rown,cols )]. The following version of fun1() does not have this issue (and is also more general as your function, because it also works for matrices that have not 4 columns):
fun1 <- function(mat) {
rows <- apply(mat, 2, function(x) which(x==1)[1])
idx <- cbind(rows[!is.na(rows)], which(!is.na(rows)))
mat[idx] <- 0
return (mat)
}
How to do the recursion
You can do the recursion with a while loop. At each step, the resulting matrix is stored in a list:
mat <- dat
mats <- list()
while (any(mat != 0)) {
mat <- fun1(mat)
mats <- append(mats, list(mat))
}
mats is a list of four matrices, where the first one (mats[[1]]) is equal to fun1(mat) and the last one (mats[[4]]) contains only zeroes.

R - apply over increasing submatrices, instead of individual rows/cols

So I've been pondering how to do this without a for loop and I couldn't come up with a good answer. Here is an example of what I mean:
sampleData <- matrix(rnorm(25,0,1),5,5)
meanVec <- vector(length=length(sampleData[,1]))
for(i in 1:length(sampleData[,1])){
subMat <- sampleData[1:i,]
ifelse( i == 1 , sumVec <- sum(subMat) ,sumVec <- apply(subMat,2,sum) )
meanVec[i] <- mean(sumVec)
}
meanVec
The actual matrix I want to do this to is reasonably large, and to be honest, for this application it won't make a huge difference in speed, but it's a question I think should be answered:
How can I get rid of that for loop and replace with some *ply call?
Edit: In the example given, I generate sample data, and define a vector equal to the number of rows in the vector.
The for loop does the following steps:
1) takes a submatrix, from row 1 to row i
2) if i is 1, it just sums up the values in that vector
3) if i is not 1, it gets the sum of each row, then gets the mean of the sum and stores that in position i of the vector meanVec.
Finally, it prints out the mean of that sum.
This does what you describe:
cumsum(rowSums(sampleData))/seq_len(nrow(sampleData))
However, your code doesn't do the same.

Multiplying data frame column values based on the value of another column in R

I have a data frame (150000 obs, 15 variables) in R and need to correct a subset of values of one variable (simply by multiplying by a constant) based on the value of another. What's an easy way to do this?
I though apply would work, but I'm not sure how to write the function (obviously can't multiply in the function) and qualifier:
df$RESULT <- df[apply(df$RESULT, 1, function(x * 18.01420678) where(SITE==1)), ]
you mean this?
dat <- data.frame(x=1:10,y=sample(20,10))
constant <- 100
dat$y <- ifelse(dat$x > dat$y, dat$y*constant, dat$y)
You could use the capacity of "[" to do subsetting but for "correction" of a subset you need to use the logical expression that defines the subset on both sides of the assignment. Since you will then be working with only the values that need correction you do not use any further conditional function.
df[ df$SITE==1, "RESULT" ] <- df[ df$SITE==1, "RESULT"] * 18.01420678
In cases where the operation is to be done on large (millions) of cases or done repeatedly in simulations, this approach may be much faster that the ifelse approach

Resources