I would like to create a vector from a matrix by applying a conditional statement to each column. The conditional statement being, if any value in the column exceeds a fixed threshold, then the value in the vector should be the last row of that column, if it does not then the value in the vector should be 0. In the end, I should end up with a vector that is the same number of columns as my matrix. Any tips on how to do this?
Something like that ?
mat <- matrix(rnorm(100),nrow=10,ncol=10)
apply(mat, 2, function(v) {
ifelse (any(v > 0.7), v[length(v)],0)
})
Related
I have a (20x12)matrix with numerical values and a list of 12 numbers. If a value in the matrix is less than the value in the list in the corresponding column index, I would like to replace it. How can I do it?
mat <- matrix(rpois(240,10),ncol=12)
list_to_replace <- rpois(12,10)
I think this is what is desired. Use the same logical index to pick out the positions of the possible replacements and the re-assignments:
t( apply(mat, 1, function(r) {
r[ r < list_to_replace] <- list_to_replace[ r < list_to_replace]; r}) )
The t is needed to transpose back because the apply function always delivers column-oriented result, even when the input is rowwise.
BTW; you would be well advised to only use the term "list" when referring to an R object with class "list". What you have is a "vector".
You could use the code below:
index <- t(t(mat) < list_to_replace)
mat[index] <- list_to_replace[which(index, TRUE)[, 2]]
I want to test whether every element of data frame is greater than 0. If it is greater than zero it would we will be "buy" otherwise "sell". I used sapply. It allocated every value "sell". I used following code. Also recommend for loop solution.
df1<-sapply(df,function(x) ifelse(x>0,yes="buy",no="sell"))
If it is a matrix (or even data.frame), create a logical matrix by using the comparison operator. This gives a TRUE/FALSE logical matrix which is of value 1/0. If we add 1 to it, it changes to 2/1 and based on that index, we can replace values (in R, indexing starts from 1)
df[] <- c("sell", "buy")[(df >0) + 1]
Also, in the comments, it was recommended not to use sapply on matrix as matrix is a vector with dim attributes and the unit element is a single element (in data.frame, the unit is a column - so if we use sapply/lapply, it loops through columns). Here, it loops through element of the matrix. So, it may not be efficient. For matrix, apply with MARGINcan be used
df[] <- apply(df, 2, FUN = function(x) ifelse(x > 0, "sell", "buy"))
Would someone be able to explain why this apply doesn't work correctly? I wanted to normalise all the values in each row by the sum of the values in each row - such that the sum of each row =1 However, when I did this using an apply function, the answer is incorrect.
data <- data.frame(Sample=c("A","B","C"),val1=c(1235,34567,234346),val2=c(3445,23446,234235),val3=c(457643,234567,754234))
norm <- function(x){
x/sum(x)}
applymeth <- data
applymeth[,2:4] <- apply(applymeth[,2:4], 1, norm)
rowSums(applymeth[,2:4])
loopmeth <- data
for(i in 1:nrow(data)){
loopmeth[i,2:4] <- norm(loopmeth[i,2:4])
}
rowSums(loopmeth[,2:4])
Thanks.
apply() gives you (in the result) a matrix column by column - in your case from a row-by-row input. You have to transpose the result:
applymeth <- data
applymeth[,2:4] <- t(apply(applymeth[,2:4], 1, norm))
rowSums(applymeth[,2:4])
Have a look at
apply(matrix(1:12, 3), 1, norm)
The reason for this result of apply() is a convention:
in a matrix or a multidimensional array the index of the first dimension is running first, then the second and so on. Example:
array(1:12, dim=c(2,2,3))
So (without any reorganisation of the data) apply() produces one column after the other. This behavior not depends on the parameter MARGIN= of the function apply().
I have a matrix and I want to create a list with selected rows of that matrix being the list elements.
For example this is my matrix
my.matrix=matrix(1:100, nrow=20)
and I want to create a list from this matrix such a way that each element of this list is part of the matrix and the row index of each part is defined by
my.n=c(1,2,4,3,5,5)
where my.n gives the number of rows that should be extracted from my.matrix. my.n[1]=1 means row 1; my.n[2]=2 means row 2,3; my.n[3]=4 means rows 4 to 7 and so on.
So the first element of my list should be
my.matrix[1,]
second
my.matrix[2:3,]
and so on.
How to do it in an elegant way?
Not quite sure, but I think you want something like this ...
S <- split(seq_len(nrow(my.matrix)), rep.int(seq_along(my.n), my.n))
lapply(S, function(x) my.matrix[x, , drop = FALSE])
Here we are splitting the row numbers of my.matrix by replications of my.n. Then we use lapply() over the resulting list S to subset my.matrix with those row numbers.
end <- cumsum(my.n)
start <- c(1,(end+1)[-length(end)])
mapply(function(a,b) my.matrix[a:b,,drop=F], start, end)
mapply takes the first argument from two vectors and applies them to a function. It moves on to the second element of each vector and continues through each vector. This behavior works for this application to create a list of subsets as described. credit to #nongkrong for the mapply approach.
I have a dataframe similar to the one this creates:
dummy=data.frame(c(1,2,3,4),c("a","b","c","d"));colnames(dummy)=c("Num","Let")
dummy$X1=rnorm(4,35,6)
dummy$X2=rnorm(4,35,6)
dummy$X3=rnorm(4,35,6)
dummy$X4=rnorm(4,35,6)
dummy$X5=rnorm(4,35,6)
dummy$X6=rnorm(4,35,6)
dummy$X7=rnorm(4,35,6)
dummy$X8=rnorm(4,35,6)
dummy$X9=rnorm(4,35,6)
dummy$X10=rnorm(4,35,6)
dummy$Xmax=apply(dummy[3:12],1,max)
only the real thing is 260*13000 cells roughly
what I aim to do is implement the equation below to each row in a set of columns defined by data[x:x] (in the example those within columns dummy[3:12])
TSP = Sum( (1-(Xi/Xmax)) /(n-1))
where Xi is each individual value within the row & among the columns of interest (i signifying each column, ie there is an X1, an X2, an X3... value for each row), Xmax is the largest of all those values in the row (as defined in the dummmy$Xmax column), and n is the number of columns selected (in the case of the example: n=10). In the actual data set I will be selecting 26 columns.
I would like to create a tidy little function which performs this calculation and deposits each row's value in to a column called dummy$TSP and does so for all 13000 rows.
One crude solution is the following, but like I said I would like to get this in to some kind of tidy function, where I can select the columns and the rest is (nearly) automatic.
dummy$TSP<- ((((1-(dummy$X1/dummy$Xmax))/(10-1))
+(((1-(dummy$X2/dummy$Xmax))/(10-1))
...
+(((1-(dummy$X10/dummy$Xmax))/(10-1)))
I would also really appreciate answers which explain the process well so I will be more likely to be able to learn, thanks in advance!
If you know the columns you want to apply the function over you can, as you suspect use apply to apply the function over the rows, on the columns you want like so;
# Columns you want to use for this function
cols <- c( 3:13 )
# Use apply to loop over rows
dummy$TSP <- apply( dummy[,cols] , 1 , FUN = function(x){ sum( ( 1 - ( x / max(x) ) ) / (length(x) - 1) ) } )
R is vectorised, so when we pass a row to the function in apply ( the row is passed as the argument x which will be a vector of 10 numbers), when we perform some operations R assumes that we want to do that operation on each element of the vector.
So in the first instance x/max(x) will return a vector of 10 numbers, which is an element from each column of that row / the maximum value in those columns for that row. We also divide each result of 1 - x/max(x) over the number of columns - 1. We then collate these into one value using sum which is returned from the function.
A more vectorized solution would be to perform the inner function over all elements and then perform the sum operation for each row with the efficient rowSums function like this:
vars.to.use <- paste0("X", 1:10)
dummy$TSP <- rowSums((1-(dummy[vars.to.use]/dummy$Xmax))/(length(vars.to.use) - 1))