This question already has answers here:
Use max on each element of a matrix
(3 answers)
Closed 5 years ago.
I have a 3D matrix of float values, and I need the maximum and minimum values inside, like:
[[1]]
0.1 0.3 0.6 0.8
0.09 0.1 0.4 0.6
[[2]]
0.08 0.2 0.5 0.75
0.01 0.3 0.5 0.55
generated by:
m <- list(matrix(c(0.1,0.3,0.6,0.8,
0.09,0.1,0.4,0.6),2,4,byrow=TRUE),
matrix(c(0.08,0.2,0.5,0.75,
0.01,0.3,0.5,0.55),2,4,byrow=TRUE))
max is 0.8 and min 0.01
I try max() and min() functions but doesn't work..
Try this:
Toy data:
> m1<-matrix(runif(8,1,100),2,4)
> m1
[,1] [,2] [,3] [,4]
[1,] 98.88260 97.13486 2.712315 35.09616
[2,] 43.90468 99.59167 41.873303 94.19843
> m2<-matrix(runif(8,1,100),2,4)
> m2
[,1] [,2] [,3] [,4]
[1,] 26.61785 61.11184 42.28846 55.87778
[2,] 49.27263 59.28607 46.72659 76.24321
> l<-list(m1,m2)
Calculate max and min
> lapply(l,FUN = max)
[[1]]
[1] 99.59167
[[2]]
[1] 76.24321
> lapply(l,FUN = min)
[[1]]
[1] 2.712315
[[2]]
[1] 26.61785
Also, you can have both in one line of code: lapply(l,FUN = range) (Thanks to Joseph Wood for this last suggestions. See comments)
Related
I have two data frames:
DF1
e l u
1 0.5 1.5
2 1 3
3 2 4
DF2
e l u
0.1 0.01 0.15
0.2 0.1 0.3
0.3 0.2 0.4
I want to combine these two data frames into single list like so:
L
[[1]]
$e: [(1 0.1);(2 0.2);(3 0.3)] #numeric
$l: [(0.5 0.01);(1 0.1);(2 0.2)] #numeric
$u: [(1.5 0.015);(3 0.3);(4 0.4)] #numeric
I have tried to rbind two data frames and then split by same column, also i was advised to use Map but it results in multiple lists not a single one or all variables become factors.
Thank you for any suggestions.
It looks like you want a list of arrays.
> mapply(cbind, DF1, DF2, SIMPLIFY=FALSE)
$e
[,1] [,2]
[1,] 1 0.1
[2,] 2 0.2
[3,] 3 0.3
$l
[,1] [,2]
[1,] 0.5 0.01
[2,] 1.0 0.10
[3,] 2.0 0.20
$u
[,1] [,2]
[1,] 1.5 0.15
[2,] 3.0 0.30
[3,] 4.0 0.40
Sorry for the dummy question
My sample data looks like this
DF
a b c
0.01 0.02 0.03
0.08 0.09 0.10
I use rank to assign ranks for values in DF as
s <- sapply(DF, rank, ties.method ="average")
How can then i assign values to ranks? Or apparently i don't understand something.
Thank you for any suggestions.
not sure if I understand question, assuming you want ranks within column (variable)
> set.seed(100)
> df<-matrix(rnorm(6),ncol=3)
> df
[,1] [,2] [,3]
[1,] -0.5021924 -0.07891709 0.1169713
[2,] 0.1315312 0.88678481 0.3186301
> s <- apply(df,2, rank, ties.method ="average")
> s
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
or if across all data.
> s <- matrix(rank(c(df), ties.method ="average"), ncol=3)
> s
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 6 5
I have a list with three matrixes:
a<-matrix(runif(100))
b<-matrix(runif(100))
c<-matrix(runif(100))
mylist<-list(a,b,c)
I would like to obtain the mean of each element in the three matrices.
I tried: aaply(laply(mylist, as.matrix), c(1, 1), mean) but this returns the means of each matrix instead of taking the mean of each element as rowMeans() would.
Maybe what you want is:
> set.seed(1)
> a<-matrix(runif(4))
> b<-matrix(runif(4))
> c<-matrix(runif(4))
> mylist<-list(a,b,c) # a list of 3 matrices
>
> apply(simplify2array(mylist), c(1,2), mean)
[,1]
[1,] 0.3654349
[2,] 0.4441000
[3,] 0.5745011
[4,] 0.5818541
The vector c(1,2) for MARGIN in the apply call indicates that the function mean should be applied to rows and columns (both at once), see ?apply for further details.
Another alternative is using Reduce function
> Reduce("+", mylist)/ length(mylist)
[,1]
[1,] 0.3654349
[2,] 0.4441000
[3,] 0.5745011
[4,] 0.5818541
The simplify2array option is really slow because it calls the mean function nrow*ncol times:
Unit: milliseconds
expr min lq mean median uq max neval
reduce 7.320327 8.051267 11.23352 12.17859 13.59846 13.72176 10
simplify2array 4233.090223 4674.827077 4802.74033 4808.00417 5010.75771 5228.05362 10
via_vector 27.720372 42.757517 51.95250 59.47917 60.11251 61.83605 10
for_loop 10.405315 12.919731 13.93157 14.46218 15.82175 15.89977 10
l=lapply(1:3,function(i)matrix(i*(1:1e6),10))
microbenchmark(times=10,
Reduce={Reduce(`+`,l)/length(l)},
simplify2array={apply(simplify2array(l),c(1,2),mean)},
via_vector={matrix(rowMeans(sapply(l,as.numeric)),nrow(l[[1]]))},
for_loop={o=l[[1]];for(i in 2:length(l))o=o+l[[i]];o/length(l)}
)
Your question is not clear.
For the mean of all elements of each matrix:
sapply(mylist, mean)
For the mean of every row of each matrix:
sapply(mylist, rowMeans)
For the mean of every column of each matrix:
sapply(mylist, colMeans)
Note that sapply will automatically simplify the results to a vector or matrix, if possible. In the first case, the result will be a vector, but in the second and third, it may be a list or matrix.
Example:
a <- matrix(1:6,2,3)
b <- matrix(7:10,2,2)
c <- matrix(11:16,3,2)
mylist <- list(a,b,c)
> mylist
[[1]]
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[[2]]
[,1] [,2]
[1,] 7 9
[2,] 8 10
[[3]]
[,1] [,2]
[1,] 11 14
[2,] 12 15
[3,] 13 16
Results:
> sapply(mylist, mean)
[1] 3.5 8.5 13.5
> sapply(mylist, rowMeans)
[[1]]
[1] 3 4
[[2]]
[1] 8 9
[[3]]
[1] 12.5 13.5 14.5
> sapply(mylist, colMeans)
[[1]]
[1] 1.5 3.5 5.5
[[2]]
[1] 7.5 9.5
[[3]]
[1] 12 15
I have a correlation matrix:
a <- matrix(c(1, .8, .8, .8, 1, .8, .8, .8, 1), 3)
## [,1] [,2] [,3]
## [1,] 1.0 0.8 0.8
## [2,] 0.8 1.0 0.8
## [3,] 0.8 0.8 1.0
I would now like to create a covariance matrix from the correlation matrix. How can this be done in R?
I tried:
e1.sd <- 3
e2.sd <- 10
e3.sd <- 3
e.cov <- a * as.matrix(c, e1.sd, e2.sd, e3.sd) %*% t(as.matrix(c(e1.sd, e2.sd, e3.sd)))
But I get the error:
Error in a * as.matrix(c, e1.sd, e2.sd, e3.sd) %*% t(as.matrix(c(e1.sd, :
non-conformable arrays
What am I doing wrong?
If you know the standard deviations of your individual variables, you can:
stdevs <- c(e1.sd, e2.sd, e3.sd)
#stdevs is the vector that contains the standard deviations of your variables
b <- stdevs %*% t(stdevs)
# b is an n*n matrix whose generic term is stdev[i]*stdev[j] (n is your number of variables)
a_covariance <- b * a #your covariance matrix
On the other hand, if you don't know the standard deviations, it's impossible.
require(MBESS)
a <- matrix(c(1,.8,.8,.8,1,.8,.8,.8,1),3)
> cor2cov(a,c(3,10,3))
[,1] [,2] [,3]
[1,] 9.0 24 7.2
[2,] 24.0 100 24.0
[3,] 7.2 24 9.0
Building on S4M's answer, in base R, I would write this function:
cor2cov <- function(V, sd) {
V * tcrossprod(sd)
}
tcrossprod will calculate the product of each combination of elements of the sd vector (equivalent to x %*% t(x)), which we then (scalar) multiply by the variance-covariance matrix
Here's a quick check that the function is correct using the built in mtcars data set:
all.equal(
cor2cov(cor(mtcars), sapply(mtcars, sd)),
cov(mtcars)
)
The answer marked as correct is wrong.
The correct solution seems to be the one provided by MBESS package, so see the post from dayne.
> a
[,1] [,2] [,3]
[1,] 1.0 0.8 0.8
[2,] 0.8 1.0 0.8
[3,] 0.8 0.8 1.0
> b <- c(3,10,3)
> b %*% t(b)
[,1] [,2] [,3]
[1,] 9 30 9
[2,] 30 100 30
[3,] 9 30 9
> c <- b %*% t(b)
> c %*% a
[,1] [,2] [,3]
[1,] 40.2 44.4 40.2
[2,] 134.0 148.0 134.0
[3,] 40.2 44.4 40.2
> cor2cov(cor.mat=a, b )
[,1] [,2] [,3]
[1,] 9.0 24 7.2
[2,] 24.0 100 24.0
[3,] 7.2 24 9.0
> a %*% c
[,1] [,2] [,3]
[1,] 40.2 134 40.2
[2,] 44.4 148 44.4
[3,] 40.2 134 40.2
>
I have a large dataset (202k points). I know that there are 8 values over 0.5. I want to subset on those rows.
How do I find/return a list the row numbers where the values are > 0.5?
If the dataset is a vector named x:
(1:length(x))[x > 0.5]
If the dataset is a data.frame or matrix named x and the variable of interest is in column j:
(1:nrow(x))[x[,j] > 0.5]
But if you just want to find the subset and don't really need the row numbers, use
subset(x, x > 0.5)
for a vector and
subset(x, x[,j] > 0.5)
for a matrix or data.frame.
which(x > 0.5)
Here's some dummy data:
D<-matrix(c(0.6,0.1,0.1,0.2,0.1,0.1,0.23,0.1,0.8,0.2,0.2,0.2),nrow=3)
Which looks like:
> D
[,1] [,2] [,3] [,4]
[1,] 0.6 0.2 0.23 0.2
[2,] 0.1 0.1 0.10 0.2
[3,] 0.1 0.1 0.80 0.2
And here's the logical row index,
index <- (rowSums(D>0.5))>=1
You can use it to extract the rows you want:
PeakRows <- D[index,]
Which looks like this:
> PeakRows
[,1] [,2] [,3] [,4]
[1,] 0.6 0.2 0.23 0.2
[2,] 0.1 0.1 0.80 0.2
Using the argument arr.ind=TRUE with which is a great way for finding the row (or column) numbers where a condition is TRUE,
df <- matrix(c(0.6,0.2,0.1,0.25,0.11,0.13,0.23,0.18,0.21,0.29,0.23,0.51), nrow=4)
# [,1] [,2] [,3]
# [1,] 0.60 0.11 0.21
# [2,] 0.20 0.13 0.29
# [3,] 0.10 0.23 0.23
# [4,] 0.25 0.18 0.51
which with arr.ind=TRUE returns the array indices where the condition is TRUE
which(df > 0.5, arr.ind=TRUE)
row col
[1,] 1 1
[2,] 4 3
so the subset becomes
df[-which(df > 0.5, arr.ind=TRUE)[, "row"], ]
# [,1] [,2] [,3]
# [1,] 0.2 0.13 0.29
# [2,] 0.1 0.23 0.23