R image function - r

I have trouble understanding the image function in R. I have the following matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 6.931799 7.092166 7.136029 6.735593 6.621951 6.740000 6.049774 6.162304 6.169014 5.626374
[2,] 7.942623 7.909091 9.923077 5.888889 8.647059 8.166667 6.625000 6.529412 7.571429 5.590643
[3,] 8.446237 6.800000 9.000000 9.631579 8.892857 7.083333 6.857143 6.250000 6.413793 5.491525
[4,] 7.698276 6.666667 8.833333 7.565217 9.100000 6.705882 6.421053 7.045455 6.045455 5.267857
[5,] 6.082524 8.300000 8.250000 8.777778 7.250000 7.928571 6.500000 6.920000 5.041667 4.970833
[6,] 6.128571 8.636364 7.300000 6.266667 7.500000 7.384615 6.727273 6.312500 5.638889 4.569231
[7,] 6.146739 7.000000 7.625000 6.615385 5.466667 5.941176 7.100000 6.687500 5.789474 4.479675
[8,] 5.403509 7.714286 6.500000 8.500000 6.384615 7.133333 6.294118 5.900000 5.615385 4.759804
[9,] 5.444444 5.666667 4.875000 6.200000 6.777778 6.166667 5.642857 6.222222 5.428571 4.385093
[10,] 5.186180 5.621118 5.004878 5.045016 4.875433 4.594340 4.260377 4.276382 4.205128 3.632721
and I would like to display it as an heatmap. To so so I use the image function as following
image(1:10,1:10,mat,axes=FALSE)
but the result is definitly not what is in my matrix!!]1
Any idea ?
thanks

Firstly, you should keep in mind that the matrix is printed from top left but plotted from bottom left, like Badger has said. Increasing the row index would move you to the right on the plot.
The color intensity increases from red to white.
Another thing that you might want to change is the range on your z value. The plot takes the min and max values from your matrix and sets that as the default range. However, you might want to add the following argument: zlim=c(0,10)
, so that your range is from 0 to 10?
Lastly, if you want your plot to correspond to the locations of your z values in the matrix, you could create a new matrix where you rotate your original matrix by 90 degrees clockwise:
t(apply(mat, 2, rev))

Related

Moving from a matrix of character names to a vector of those names (for fMRI data)

I have a lower triangular matrix of fMRI network connectivities of sum(1:235), so there are 27730 values. I have these values, however, I want to cbind another vector that has the names of these regions of interest (ROIs), but I'm not sure how I can move from the 236 vector of these ROIs to the filled out 27730 vector.
So the connections should go like this: SN1-SN2, SN1-SN3…..SN1-CB4, SN2-SN3 …. SN2-CB4, SN3-SN4 …SN3-CB4 and so on. If you take all the unique connections, then the first of 236 ROIs has 235 connections, second ROI has 234 connections, third ROI has 233 connections and so on. So the total unique connections are sum(1:235) = 27730.
Per a comment, though, I have changed the vector to only contain 7 of these values.
Thus, I've also changed the connectivities to have sum(1:8) values.
Thanks much!
roi <- c("SN2", "SN3", "SN4", "SN5", "CON1", "CON2", "CB4")
connectivities <- rnorm(1:28)
Here's a way:
m <- outer(roi, roi, paste, sep = "-")
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "SN2-SN2" "SN2-SN3" "SN2-SN4" "SN2-SN5" "SN2-CON1" "SN2-CON2" "SN2-CB4"
# [2,] "SN3-SN2" "SN3-SN3" "SN3-SN4" "SN3-SN5" "SN3-CON1" "SN3-CON2" "SN3-CB4"
# [3,] "SN4-SN2" "SN4-SN3" "SN4-SN4" "SN4-SN5" "SN4-CON1" "SN4-CON2" "SN4-CB4"
# [4,] "SN5-SN2" "SN5-SN3" "SN5-SN4" "SN5-SN5" "SN5-CON1" "SN5-CON2" "SN5-CB4"
# [5,] "CON1-SN2" "CON1-SN3" "CON1-SN4" "CON1-SN5" "CON1-CON1" "CON1-CON2" "CON1-CB4"
# [6,] "CON2-SN2" "CON2-SN3" "CON2-SN4" "CON2-SN5" "CON2-CON1" "CON2-CON2" "CON2-CB4"
# [7,] "CB4-SN2" "CB4-SN3" "CB4-SN4" "CB4-SN5" "CB4-CON1" "CB4-CON2" "CB4-CB4"
m[upper.tri(m)]
# [1] "SN2-SN3" "SN2-SN4" "SN3-SN4" "SN2-SN5" "SN3-SN5" "SN4-SN5" "SN2-CON1" "SN3-CON1" "SN4-CON1"
# [10] "SN5-CON1" "SN2-CON2" "SN3-CON2" "SN4-CON2" "SN5-CON2" "CON1-CON2" "SN2-CB4" "SN3-CB4" "SN4-CB4"
# [19] "SN5-CB4" "CON1-CB4" "CON2-CB4"
Because there are 7 in roi, the first element ("SN2") has six connections; second element ("SN3") has five; etc ... producing 21 total connections.
Another way, using (and improving on) Ben's use of combn:
apply(combn(roi,2), 2, paste, collapse = "-")
# [1] "SN2-SN3" "SN2-SN4" "SN2-SN5" "SN2-CON1" "SN2-CON2" "SN2-CB4" "SN3-SN4" "SN3-SN5" "SN3-CON1"
# [10] "SN3-CON2" "SN3-CB4" "SN4-SN5" "SN4-CON1" "SN4-CON2" "SN4-CB4" "SN5-CON1" "SN5-CON2" "SN5-CB4"
# [19] "CON1-CON2" "CON1-CB4" "CON2-CB4"
Here is an example with a smaller set of values (7). For 7 values, there are 21 combinations: 6 + 5 + 4 + 3 + 2 + 1 = 45.
roi <- c("SN2", "SN3", "SN4", "SN5", "CON1", "CON2", "CB4")
The combn() function generates the desired output as a matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] "SN2" "SN2" "SN2" "SN2" "SN2" "SN2" "SN3" "SN3" "SN3" "SN3" "SN3"
[2,] "SN3" "SN4" "SN5" "CON1" "CON2" "CB4" "SN4" "SN5" "CON1" "CON2" "CB4"
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,] "SN4" "SN4" "SN4" "SN4" "SN5" "SN5" "SN5" "CON1" "CON1" "CON2"
[2,] "SN5" "CON1" "CON2" "CB4" "CON1" "CON2" "CB4" "CON2" "CB4" "CB4"
To get your final desired output, transpose the matrix, convert to data.frame, and use unite() from tidyr to stitch the two roi values together.
library(dplyr) # for the piper %>%
library(tidy)
combn(roi, 2) %>%
t() %>% as.data.frame() %>%
unite(col = "combination", sep = "-")
combination
1 SN2-SN3
2 SN2-SN4
3 SN2-SN5
4 SN2-CON1
5 SN2-CON2
6 SN2-CB4
7 SN3-SN4
8 SN3-SN5
9 SN3-CON1
10 SN3-CON2
11 SN3-CB4
12 SN4-SN5
13 SN4-CON1
14 SN4-CON2
15 SN4-CB4
16 SN5-CON1
17 SN5-CON2
18 SN5-CB4
19 CON1-CON2
20 CON1-CB4
21 CON2-CB4

Is it possible to read only a sample of data using readRDS?

I have some big matrix saved using saveRDS:
# create same big matrix and save it
x = matrix(c(1:(10*10000)),10000,10)
saveRDS(x, 'test.RDS')
Now I would like to analyze only a sample on the data, but before taking the sample, I have been reading the full matrix:
# load big matrix and take a sample on the data after reading the data
x <- readRDS('test.RDS')
set.seed(1)
x[sample.int(dim(x)[1],5),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017
I wonder whether it is possible to read only a sample on the data stored into an RDS file? That would mean not reading the whole matrix into memory before taking the sample, but somehow skip the data which does not belong to the sample?
I tried the following, and got the same result:
# find out the size of the matrix and load only the part of the matrix which is needed?
n <- dim(readRDS('test.RDS'))[1]
set.seed(1)
readRDS('test.RDS')[sample.int(dim(x)[1],5),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017
How could I read a sample on RDS file without putting the full data temporarily into memory?
Alternatively, what kind of storing & loading functions one should use in order to be able to read only a sample from a file containing a matrix or data frame?

Calculate average of lowest values of matrix rows

I have a large matrix, e.g.
> mat = matrix(runif(100), ncol = 5)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0.264442954 0.6408534 0.76472904 0.2437074 0.08019882
[2,] 0.575443586 0.6428957 0.44188123 0.0230842 0.07502289
[3,] 0.894885901 0.5926238 0.55431966 0.7717503 0.52806173
[4,] 0.231978411 0.1192595 0.08170498 0.4264405 0.97486053
[5,] 0.344765840 0.5349323 0.85523617 0.2257759 0.20549035
[6,] 0.499130844 0.9882825 0.99417390 0.8070708 0.29963075
[7,] 0.613479990 0.8877605 0.34282782 0.9525512 0.91488004
[8,] 0.967166001 0.6115709 0.68169111 0.3067973 0.30094691
[9,] 0.957612804 0.5565989 0.88180650 0.3359184 0.17980137
[10,] 0.342177768 0.7735620 0.48154937 0.3692096 0.31299886
[11,] 0.871928110 0.3397143 0.57596030 0.4749349 0.47800019
[12,] 0.387563040 0.1656725 0.47796646 0.8956274 0.68345302
[13,] 0.628535870 0.3418692 0.86513964 0.8052477 0.01850535
[14,] 0.379472842 0.9176644 0.08829197 0.8548662 0.42151935
[15,] 0.071958980 0.6644800 0.90061596 0.4484674 0.32649345
[16,] 0.229463192 0.9995178 0.63995121 0.8369698 0.35091430
[17,] 0.291761976 0.5014815 0.35260028 0.6188047 0.68192891
[18,] 0.077610797 0.2747788 0.07084273 0.5977530 0.37134566
[19,] 0.675912490 0.6059304 0.29321852 0.5638336 0.73866322
[20,] 0.006010715 0.7697045 0.43627939 0.1723969 0.88665973
I want to extract the lowest and highest 2 values of each row and calculate their average.
Eventually, I'd like to generate a new matrix where the first column in the average of the lowest values, and the second column is the average of the highest values.
Thanks in advance!
I believe this does what you want:
do.call(rbind, apply(mat,1, function(x) {sorted = sort(x);
return(data.frame(min=mean(head(sorted,2)), max=mean(tail(sorted,2))))}))
Output:
min max
1 0.14333229 0.8877635
2 0.12311651 0.5283049
3 0.09367614 0.5433373
4 0.39926848 0.6361645
5 0.05196898 0.5473783
6 0.12876148 0.6153546
7 0.29893684 0.8436462
8 0.14254481 0.7023039
9 0.20889814 0.8863141
10 0.44838327 0.8641790
11 0.14859312 0.5533045
12 0.19728414 0.8619284
13 0.37049481 0.7448965
14 0.30070570 0.9320575
15 0.30333510 0.6774024
16 0.21908982 0.7077274
17 0.61804571 0.9239816
18 0.36525615 0.8531795
19 0.22751108 0.4993744
20 0.14251095 0.6353147
Hope this helps!

How to replace ties with NA in R

I am working on a function to return the column name of the largest value for each row. Something like:
colnames(x)[apply(x,1,which.max)]
However, before applying a function like this is there a straight forward and general way to replace ties with NA (or any other arbitrary letter etc.)?
I have the following matrix:
0 1
[1,] 5.000000e-01 0.5000000000
[2,] 9.901501e-01 0.0098498779
[3,] 9.981358e-01 0.0018641935
[4,] 9.996753e-01 0.0003246823
[5,] 9.998598e-01 0.0001402322
[6,] 1.303731e-02 0.9869626938
[7,] 1.157919e-03 0.9988420815
[8,] 6.274074e-07 0.9999993726
[9,] 1.659164e-07 0.9999998341
[10,] 6.517362e-08 0.9999999348
[11,] 8.951474e-06 0.9999910485
[12,] 5.070740e-06 0.9999949293
[13,] 1.278186e-07 0.9999998722
[14,] 9.914646e-08 0.9999999009
[15,] 7.058751e-08 0.9999999294
[16,] 2.847667e-09 0.9999999972
[17,] 1.675766e-08 0.9999999832
[18,] 2.172290e-06 0.9999978277
[19,] 4.964820e-06 0.9999950352
[20,] 1.333680e-07 0.9999998666
[21,] 2.087793e-07 0.9999997912
[22,] 2.358360e-06 0.9999976416
The first row has equal values for variables which I would like to replace with NA. While this is simple for this particular example, I want to be able to replace all ties with NA where they occur in any size matrix i.e. in this matrix:
1 2 3
[1,] 0.25 0.25 0.5
[2,] 0.3 0.3 0.3
all values would be replaced with NA except for [1,3]
I have looked at the function which.max.simple() which can deal with ties by replacing with NA but it doesn't appear to work any more, and all other methods of dealing with ties don't address my issue
I hope that makes sense
Thanks,
C
Here's a simple approach to replace any row-wise duplicated values with NA in a matrix m:
is.na(m) <- t(apply(m, 1, FUN = function(x) {
duplicated(x) | duplicated(x, fromLast = TRUE)}))
But consider the following notes:
1) be extra careful when comparing floating point numbers for equality (see Why are these numbers not equal?);
2) depending on your ultimate target, there may be simpler ways than replacing duplicated in your data (since it seems that you are only interested in column names); and
3) if you are going to replace values in a numeric matrix, don't use arbitrary characters for replacement since that will convert your whole matrix to character class (replacement with NA is not a problem)

Matrix multiplication R

I just have an easy question: I have these two matrices
Matrix Y (264 rows and 4 columns)
[,1] [,2] [,3] [,4]
1751 -1.745529 0.3692280 0.04607022 -0.07004973
1752 -1.532722 0.5642921 0.07477571 0.03380135
1753 -1.657636 0.4660229 0.05772685 -0.03314599
1754 -1.685309 0.4540047 0.08254891 -0.01623810
1755 -1.702469 0.4483389 0.10709689 -0.03936556
1756 -1.761332 0.4505378 0.04801420 -0.06385137
Matrix E (4x4,of elements e)
[,1] [,2] [,3] [,4]
[1,] -0.8769976 -0.4706054 -0.07186508 0.06512449
[2,] -0.4085563 0.8198519 -0.40067903 -0.01951755
[3,] 0.2190770 -0.3206892 -0.86394973 -0.32055350
[4,] -0.1263415 0.0594299 0.29644997 -0.94478745
I want to do this for each year b(t)=∑(e[1,i]∙Y[,i]) with i from 1 to 4.
This is what I should get (a matrix 264x4),and this is the code I've used
betaNew1<-(Y[,1]%*%t(P[1,1]))
betaNew2<-(Y[,2]%*%t(P[1,2]))
betaNew3<-(Y[,3]%*%t(P[1,3]))
betaNew4<-(Y[,3]%*%t(P[1,4]))
beta_t<-data.frame(betaNew1,betaNew2,betaNew3,betaNew4)
betaNew1 betaNew2 betaNew3 betaNew4
1 1.530825 -0.1737607 -0.003310840 0.003000300
2 1.344193 -0.2655589 -0.005373763 0.004869730
3 1.453743 -0.2193129 -0.004148544 0.003759431
4 1.478012 -0.2136570 -0.005932384 0.005375955
5 1.493062 -0.2109907 -0.007696526 0.006974630
6 1.544684 -0.2120255 -0.003450544 0.003126900
How can I avoid to use 4 instructions?
We can try
res <- lapply(seq_len(nrow(P)), function(i) Y*P[i,][col(Y)])

Resources