R: how do apply the sum function in a list? - r

I am having a problem with summing the rows of my matrices. I have a list formed by 30 matrices
Matrix<-matrix(1:45, ncol=9)
List<-list(lapply(seq_len(30), function(X) Matrix))
The idea is to create 30 matrices size 5*3. Firstly, I need to sum some columns, 1:3 4:6 7:9, such that the result will be the following:
[,1] [,2] [,3]
[1,] 18 63 108
[2,] 21 66 111
[3,] 34 69 114
[4,] 47 72 117
[5,] 30 75 120
I am trying to get this matrix using this code:
Y<-lapply(List, function(x) rowSums(x[, 1:3]))
But, it only allows me to sum the 3 firsts columns.
After this, I need to sum the list and obtain only one matrix(5*3). I think that the command final<-reduce(Y,+) could help.
540 1890 3240
630 1980 3330
1020 2070 3420
1410 2160 3510
900 2250 3600
Thank you for your help

You need to find someway to group your columns by threes, for example:
grp = (1:ncol(Matrix) -1) %/% 3
or if you know the dimensions:
grp = rep(0:2,each=3)
To do rowSums in columns of threes, we can do this with a function:
SumCols = function(M,col_grp){
sapply(unique(col_grp),function(i)rowSums(M[,col_grp==i]))
}
SumCols(Matrix,grp)
[,1] [,2] [,3]
[1,] 18 63 108
[2,] 21 66 111
[3,] 24 69 114
[4,] 27 72 117
[5,] 30 75 120
So put this inside your List of matrices,
Reduce("+",lapply(List[[1]],SumCols,grp))
[,1] [,2] [,3]
[1,] 540 1890 3240
[2,] 630 1980 3330
[3,] 720 2070 3420
[4,] 810 2160 3510
[5,] 900 2250 3600

Here is another base R solution
out <- Reduce(`+`,Map(function(x) do.call(cbind,Map(rowSums, split.default(data.frame(x),ceiling(seq(ncol(x))/3)))),List[[1]]))
such that
> out
0 1 2
[1,] 540 1890 3240
[2,] 630 1980 3330
[3,] 720 2070 3420
[4,] 810 2160 3510
[5,] 900 2250 3600

Related

Create a vector that contains the first 10 powers of 2, then the first 10 powers of 3 by R language

I try to use x <- rep(2,10) , but I don't know what I should do next. Can anyone give me some advice? Thanks in advance.
sapply(2:3, function(v) v**(1:10))
gives
[,1] [,2]
[1,] 2 3
[2,] 4 9
[3,] 8 27
[4,] 16 81
[5,] 32 243
[6,] 64 729
[7,] 128 2187
[8,] 256 6561
[9,] 512 19683
[10,] 1024 59049
You don't need the rep, because, for example, R recycles the base 2 as the power of 1, 2, 3...10 is calculated, the latter expressed in R shorthand as the vector 1:10
2^(1:10)
#[1] 2 4 8 16 32 64 128 256 512 1024
3^(1:10)
#[1] 3 9 27 81 243 729 2187 6561 19683 59049
Parantheses are necessary around (1:10); otherwise, R would interpret this as simply wanting the vector 2:10
2^1:10
#[1] 2 3 4 5 6 7 8 9 10

How do I sum over specific number of columns in dataframe in R?

I have a dataframe (cenMca) with 1020 rows and 800 columns.
Each 4 columns, I have a set of data I call "cen". So, from column 1 to 4, I have cen 1, from 5 to 8, I have cen2 and so on.
I wanted to split cenMca into 200 hundred smaller dataframes of dimensions equal to 1020 lines by 4 columns and sum the values per row. For this I'd apply a function sum to each row, however, I searched for ways to split my dataframe in the way I wanted, but failed in doing so. Also, I have no idea how I would iterate through these smaller dataframes to save each with a different name.
So I thought that instead of breaking cenMca into smaller dataframes, I'd sum the values from cenMca and assign them to a single dataframe I called sumvec. So, for every 4 columns in cenMca, I'd have one corresponding column in sumvec. This gives sumvec dimensions equal to 1020 rows and 200 columns.
To accomplish this, I tried:
sumvec = matrix(NA,1020,200)
for (i in 1:1020 ){
for (j in seq(1,800,4)){
sumvec[i,(j+3)/4] = cenMca[i,j]+cenMca[i,j+1]+cenMca[i,j+2]+cenMca[i,j+3]
}
}
The first for runs through rows, and the second for runs through the columns. My increment is 4 for the second for because then I'd get all four values I wanted in a cycle.
I know this is far from efficient, but I thought it'd work.
After I ran the script, I got this:
I tried warnings() but nothing came up. All I have in sumvec is "NA"
How could I fix this?
Other techniques on how to get this done will be appreciated. Thank you.
This seems like a good application for rowSums. You could use lapply to run it over the grouped columns like you're trying to do.
I'll use similar data setup as #R.Schifini:
set.seed(1)
z <- matrix( rnorm( 1020*800 ), ncol = 800 )
Make it a data frame, like your data.
z <- as.data.frame(z)
Now group the data frame into groups of 4 columns, running rowSums on each group.
x <- lapply( seq.int( 1, ncol(z), 4 ),
function(i) {
rowSums( z[ , i:(i+3) ] )
} )
Bind it together as a single data frame, with the column names you need.
x <- as.data.frame( x, col.names = paste0( "cen", seq_along( x ) ) )
Here's a small sample of the output.
> head( x[1:6] )
cen1 cen2 cen3 cen4 cen5 cen6
1 -0.8027240 -0.7437158 -1.5305678 -0.7055544 2.0122082 0.7851487
2 0.0854064 0.2422316 -2.5071390 1.7854231 -3.5219698 -0.7699433
3 1.2738387 1.7360087 1.4317099 -3.3501584 -1.8412381 -2.1396324
4 -0.5864149 -0.5648199 -0.3099392 -1.9144969 0.7874474 -2.4840934
5 -0.3887289 -1.0745042 -1.9729363 1.8971846 -4.3374676 2.5744197
6 0.9104741 -0.7546090 4.2516971 1.0335885 2.6814576 -0.2548666
Is this what you are trying to achieve?
I'll create a sample matrix (also works if it is a data frame)
z = matrix(floor(runif(120, 0, 100)), ncol = 12)
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 37 50 37 0 71 84 29 65 0 34 33 65
[2,] 53 60 17 44 39 94 16 66 72 12 27 32
[3,] 10 26 5 26 11 58 39 47 71 38 11 19
[4,] 80 42 65 93 24 50 45 96 18 92 4 11
[5,] 73 36 57 71 86 18 43 40 64 80 37 99
[6,] 5 94 98 16 43 0 51 84 54 75 33 37
[7,] 48 12 60 47 49 87 84 75 33 95 17 56
[8,] 92 7 6 69 69 13 5 53 63 99 62 73
[9,] 4 96 16 46 76 2 55 87 82 60 39 87
[10,] 29 44 47 95 15 93 68 46 70 2 95 57
Then add columns in groups of four:
result = z[,seq(1,12,by = 4)]+z[,seq(2,12,by = 4)]+z[,seq(3,12,by = 4)]+z[,seq(4,12,by = 4)]
> result
[,1] [,2] [,3]
[1,] 124 249 132
[2,] 174 215 143
[3,] 67 155 139
[4,] 280 215 125
[5,] 237 187 280
[6,] 213 178 199
[7,] 167 295 201
[8,] 174 140 297
[9,] 162 220 268
[10,] 215 222 224
First of all, you don't need to loop over rows. R works well with vectors.
Secondly, NAs in sumvec might be results of NAs in cenMca. If you have NAs in cenMca, use sum instead of +.
for (j in seq(1,800,4)) sumvec[,(j+3)/4] <- apply(cenMca[,j:(j+3)],1,sum, na.rm=T)
Hope, this helps.

Repeating sequences in R

I want to generate a vector of the the numbers (1:5), (1:5)+45 and so on
nums <- seq(1,22500,45)
rws <- c(1:5)
nums2 <- nums - 1
for (i in nums2[2:500]){
rwsx <- append(rws, rws+i)
rwsx}
But the loop just stores the most recent append and I get:
1 2 3 4 5 22456 22457 22458 22459 22460
It sounds like you're looking for outer. Try:
> nums <- seq(1, 22500, 45)
> out <- outer(nums, 0:4, "+")
> head(out)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 46 47 48 49 50
[3,] 91 92 93 94 95
[4,] 136 137 138 139 140
[5,] 181 182 183 184 185
[6,] 226 227 228 229 230
> tail(out)
[,1] [,2] [,3] [,4] [,5]
[495,] 22231 22232 22233 22234 22235
[496,] 22276 22277 22278 22279 22280
[497,] 22321 22322 22323 22324 22325
[498,] 22366 22367 22368 22369 22370
[499,] 22411 22412 22413 22414 22415
[500,] 22456 22457 22458 22459 22460
As mentioned in the comments, a matrix is a vector with dimensional attributes. Matrices in R a generally constructed by column, so if you want to remove the dimensions and get a single vector in the row-wise order, then you need to transpose the matrix first.
> head(as.vector(t(out)), 16)
[1] 1 2 3 4 5 46 47 48 49 50 91 92 93 94 95 136
We can try
rwsx <- matrix(nums,500) %*% matrix(rws,1)
head(rwsx)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 46 92 138 184 230
[3,] 91 182 273 364 455
[4,] 136 272 408 544 680
[5,] 181 362 543 724 905
[6,] 226 452 678 904 1130
r <- 22500%/%45
m <- matrix(45*0:(r-1), r, 5)
m <- m+ col(m)
rwsx <- c(t(m))
or
r <- 22500%/%45
m <- matrix(45*0:(r-1), 5, r, byrow=TRUE)
m <- m+ row(m)
rwsx <- c(m)
As oneliner:
as.vector(sapply(seq(by = 45, length.out = 10), function(x) x + 0:4))

Segment vector according to whether or not values are above a threshold in R

I have a long vector and I need to divide it into segments according to a threshold. A segment is consecutive values over the threshold. When values drop below the threshold, the segment ends and the next segment begins where the values once again cross above the threshold. I need to record the start and end indices of each segment.
Below is an inefficient implementation. What's the fastest and most appropriate way to write this? This is pretty ugly, I have to assume that there's a cleaner implementation.
set.seed(10)
test.vec <- rnorm(100, 8, 10)
threshold <- 0
segments <- list()
in.segment <- FALSE
for(i in 1:length(test.vec)){
# If we're in a segment
if(in.segment){
if(test.vec[i] > threshold){
next
}else{
end.ind <- i - 1
in.segment <- FALSE
segments[[length(segments) + 1]] <- c(start.ind, end.ind)
}
}
# if not in segment
else{
if(test.vec[i] > threshold){
start.ind <- i
in.segment <- TRUE
}
}
}
EDIT: Runtime of all solutions
Thanks for all the replies, this has been helpful and very instructive. A small test of all five solutions is below (the four provided plus the original example). As you can see, all four are a huge improvement over the original solution, but Khashaa's solution is by far the fastest.
set.seed(1)
test.vec <- rnorm(1e6, 8, 10);threshold <- 0
originalFunction <- function(x, threshold){
segments <- list()
in.segment <- FALSE
for(i in 1:length(test.vec)){
# If we're in a segment
if(in.segment){
if(test.vec[i] > threshold){
next
}else{
end.ind <- i - 1
in.segment <- FALSE
segments[[length(segments) + 1]] <- c(start.ind, end.ind)
}
}
# if not in segment
else{
if(test.vec[i] > threshold){
start.ind <- i
in.segment <- TRUE
}
}
}
segments
}
SimonG <- function(x, threshold){
hit <- which(x > threshold)
n <- length(hit)
ind <- which(hit[-1] - hit[-n] > 1)
starts <- c(hit[1], hit[ ind+1 ])
ends <- c(hit[ ind ], hit[n])
cbind(starts,ends)
}
Rcpp::cppFunction('DataFrame Khashaa(NumericVector x, double threshold) {
x.push_back(-1);
int n = x.size(), startind, endind;
std::vector<int> startinds, endinds;
bool insegment = false;
for(int i=0; i<n; i++){
if(!insegment){
if(x[i] > threshold){
startind = i + 1;
insegment = true; }
}else{
if(x[i] < threshold){
endind = i;
insegment = false;
startinds.push_back(startind);
endinds.push_back(endind);
}
}
}
return DataFrame::create(_["start"]= startinds, _["end"]= endinds);
}')
bgoldst <- function(x, threshold){
with(rle(x>threshold),
t(matrix(c(0L,rep(cumsum(lengths),2L)[-length(lengths)]),2L,byrow=T)+1:0)[values,])
}
ClausWilke <- function(x, threshold){
suppressMessages(require(dplyr, quietly = TRUE))
in.segment <- (x > threshold)
start <- which(c(FALSE, in.segment) == TRUE & lag(c(FALSE, in.segment) == FALSE)) - 1
end <- which(c(in.segment, FALSE) == TRUE & lead(c(in.segment, FALSE) == FALSE))
data.frame(start, end)
}
system.time({ originalFunction(test.vec, threshold); })
## user system elapsed
## 66.539 1.232 67.770
system.time({ SimonG(test.vec, threshold); })
## user system elapsed
## 0.028 0.008 0.036
system.time({ Khashaa(test.vec, threshold); })
## user system elapsed
## 0.008 0.000 0.008
system.time({ bgoldst(test.vec, threshold); })
## user system elapsed
## 0.065 0.000 0.065
system.time({ ClausWilke(test.vec, threshold); })
## user system elapsed
## 0.274 0.012 0.285
Here's another option, mostly using which. The start and end points are determined by finding the non-consecutive elements of the hit sequence.
test.vec <- rnorm(100, 8, 10)
threshold <- 0
findSegments <- function(x, threshold){
hit <- which(x > threshold)
n <- length(hit)
ind <- which(hit[-1] - hit[-n] > 1)
starts <- c(hit[1], hit[ ind+1 ])
ends <- c(hit[ ind ], hit[n])
cbind(starts,ends)
}
findSegments(test.vec, threshold=0)
This gives something like:
> findSegments(test.vec, threshold=0)
starts ends
[1,] 1 3
[2,] 5 7
[3,] 9 11
[4,] 13 28
[5,] 30 30
[6,] 32 32
[7,] 34 36
[8,] 38 39
[9,] 41 41
[10,] 43 43
[11,] 46 51
[12,] 54 54
[13,] 56 61
[14,] 63 67
[15,] 69 72
[16,] 76 77
[17,] 80 81
[18,] 83 84
[19,] 86 88
[20,] 90 92
[21,] 94 95
[22,] 97 97
[23,] 100 100
Compare that to the original sequence:
> round(test.vec,1)
[1] 20.7 15.7 4.3 -15.1 24.6 9.4 23.2 -4.5 16.9 20.9 13.2 -1.2
[13] 22.6 7.7 6.0 6.6 4.1 21.3 5.3 16.7 11.4 16.7 19.6 16.7
[25] 11.6 7.3 3.7 8.4 -4.5 11.7 -7.1 8.4 -18.5 12.8 22.5 11.0
[37] -3.3 11.1 6.9 -7.9 22.9 -3.7 3.5 -7.1 -5.9 3.5 13.2 20.0
[49] 13.2 23.4 15.9 -5.0 -6.3 10.0 -6.2 4.7 2.1 26.4 5.9 27.3
[61] 14.3 -12.4 28.4 30.9 18.2 11.4 5.7 -4.5 6.2 12.0 10.9 11.1
[73] -2.0 -9.0 -1.4 15.4 19.1 -1.6 -5.4 5.4 7.8 -5.6 15.2 13.8
[85] -18.8 7.1 17.1 9.3 -3.9 22.6 1.7 28.9 -21.3 21.2 8.2 -15.4
[97] 3.2 -10.2 -6.2 14.1
I like for loops for translation to Rcpp is straightforward.
Rcpp::cppFunction('DataFrame findSegment(NumericVector x, double threshold) {
x.push_back(-1);
int n = x.size(), startind, endind;
std::vector<int> startinds, endinds;
bool insegment = false;
for(int i=0; i<n; i++){
if(!insegment){
if(x[i] > threshold){
startind = i + 1;
insegment = true; }
}else{
if(x[i] < threshold){
endind = i;
insegment = false;
startinds.push_back(startind);
endinds.push_back(endind);
}
}
}
return DataFrame::create(_["start"]= startinds, _["end"]= endinds);
}')
set.seed(1); test.vec <- rnorm(1e7,8,10); threshold <- 0;
system.time(findSegment(test.vec, threshold))
# user system elapsed
# 0.045 0.000 0.045
# #SimonG's solution
system.time(findSegments(test.vec, threshold))
# user system elapsed
# 0.533 0.012 0.548
with(rle(test.vec>threshold),t(matrix(c(0L,rep(cumsum(lengths),2L)[-length(lengths)]),2L,byrow=T)+1:0)[values,]);
## [,1] [,2]
## [1,] 1 8
## [2,] 10 13
## [3,] 16 17
## [4,] 20 26
## [5,] 28 28
## [6,] 30 34
## [7,] 36 38
## [8,] 41 46
## [9,] 48 49
## [10,] 51 53
## [11,] 55 81
## [12,] 84 90
## [13,] 92 100
Explanation
test.vec>threshold
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Compute which elements in the input vector are above the threshold using vectorized comparison.
rle(...)
## Run Length Encoding
## lengths: int [1:25] 8 1 4 2 2 2 7 1 1 1 ...
## values : logi [1:25] TRUE FALSE TRUE FALSE TRUE FALSE ...
Compute the run-length encoding of the logical vector. This returns a list classed as 'rle' which contains two named components: lengths, containing the lengths of each run-length, and values, containing the value that ran that length, which in this case will be TRUE or FALSE, with the former representing a segment of interest, and the latter representing a non-segment run length.
with(...,...)
The first argument is the run-length encoding as described above. This will evaluate the second argument in a virtual environment consisting of the 'rle'-classed list, thus making the lengths and values components accessible as lexical variables.
Below I dive into the contents of the second argument.
cumsum(lengths)
## [1] 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100
Compute the cumulative sum of the lengths. This will form the basis for computing both the start indexes and end indexes of each run-length. Critical point: Each element of the cumsum represents the end index of that run-length.
rep(...,2L)
## [1] 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100
Duplicate the cumulative sum. The first repetition will serve as the basis for the start indexes, the second the end. I will henceforth refer to these repetitions as the "start-index repetition" and the "end-index repetition".
c(0L,...[-length(lengths)])
## [1] 0 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100
This removes the last element at the end of the start-index repetition, and prepends a zero to the beginning of it. This effectively lags the start-index repetition by one element. This is necessary because we need to compute each start index by adding one to the previous run-length's end index, taking zero as the end index of the non-existent run-length prior to the first.
matrix(...,2L,byrow=T)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## [1,] 0 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91
## [2,] 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100
This builds a two-row matrix out of the previous result. The lagged start-index repetition is the top row, the end-index repetition is the bottom row.
...+1:0
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## [1,] 1 9 10 14 16 18 20 27 28 29 30 35 36 39 41 47 48 50 51 54 55 82 84 91 92
## [2,] 8 9 13 15 17 19 26 27 28 29 34 35 38 40 46 47 49 50 53 54 81 83 90 91 100
R cycles this two-element addend across rows first, then across columns, thus this adds one to the top row. This completes the computation of the start indexes.
t(...)
## [,1] [,2]
## [1,] 1 8
## [2,] 9 9
## [3,] 10 13
## [4,] 14 15
## [5,] 16 17
## [6,] 18 19
## [7,] 20 26
## [8,] 27 27
## [9,] 28 28
## [10,] 29 29
## [11,] 30 34
## [12,] 35 35
## [13,] 36 38
## [14,] 39 40
## [15,] 41 46
## [16,] 47 47
## [17,] 48 49
## [18,] 50 50
## [19,] 51 53
## [20,] 54 54
## [21,] 55 81
## [22,] 82 83
## [23,] 84 90
## [24,] 91 91
## [25,] 92 100
Transpose to a two-column matrix. This is not entirely necessary, if you're ok with getting the result as a two-row matrix.
...[values,]
## [,1] [,2]
## [1,] 1 8
## [2,] 10 13
## [3,] 16 17
## [4,] 20 26
## [5,] 28 28
## [6,] 30 34
## [7,] 36 38
## [8,] 41 46
## [9,] 48 49
## [10,] 51 53
## [11,] 55 81
## [12,] 84 90
## [13,] 92 100
Subset just the segments of interest. Since values is a logical vector representing which run-lengths surpassed the threshold, we can use it directly as a row index vector.
Performance
I guess I'm screwing myself here, but SimonG's solution performs about twice as well as mine:
bgoldst <- function() with(rle(test.vec>threshold),t(matrix(c(0L,rep(cumsum(lengths),2L)[-length(lengths)]),2L,byrow=T)+1:0)[values,]);
simong <- function() findSegments(test.vec,threshold);
set.seed(1); test.vec <- rnorm(1e7,8,10); threshold <- 0;
identical(bgoldst(),unname(simong()));
## [1] TRUE
system.time({ bgoldst(); })
## user system elapsed
## 1.344 0.204 1.551
system.time({ simong(); })
## user system elapsed
## 0.656 0.109 0.762
+1 from me...
Here is another solution that I think is simpler. Note that you have to use set.seed(10), not set.seed <- 10, to set the seed of the random number generator.
require(dplyr) # for lead() and lag()
set.seed(10)
test.vec <- rnorm(100, 8, 10)
threshold <- 0
in.segment <- (test.vec > threshold)
start <- which(c(FALSE, in.segment) == TRUE & lag(c(FALSE, in.segment) == FALSE)) - 1
end <- which(c(in.segment, FALSE) == TRUE & lead(c(in.segment, FALSE) == FALSE))
segments <- data.frame(start, end)
head(segments)
## start end
## 1 1 2
## 2 4 6
## 3 8 8
## 4 10 16
## 5 18 21
## 6 23 23
In general, in R, if you find yourself writing complicated loops and if statements you're probably doing it wrong. Most problems can be solved in a vectorized form.

Calculating ratios and put them into a matrix in R

I have a table similar to this one, and want to calculate the ratio between column A and B. For example:
A B C D E F
[1,] 187 174 183 115 101 104
[2,] 451 166 177 842 101 133
[3,] 727 171 187 12803 98 134
[4,] 1532 181 196 730 98 108
[5,] 4139 188 214 20358 105 159
[6,] 689 185 211 1633 110 162
[7,] 1625 184 195 2283 109 114
[8,] 771 181 190 904 105 110
[9,] 950 177 190 1033 106 112
[10,] 703 180 191 463 106 110
[11,] 2052 178 188 2585 100 105
[12,] 1161 178 187 2874 99 110
[13,] 214 175 184 173 98 110
[14,] 473 184 191 971 104 111
[15,] 756 185 193 14743 107 114
I want to create a new matrix that has all of those previous rows as new rows and columns (15 rows and 15 columns) like so (values in parentheses are placeholders for the calculated ratios):
[,1] [,2] [,3] [,4]
[1,] (A1:B1) (A1:B2) (A1:B3) (A1:B4) ...
[2,]
[3,]
[4,]
...
That is maybe not the best example, but I hope it is not too confusing.
To calculate the ratios A1:B1, A2:B2, A3:B3 I could do something like:
data.matrix(data["A"]/data["B"])
And to do it for all, I would do something like:
data.matrix(data[1,]/data[1,1])
data.matrix(data[1,]/data[1,2])
...
and so on.
This seems to be a lot of work and maybe someone knows a quicker and more efficient method.
EDIT
I thought the combn function would work, but then I figured out it doesn't. When I have a 2 column matrix, such as:
A B
[1,] 187 115
[2,] 451 842
[3,] 727 12803
[4,] 1532 730
[5,] 4139 20358
[6,] 689 1633
[7,] 1625 2283
[8,] 771 904
[9,] 950 1033
[10,] 703 463
[11,] 2052 2585
[12,] 1161 2874
[13,] 214 173
[14,] 473 971
[15,] 756 14743
And I use the combn function to calculate all possible ratios (A1:B1, A1:B2, ... A2:B1, A2:B2...) I get just the result for A1 vs all values of B.
> combn(ncol(data), 2, function(x) data[,x[1]]/data[,x[2]])
[,1]
[1,] 1.62608696
[2,] 0.53562945
[3,] 0.05678357
[4,] 2.09863014
[5,] 0.20331074
[6,] 0.42192284
[7,] 0.71178274
[8,] 0.85287611
[9,] 0.91965150
[10,] 1.51835853
[11,] 0.79381044
[12,] 0.40396660
[13,] 1.23699422
[14,] 0.48712667
[15,] 0.05127857
Or maybe I just don't understand the combn function and I am doing something wrong here.
You can achieve what you want by using expand.grid, apply and matrix functions as below
I am assuming what you want is matrix like
A1/B1 A1/B2 A1/B3 ...
A2/B1 A2/B2 A2/B3 ...
... ... ... ...
... ... ... ...
Here is the code to do that. Explanation in comments
txt <- "A B C D E F\n187 174 183 115 101 104\n451 166 177 842 101 133\n727 171 187 12803 98 134\n1532 181 196 730 98 108\n4139 188 214 20358 105 159\n689 185 211 1633 110 162\n1625 184 195 2283 109 114\n771 181 190 904 105 110\n950 177 190 1033 106 112\n703 180 191 463 106 110\n2052 178 188 2585 100 105\n1161 178 187 2874 99 110\n214 175 184 173 98 110\n473 184 191 971 104 111\n756 185 193 14743 107 114"
data <- as.matrix(read.table(textConnection(txt), header = TRUE))
# expand.grid : creates every combination of one element each from column A and
# B with elements of B repeated first
# apply : calls function(x) { x[1]/x[2]) } for every combination outputted by
# expand.grid
# matrix : converts the result of apply into matrix. dimnames arguments sets
# rownames and colnames for easy verification for us
result <- matrix(apply(expand.grid(data[, "A"], data[, "B"]), 1, function(x) x[1]/x[2]),
nrow = nrow(data), dimnames = list(data[, "A"], data[, "B"]))
# note that we have set rownames for result to be values of A and colnames for
# result to be value of B
result
## 174 166 171 181 188 185 184
## 187 1.074713 1.126506 1.093567 1.033149 0.9946809 1.010811 1.016304
## 451 2.591954 2.716867 2.637427 2.491713 2.3989362 2.437838 2.451087
## 727 4.178161 4.379518 4.251462 4.016575 3.8670213 3.929730 3.951087
## 1532 8.804598 9.228916 8.959064 8.464088 8.1489362 8.281081 8.326087
## 4139 23.787356 24.933735 24.204678 22.867403 22.0159574 22.372973 22.494565
## 689 3.959770 4.150602 4.029240 3.806630 3.6648936 3.724324 3.744565
## 1625 9.339080 9.789157 9.502924 8.977901 8.6436170 8.783784 8.831522
## 771 4.431034 4.644578 4.508772 4.259669 4.1010638 4.167568 4.190217
## 950 5.459770 5.722892 5.555556 5.248619 5.0531915 5.135135 5.163043
## 703 4.040230 4.234940 4.111111 3.883978 3.7393617 3.800000 3.820652
## 2052 11.793103 12.361446 12.000000 11.337017 10.9148936 11.091892 11.152174
## 1161 6.672414 6.993976 6.789474 6.414365 6.1755319 6.275676 6.309783
## 214 1.229885 1.289157 1.251462 1.182320 1.1382979 1.156757 1.163043
## 473 2.718391 2.849398 2.766082 2.613260 2.5159574 2.556757 2.570652
## 756 4.344828 4.554217 4.421053 4.176796 4.0212766 4.086486 4.108696
## 181 177 180 178 178 175 184
## 187 1.033149 1.056497 1.038889 1.050562 1.050562 1.068571 1.016304
## 451 2.491713 2.548023 2.505556 2.533708 2.533708 2.577143 2.451087
## 727 4.016575 4.107345 4.038889 4.084270 4.084270 4.154286 3.951087
## 1532 8.464088 8.655367 8.511111 8.606742 8.606742 8.754286 8.326087
## 4139 22.867403 23.384181 22.994444 23.252809 23.252809 23.651429 22.494565
## 689 3.806630 3.892655 3.827778 3.870787 3.870787 3.937143 3.744565
## 1625 8.977901 9.180791 9.027778 9.129213 9.129213 9.285714 8.831522
## 771 4.259669 4.355932 4.283333 4.331461 4.331461 4.405714 4.190217
## 950 5.248619 5.367232 5.277778 5.337079 5.337079 5.428571 5.163043
## 703 3.883978 3.971751 3.905556 3.949438 3.949438 4.017143 3.820652
## 2052 11.337017 11.593220 11.400000 11.528090 11.528090 11.725714 11.152174
## 1161 6.414365 6.559322 6.450000 6.522472 6.522472 6.634286 6.309783
## 214 1.182320 1.209040 1.188889 1.202247 1.202247 1.222857 1.163043
## 473 2.613260 2.672316 2.627778 2.657303 2.657303 2.702857 2.570652
## 756 4.176796 4.271186 4.200000 4.247191 4.247191 4.320000 4.108696
## 185
## 187 1.010811
## 451 2.437838
## 727 3.929730
## 1532 8.281081
## 4139 22.372973
## 689 3.724324
## 1625 8.783784
## 771 4.167568
## 950 5.135135
## 703 3.800000
## 2052 11.091892
## 1161 6.275676
## 214 1.156757
## 473 2.556757
## 756 4.086486
Edit: I seem to have misunderstood the question. The answer is even more simpler using outer:
# gives the same 15*15 matrix as geektrader's
outer(mm[,1], mm[,2], '/')
Old answer (not correct):
You should use combn:
# combn(ncol(mm), 2) gives you all possible combinations
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# [1,] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5
# [2,] 2 3 4 5 6 3 4 5 6 4 5 6 5 6 6
# it also accepts a function argument. we can use it to divide
# respective columns
mm.div <- combn(ncol(mm), 2, function(x) mm[,x[1]]/mm[,x[2]])
# set column names the matrix
colnames(mm.div) <- combn(colnames(mm), 2, paste, collapse="")
I might be completely missing the point here, but why not just use a couple for loops? I wrote a quick function, then you could pass the pairs to.
For example:
A <- rnorm(15)
B <- rnorm(15)
data <- data.frame(A,B)
ratio <- function(input1, input2){
out <- matrix(0, nrow=length(input1), ncol=length(input1))
k <- 1
for (i in 1:length(input1)){
for (j in 1:length(input1)){
out[k, j] <- input1[k] / input2[j]
}
k <- k + 1
}
return(out)
}
ratio(data$A, data$B)
EDIT
Another thought. To then use the function to do all possible pairs of ratios, you could simply add another for loop, like this:
combs <- combn(1:4, 2)
out <- list()
for (i in 1:(length(combs)/2)){
out[[i]] <- ratio(data[,combs[1,i]], data[,combs[2,i]])
}
Hope that helps!

Resources