split data matrix - r

I have a data matrix with 100,000 rows of values corresponding to methylation values across several cell types. I would like to visually display the changes in methylation in a clustered heatmap. To get the data into a more manageable size I was thinking of creating a new data matrix every 10th or so row. Is there any simple way to do this?

Use seq and combinations of arguments. E.g.:
m1 <- matrix(runif(100000*10), ncol = 10)
m2 <- m1[seq(from = 1, to = nrow(m1), by = 10), ]
> dim(m2)
[1] 10000 10
How does this work? Look at what this does:
> sq <- seq(from = 1, to = nrow(m1), by = 10)
> head(sq)
[1] 1 11 21 31 41 51
> tail(sq)
[1] 99941 99951 99961 99971 99981 99991
> nrow(m1)
[1] 100000
We specify to go from the first row to the last incrementing 10 each step. This gives us rows 1, 11, 21, etc. When we get to the end of the sequence, even though we specified nrow(m1) (which is 100000) the last element in our sequence in 99991. This is because 99991 + 10 would take us beyond the from argument limit (beyond 100000) and hence that is not included in the sequence.

Try the following which takes your large matrix m and generates a list of smaller matrices. It generates a sequence of indices that breaks at every chunk.length values and then collects the chunks.
list.of.matrices <- lapply(X=seq.int(1, nrow(m), by=chunk.length)),
FUN=function (k) {
m[k + seq_len(chunk.length) - 1, ])
})
However, if you have 100,000 rows, it will be wasteful for your RAM to save all these chunks separately. Perhaps, you can just do the required computation on the subsets and save only the results. Just a suggestion.

Related

R – Populate a vector via rep with an increasing exponential value

I have one vector like this:
years <- c(2021:2091)
And I want to create another vector to bind to it based off an initial value and inrease compound-like for every row based on an arbitrary decimal(such as 10%, 15%, 20%):
number = x
rep(x*(1 + .10)^n, length(years))
How do I replicate the length of years for the second vector while increasing the exponent every time. Say there is 71 rows in years, I need n to start at 1 and run through 71.
I have tried:
rep(x*(1 + .10)^(1:71), length(years))
But this does it 71*71 times. I just need one value for each exponent!
Hopefully this makes sense, thanks in advance!
Here is how you could do it with a function:
future_value = function(years, x = 1, interest = 0.1) {
x * (1 + interest) ^ (1:length(years))
}
Example outputs:
> future_value(2021:2025)
[1] 1.10000 1.21000 1.33100 1.46410 1.61051
> future_value(2021:2025, x = 2, interest = 0.15)
[1] 2.300000 2.645000 3.041750 3.498012 4.022714

Unique combinations of vector elements that fulfill criteria

I have a vector of integers, e.g., totalVector <- c(4,2,1), and two variables totalResult and totalNumber. What I want to do is the following:
I want to to find all UNIQUE combinations of "totalNumber" elements from totalVector that add up to "totalResult". To clarify, if totalResult = 100 and totalNumber = 50, I want all combinations of 50 elements from totalVector that have a sum of 100 (repetitions are obviously allowed, but duplicate results such as 25 fours and 25 re-arranged fours should only be counted once).
I originally did this by expanding the total vector (repeating each element 50 times), getting all combinations of 50 elements with combn() and then filtering their sums. For large values however, this proved very inefficient, and failed due to the sheer amount of data. Is there a quicker and less data-heavy way to do this?
I think the OP is looking for the combinations with repetition of a vector that sum to a particular number. This will do it:
totalVector <- c(4,2,1)
totalNumber <- 50
totalResult <- 100
library(RcppAlgos)
myAns <- comboGeneral(totalVector, totalNumber, repetition = TRUE,
constraintFun = "sum", comparisonFun = "==",
limitConstraints = totalResult)
dim(myAns)
[1] 17 50
all(apply(myAns, 1, sum) == totalResult)
[1] TRUE
Disclaimer: I am the author of RcppAlgos
This would give you what you need for a small sample, but you will encounter issues with combinatorial explosion very quickly as you increase the size of the problem
tv <- sample(1:10, 10, replace = TRUE)
tn <- 5
tr <- 20
combinations <- combn(tv, tn)
equals.tr <- apply(combinations, MARGIN = 2, FUN = function(x) sum(x) == tr)
combinations[, equals.tr]

R loop does not stop

I have a problem that sounds easy, but I really cannot find the mistake. I have 3377 data points (measurements of body temperature). The sampling rate is 5min and I would like to put the data into a matrix. However, R starts recycling once it has put all 3377 data points into the matrix. To prevent r from doing this, I wrote a loop and I want the loop to stop when the end of the vector is reached.
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11 # 11 full days of sampling (and a few more points, wherefore the matrix is to be 12 rows)
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*p)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*p)+p)]
} else {stop
}
}
}
However, the loop does not stop...it loops till the last cell in my matrix. According to my calculations, the last 79 cells should remain free (i.e. NA, because 3456 cells - 3377 = 79), but that is only true for the last 8 or so...
Any hints where the mistake is?
Thanks!
I think this does what you would like to do:
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11
length(Ankle.r) <- a * (c + 1) #pad input vector with NA values
m <- matrix(Ankle.r, ncol = a, byrow = TRUE)
Ok, try an example and it will show you where your mistake is...sighing. The loop must be:
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*a)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<=x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*a)+p)]
} else {stop
}
}
}
Thanks anyway!
Best,
Christine

Add replicate column to end of matrix

I have a matrix of one column and 6 rows. I would like to replicate that column i times but change one value randomly each time, and after each iteration, calculate the mean and variance across all columns.
For example:
values = rnorm(6, 6, 1); matrix1 = matrix(values, 6)
After i=1, would look like:
values2 = values
values2[sample(1:6, 1)] = values2[sample(1:6, 1)]+runif(1, 0, 1)
matrix2 = matrix(c(values, values2), 6)
At the end, I would like to output a data frame that looks like so:
i mean var
1 1.23 2.31
2 1.24 2.33 etc...
For many i's. I imagine there is a way to do this with loops, but my skills are not such that I can figure it out. Thanks for all your help!
If you know how many times you're doing this, it would be best to construct your final matrix beforehand, especially if i is large. However, without that:
jitter.func <- function(x, vec) {
cell <- sample(1:length(vec), 1)
vec[cell] <- vec[cell] + runif(1, 0, 1)
return(c(mean=mean(vec), var=var(vec)))
}
i <- 10
sapply(1:i, jitter.func, vec=values)
j <- 20 # Number of columns
i <- 6 # Number of rows
vec <- matrix(rnorm(i,6,1),ncol=j,nrow=i)# vector replicated j times
idx <- sample(seq(i),j,replace=TRUE) # j random rows
vec[cbind(idx, seq(j))] <- vec[cbind(idx, seq(j))]+runif(j) # add random number to random row in each column
apply(vec,2,plyr::each(mean,var)) # summary statistics

Constrained randomization of column order in a data.frame

I am trying to duplicate each column from data frame and move it to a randomly located point within 1-3 columns and do it for each column in the data frame. I want columns to move AT LEAST one space to the left or right. Of course sample(data) reorders columns randomly, but my attempts to put it in a loop are embarrassingly bad (I admit I skipped majority of linear algebra classes, damn...). Below is an example data:
dat <- read.table(textConnection(
"-515.5718 94.33423 939.6324 -502.9918 -75.14629 946.6926
-515.2283 96.10239 939.5687 -503.1425 -73.39015 946.6360
-515.0044 97.68119 939.4177 -503.4021 -71.79252 946.6909
-514.7430 99.59141 939.3976 -503.6645 -70.08514 946.6887
-514.4449 101.08511 939.2342 -503.9207 -68.48133 946.7183
-514.2769 102.29453 939.0013 -504.2665 -67.04509 946.7809
-513.9294 104.02753 938.9436 -504.4703 -65.34361 946.7899
-513.5900 105.49624 938.7684 -504.7405 -63.75965 946.7991"
),header=F,as.is=T)
sample(dat)#random columns position
How about this brute-force but plenty-fast solution?
It tries out different permutations of the columns until it finds one in which each column is moved at least 1, and not more than 3 columns to left or right. When it finds such a permutation, the test in the final line of the while() call evaluates to FALSE, terminating the loop and leaving the variable x containing the acceptable permutation.
n <- ncol(dat)
while({x <- sample(n) # Proposed new column positions
y <- seq_len(n) # Original column positions
max(abs(x - y)) > 3 | min(abs(x - y)) == 0
}) NULL
dat[x]
I should probably wait to post this until I have time to comment it up, and discuss some of the ambiguities in the problem as currently specified in the comments above. But since I won't be able to do that, possibly for a while, I thought I'd give you code for a solution that you can examine yourself.
# Create a function that generates acceptable permutations of the data
getPermutation <- function(blockSize, # number of columns/block
nBlock, # number of blocks of data
fromBlocks) { # indices of blocks to be moved
X <- unique(as.vector(outer(fromBlocks, c(-2,-1,1,2), "+")))
# To remove nonsensical indices like 0 or -1
X <- X[X %in% seq.int(nBlock)]
while({toBlocks <- sample(X, size = length(fromBlocks))
max(abs(toBlocks - fromBlocks)) > 2 | min(abs(toBlocks - fromBlocks)) < 1
}) NULL
A <- seq.int(nBlock)
A[toBlocks] <- fromBlocks
A[fromBlocks] <- toBlocks
blockColIndices <-
lapply(seq.int(nBlock) - 1,
function(X) {
seq(from = X * blockSize + 1,
by = 1,
length.out = blockSize)
})
unlist(blockColIndices[A])
}
# Create an example dataset, a 90 column data.frame
dat <- as.data.frame(matrix(seq.int(90*4), ncol=90))
# Call the function for a data frame with 30 3-column blocks
# within which you want to move blocks 2, 14, and 14.
index <- getPermutation(3, 30, c(2, 14, 15))
newdat <- dat[index]

Resources