I have a list of data, which I wish to turn into a matrix. I know the exact size my matrix needs to be, but the data does not completely fill it.
For example, for a vector of length 95, I would like to turn this into a 25*4 matrix. Using the matrix command does not work since the number of values does not fit the matrix so I need a way to pad this out with NAs, and fill the matrix by row.
The size of matrix will be known in each scenario, but it is not consistent from one set of data to the next, so ideally, there will be a function which automatically pads the matrix with NAs if the data is not available.
Example code:
example=c(20.28671, 20.28544, 20.28416, 20.28288, 20.28161, 20.28033, 20.27906, 20.27778, 20.27651, 20.27523, 20.27396, 20.27268, 20.27141,
20.27013, 20.26885, 20.26758, 20.26533, 20.26308, 20.26083, 20.25857, 20.25632, 20.25407, 20.25182, 20.24957, 20.24732, 20.24507,
20.24282, 20.24057, 20.23832, 20.23606, 20.23381, 20.22787, 20.22193, 20.21598, 20.21004, 20.20410, 20.19816, 20.19221, 20.18627,
20.18033, 20.17438, 20.16844, 20.16250, 20.15656, 20.15061, 20.14467, 20.13527, 20.12587, 20.11646, 20.10706, 20.09766, 20.08826,
20.07886, 20.06946, 20.06005, 20.05065, 20.04125, 20.03185, 20.02245, 20.01305, 20.00364, 20.00369, 20.00374, 20.00378, 20.00383,
20.00388, 20.00392, 20.00397, 20.00401, 20.00406, 20.00411, 20.00415, 20.00420, 20.00425, 20.00429, 20.00434, 20.01107, 20.01779,
20.02452, 20.03125, 20.03798, 20.04470, 20.05143, 20.05816, 20.06489, 20.07161, 20.07834, 20.08507, 20.09180, 20.09853, 20.10525,
20.11359, 20.12193, 20.13026, 20.13860)
mat=matrix(example,ncol=4,nrow=25)
Warning message:
In matrix(example, ncol = 4, nrow = 25) :
data length [95] is not a sub-multiple or multiple of the number of rows [25]
Whilst I'm sure this is not the best answer it does achieve what you want:
If you try to subset a vector using [ by using indicies that are beyond it's length it will pad with NA
mat = matrix(example[1:100],nrow = 25, byrow = TRUE, ncol = 4)
This feels as though it is a bit messy though. Perhaps one of the others is better R code.
You can try this:
mat <- matrix(NA,ncol=4, nrow=25)
mat[1:length(example)] <- example
We can use length<- to pad NAs to the desired length if there is shortage and then call the matrix.
nC <- 4
nR <- 25
matrix(`length<-`(example, nC*nR), nR, nC)
The length<- option can also be used in several other cases, i.e. in a list of vectors where the length are not equal. In that case, we pad NAs if we need to convert to data.frame or matrix.
Related
I have a pretty basic question I haven't been able to find the answer to because I'm not 100% clear on exactly how to ask it. The tutorials I found that seem appropriate are all a little too simplified and missing some key info.
I have a list vector of processed data and I'm trying to convert it to a table for downstream analysis and I'm stuck.
Currently I have a Values list with 4088 numerical datapoint. The metadata of 'Data' has my subtype information. I generated my list this way:
vec <-vector("list",3)
vec[[1]] <- Values[which(colData(Data)$Type=="Type1")]
vec[[2]] <- Values[which(colData(Data)$Type=="Type2")]
vec[[3]] <- Values[which(colData(Data)$Type=="Type3")]
Now, [[1]] has just the values I care about for Type 1, [[2]] for type 2 etc.
So, how do I convert this into a dataframe or a table for downstream stuff like T-Tests between groups? The below readout tells me I need to define my data somehow and that I have a different number of points per group is a problem, but I'm completely lost here.
df <- as.data.frame(vec)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1461, 658, 1969
Thanks for any help, or just a direction to a tutorial that will help me answer this for myself!
It really depends what your downstream steps are, but you can generate a data.frame with the maximum length of lengths(vec) for the number of rows (which will pad the vectors with NA - make sure how to handle NAs afterwards). Examples of how to get such a data.frame can be found in the answer of #AndyBrown
Specifically, to perform t-tests between groups you do not need to generate a data.frame. You could use pairwise combinations of your list elements, which can have different lengths.
Below is an example using all pairwise combinations of list elements of vec to perform t-tests:
lapply(combn(vec, 2, simplify=FALSE), function(x) t.test(x[[1]], x[[2]]))
There are actually a few options that you can use.
#base R
data.frame(lapply(vec, "length<-", max(lengths(vec))))
#Data Table
library(data.table)
setDT(lapply(vec, "length<-", max(lengths(vec))))
I wanted to get all the combinations for 73,000 choose 2 and I tried to use combn in order to calculate it.
combn(73000,2)
I received the following error:
Error in matrix(r, nrow = len.r, ncol = count) :
invalid 'ncol' value (too large or NA)
I figured that the number of combinations is 2,664,463,500 so multiplied by 8 should yield around 22GB which I had free on my machine.
So even though it's a lot of combinations, it shouldn't fail.
Any alternative way to calculate the number of combinations or explanations of why combn fails?
I dug into the code and apparently when constructing the output matrix the dimensions are converted to integer:
count <- as.integer(round(choose(n, m)))
out <- matrix(r, nrow = len.r, ncol = count) # matrix for now
Eliminating the as.integer from count increases its range and in my case it doesn't overflow anymore.
It wasn't enough though, I continue to receive the same error again.
I couldn't find a way to initialize the matrix as type integer, so I created two vectors instead (m=2)like that:
col1 <- vector(mode="integer", length=n)
col2 <- vector(mode="integer", length=n)
With few more adjustments it runs now.
I hope this might help to others as well.
I am trying to create a matrix of coordinates(indexes) that I randomly pick one from using the sample function. I then use these to select a cell in another matrix. What is the best way to do this? The trouble is how to store these integers in the matrix so that they are easy to separate. Right now I have them stored as strings with a comma, that I then split. Someone suggested I use a pair, or a string, but I cannot seam to get these to work with a matrix. Thanks!
EDIT:What i currently have looks like this (changed a little to make sense out of context):
probs <- matrix(c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0),5,5)
cordsMat <- matrix("",5,5)
for (x in 1:5){
for (y in 1:5){
cordsMat[x,y] = paste(x,y,sep=",")
}
}
cords <- sample(cordsMat,1,,probs)
cordsVec <- unlist(strsplit(cords,split = ","))
cordX <- as.numeric(cordsVec[1])
cordY <- as.numeric(cordsVec[2])
otherMat[cordX,cordY]
It sort of works but i would also be interested for a better way, as this will get repeated a lot.
If you want to set the probabilities it can easily be done by providing it to sample
# creating the matrix
matrix(sample(rep(1:6, 15:20), 25), 5) -> other.mat
# set the probs vec
probs <- c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0)
# the coordinates matrix
mat <- as.matrix(expand.grid(1:nrow(other.mat),1:ncol(other.mat)))
# sampling a row randomly
sample(mat, 1, prob=probs) -> rand
# getting the value
other.mat[mat[rand,1], mat[rand,2]]
[1] 6
I have a list with 20 elements each contains a vector of 2 numbers. I have also generated a sequence of numbers (20). Now I would like to construct 1 long vector that would first list the elements of intervals[[1]] and the first element of newvals[1], later intervals[[2]], newvals[2] etc etc
Help will be much appreciated. I think plyr package might be helpful although I am not sure how to structure it. help will be much appreciated!
s1 <- seq(0, 1, by = 0.05)
intervals <- Map(c, s1[-length(s1)], s1[-1])
intervals[[length(intervals)]][2] <- intervals[[length(intervals)]][2]+0.1
newvals <- seq(1,length(intervals),1)
#### HERE I WOULD LIKE TO HAVE A VECTOR IN THE FOLLOWING PATTERN
####UP TO THE LAST ELEMENT OF THE LIST:
stringreclass <- c(intervals[[1]],newvals[1]), .... , intervals[[20]],newvals[20])
I am using R to code simulations for a research project I am conducting in college. After creating relevant data structures and generating data, I seek to randomly modify a proportion P of observations (in increments of 0.02) in a 20 x 20 matrix by some effect K. In order to randomly determine the observations to be modified, I sample a number of integers equal to P*400 twice to represent row (rRow) and column (rCol) indices. In order to guarantee that no observation will be modified more than once, I perform this algorithm:
I create a matrix, alrdyModded, that is 20 x 20 and initialized to 0s.
I take the first value in rRow and rCol, and check whether alrdyModded[rRow[1]][rCol[1]]==1; WHILE alrdyModded[rRow[1]][rCol[1]]==1, i randomly select new integers for the indices until it ==0
When alrdyModded[rRow[1]][rCol[1]]==0, modify the value in a treatment matrix with same indices and change alrdyModded[rRow[1]][rCol[1]] to 1
Repeat for the entire length of rRow and rCol vectors
I believe a good method to perform this operation is a while loop nested in a for loop. However, when I enter the code below into R, I receive the following error code:
R CODE:
propModded<-1.0
trtSize<-2
numModded<-propModded*400
trt1<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
cont<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
alrdyModded1<- matrix(0, nrow = 20, ncol = 20)
## data structures for computation have been intitialized and filled
rCol<-sample.int(20,numModded,replace = TRUE)
rRow<-sample.int(20,numModded,replace = TRUE)
## indices for modifying observations have been generated
for(b in 1:numModded){
while(alrdyModded1[rRow[b]][rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b]][rCol[b]]<-'+'(trt1[rRow[b]][rCol[b]],trtSize)
alrdyModded[rRow[b]][rCol[b]]<-1
}
## algorithm for guaranteeing no observation in trt1 is modified more than once
R OUTPUT
" Error in while (alrdyModded1[rRow[b]][rCol[b]] == 1) { :
missing value where TRUE/FALSE needed "
When I take out the for loop and run the code, the while loop evaluates the statement just fine, which implies an issue with accessing the correct values from the rRow and rCol vectors. I would appreciate any help in resolving this problem.
It appears you're not indexing right within the matrix. Instead of having a condition like while(alrdyModded1[rRow[b]][rCol[b]]==1){, it should read like this: while(alrdyModded1[rRow[b], rCol[b]]==1){. Matrices are indexed like this: matrix[1, 1], and it looks like you're forgetting your commas. The for-loop should be something closer to this:
for(b in 1:numModded){
while(alrdyModded1[rRow[b], rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b], rCol[b]]<-'+'(trt1[rRow[b], rCol[b]],trtSize)
alrdyModded1[rRow[b], rCol[b]]<-1
}
On a side note, why not make alrdyModded1 a boolean matrix (populated with just TRUE and FALSE values) with alrdyModded1<- matrix(FALSE, nrow = 20, ncol = 20) in line 7, and have the condition be just while(alrdyModded1[rRow[b], rCol[b]]){ instead?