Creating empty rows on matrix of various size - r

I am trying to generate a matrix with the following data, Is there any way to create empty rows to make matrix the same size?
#Generating original data
n <- c(12,24)
mu <- c(6.573,6.5)
sigma <- sqrt(0.25)
Diseased.Data <- round(rnorm(n[1],mu[1],sigma),4)
Healthy.Data <- round(rnorm(n[2],mu[2],sigma),4)
g <- c(2,3,4)
cstar.pool <- (mu[1]+mu[2])/2
#generating pooled data
for(i in 1:3){
assign(paste("pool.dis.data",i,sep = ""),replicate(n[1]/g[i],mean(sample(Diseased.Data,g[i]))))
assign(paste("pool.hel.data",i,sep = ""),replicate(n[2]/g[i],mean(sample(Healthy.Data,g[i]))))
}
#generating the pooled diseased data matrix
dis.mat1<- matrix(data = pool.dis.data1,length(pool.dis.data1),1)
dis.mat2 <- matrix(data = pool.dis.data2,length(pool.dis.data2),1)
dis.mat3 <- matrix(data = pool.dis.data3,length(pool.dis.data3),1)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.matrix <- matrix(NA, max(length(pool.dis.data1),length(pool.dis.data2),length(pool.dis.data3)),3)
dis.matrix[,1] <- cbind(dis.mat1)
dis.matrix[,2] <- cbind(dis.mat2)
dis.matrix[,3] <- cbind(dis.mat3)

I'd say your best bet is to start out with an empty matrix of the size you need. You can tell matrix to specify the dimensions on creation like so:
new <- matrix( data = NA, nrow = 10, ncol = 20 )
So you just need to create a value for each dimension, based on your input data:
num.rows <- max( length(n), length(mu), ... )
num.columns <- [ I'd just enter a numeric value here ]
new <- matrix( data = NA, nrow = num.rows, ncol = num.columns )
Then you can fill the columns as needed, making sure to leave any excess empty. For example:
new[(1:length(n)),3] <- n
The "1:length(n)" part there will tell R to stop filling the column once the values you've given it have been entered. Otherwise R will continue filling, and you'll get repeated values, which I'm guessing you don't want.

Related

How to run function on indivisual columns instead of data frame?

Hello everyone I have two data frame trying to do bootstrapping with below script1 in my script1 i am taking number of rows from data frame one and two. Instead of taking rows number from entire data frame I wanted split individual columns as a data frame and remove the zero values and than take the row number than do the bootstrapping using below script. So trying with script2 where I am creating individual data frame from for loop as I am new to R bit confused how efficiently do add the script1 function to it
please suggest me below I am providing script which is running script1 and the script2 I am trying to subset each columns creating a individual data frame
Script1
set.seed(2)
m1 <- matrix(sample(c(0, 1:10), 100, replace = TRUE), 10)
m2 <- matrix(sample(c(0, 1:5), 50, replace = TRUE), 5)
m1 <- as.data.frame(m1)
m2 <- as.data.frame(m2)
nboot <- 1e3
n_m1 <- nrow(m1); n_m2 <- nrow(m2)
temp<- c()
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
value <- colSums(m2)/colSums(m1[boot,])
temp <- rbind(temp, value)
}
boot_data<- apply(temp, 2, median)
script2
for (i in colnames(m1)){
m1_subset=(m1[m1[[i]] > 0, ])
m1_subset=m1_subset[i]
m2_subset=m2[m2[[i]] >0, ]
m2_subset=m2_subset[i]
num_m1 <- nrow(m1_subset); n_m2 <- nrow(m2_subset)# after this wanted add above script changing input
}
If I understand correctly, you want to do the sampling and calculation on each column individually, after removing the 0 values. I. modified your code to work on a single vector instead of a dataframe (i.e., using length() instead of nrow() and sum() instead of colSums(). I also suggest creating the empty matrix for your results ahead of time, and filling in -- it will be fasted.
temp <- matrix(nrow = nboot, ncol = ncol(m1))
for (i in seq_along(m1)){
m1_subset = m1[m1[,i] > 0, i]
m2_subset = m2[m2[,i] > 0, i]
n_m1 <- length(m1_subset); n_m2 <- length(m2_subset)
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
temp[j, i] <- sum(m2_subset)/sum(m1_subset[boot])
}
}
boot_data <- apply(temp, 2, median)
boot_data <- setNames(data.frame(t(boot_data)), names(m1))
boot_data

generate m-matrices of differnet lengths using foreach command

Take this simple worked example with dummy data:
ab <- c(1:500)
cd <- sample(1:100, 500, replace = T)
ef <- sample(1:10, 500, replace = T)
df1 <- data.frame(ab, cd, ef)
m <- 4
Now I want to use the foreach command to generate m matrices
Each matrix will vary by length using:
#size1 <- sample(50:60, 1)
#indices <- sample(1:500, size1)
#df2 <- df1[indices,]
I have not sure if how to generate the different matrices with the foreach command
Result = foreach(i=1:m,.combine=matrix(df2)) %do%{
size1 <- sample(50:60, 1)
indices <- sample(1:500, size1)
df2 <- df1[indices,]
}
The default of foreach is to save a list. The following code saves a list of matrices of different dimensions.
Result <- foreach(i=1:5) %do%{
# randomly select number of rows and columns
random.rows <- sample(1:5, 1)
random.columns <- sample(1:5, 1)
# generate matrix out of this
matrix(sample(1:100, random.rows*random.columns), random.rows)
}
The Result object is a list of length 5 with matrices whose sizes vary between 1X1 and 5X5.

Avoid a for loop

I am trying to optimize an algorithm and I really want to avoid all my loops. Hence I am wondering if there is a way to avoid the following simple loop:
library(FNN)
data <- cbind(1:10, 1:10)
NN.index <- get.knn(data, 5)$nn.index
bc <- matrix(0, nrow(NN.index), max(NN.index))
for(i in 1:nrow(bc)){
bc[i,NN.index[i,]] <- 1
}
were bc is a matrix of zeros.
In R, if the bracket of a matrix M take a k-by-2 matrix 'I', then each row of the k-by-2 matrix I is recognized as the row and column index of M. For example
M = matrix(1:20, nrow =4, ncol = 3)
print(M)
I = rbind(c(1,2), c(4,2), c(3,3))
print(M[I])
In this case, M[1,2], M[4,2] and M[3,3] are extracted.
In your case, we can create row_index and col_index from NN.index as below, and then assign 1 to the corresponding entries.
bc <- matrix(0, nrow(NN.index), max(NN.index))
row_index <- rep(1:nrow(NN.index), times = ncol(NN.index))
col_index <- as.vector(NN.index)
bc[cbind(row_index, col_index)] <- 1
print(bc)

MHSMM package R input data format with multiple variables

my problem is similar to the question as followingthe problem of R-input Format
I have tried the above code in the above link and revised some part to suit my data. my data is like follow
I want my data can be created as a data frame with 4 variable vectors. The code what I have revised is
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
The output observation is not the data of 4th col, it's a list of (4, 8, 12,...,396, 1, 1, ..., 56, 192,...,6550, 68, NA, NA,...) It has picked up 1/4 data of each col. Why it is like this?
Thank you very much!!!!
Why don't you simply count yout observations by Id, and create the hsmm.data object directly? Supposing yout dataframe is called "data", we have:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"
Extracted from http://www.jstatsoft.org/v39/i04/paper

Filling in a matrix from a list of row,column,value

I have a data frame that contains a list of row positions, column positions and values, like so:
combs <- as.data.frame(t(combn(1:10,2)))
colnames(combs) <- c('row','column')
combs$value <- rnorm(nrow(combs))
I would like to fill in a matrix with these values such that every value appears in the matrix in exactly the position specified by row and column. I suppose I could do this manually
mat <- matrix(nrow=10,ncol=10)
for(i in 1:nrow(combs)) {
mat[combs[i,'row'],combs[i,'column']] <- combs[i,'value']
}
But surely there is a more elegant way to accomplish this in R?
Like this:
mat <- matrix(nrow = 10, ncol = 10)
mat[cbind(combs$row, combs$column)] <- combs$value
You could also consider building a sparse matrix using the Matrix package:
library(Matrix)
mat <- sparseMatrix(i = combs$row, j = combs$column, x = combs$value,
dims = c(10, 10))

Resources