converting a dataframe to a matrix - r

I have a data frame like this:
V1 V2 V3
1 1 1 0.0000000
2 2 1 1.6646331
3 3 1 1.6649136
4 4 1 1.7420642
5 5 1 1.4441743
6 6 1 0.7544465
7 7 1 1.9796860
8 8 1 1.0744425
9 9 1 2.1503288
10 10 1 1.0408388
11 11 1 2.0822162
....
841 29 29 0.0000000
I want to convert this data frame to a matrix. In this matrix V2 should be the row and V1 should be column
[1] [2] [3] [4] ....
[1] 0.0000000 1.6646331 1.664936 1.7420642...
How can I do that in r?

Assuming you have contiguous values for your matrix (i.e. no gaps in the matrix) and the running order of values is continuous (i.e. row1;columns1:10,row2;columns1:10... etc), then....
Take the values in the appropriate column (V3 in your case) and reshape them according to your paramters of matrix size...
m <- matrix( df$V3 , ncol = max(df$V1) , nrow = max(df$V2) , byrow = TRUE )
#[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#[1,] 0 1.664633 1.664914 1.742064 1.444174 0.7544465 1.979686 1.074442 2.150329 1.040839 2.082216
If you need to match the values (i.e. running order is not continuous) then you can take advantage of vectorised matrix subscripting like so....
# Create empty matrix
m <- matrix( NA, ncol = max(df$V1) , nrow = max(df$V2) )
# Then fill with values at defined locations
m[ cbind( df$V2 , df$V1 ) ] <- df$V3

You need to do two things:
Reorder your columns
Transpose (using the t function)
So first create some example data
d = data.frame(V1=runif(5), V2= 5+runif(5), V3 = 10+runif(5))
then
t(d[, ncol(d):1])
or
t(d)[ncol(d):1, ]

This should do the job (where df is your data frame)
m <- do.call(cbind, df)

Related

How to vectorize this operation?

I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1

Transform matrix to array assigning specific columns

I have a numerical matrix of size 17 (rows) x 6 (columns). It looks looks like this:
Now I want to transform this matrix to an array of size 2 (rows) x 3 (columns) x (17 dimensions) in a way that every row is transformed to one dimension in the new array, in a way that columns 1-3 go to the first row and columns 4-6 go to the second row.
I have used the numbers out of the example to give an example how dimension 1 looks in this new array (it includes the values of the first row):
How can I transform this matrix to the array I would like to have?
m <- matrix(c(1:12), ncol = 6)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 3 5 7 9 11
#[2,] 2 4 6 8 10 12
a <- array(t(m), dim = c(3, 2, length(m)/6))
a <- aperm(a, c(2, 1, 3)) #switch first and second dimension
#, , 1
#
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 7 9 11
#
#, , 2
#
# [,1] [,2] [,3]
#[1,] 2 4 6
#[2,] 8 10 12

rowsums for matrix over randomly specified subsets of columns in R

I have this matrix
mu<-1:100
sigma<-100:1
sample.size<-10
toy.mat<-mapply(function(x,y){rnorm(x,y,n=sample.size)},x=mu,y=sigma)
colnames(toy.mat) <- c(rep(1,10),rep(2,10), rep(3,10), rep(4,10), rep(5,10),
rep(6,10), rep(7,10), rep(8,10), rep(9,10), rep(10,10) )
For the 10 columns named (1) I like to randomly select 5 pairs and rowsums each pair to generate 5 columns named (1a, 1b, 1c, 1d, 1e). I will do the same with columns named 2, 3 to 10.
Is there a data.table method to do this?
I'm still unsure about what you're trying to do.
This is what I understood.
I first split toy.mat into a list of 10 matrices (chunks). This is for convenience.
# Split toy.mat into list of matrices
lst <- lapply(seq(1, 100, by = 10), function(i) toy.mat[, i:(i+9)]);
Next, generate 5 random pairs, by sampling 10 numbers from the sequence 1:10 and coercing them into a 5x2 matrix. Repeat for all 10 matrix chunks.
# Generate 5 random pairs
set.seed(2017); # For reproducibility of results
rand <- replicate(10, matrix(sample(1:10, 10), ncol = 5), simplify = FALSE);
head(rand, n = 2);
#[[1]]
# [,1] [,2] [,3] [,4] [,5]
#[1,] 10 4 9 1 6
#[2,] 5 3 8 2 7
#
#[[2]]
# [,1] [,2] [,3] [,4] [,5]
#[1,] 7 9 3 5 10
#[2,] 1 4 2 6 8
Select corresponding columns based on pairs from rand and calculate the rowSums. Do that for every matrix chunk.
# Select column pairs and calculate rowSums
lst.rand <- lapply(1:10, function(i)
sapply(as.data.frame(rand[[i]]), function(w) rowSums(lst[[i]][, w])));
Bind list elements into matrix, and set column names.
# Bind into
mat <- do.call(cbind, lst.rand);
colnames(mat) <- as.vector(sapply(1:10, function(i) paste0(i, letters[1:5])));
mat[1:5, 1:6];
# 1a 1b 1c 1d 1e 2a
#[1,] 21.410826 34.90337 -11.297396 -50.56332 -115.82456 51.32369
#[2,] 5.323713 -144.26640 169.697538 -58.35540 96.25637 -78.95717
#[3,] -78.925937 -45.32790 -177.546469 251.69348 -52.85132 123.38741
#[4,] -33.673704 -95.64937 3.561921 -253.95046 -136.88182 -10.20650
#[5,] 51.080564 -180.87033 -161.861342 108.41120 188.07454 52.34226

Reshape each row of a data.frame to be a matrix in R

I am working with the hand-written zip codes dataset. I have loaded the dataset like this:
digits <- read.table("./zip.train",
quote = "",
comment.char = "",
stringsAsFactors = F)
Then I get only the ones:
ones <- digits[digits$V1 == 1, -1]
Right now, in ones I have 442 rows, with 256 column. I need to transform each row in ones to a 16x16 matrix. I think what I am looking for is a list of 16x16 matrix like the ones in this question:
How to create a list of matrix in R
But I tried with my data and did not work.
At first I tried ones <- apply(ones, 1, matrix, nrow = 16, ncol = 16) but is not working as I thought it was. I also tried lapply with no luck.
An alternative is to just change the dims of your matrix.
Consider the following matrix "M":
M <- matrix(1:12, ncol = 4)
M
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
We are looking to create a three dimensional array from this, so you can specify the dimensions as "row", "column", "third-dimension". However, since the matrix is constructed by column, you first need to transpose it before changing the dimensions.
`dim<-`(t(M), c(2, 2, nrow(M)))
# , , 1
#
# [,1] [,2]
# [1,] 1 7
# [2,] 4 10
#
# , , 2
#
# [,1] [,2]
# [1,] 2 8
# [2,] 5 11
#
# , , 3
#
# [,1] [,2]
# [1,] 3 9
# [2,] 6 12
though there are probably simple ways, you can try with lapply:
ones_matrix <- lapply(1:nrow(ones), function(i){matrix(ones[i, ], nrow=16)})

Saving vectors of different lengths in a matrix/data frame

I have a numeric called area of length 166860. This consists of 412 different elements, most of length 405 and some of length 809. I have their start and end ids.
My goal is to extract them and put them in a matrix/data frame with 412 columns
Right now, I'm trying this code:
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
m[,j] = area[temp.start:temp.end]
}
But I just end up with this error message:
"Error in m[, j] = area[temp.start:temp.end] :
number of items to replace is not a multiple of replacement length"
Here's a quite easy approach:
Example data:
area <- c(1:4, 1:5, 1:6, 1:3)
# [1] 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 1 2 3
start.ids <- which(area == 1)
# [1] 1 5 10 16
end.ids <- c(which(area == 1)[-1] - 1, length(area))
# [1] 4 9 15 18
Create a list with one-row matrices:
mats <- mapply(function(x, y) t(area[seq(x, y)]), start.ids, end.ids)
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 3 4 5 6
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 2 3
Use the function rbind.fill.matrix from the plyr package to create the matrix and transpose it (t):
library(plyr)
m <- t(rbind.fill.matrix(mats))
# [,1] [,2] [,3] [,4]
# 1 1 1 1 1
# 2 2 2 2 2
# 3 3 3 3 3
# 4 4 4 4 NA
# 5 NA 5 5 NA
# 6 NA NA 6 NA
You are setting the column length to be 412, and matrices cannot be flexible/variable in their length. This means the value you assign to the columns must either have a length of 412 or something less that can fill into a length of 412. From the manual on ?matrix:
If there are too few elements in data to fill the matrix, then the elements in data are recycled. If data has length zero, NA of an appropriate type is used for atomic vectors (0 for raw vectors) and NULL for lists.
As another commenter said, you may have intended to assign to the rows in which case m[j, ] is the way to do that, but you have to then pad the value you are assigning with NA or allow NA's to be filled so the value being assigned is always of length 809.
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
val <- area[temp.start:temp.end]
m[j, ] = c(val, rep(NA, 809 - length(val)))
}
How about this? I've manufactured some sample data:
#here are the random sets of numbers - length either 408 or 809
nums<-lapply(1:412,function(x)runif(sample(c(408,809),1)))
#this represents your numeric (one list of all the numbers)
nums.vec<-unlist(nums)
#get data about the series (which you have)
nums.lengths<-sapply(nums,function(x)length(x))
nums.starts<-cumsum(c(1,nums.lengths[-1]))
nums.ends<-nums.starts+nums.lengths-1
new.vec<-unlist(lapply(1:412,function(x){
v<-nums.vec[nums.starts[x]:nums.ends[x]]
c(v,rep(0,(809-length(v))))
}))
matrix(new.vec,ncol=412)
What about
m[j,] = area[temp.start:temp.end]
?
Edit:
a <- area[temp.start:temp.end]
m[1:length(a),j] <- a
Maybe others have better answers. As I see it, you have two options:
Change m[,j] to m[1:length(area[temp.start:temp.end]),j] and then you will not get an error but you would have some NA's left.
Use a list of matrices instead, so you would get different dimensions for each matrix.

Resources