Create vectors within lappy (or loop)

Create vectors within lappy (or loop) - r

I would like to loop a function over a vector of characters. The function will create a vector or a list, and the name of each vector (list) will be taken from the vector of characters. For example.
# The data would look like:
fist second third
1 2 3
1 NA 3
1 2 3
1 2 NA
NA 2 3
# I want to create three lists/vectors such as
first <- c("1.pdf", "1.pdf", "1.pdf", "1.pdf")
second <- c("2.pdf", "2.pdf", "2.pdf", "2.pdf")
third <- c("3.pdf", "3.pdf", "3.pdf", "3.pdf")
# where, first, second, third, now are the names of the vectors. I tried the following way.
vector_names <- c("first", "second", "third")
cleanNA <- function(x){
x <- as.character(as.data.frame(t(data[paste0(x)])))
x <- na.omit(x) # remove all NA observations.
x <- paste0(x, ".pdf")
return(x)
}
# I can do this by a vector length 1.
name <- c("first")
assign(name, namef)
namef <- createlists(name)
# But once I do an lapply, it won't create the three vectors as I wanted. The lapply does run and returns what I want, but not create the three vectors.
lapply(vector_names, cleanNA)
I've been searching for this type of questions many times and feel R doesn't really provide a good way to generate a new vector within a loop. Am I right? Thanks.

Here's a simplified version :
cleanNA <- function(data, x){
x <- data[[x]]
x <- na.omit(x)
x <- paste0(x, ".pdf")
return(x)
#Or a one-liner
#paste0(na.omit(data[[x]]), '.pdf')
}
list_vec <- lapply(vector_names, cleanNA, data = data)
list_vec
#[[1]]
#[1] "1.pdf" "1.pdf" "1.pdf" "1.pdf"
#[[2]]
#[1] "2.pdf" "2.pdf" "2.pdf" "2.pdf"
#[[3]]
#[1] "3.pdf" "3.pdf" "3.pdf" "3.pdf"
It is better to keep data in a list so that it is easier to manage and avoids creating lot of objects in global environment. However, if you want them as separate vectors you can use list2env :
list_vec <- setNames(list_vec, vector_names)
list2env(list_vec, .GlobalEnv)
data
data <- structure(list(first = c(1L, 1L, 1L, 1L, NA), second = c(2L,
NA, 2L, 2L, 2L), third = c(3L, 3L, 3L, NA, 3L)), class = "data.frame",
row.names = c(NA, -5L))

Related

Get the name of the list object then add that name as a new column of the each list

I have a list of dataframes. Then I want to extract the list object name and add that as the first column in the dataframe. THen I want to unlist the and make a single dataframe combining each dataframe by row. They all have the same dimensions.
A small section of my list of dataframes below
mylist <- list(SiO2 = structure(c(5.121, 0.00836394378003293, 0.0199499373432604,
5, 10, 1.87883863763252, 0.0836503954062112, 2.1240292640167), .Dim = c(1L,
8L), .Dimnames = list(NULL, c("Analyte_Mean", "SDBetweenGroup",
"SDWithinGroup", "replicates", "NumSamples", "FValue", "PValue",
"FCritical"))), Al2O3 = structure(c(2.0812, 0.0053103672189408,
0.0159059737205869, 5, 10, 0.442687747035557, 0.903289230024797,
2.1240292640167), .Dim = c(1L, 8L), .Dimnames = list(NULL, c("Analyte_Mean",
"SDBetweenGroup", "SDWithinGroup", "replicates", "NumSamples",
"FValue", "PValue", "FCritical"))))
I have the following to get the name of the list object names. But unsure how to pass it back as a column for each dataframe
names(mylist)
to make a single dataframe I have
new_list <- as.data.frame(do.call(rbind, mylist))
Any help is appreciated.

If I've understood you correctly, I'd probably use the purrr library. imap_dfr will pass both your matrix and its name to the function, and then bind the results into a single dataframe.
library(purrr)
new_list <- imap_dfr(mylist, function(mat, name) {
result <- as.data.frame(mat)
result$name <- name
result
})
gives
> new_list
Analyte_Mean SDBetweenGroup SDWithinGroup replicates NumSamples FValue PValue FCritical name
1 5.1210 0.008363944 0.01994994 5 10 1.8788386 0.0836504 2.124029 SiO2
2 2.0812 0.005310367 0.01590597 5 10 0.4426877 0.9032892 2.124029 Al2O3

You can use purrr::map_df -
purrr::map_df(mylist, data.frame, .id = 'name')
# name Analyte_Mean SDBetweenGroup SDWithinGroup replicates NumSamples
#1 SiO2 5.1210 0.008363944 0.01994994 5 10
#2 Al2O3 2.0812 0.005310367 0.01590597 5 10
# FValue PValue FCritical
#1 1.8788386 0.0836504 2.124029
#2 0.4426877 0.9032892 2.124029
Or in base R -
do.call(rbind, Map(cbind.data.frame, mylist, name = names(mylist)))

Combine matrices of different length and keep column names

There is a similar question about combining vectors with different lengths here, but all answers (except #Ronak Shah`s answer) loose the names/colnames.
My problem is that I need to keep the column names, which seems to be possible using the rowr package and cbind.fills.
I would like to stay in base-R or use stringi and the output shoud remain a matrix.
Test data:
inp <- list(structure(c("1", "2"), .Dim = 2:1, .Dimnames = list(NULL,"D1")),
structure(c("3", "4", "5"), .Dim = c(3L, 1L), .Dimnames = list(NULL, "D2")))
I know that I could get the column names beforehand and then reassign them after creating the matrix, like:
## Using stringi
colnam <- unlist(lapply(inp, colnames))
out <- stri_list2matrix(inp)
colnames(out) <- colnam
out
## Using base-R
colnam <- unlist(lapply(inp, colnames))
max_length <- max(lengths(inp))
nm_filled <- lapply(inp, function(x) {
ans <- rep(NA, length = max_length)
ans[1:length(x)]<- x
ans
})
out <- do.call(cbind, nm_filled)
colnames(out) <- colnam
out
Are there other options that keep the column names?

Since stringi is ok for you to use, you can use the function stri_list2matrix(), i.e.
setNames(as.data.frame(stringi::stri_list2matrix(inp)), sapply(inp, colnames))
# D1 D2
#1 1 3
#2 2 4
#3 <NA> 5

Here is a slightly more concise base R variation
len <- max(lengths(inp))
nms <- sapply(inp, colnames)
do.call(cbind, setNames(lapply(inp, function(x)
replace(rep(NA, len), 1:length(x), x)), nms))
# D1 D2
#[1,] "1" "3"
#[2,] "2" "4"
#[3,] NA "5"
Not sure if this constitutes a sufficiently different solution from what you've already posted. Will remove if deemed too similar.
Update
Or how about a merge?
Reduce(
function(x, y) merge(x, y, all = T, by = 0),
lapply(inp, as.data.frame))[, -1]
# D1 D2
#1 1 3
#2 2 4
#3 <NA> 5
The idea here is to convert the list entries to data.frames, then add a row number and merge by row and merge by row by setting by = 0 (thanks #Henrik). Note that this will return a data.frame rather than a matrix.

Here is using base:
do.call(cbind,
lapply(inp, function(i){
x <- data.frame(i, stringsAsFactors = FALSE)
as.matrix( x[ seq(max(lengths(inp))), , drop = FALSE ] )
#if we matrices have more than 1 column use:
#as.matrix( x[ seq(max(sapply(inp, nrow))), , drop = FALSE ] )
}
))
# D1 D2
# 1 "1" "3"
# 2 "2" "4"
# NA NA "5"
The idea is to make all matrices to have the same number of rows. When we subset dataframe by index, rows that do not exist will be returned as NA, then we convert back to matrix and cbind.

Converting a data.table with missing or NA-values into a matrix with R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data.frame that looks like this.
x a 1
x b 2
x c 3
y a 3
y b 3
y c 2
I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:
a b c
x 1 2 3
y 3 3 2
I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
Using the tidyverse:
The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.
library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2
The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.
library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using reshape2:
One of the first steps toward the tidyverse was the reshape2 package.
To get a matrix use acast:
library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2
Or to get a data frame, use dcast, as here: Reshape data for values in one column.
dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using plyr:
In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673
library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2
Using matrix indexing:
This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
Using xtabs:
xtabs(z~x+y, data=tmp)
Using a sparse matrix:
There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names
with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2
Using reshape:
You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).
reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2

The question is some years old but maybe some people are still interested in alternative answers.
If you don't want to load any packages, you might use this function:
#' Converts three columns of a data.frame into a matrix -- e.g. to plot
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#'
#' #param data data.frame: input data
#' #param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' #param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' #param datatitle string: name of the column in data, which values should be filled into the output matrix
#' #param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' #param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' #param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' #return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' #author Daniel Neumann
#' #date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ( (!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data))) ) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
How it works:
myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
'dim2'=c('a','b','c','a','b','c'),
'values'=c(1,2,3,3,3,2)))
myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')
myMatrix
> a b c
> x 1 2 3
> y 3 3 2

base R, unstack
unstack(df, V3 ~ V2)
# a b c
# 1 1 2 3
# 2 3 3 2
This may not be a general solution but works well in this case.
data
df<-structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = c(1L,
2L, 3L, 3L, 3L, 2L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-6L))

For sake of completeness, there's a tapply() solution around.
with(d, tapply(z, list(x, y), sum))
# a b c
# x 1 2 3
# y 3 3 2
Data
d <- structure(list(x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), y = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), z = c(1, 2,
3, 3, 3, 2)), class = "data.frame", row.names = c(NA, -6L))

From tidyr 0.8.3.9000, a new function called pivot_wider() is introduced. It is basically an upgraded version of the previous spread() function (which is, moreover, no longer under active development). From pivoting vignette:
This vignette describes the use of the new pivot_longer() and
pivot_wider() functions. Their goal is to improve the usability of
gather() and spread(), and incorporate state-of-the-art features found
in other packages.
For some time, it’s been obvious that there is something fundamentally
wrong with the design of spread() and gather(). Many people don’t find
the names intuitive and find it hard to remember which direction
corresponds to spreading and which to gathering. It also seems
surprisingly hard to remember the arguments to these functions,
meaning that many people (including me!) have to consult the
documentation every time.
How to use it (using the data from #Aaron):
pivot_wider(data = tmp, names_from = y, values_from = z)
x a b c
<fct> <dbl> <dbl> <dbl>
1 x 1 2 3
2 y 3 3 2
Or in a "full" tidyverse fashion:
tmp %>%
pivot_wider(names_from = y, values_from = z)

The tidyr package from the tidyverse has an excellent function that does this.
Assuming your variables are named v1, v2 and v3, left to right, and you data frame is named dat:
dat %>%
spread(key = v2,
value = v3)
Ta da!

Convert data rows to numeric matrix [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data.frame that looks like this.
x a 1
x b 2
x c 3
y a 3
y b 3
y c 2
I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:
a b c
x 1 2 3
y 3 3 2
I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
Using the tidyverse:
The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.
library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2
The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.
library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using reshape2:
One of the first steps toward the tidyverse was the reshape2 package.
To get a matrix use acast:
library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2
Or to get a data frame, use dcast, as here: Reshape data for values in one column.
dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using plyr:
In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673
library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2
Using matrix indexing:
This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
Using xtabs:
xtabs(z~x+y, data=tmp)
Using a sparse matrix:
There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names
with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2
Using reshape:
You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).
reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2

The question is some years old but maybe some people are still interested in alternative answers.
If you don't want to load any packages, you might use this function:
#' Converts three columns of a data.frame into a matrix -- e.g. to plot
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#'
#' #param data data.frame: input data
#' #param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' #param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' #param datatitle string: name of the column in data, which values should be filled into the output matrix
#' #param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' #param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' #param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' #return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' #author Daniel Neumann
#' #date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ( (!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data))) ) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
How it works:
myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
'dim2'=c('a','b','c','a','b','c'),
'values'=c(1,2,3,3,3,2)))
myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')
myMatrix
> a b c
> x 1 2 3
> y 3 3 2

base R, unstack
unstack(df, V3 ~ V2)
# a b c
# 1 1 2 3
# 2 3 3 2
This may not be a general solution but works well in this case.
data
df<-structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = c(1L,
2L, 3L, 3L, 3L, 2L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-6L))

For sake of completeness, there's a tapply() solution around.
with(d, tapply(z, list(x, y), sum))
# a b c
# x 1 2 3
# y 3 3 2
Data
d <- structure(list(x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), y = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), z = c(1, 2,
3, 3, 3, 2)), class = "data.frame", row.names = c(NA, -6L))

From tidyr 0.8.3.9000, a new function called pivot_wider() is introduced. It is basically an upgraded version of the previous spread() function (which is, moreover, no longer under active development). From pivoting vignette:
This vignette describes the use of the new pivot_longer() and
pivot_wider() functions. Their goal is to improve the usability of
gather() and spread(), and incorporate state-of-the-art features found
in other packages.
For some time, it’s been obvious that there is something fundamentally
wrong with the design of spread() and gather(). Many people don’t find
the names intuitive and find it hard to remember which direction
corresponds to spreading and which to gathering. It also seems
surprisingly hard to remember the arguments to these functions,
meaning that many people (including me!) have to consult the
documentation every time.
How to use it (using the data from #Aaron):
pivot_wider(data = tmp, names_from = y, values_from = z)
x a b c
<fct> <dbl> <dbl> <dbl>
1 x 1 2 3
2 y 3 3 2
Or in a "full" tidyverse fashion:
tmp %>%
pivot_wider(names_from = y, values_from = z)

The tidyr package from the tidyverse has an excellent function that does this.
Assuming your variables are named v1, v2 and v3, left to right, and you data frame is named dat:
dat %>%
spread(key = v2,
value = v3)
Ta da!

Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data.frame that looks like this.
x a 1
x b 2
x c 3
y a 3
y b 3
y c 2
I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:
a b c
x 1 2 3
y 3 3 2
I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
Using the tidyverse:
The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.
library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2
The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.
library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using reshape2:
One of the first steps toward the tidyverse was the reshape2 package.
To get a matrix use acast:
library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2
Or to get a data frame, use dcast, as here: Reshape data for values in one column.
dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using plyr:
In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673
library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2
Using matrix indexing:
This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
Using xtabs:
xtabs(z~x+y, data=tmp)
Using a sparse matrix:
There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names
with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2
Using reshape:
You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).
reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2

The question is some years old but maybe some people are still interested in alternative answers.
If you don't want to load any packages, you might use this function:
#' Converts three columns of a data.frame into a matrix -- e.g. to plot
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#'
#' #param data data.frame: input data
#' #param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' #param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' #param datatitle string: name of the column in data, which values should be filled into the output matrix
#' #param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' #param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' #param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' #return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' #author Daniel Neumann
#' #date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ( (!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data))) ) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
How it works:
myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
'dim2'=c('a','b','c','a','b','c'),
'values'=c(1,2,3,3,3,2)))
myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')
myMatrix
> a b c
> x 1 2 3
> y 3 3 2

base R, unstack
unstack(df, V3 ~ V2)
# a b c
# 1 1 2 3
# 2 3 3 2
This may not be a general solution but works well in this case.
data
df<-structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = c(1L,
2L, 3L, 3L, 3L, 2L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-6L))

For sake of completeness, there's a tapply() solution around.
with(d, tapply(z, list(x, y), sum))
# a b c
# x 1 2 3
# y 3 3 2
Data
d <- structure(list(x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), y = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), z = c(1, 2,
3, 3, 3, 2)), class = "data.frame", row.names = c(NA, -6L))

From tidyr 0.8.3.9000, a new function called pivot_wider() is introduced. It is basically an upgraded version of the previous spread() function (which is, moreover, no longer under active development). From pivoting vignette:
This vignette describes the use of the new pivot_longer() and
pivot_wider() functions. Their goal is to improve the usability of
gather() and spread(), and incorporate state-of-the-art features found
in other packages.
For some time, it’s been obvious that there is something fundamentally
wrong with the design of spread() and gather(). Many people don’t find
the names intuitive and find it hard to remember which direction
corresponds to spreading and which to gathering. It also seems
surprisingly hard to remember the arguments to these functions,
meaning that many people (including me!) have to consult the
documentation every time.
How to use it (using the data from #Aaron):
pivot_wider(data = tmp, names_from = y, values_from = z)
x a b c
<fct> <dbl> <dbl> <dbl>
1 x 1 2 3
2 y 3 3 2
Or in a "full" tidyverse fashion:
tmp %>%
pivot_wider(names_from = y, values_from = z)

The tidyr package from the tidyverse has an excellent function that does this.
Assuming your variables are named v1, v2 and v3, left to right, and you data frame is named dat:
dat %>%
spread(key = v2,
value = v3)
Ta da!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create vectors within lappy (or loop) - r

Related

Get the name of the list object then add that name as a new column of the each list

Combine matrices of different length and keep column names

Converting a data.table with missing or NA-values into a matrix with R [duplicate]

Convert data rows to numeric matrix [duplicate]

Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]

Categories

Resources