Convert a matrix with dimnames into a long format data.frame - r

Hoping there's a simple answer here but I can't find it anywhere.
I have a numeric matrix with row names and column names:
# 1 2 3 4
# a 6 7 8 9
# b 8 7 5 7
# c 8 5 4 1
# d 1 6 3 2
I want to melt the matrix to a long format, with the values in one column and matrix row and column names in one column each. The result could be a data.table or data.frame like this:
# col row value
# 1 a 6
# 1 b 8
# 1 c 8
# 1 d 1
# 2 a 7
# 2 c 5
# 2 d 6
...
Any tips appreciated.

Use melt from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...

The as.table and as.data.frame functions together will do this:
> m <- matrix( sample(1:12), nrow=4 )
> dimnames(m) <- list( One=letters[1:4], Two=LETTERS[1:3] )
> as.data.frame( as.table(m) )
One Two Freq
1 a A 7
2 b A 2
3 c A 1
4 d A 5
5 a B 9
6 b B 6
7 c B 8
8 d B 10
9 a C 11
10 b C 12
11 c C 3
12 d C 4

Assuming 'm' is your matrix...
data.frame(col = rep(colnames(m), each = nrow(m)),
row = rep(rownames(m), ncol(m)),
value = as.vector(m))
This executes extremely fast on a large matrix and also shows you a bit about how a matrix is made, how to access things in it, and how to construct your own vectors.

A modification that doesn't require you to know anything about the storage structure, and that easily extends to high dimensional arrays if you use the dimnames, and slice.index functions:
data.frame(row=rownames(m)[as.vector(row(m))],
col=colnames(m)[as.vector(col(m))],
value=as.vector(m))

Related

Combining elements of one column into two columns by group in R

Given a two column data.frame with one containing group labels and a second containing integer values ordered from smallest to largest. How can the data be expanded creating pairs of combinations of the integer column?
Not sure the best way to state this. I'm not interested in all possible combinations but instead all unique combinations starting from the lowest value.
In r, the combn function gives the desired output not considering groups, for example:
t(combn(seq(1:4),2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 3
[5,] 2 4
[6,] 3 4
Since the first values is 1 we get the unique combination of (1,2) and not the additional combination of (2,1) which I don't need. How would one then apply a similar method by groups?
for example given a data.frame
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
test
Group Val
1 A 1
2 A 3
3 A 6
4 A 8
5 B 2
6 B 4
7 B 5
8 B 7
I was able to come up with this solution that gives the desired output:
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
j=1
for(i in unique(test$Group)){
if(j==1){
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test1 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
j=j+1
}else{
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test2 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
test1 <- rbind(test1,test2)
}
}
test1
Group Val1 Val2
1 A 1 3
2 A 1 6
3 A 1 8
4 A 3 6
5 A 3 8
6 A 6 8
7 B 2 4
8 B 2 5
9 B 2 7
10 B 4 5
11 B 4 7
12 B 5 7
However, this is not elegant and is really slow as the number of groups and length of each group become large. It seems like there should be a more elegant and efficient solution but so far I have not come across anything on SO.
I would appreciate any ideas!
here is a data.table approach
library( data.table )
#make test a data.table
setDT(test)
#split by group
L <- split( test, by = "Group")
#get unique combinations of 2 Vals
L2 <- lapply( L, function(x) {
as.data.table( t( combn( x$Val, m = 2, simplify = TRUE ) ) )
})
#merge them back together
data.table::rbindlist( L2, idcol = "Group" )
# Group V1 V2
# 1: A 1 3
# 2: A 1 6
# 3: A 1 8
# 4: A 3 6
# 5: A 3 8
# 6: A 6 8
# 7: B 2 4
# 8: B 2 5
# 9: B 2 7
#10: B 4 5
#11: B 4 7
#12: B 5 7
You can set simplify = F in combn() and then use unnest_wider() in dplyr.
library(dplyr)
library(tidyr)
test %>%
group_by(Group) %>%
summarise(Val = combn(Val, 2, simplify = F)) %>%
unnest_wider(Val, names_sep = "_")
# Group Val_1 Val_2
# <chr> <dbl> <dbl>
# 1 A 1 3
# 2 A 1 6
# 3 A 1 8
# 4 A 3 6
# 5 A 3 8
# 6 A 6 8
# 7 B 2 4
# 8 B 2 5
# 9 B 2 7
# 10 B 4 5
# 11 B 4 7
# 12 B 5 7
library(tidyverse)
df2 <- split(df$Val, df$Group) %>%
map(~gtools::combinations(n = 4, r = 2, v = .x)) %>%
map(~as_tibble(.x, .name_repair = "unique")) %>%
bind_rows(.id = "Group")

Reorder a subset of an R data.frame modifying the row names as well

Given a data.frame:
foo <- data.frame(ID=1:10, x=1:10)
rownames(foo) <- LETTERS[1:10]
I would like to reorder a subset of rows, defined by their row names. However, I would like to swap the row names of foo as well. I can do
sel <- c("D", "H") # rows to reorder
foo[sel,] <- foo[rev(sel),]
sel.wh <- match(sel, rownames(foo))
rownames(foo)[sel.wh] <- rownames(foo)[rev(sel.wh)]
but that is long and complicated. Is there a simpler way?
We can replace the sel values in rownames with the reverse of sel.
x <- rownames(foo)
foo[replace(x, x %in% sel, rev(sel)), ]
# ID x
#A 1 1
#B 2 2
#C 3 3
#H 8 8
#E 5 5
#F 6 6
#G 7 7
#D 4 4
#I 9 9
#J 10 10
Not as concise as ronak-shah's answer, but you could also use order.
# extract row names
temp <- row.names(foo)
# reset of vector
temp[which(temp %in% sel)] <- temp[rev(which(temp %in% sel))]
# reset order of data.frame
foo[order(temp),]
ID x
A 1 1
B 2 2
C 3 3
H 8 8
E 5 5
F 6 6
G 7 7
D 4 4
I 9 9
J 10 10
As noted in the comments, this relies on the row names following a lexicographical order. In instances where this is not true, we can use match.
# set up
set.seed(1234)
foo <- data.frame(ID=1:10, x=1:10)
row.names(foo) <- sample(LETTERS[1:10])
sel <- c("D", "H")
Now, the rownames are
# initial data.frame
foo
ID x
B 1 1
F 2 2
E 3 3
H 4 4
I 5 5
D 6 6
A 7 7
G 8 8
J 9 9
C 10 10
# grab row names
temp <- row.names(foo)
# reorder vector containing row names
temp[which(temp %in% sel)] <- temp[rev(which(temp %in% sel))]
Using, match along with order
foo[order(match(row.names(foo), temp)),]
ID x
B 1 1
F 2 2
E 3 3
D 6 6
I 5 5
H 4 4
A 7 7
G 8 8
J 9 9
C 10 10
your data frame is small so you can duplicate it then change the value of each raw:
footmp<-data.frame(foo)
foo[4,]<-footemp[8,]
foot{8,]<-footemp[4,]
Bob

R - split list every x items

I have data to analyse that is presented in the form of a list (just one row and MANY columns).
A B C D E F G H I
1 2 3 4 5 6 7 8 9
Is there a way to tell R to split this list every x items and get something as seen below (the columns C D E F G H I are virtually the same as A B)?
A B
1 2
3 4
5 6
7 8
9
If the number of columns is a multiple of 'x', then we unlist the dataset, and use matrix to create the expected output.
as.data.frame(matrix(unlist(df1), ncol=2, dimnames=list(NULL, c("A", "B")) , byrow=TRUE))
If the number of columns is not a multiple of 'x', then
x <- 2
gr <- as.numeric(gl(ncol(df1), x, ncol(df1)))
lst <- split(unlist(df1), gr)
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
# A B
# 1 1 2
# 2 3 4
# 3 5 6
# 4 7 8
# 5 9 NA

add column to existing column in r

How do I convert 2 columns from a data.frame onto 2 different columns?
I.E:
Data
A B C D
1 3 5 7
2 4 6 8
to
Data
A B
1 3
2 4
5 7
6 8
You can use rbind
rbind(df[,1:2], data.frame(A = df$C, B = df$D))
You can use a fast version of rbind, rbindlist from data.table:
library(data.table)
rbindlist(lapply(seq(1, ncol(df), 2), function(i) df[,i:(i+1)]))
Here is my solution but it requires to change names of the columns.
names(dat) <- c("A", "B", "A", "B")
merge(dat[1:2], dat[3:4], all = T)
A B
1 1 3
2 2 4
3 5 7
4 6 8
And here is another solution more easy.
dat[3:4, ] <- dat[ ,3:4]
dat <- dat[1:2]
dat
A B
1 1 3
2 2 4
3 5 7
4 6 8
For scalability, a solution that will halve any even size data frame and append the rows:
half <- function(df) {m <- as.matrix(df)
dim(m) <- c(nrow(df)*2,ncol(df)/2)
nd <- as.data.frame(m)
names(nd) <- names(df[(1:dim(nd)[2])]);nd}
half(Data)
A B
1 1 5
2 2 6
3 3 7
4 4 8

Condensing Data Frame in R

I just have a simple question, I really appreciate everyones input, you have been a great help to my project. I have an additional question about data frames in R.
I have data frame that looks similar to something like this:
C <- c("","","","","","","","A","B","D","A","B","D","A","B","D")
D <- c(NA,NA,NA,2,NA,NA,1,1,4,2,2,5,2,1,4,2)
G <- list(C=C,D=D)
T <- as.data.frame(G)
T
C D
1 NA
2 NA
3 NA
4 2
5 NA
6 NA
7 1
8 A 1
9 B 4
10 D 2
11 A 2
12 B 5
13 D 2
14 A 1
15 B 4
16 D 2
I would like to be able to condense all the repeat characters into one, and look similar to this:
J B C E
1 2 1
2 A 1 2 1
3 B 4 5 4
4 D 2 2 2
So of course, the data is all the same, it is just that it is condensed and new columns are formed to hold the data. I am sure there is an easy way to do it, but from the books I have looked through, I haven't seen anything for this!
EDIT I edited the example because it wasn't working with the answers so far. I wonder if the NA's, blanks, and unevenness from the blanks are contributing??
hereĀ“s a reshape solution:
require(reshape)
cast(T, C ~ ., function(x) x)
Changed T to df to avoid a bad habit. Returns a list, which my not be what you want but you can convert from there.
C <- c("A","B","D","A","B","D","A","B","D")
D <- c(1,4,2,2,5,2,1,4,2)
my.df <- data.frame(id=C,val=D)
ret <- function(x) x
by.df <- by(my.df$val,INDICES=my.df$id,ret)
This seems to get the results you are looking for. I'm assuming it's OK to remove the NA values since that matches the desired output you show.
T <- na.omit(T)
T$ind <- ave(1:nrow(T), T$C, FUN = seq_along)
reshape(T, direction = "wide", idvar = "C", timevar = "ind")
# C D.1 D.2 D.3
# 4 2 1 NA
# 8 A 1 2 1
# 9 B 4 5 4
# 10 D 2 2 2
library(reshape2)
dcast(T, C ~ ind, value.var = "D", fill = "")
# C 1 2 3
# 1 2 1
# 2 A 1 2 1
# 3 B 4 5 4
# 4 D 2 2 2

Resources