Turn 3x3 data.frame into 1x9 data.frame while preserving row and column names - r

I am having trouble coming up with an elegant solution to this seemingly simple data manipulation problem. I can see a looped solution but I assume there is a 1-2 function single-line solution.
Here is what I have:
x <- data.frame(c1=c(1,2,3),
c2=c(4,5,6),
c3=c(7,8,9),
row.names = c("r1","r2","r3"))
> x
c1 c2 c3
r1 1 4 7
r2 2 5 8
r3 3 6 9
And here is what I want:
> y
c1.r1 c1.r2 c1.r3 c2.r1 c2.r2 c2.r3 c3.r1 c3.r2 c3.r3
1 1 2 3 4 5 6 7 8 9
How do I manipulate x to give me y?

Here's one way to do it:
R> unlist(lapply(x, setNames, rownames(x)))
c1.r1 c1.r2 c1.r3 c2.r1 c2.r2 c2.r3 c3.r1 c3.r2 c3.r3
1 2 3 4 5 6 7 8 9
A data.frame is a list, so lapply just loops over the columns. Then it sets the names of each vector to the rownames of the data.frame. Then unlist flattens the list to a vector (recursively, setting names, by default).

Related

How to split each column into its own data frame? [duplicate]

This question already has answers here:
Split data.frame into groups by column name
(2 answers)
Closed 4 years ago.
I have a data frame with 3 columns, for example:
my.data <- data.frame(A=c(1:5), B=c(6:10), C=c(11:15))
I would like to split each column into its own data frame (so I'd end up with a list containing three data frames). I tried to use the "split" function but I don't know what I would set as the factor argument. I tried this:
data.split <- split(my.data, my.data[,1:3])
but that's definitely wrong and just gives me a bunch of empty data frames. It sounds fairly simple but after searching through previous questions I haven't come across a way to do this.
Not sure why you'd want to do that; lapply let's you already operate on the columns directly; but you could do
lst <- split(t(my.data), 1:3);
names(lst) <- names(my.data);
lst;
#$A
#[1] 1 2 3 4 5
#
#$B
#[1] 6 7 8 9 10
#
#$C
#[1] 11 12 13 14 15
Turn vector entries into data.frames with
lapply(lst, as.data.frame);
You can use split.default, i.e.
split.default(my.data, seq_along(my.data))
$`1`
A
1 1
2 2
3 3
4 4
5 5
$`2`
B
1 6
2 7
3 8
4 9
5 10
$`3`
C
1 11
2 12
3 13
4 14
5 15

cbind named vectors in R by name

I have two named vectors similar to these ones:
x <- c(1:5)
names(x) <- c("a","b","c","d","e")
t <- c(6:10)
names(t) <- c("e","d","c","b","a")
I would like to combine them so to get the following outcome:
x t
a 1 10
b 2 9
c 3 8
d 4 7
e 5 6
Unfortunately when I run cbind(x,t) the result just combines them in the order they are disregarding the names of t and only keeping those of x. Giving the following result:
x t
a 1 6
b 2 7
c 3 8
d 4 9
e 5 10
I'm pretty sure there must be an easy solution, but I cannot find it. As this passage is part of a long and tedious loop (and the vectors I'm working with are much longer), it is important to have the least convoluted and quicker to compute options.
We can use the names of 'x' to change the order the 't' elements and cbind with 'x'
cbind(x, t = t[names(x)])
# x t
#a 1 10
#b 2 9
#c 3 8
#d 4 7
#e 5 6

Convert a full length column to one variable in a row in R

I was wondering if it is possible to convert 1 column into 1 variable next to eachother
i.e.:
d <- data.frame(y = 1:10)
> d
y
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Convert this column into:
> d
1 2 3 4 5 6 7 8 9 10
We don't know how are you going to use the numbers, but I think it is unnecessary to make any transformation. You can use d$y to get the numbers applied to any map of colors. See for example.
d <- data.frame(y = 1:7)
library(RColorBrewer)
mypalette<-brewer.pal(4,"Greens")
mycol <-palette()#rainbow(7)
heatmap(matrix(1:28,ncol=4),col=mypalette[d$y[1:4]],xlab="Greens (sequential)",
ylab="",xaxt="n",yaxt="n",bty="n",RowSideColors=mycol[d$y])
Not sure what is the prupose of:
1 variable next to eachother
But there are few ways to get the desired result (again, depends on the objective). You can do either:
d$y
unname(unlist(d)) #suggested by agstudy
or, better yet, to convert your dataframe's column into a vector, do this:
v <- as.vector(d[,1])
as string:
args <- paste(d$y, sep=" ")
args<-noquote(args)
now you'll have
[1] 1 2 3 4 5 6 7 8 9 10

Extract data from data.frame based on coordinates in another data.frame

So here is what my problem is. I have a really big data.frame woth two columns, first one represents x coordinates (rows) and another one y coordinates (columns), for example:
x y
1 1
2 3
3 1
4 2
3 4
In another frame I have some data (numbers actually):
a b c d
8 7 8 1
1 2 3 4
5 4 7 8
7 8 9 7
1 5 2 3
I would like to add a third column in first data.frame with data from second data.frame based on coordinates from first data.frame. So the result should look like this:
x y z
1 1 8
2 3 3
3 1 5
4 2 8
3 4 8
Since my data.frames are really big the for loops are too slow. I think there is a way to do this with apply loop family, but I can't find how. Thanks in advance (and sorry for ugly message layout, this is my first post here and I don't know how to produce this nice layout with code and proper data.frames like in another questions).
This is a simple indexing question. No need in external packages or *apply loops, just do
df1$z <- df2[as.matrix(df1)]
df1
# x y z
# 1 1 1 8
# 2 2 3 3
# 3 3 1 5
# 4 4 2 8
# 5 3 4 8
A base R solution: (df1 and df2 are coordinates and numbers as data frames):
df1$z <- mapply(function(x,y) df2[x,y], df1$x, df1$y )
It works if the last y in the first data frame is corrected from 5 to 4.
I guess it was a typo since you don't have 5 columns in the second data drame.
Here's how I would do this.
First, use data.table for fast merging; then convert your data frames (I'll call them dt1 with coordinates and vals with values) to data.tables.
dt1<-data.table(dt)
vals<-data.table(vals)
Second, put vals into a new data.table with coordinates:
vals_dt<-data.table(x=rep(1:dim(vals)[1],dim(vals)[2]),
y=rep(1:dim(vals)[2],each=dim(vals)[1]),
z=matrix(vals,ncol=1)[,1],key=c("x","y"))
Now merge:
setkey(dt1,x,y)[vals_dt,z:=z]
You can also try the data.table package and update df1 by reference
library(data.table)
setDT(df1)[, z := df2[cbind(x, y)]][]
# x y z
# 1: 1 1 8
# 2: 2 3 3
# 3: 3 1 5
# 4: 4 2 8
# 5: 3 4 8

Subset dataframe in a list by a dataframe column criteria

I have a list of dataframes. I need to subset a dataframe of this list according to a criteria in one column of the dataframe.
(all dataframes of the list have the same number and names of columns, and the same number of rows)
For example, I have:
l <- list(data.frame(x=c(2,3,4,5), y = c(4,4,4,4), z=c(2,3,4,5)),
data.frame(x=c(1,4,7,3), y = c(7,7,7,7), z=c(2,5,7,8)),
data.frame(x=c(2,3,1,8), y = c(1,1,1,1), z=c(6,4,1,3)))
names(l) <- c("MH1", "MH2","MH3")
output
$MH1
x y z
1 2 4 2
2 3 4 3
3 4 4 4
4 5 4 5
$MH2
x y z
1 1 7 2
2 4 7 5
3 7 7 7
4 3 7 8
$MH3
x y z
1 2 1 6
2 3 1 4
3 1 1 1
4 8 1 3
So I want to subset the dataframe for which column "y" is the closest to a given number. For example if I say a=3, the chosen dataframe should be "MH1" (where column y=4)
If "l" was a dataframe I will do something like:
closestDF <- subset(l, abs(l$y - a) == min(abs(l$y - a))
How can I do this with the list of dataframes?
Following the answers and comments of #David Arenburg, #akrun and #shadow, here there are three possible solutions to the problem I posted:
Option 1)
library(data.table)
rbindlist(l)[abs(y - a) == min(abs(y - a))]
Option 2) (needs an R version > 3.1.2)
library(dplyr)
bind_rows(l) %>% filter(abs(y-a)==which.min(abs(y-a)))
Option 3) (also works perfectly, but computationally less faster than the first 2 options if used within a big loop or an iterative process)
l[[which.min(sapply(l, function(df) sum(abs(df$y - a))))]]

Resources