How to split each column into its own data frame? [duplicate] - r

This question already has answers here:
Split data.frame into groups by column name
(2 answers)
Closed 4 years ago.
I have a data frame with 3 columns, for example:
my.data <- data.frame(A=c(1:5), B=c(6:10), C=c(11:15))
I would like to split each column into its own data frame (so I'd end up with a list containing three data frames). I tried to use the "split" function but I don't know what I would set as the factor argument. I tried this:
data.split <- split(my.data, my.data[,1:3])
but that's definitely wrong and just gives me a bunch of empty data frames. It sounds fairly simple but after searching through previous questions I haven't come across a way to do this.

Not sure why you'd want to do that; lapply let's you already operate on the columns directly; but you could do
lst <- split(t(my.data), 1:3);
names(lst) <- names(my.data);
lst;
#$A
#[1] 1 2 3 4 5
#
#$B
#[1] 6 7 8 9 10
#
#$C
#[1] 11 12 13 14 15
Turn vector entries into data.frames with
lapply(lst, as.data.frame);

You can use split.default, i.e.
split.default(my.data, seq_along(my.data))
$`1`
A
1 1
2 2
3 3
4 4
5 5
$`2`
B
1 6
2 7
3 8
4 9
5 10
$`3`
C
1 11
2 12
3 13
4 14
5 15

Related

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

How can I make an list from existing data frame, each object in a list contains a vector of a single or multiple row from the data frame?

I am very new to R, still getting my head around so my question can be very basic but please help me out!
I have a large data frame, with more than 400000 rows.
GENE_ID p1 p2 p3 ...
41 1 2 3
41 4 5 6
41 7 8 9
85 1 2 3
1923 1 2 3
1923 4 5 6
First, I wanted to simply name the GENE_ID as the row name, but due to some gene IDs not unique, I failed.
Now I am thinking of making this data frame into a list each object contains expression level of a gene.
So what I want is a list that has outcome something like,
mylist$41
[1] 1 2 3 4 5 6 7 8 9
mylist$85
[1] 1 2 3
mylist$1923
[1] 1 2 3 4 5 6
Any advice to achieve this would be greatly appreciated.
We can do a melt by 'GENE_ID' and then do the split to get a list of vectors
library(reshape2)
mylist <- melt(df1, id.var = 'GENE_ID')
split(mylist$value, mylist$GENE_ID)
#$`41`
#[1] 1 4 7 2 5 8 3 6 9
#$`85`
#[1] 1 2 3
#$`1923`
#[1] 1 4 2 5 3 6
Also, we can do this in base R
v1 <- unlist(df1[-1], use.names = FALSE)
grp <- rep(df1[,1], ncol(df1[-1]))
split(v1, grp)

R Subset using first and last column names of interest [duplicate]

This question already has answers here:
refer to range of columns by name in R
(6 answers)
Closed 6 years ago.
> df
a b c d e
1 1 4 7 10 13
2 2 5 8 11 14
3 3 6 9 12 15
To subset the columns b,c,d we can use df[,2:4] or df[,c("b", "c", "d")]. However, I am looking for a solution which fetches me the columns b,c,d using something like df[,b:d]. In other words, I want to simply use the first and last column names of interest to subset the data. I have been looking for a solution to this but am unsuccessful. All the examples I have seen till date refer to each and every specific column name while subsetting.
It's also simple in base R, e.g.:
subset(df, select=b:d)
Or roll your own:
df[do.call(seq, as.list(match(c("b","d"), names(df))) )]
If you are open to using dplyr:
dplyr::select(df, b:d)
b c d
1 4 7 10
2 5 8 11
3 6 9 12

Turn 3x3 data.frame into 1x9 data.frame while preserving row and column names

I am having trouble coming up with an elegant solution to this seemingly simple data manipulation problem. I can see a looped solution but I assume there is a 1-2 function single-line solution.
Here is what I have:
x <- data.frame(c1=c(1,2,3),
c2=c(4,5,6),
c3=c(7,8,9),
row.names = c("r1","r2","r3"))
> x
c1 c2 c3
r1 1 4 7
r2 2 5 8
r3 3 6 9
And here is what I want:
> y
c1.r1 c1.r2 c1.r3 c2.r1 c2.r2 c2.r3 c3.r1 c3.r2 c3.r3
1 1 2 3 4 5 6 7 8 9
How do I manipulate x to give me y?
Here's one way to do it:
R> unlist(lapply(x, setNames, rownames(x)))
c1.r1 c1.r2 c1.r3 c2.r1 c2.r2 c2.r3 c3.r1 c3.r2 c3.r3
1 2 3 4 5 6 7 8 9
A data.frame is a list, so lapply just loops over the columns. Then it sets the names of each vector to the rownames of the data.frame. Then unlist flattens the list to a vector (recursively, setting names, by default).

Making a data frame that is a subset of two data frames

I am stumped again.
I have two data frames
dataframe1
a b c
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
and
dataframe2
f g h
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
I want to take the first column of dataframe1 and make three new dataframes with the second column being each of the three f,g and h
Obviously I could just do a subset over and over
subset1 <- cbind(dataframe1[,1]dataframe2[,1])
subset2 <- cbind(dataframe1[,1]dataframe2[,2])
but my dataframes will have variable numbers of columns and are very long row numberwise. So I am looking for a little more something general. My data frames will always be the same length.
The closest I have come to getting anything was with apply and cbind but I got either a set of three rows that were a and f, a and g, a and h each combined as single numeric vector or I get a single data frame with four columns, a,f,g,h.
Help is deeply appreciated.
You can use lapply it iterate over the columns of dataframe2 like so:
lapply(dataframe2, function(x) as.data.frame(cbind(dataframe1[,1], x)))
This will result in a list object where each entry corresponds to a column of dataframe2. For example:
$f
V1 x
1 21 21
2 11 11
3 4 4
$g
V1 x
1 21 12
2 11 9
3 4 6
$h
V1 x
1 21 22
2 11 6
3 4 7

Resources