I have a data.frame with several columns I'd like to join into one column in a new data.frame.
df1 <- data.frame(col1 = 1:3, col2 = 4:6, col3 = 7:9)
how would I create a new data.frame with a single column that's 1:9?
Since data.frames are essentially lists of columns, unlist(df1) will give you one large vector of all the values. Now you can simply construct a new data.frame from it:
data.frame(col = unlist(df1))
In case you want an indicator too:
stack(df1)
# values ind
# 1 1 col1
# 2 2 col1
# 3 3 col1
# 4 4 col2
# 5 5 col2
# 6 6 col2
# 7 7 col3
# 8 8 col3
# 9 9 col3
Just to provide a complete set of ways to do that, here is the tidyr way.
library(tidyr)
gather(df1)
key value
1 col1 1
2 col1 2
3 col1 3
4 col2 4
5 col2 5
6 col2 6
7 col3 7
8 col3 8
9 col3 9
One more using c function:
data.frame(col11 = c(df1,recursive=TRUE))
col11
col11 1
col12 2
col13 3
col21 4
col22 5
col23 6
col31 7
col32 8
col33 9
You could try:
as.data.frame(as.vector(as.matrix(df1)))
# as.vector(as.matrix(df1))
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
Another approach, just for using Reduce...
data.frame(Reduce(c, df1))
Related
I have a list with two dataframes, the first of which has two columns and the second of which has three.
dat.list<-list(dat1=data.frame(col1=c(1,2,3),
col2=c(10,20,30)),
dat2= data.frame(col1=c(5,6,7),
col2=c(30,40,50),
col3=c(7,8,9)))
# $dat1
# col1 col2
# 1 1 10
# 2 2 20
# 3 3 30
# $dat2
# col1 col2 col3
# 1 5 30 7
# 2 6 40 8
# 3 7 50 9
I am trying to create a new column in both dataframes using map(), mutate() and case_when(). I want this new column to be identical to col3 if the dataframe has more than two columns, and identical to col1 if it has two or less columns. I have tried to do this with the following code:
library(tidyverse)
dat.list %>% map(~ .x %>%
mutate(newcol=case_when(ncol(.)>2 ~ col3,
TRUE ~ col1),
))
However, this returns the following error: "object 'col3' not found". How can I get the desired output? Below is the exact output I am trying to achieve.
# $dat1
# col1 col2 newcol
# 1 1 10 1
# 2 2 20 2
# 3 3 30 3
# $dat2
# col1 col2 col3 newcol
# 1 5 30 7 7
# 2 6 40 8 8
# 3 7 50 9 9
if/else will do :
library(dplyr)
library(purrr)
dat.list %>% map(~ .x %>% mutate(newcol= if(ncol(.) > 2) col3 else col1))
#$dat1
# col1 col2 newcol
#1 1 10 1
#2 2 20 2
#3 3 30 3
#$dat2
# col1 col2 col3 newcol
#1 5 30 7 7
#2 6 40 8 8
#3 7 50 9 9
Base R using lapply :
lapply(dat.list, function(x) transform(x, newcol = if(ncol(x) > 2) col3 else col1))
Following a question I came across today, I would like to know how I can use bind_rows function in a pipe while avoiding duplication and NA values. Consider I have the following simple tibble:
df <- tibble(
col1 = c(3, 4, 5),
col2 = c(5, 3, 1),
col3 = c(6, 4, 9),
col4 = c(9, 6, 5)
)
I would like to bind col1 & col2 row-wise with col3 & col4 so that I have a tibble with 2 columns and 6 observations. In the end changing the names of the columns to colnew1 and colnew2.
But when I use bind_rows I got the following output with a lot of duplications and NA values.
df %>%
bind_rows(
select(., 1:2),
select(., 3:4)
)
# A tibble: 9 x 4
col1 col2 col3 col4
<dbl> <dbl> <dbl> <dbl>
1 3 5 6 9
2 4 3 4 6
3 5 1 9 5
4 3 5 NA NA
5 4 3 NA NA
6 5 1 NA NA
7 NA NA 6 9
8 NA NA 4 6
9 NA NA 9 5
# My desired output would be something like this:
f1 <- function(x) {
df <- x %>%
set_names(nm = rep(c("newcol1", "newcol2"), 2))
bind_rows(df[, c(1, 2)], df[, c(3, 4)])
}
f1(df)
# A tibble: 6 x 2
newcol1 newcol2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
I can get the desired output without a pipe but first I would like to know how I could use bind_rows in a pipe without getting NA values and duplications and second whether I could use select function in bind_rows as I remember once Hadley Wickham used filter function wrapped by bind_rows.
I would appreciate any explanation to this problem and thank you in advance.
Select the first two columns and bind_rows col3 col4 to col1 and col2 then use transmute
df1 <- df %>%
select(col1, col2) %>%
bind_rows(
df %>%
transmute(col1 = col3, col2 = col4)
)
Results:
# A tibble: 6 x 2
col1 col2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a data table in R, called A, which has three columns Col1, Col2, and Col3. Another table, called B, also has the same three columns. I want to remove all the rows in table A, for which the pairs (Col1, Col2) are present in table B. I tried, but I am not sure how to do this. I am stuck on this for last few days.
Thanks,
library(data.table)
A = data.table(Col1 = 1:4, Col2 = 4:1, Col3 = letters[1:4])
# Col1 Col2 Col3
#1: 1 4 a
#2: 2 3 b
#3: 3 2 c
#4: 4 1 d
B = data.table(Col1 = c(1,3,5), Col2 = c(4,2,1))
# Col1 Col2
#1: 1 4
#2: 3 2
#3: 5 1
A[!B, on = c("Col1", "Col2")]
# Col1 Col2 Col3
#1: 2 3 b
#2: 4 1 d
We can use anti_join
library(dplyr)
anti_join(A, B, by = c('Col1', 'Col2'))
Here's a go, using interaction:
A <- data.frame(Col1=1:3, Col2=2:4, Col3=10:12)
B <- data.frame(Col1=1:2, Col2=2:3, Col3=10:11)
A
# Col1 Col2 Col3
#1 1 2 10
#2 2 3 11
#3 3 4 12
B
# Col1 Col2 Col3
#1 1 2 10
#2 2 3 11
byv <- c("Col1","Col2")
A[!(interaction(A[byv]) %in% interaction(B[byv])),]
# Col1 Col2 Col3
#3 3 4 12
Or create a unique id for each row, and then exclude those that merged:
A[-merge(cbind(A[byv],id=seq_len(nrow(A))), B[byv], by=byv)$id,]
I have below data frame
col1 <- c("A","B", "A")
col2 <- c("C","D","D")
col3 <- c("E","E","E")
col4 <- c("F","F","H")
x <- data.frame(col1,col2,col3,col4)
Output of above frame is:
1
I want to replace characters to numbers, as below:
2
Here's a one-liner in base R that works with any number of columns and any names - nothing is hard-coded, so it works with any x:
> setNames(data.frame(matrix(as.numeric(unlist(x)),ncol=ncol(x))),names(x))
col1 col2 col3 col4
1 1 3 5 6
2 2 4 5 6
3 1 4 5 7
x <- x %>%
unlist %>%
as.numeric %>%
matrix(ncol=4) %>%
data.frame
names(x) <- paste0("col", 1:4)
x
col1 col2 col3 col4
1 1 3 5 6
2 2 4 5 6
3 1 4 5 7
Here is a solution with base R:
x[] <- match(as.matrix(x), unique(c(as.matrix(x))))
# > x
# col1 col2 col3 col4
# 1 1 3 5 6
# 2 2 4 5 6
# 3 1 4 5 7
Here is a shorter solution:
x[] <- as.integer(unlist(x))
data:
x <- data.frame(col1=c("A","B", "A"), col2=c("C","D","D"), col3=c("E","E","E"), col4=c("F","F","H")
We can use lapply from base R
x[] <- lapply(x, match, LETTERS)
x
# col1 col2 col3 col4
#1 1 3 5 6
#2 2 4 5 6
#3 1 4 5 8
Given a data frame:
df=data.frame(co1=c(5,9,6,1,6),co2=c(8,5,4,6,2),co3=c(6,5,4,1,2),co4=c(6,1,5,3,2),co5=c(5,1,2,6,8))
rownames(df)=c('row1','row2','row3','row4','row5')
df
# co1 co2 co3 co4 co5
#row1 5 8 6 6 5
#row2 9 5 5 1 1
#row3 6 4 4 5 2
#row4 1 6 1 3 6
#row5 6 2 2 2 8
How can I select numbers whose values are greater than 5? And how to determine which row and column are these numbers in? That is, how do I get a data frame like this:
# rownames colnames value
# row1 col2 8
# row1 col3 6
# row1 col4 6
# row2 col1 9
# row3 col1 6
# ... ... ...
We can use melt with subset
library(reshape2)
subset(melt(as.matrix(df)), value>5)
Or if you prefer tidyverse,
library(tidyverse)
df %>%
rownames_to_column() %>%
gather(colname,value,-rowname) %>%
filter(value > 5)
Best regards,
Hans