Collapse several columns of data frame into one data frame - r

For some reason, I have a data in which a few columns are a set of data frame consist of one column. So, I want to "collapse" these columns of data frame into one data frame.
library(tidyverse)
df <- tibble(col1=1:5,
col2=tibble(newcol=LETTERS[1:5]),
col3=tibble(newcol2=LETTERS[6:10]))
df
# A tibble: 5 x 3
col1 col2$newcol col3$newcol2
<int> <chr> <chr>
1 1 A F
2 2 B G
3 3 C H
4 4 D I
5 5 E J
I have tried unnest(), but, the function actually replicate data frame/tibble of col2 and col3 for each row of col1, which is not what I want.
df2 <- df %>% unnest(cols = c(col2, col3))
df2
# A tibble: 25 x 3
col1 col2 col3
<int> <chr> <chr>
1 1 A F
2 1 B G
3 1 C H
4 1 D I
5 1 E J
6 2 A F
7 2 B G
8 2 C H
9 2 D I
10 2 E J
# ... with 15 more rows
The result that I want is as below:
df3 <- tibble(col1=1:5,
newcol=LETTERS[1:5],
newcol2=LETTERS[6:10])
df3
# A tibble: 5 x 3
col1 newcol newcol2
<int> <chr> <chr>
1 1 A F
2 2 B G
3 3 C H
4 4 D I
5 5 E J
Any idea how to do this? Any help is much appreciated.

it looks like you only want to change the column names or am I missing something here?
df<-df%>%mutate(col2=df$col2$newcol, col3=df$col3$newcol2)
After your comment, here you can find a more general version (might not be suitable for all use cases)
df1<-df%>%unnest(cols = c(1:3))%>%
group_by(col1)%>%
mutate(row=row_number())%>%
filter(row==col1)%>%
select(-row)

If I understand correct you have three dataframes each of them containing one column. Now you want to bring them all in one dataframe together. Then cbind is an option.
df3 <- cbind(df, col2, col3)
Output:
col1 newcol newcol2
1 1 A F
2 2 B G
3 3 C H
4 4 D I
5 5 E J

Related

Find rows that occur only once, in two datasets

I have data as follows:
library(data.table)
datA <- fread("A B C
1 1 1
2 2 2")
datB <- fread("A B C
1 1 1
2 2 2
3 3 3")
I want to figure out which rows are unique (which is the one with 3 3 3, because all others occur more often).
I tried:
dat <- rbind(datA, datB)
unique(dat)
!duplicated(dat)
I also tried
setDT(dat)[,if(.N ==1) .SD,]
But that is NULL.
How should I do this?
You can use fsetdiff:
rbind.data.frame(fsetdiff(datA, datB, all = TRUE),
fsetdiff(datB, datA, all = TRUE))
In general, this is called an anti_join:
library(dplyr)
bind_rows(anti_join(datA, datB),
anti_join(datB, datA))
A B C
1: 4 4 4
2: 3 3 3
Data: I added a row in datA to show how to keep rows from both data sets (a simple anti-join does not work otherwise):
library(data.table)
datA <- fread("A B C
1 1 1
2 2 2
4 4 4")
datB <- fread("A B C
1 1 1
2 2 2
3 3 3")
One possible solution
library(data.table)
datB[!datA, on=c("A", "B", "C")]
A B C
<int> <int> <int>
1: 3 3 3
Or (if you are interested in the symmetric difference)
funion(fsetdiff(datB, datA), fsetdiff(datA, datB))
A B C
<int> <int> <int>
1: 3 3 3
Another dplyr option by filtering rows that appear once with a group_by and filter:
library(data.table)
library(dplyr)
datA %>%
bind_rows(., datB) %>%
group_by(across(everything())) %>%
filter(n() == 1)
#> # A tibble: 1 × 3
#> # Groups: A, B, C [1]
#> A B C
#> <int> <int> <int>
#> 1 3 3 3
Created on 2022-11-09 with reprex v2.0.2

How to remove all rows from dataframe if count of simillar `person_id` values are not `== 2`

I need remove all rows from dataframe if count of simillar person_id values are not == 2. For example:
a1 <- data.frame(person_id = 1:5, b=letters[1:5])
a2 <- data.frame(person_id = 2:6, b=letters[6:10])
data = rbind(a1, a2)
person_id b
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 2 f
7 3 g
8 4 h
9 5 i
10 6 j
Row 1 and 10 must be removed, because person_id==1 and person_id==6 have only 1 record. For example person_id==2 have 2 rows.
How can I get new dataset with only rows where count of rows with person_id values are == 2 (and in future 3 or 4)?
Base R solution:
subset(
data,
ave(person_id, person_id, FUN = length) == 2
)
To remove the rows where count of person_id isn't equal to 2:
library(dplyr)
data %>%
group_by(person_id) %>%
filter(n() == 2)
person_id b
<int> <chr>
1 2 b
2 3 c
3 4 d
4 5 e
5 2 f
6 3 g
7 4 h
8 5 i

R slide window through tibble

I got a simple question that I cannot figure out solutions.
Also, I didn't find an answer that I understand.
Imagine I got this data frame
(ts <- tibble(
+ a = LETTERS[1:10],
+ b = c(rep(1, 5), rep(2,5))
+ ))
# A tibble: 10 x 2
a b
<chr> <dbl>
1 A 1
2 B 1
3 C 1
4 D 1
5 E 1
6 F 2
7 G 2
8 H 2
9 I 2
10 J 2
What I want is simple. I want to build a df with the column b indexing a sliding window which sizes n f the column a.
The output can be something like this:
# A tibble: 8 x 2
b a
<dbl> <chr>
1 1 A B
2 1 B C
3 1 C D
4 1 D E
5 2 F G
6 2 G H
7 2 H I
8 2 I J
I don't care if the column a contains an array (nest values).
I just need a new data frame based on the sliding window.
Since this operation will run in a relational database I'd like a function compatible with DBI-PostgresSQL.
Any help is appreciated.
Thanks in advance
We can group by 'b', create the new column based on the lead of 'a', remove the NA rows with na.omit
library(dplyr)
ts %>%
group_by(b) %>%
mutate(a2 = lead(a)) %>%
ungroup %>%
na.omit %>%
select(b, everything())
# A tibble: 8 x 3
# b a a2
# <dbl> <chr> <chr>
#1 1 A B
#2 1 B C
#3 1 C D
#4 1 D E
#5 2 F G
#6 2 G H
#7 2 H I
#8 2 I J
If lead doesn't works, then just remove the first element, append NA at the end in the mutate step
ts %>%
group_by(b) %>%
mutate(a2 = c(a[-1], NA)) %>%
ungroup %>%
na.omit %>%
select(b, everything())

Drop list columns from dataframe using dplyr and select_if

Is it possible to drop all list columns from a dataframe using dpyr select similar to dropping a single column?
df <- tibble(
a = LETTERS[1:5],
b = 1:5,
c = list('bob', 'cratchit', 'rules!','and', 'tiny tim too"')
)
df %>%
select_if(-is.list)
Error in -is.list : invalid argument to unary operator
This seems to be a doable work around, but was wanting to know if it can be done with select_if.
df %>%
select(-which(map(df,class) == 'list'))
Use Negate
df %>%
select_if(Negate(is.list))
# A tibble: 5 x 2
a b
<chr> <int>
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
There is also purrr::negate that would give the same result.
We can use Filter from base R
Filter(Negate(is.list), df)
# A tibble: 5 x 2
# a b
# <chr> <int>
#1 A 1
#2 B 2
#3 C 3
#4 D 4
#5 E 5

subtract first or second value from each row [duplicate]

This question already has answers here:
R subtract value for the same ID (from the first ID that shows)
(3 answers)
subtract first value from each subset of dataframe
(2 answers)
Closed 4 years ago.
I'm manipulating my data using dplyr, and after grouping my data, I would like to subtract all values by the first or second value in my group (i.e., subtracting a baseline). Is it possible to perform this in a single pipe step?
MWE:
test <- tibble(one=c("c","d","e","c","d","e"), two=c("a","a","a","b","b","b"), three=1:6)
test %>% group_by(`two`) %>% mutate(new=three-three[.$`one`=="d"])
My desired output is:
# A tibble: 6 x 4
# Groups: two [2]
one two three new
<chr> <chr> <int> <int>
1 c a 1 -1
2 d a 2 0
3 e a 3 1
4 c b 4 -1
5 d b 5 0
6 e b 6 1
However I am getting this as the output:
# A tibble: 6 x 4
# Groups: two [2]
one two three new
<chr> <chr> <int> <int>
1 c a 1 -1
2 d a 2 NA
3 e a 3 1
4 c b 4 -1
5 d b 5 NA
6 e b 6 1
We can use the first from dplyr
test %>%
group_by(two) %>%
mutate(new=three- first(three))
# A tibble: 6 x 4
# Groups: two [2]
# one two three new
# <chr> <chr> <int> <int>
#1 c a 1 0
#2 d a 2 1
#3 e a 3 2
#4 c b 4 0
#5 d b 5 1
#6 e b 6 2
If we are subsetting the 'three' values based on string "c" in 'one', then we don't need .$ as it will get the whole column 'c' instead of the values within the group by column
test %>%
group_by(`two`) %>%
mutate(new=three-three[one=="c"])
library(tidyverse)
tibble(
one = c("c", "d", "e", "c", "d", "e"),
two = c("a", "a", "a", "b", "b", "b"),
three = 1:6
) -> test_df
test_df %>%
group_by(two) %>%
mutate(new = three - three[1])
## # A tibble: 6 x 4
## # Groups: two [2]
## one two three new
## <chr> <chr> <int> <int>
## 1 c a 1 0
## 2 d a 2 1
## 3 e a 3 2
## 4 c b 4 0
## 5 d b 5 1
## 6 e b 6 2

Resources