I'm starting with a data frame with 5 columns: one treatment column, T_type, and four outcome variable columns, A, B, C and D. I'm trying to stack the outcome variables so I end up with one column for values, another with the names of the four outcome variables and then a column with the treatment names repeated down along the stacked columns. It's what's shown in the R help page for pivot_longer in the relig_income example and pretty much what Jason was trying to do here: dplyr `pivot_longer()` object not found but it's right there?
I get the same sort of error Jason was getting with pivot_longer and have no idea why. Here's what's happening.
dd <- as.data.frame(matrix(rpois(32, 4), nrow = 8))
names(dd) <- LETTERS[1:4]
dd <- data.frame(dd, T_type = rep(c("M", "P"), each = 4))
dd
A B C D T_type
1 3 5 5 4 M
2 7 5 2 2 M
3 2 3 3 10 M
4 3 3 2 3 M
5 8 3 4 3 P
6 4 4 5 1 P
7 6 4 2 6 P
8 9 4 3 6 P
So now I try pivot_longer.
dd %>% pivot_longer(-T_type, cols = A:D, names_to = "response", values_to = "y_obs")
Error in build_longer_spec(data, !!cols, names_to = names_to, values_to = values_to, :
object 'T_type' not found
Re-arranging the columns in dd so T_type is before columns A to D doesn't help.
I'd be grateful if someone could tell me what's going on here and how I can get pivot_longer to do the job.
You need to eliminate T_type from pivot_longer because the first argument of this function is the dataset (which can be omitted in you are in a %>% pipeline)
dd %>% pivot_longer(cols = A:D, names_to = "response", values_to = "y_obs")
Output
# A tibble: 32 x 3
# T_type response y_obs
# <chr> <chr> <int>
# 1 M A 7
# 2 M B 4
# 3 M C 4
# 4 M D 3
# 5 M A 8
# 6 M B 3
# 7 M C 5
# 8 M D 3
# 9 M A 4
# 10 M B 6
# ... with 22 more rows
Try this :
dd %>%
gather("response", "y_obs", -T_type)
Or :
dd %>% pivot_longer(names_to = "response", values_to = "y_obs", -T_type)
Or :
dd %>% pivot_longer(names_to = "response", values_to = "y_obs", A:D)
Youy specify the range of cols : A to D, so you will not find T_type
Related
This question already has answers here:
Transpose and Merge columns in R [duplicate]
(3 answers)
Closed last year.
How do we combine two or more columns using dplyr?
df = data.frame(a=1:6, b=seq(2,6))
I need my output as
a 1
a 2
a 3
a 4
a 5
a 6
b 2
b 2
b 2
b 2
b 2
b 2
You can use pivot_longer() from the tidyr package:
library(tidyr)
df <- data.frame(a = 1:6, b = rep(2, 6))
df %>% mutate(across(.cols = everything(), .fns = as.numeric)) %>%
pivot_longer(cols = everything(), names_to = "var", values_to = "value") %>%
arrange(var)
rev(stack(df))
ind values
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b 2
8 b 2
9 b 2
10 b 2
11 b 2
12 b 2
I have a dataframe in the following format with ID's and A/B's. The dataframe is very long, over 3000 ID's.
id
type
1
A
2
B
3
A
4
A
5
B
6
A
7
B
8
A
9
B
10
A
11
A
12
A
13
B
...
...
I need to remove all rows (A+B), where more than one A is behind another one or more. So I dont want to remove the duplicates. If there are a duplicate (2 or more A's), i want to remove all A's and the B until the next A.
id
type
1
A
2
B
6
A
7
B
8
A
9
B
...
...
Do I need a loop for this problem? I hope for any help,thank you!
This might be what you want:
First, define a function that notes the indices of what you want to remove:
row_sequence <- function(value) {
inds <- which(value == lead(value))
sort(unique(c(inds, inds + 1, inds +2)))
}
Apply the function to your dataframe by first extracting the rows that you want to remove into df1 and second anti_joining df1 with df to obtain the final dataframe:
library(dplyr)
df1 <- df %>% slice(row_sequence(type))
df2 <- df %>%
anti_join(., df1)
Result:
df2
id type
1 1 A
2 2 B
3 6 A
4 7 B
5 8 A
6 9 B
Data:
df <- data.frame(
id = 1:13,
type = c("A","B","A","A","B","A","B","A","B","A","A","A","B")
)
I imagined there is only one B after a series of duplicated A values, however if that is not the case just let me know to modify my codes:
library(dplyr)
library(tidyr)
library(data.table)
df %>%
mutate(rles = data.table::rleid(type)) %>%
group_by(rles) %>%
mutate(rles = ifelse(length(rles) > 1, NA, rles)) %>%
ungroup() %>%
mutate(rles = ifelse(!is.na(rles) & is.na(lag(rles)) & type == "B", NA, rles)) %>%
drop_na() %>%
select(-rles)
# A tibble: 6 x 2
id type
<int> <chr>
1 1 A
2 2 B
3 6 A
4 7 B
5 8 A
6 9 B
Data
df <- read.table(header = TRUE, text = "
id type
1 A
2 B
3 A
4 A
5 B
6 A
7 B
8 A
9 B
10 A
11 A
12 A
13 B")
I have a data.frame that contains duplicate column names that I want to lengthen. I don't want to fix the names because they correspond to values in my future column. I am trying to use pivot_longer but it throws an error.
Error: Can't transform a data frame with duplicate names.
I looked at the documentation for the function and used the "names_repair" argument to get around the issue but it didn't help.
I also found this issue on tidyvere's github but I'm not sure what's going on in there.
Here's my code:
library(dplyr)
library(tidyr)
df %>%
mutate_all(as.character) %>%
pivot_longer(-a, names_to = "Names", values_to = "Values", names_repair = "minimal")
Is there a way to do this?
Desired output:
a Names Values
<chr> <chr> <chr>
1 1 b 4
2 1 c a
3 1 c d
4 2 b 5
5 2 c b
6 2 c e
7 3 b 6
8 3 c c
9 3 c f
Sample data:
df <- setNames(data.frame(c(1,2,3),
c(4,5,6),
c("a","b","c"),
c("d","e","f"),
stringsAsFactors = F),
c("a","b","c","c"))
The problem is not pivot_wider, it can be used on data.frames containing columns with the same name - mutate can't. So we need to transform the columns to character columns either by (i) using base R or (ii) if you want to stay in the larger tidyverse purrr::modify_at (after all a data.frame is always a list). After that its just a regular call to pivot_wider.
df <- setNames(data.frame(c(1,2,3),
c(4,5,6),
c("a","b","c"),
c("d","e","f"),
stringsAsFactors = F),
c("a","b","c","c"))
library(dplyr)
library(tidyr)
# Alternatively use base R to transform cols to character
# df[,c("a", "b")] <- lapply(df[,c("a", "b")], as.character)
df %>%
purrr::modify_at(c("a","b"), as.character) %>%
pivot_longer(-a,
names_to = "Names",
values_to = "Values")
#> # A tibble: 9 x 3
#> a Names Values
#> <chr> <chr> <chr>
#> 1 1 b 4
#> 2 1 c a
#> 3 1 c d
#> 4 2 b 5
#> 5 2 c b
#> 6 2 c e
#> 7 3 b 6
#> 8 3 c c
#> 9 3 c f
Created on 2021-02-23 by the reprex package (v0.3.0)
In my data frame, the column names have brackets. I want to use the function select_ to pick up columns I need.
However, I got an error message
Error in overscope_eval_next(overscope, expr) : object 'A.B.V1' not found
How could I solve this problem?
This is a minimumal example to reproduce my problem.
library(dplyr)
a <- data_frame(`A.B.V1:7(1)` = seq(1, 10), B = seq(1, 10))
# Can select one column
a %>% select(`A.B.V1:7(1)`)
# Cannot select columns
col <- c('A.B.V1:7(1)', 'B')
a %>% select_(.dots = col)
You can use backquotes, eg:
col <- c('`A.B.V1:7(1)`', 'B')
a %>% select_(.dots = col)
# A tibble: 10 x 2
`A.B.V1:7(1)` B
<int> <int>
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
I have a trouble with repeating rows of my real data using dplyr. There is already another post in here repeat-rows-of-a-data-frame but no solution for dplyr.
Here I just wonder how could be the solution for dplyr
but failed with error:
Error: wrong result size (16), expected 4 or 1
library(dplyr)
df <- data.frame(column = letters[1:4])
df_rep <- df%>%
mutate(column=rep(column,each=4))
Expected output
>df_rep
column
#a
#a
#a
#a
#b
#b
#b
#b
#*
#*
#*
Using the uncount function will solve this problem as well. The column count indicates how often a row should be repeated.
library(tidyverse)
df <- tibble(letters = letters[1:4])
df
# A tibble: 4 x 1
letters
<chr>
1 a
2 b
3 c
4 d
df %>%
mutate(count = c(2, 3, 2, 4)) %>%
uncount(count)
# A tibble: 11 x 1
letters
<chr>
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 d
9 d
10 d
11 d
I was looking for a similar (but slightly different) solution. Posting here in case it's useful to anyone else.
In my case, I needed a more general solution that allows each letter to be repeated an arbitrary number of times. Here's what I came up with:
library(tidyverse)
df <- data.frame(letters = letters[1:4])
df
> df
letters
1 a
2 b
3 c
4 d
Let's say I want 2 A's, 3 B's, 2 C's and 4 D's:
df %>%
mutate(count = c(2, 3, 2, 4)) %>%
group_by(letters) %>%
expand(count = seq(1:count))
# A tibble: 11 x 2
# Groups: letters [4]
letters count
<fctr> <int>
1 a 1
2 a 2
3 b 1
4 b 2
5 b 3
6 c 1
7 c 2
8 d 1
9 d 2
10 d 3
11 d 4
If you don't want to keep the count column:
df %>%
mutate(count = c(2, 3, 2, 4)) %>%
group_by(letters) %>%
expand(count = seq(1:count)) %>%
select(letters)
# A tibble: 11 x 1
# Groups: letters [4]
letters
<fctr>
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 d
9 d
10 d
11 d
If you want the count to reflect the number of times each letter is repeated:
df %>%
mutate(count = c(2, 3, 2, 4)) %>%
group_by(letters) %>%
expand(count = seq(1:count)) %>%
mutate(count = max(count))
# A tibble: 11 x 2
# Groups: letters [4]
letters count
<fctr> <dbl>
1 a 2
2 a 2
3 b 3
4 b 3
5 b 3
6 c 2
7 c 2
8 d 4
9 d 4
10 d 4
11 d 4
This is rife with peril if the data.frame has other columns (there, I said it!), but the do block will allow you to generate a derived data.frame within a dplyr pipe (though, ceci n'est pas un pipe):
library(dplyr)
df <- data.frame(column = letters[1:4], stringsAsFactors = FALSE)
df %>%
do( data.frame(column = rep(.$column, each = 4), stringsAsFactors = FALSE) )
# column
# 1 a
# 2 a
# 3 a
# 4 a
# 5 b
# 6 b
# 7 b
# 8 b
# 9 c
# 10 c
# 11 c
# 12 c
# 13 d
# 14 d
# 15 d
# 16 d
As #Frank suggested, a much better alternative could be
df %>% slice(rep(1:n(), each=4))
I did a quick benchmark to show that uncount() is a lot faster than expand()
# for the pipe
library(magrittr)
# create some test data
df_test <-
tibble::tibble(
letter = letters,
row_count = sample(1:10, size = 26, replace = TRUE)
)
# benchmark
bench <- microbenchmark::microbenchmark(
expand = df_test %>%
dplyr::group_by(letter) %>%
tidyr::expand(row_count = seq(1:row_count)),
uncount = df_test %>%
tidyr::uncount(row_count)
)
# plot the benchmark
ggplot2::autoplot(bench)