Related
Take an example dataframe like so (the real dataframe has more columns):
df <- data.frame(A = seq(1, 3, 1),
B = seq(4, 6, 1))
I can use pivot_longer to collect my columns of interest (A and B) like so:
library(dplyr)
library(tidyr)
df <- df %>%
pivot_longer(cols = c("A", "B"), names_to = "Letter", values_to = "Number")
df
Letter Number
<chr> <dbl>
1 A 1
2 B 4
3 A 2
4 B 5
5 A 3
6 B 6
Now let's say I have another column C in my dataframe, making it no longer tidy
C <- seq(7, 12, 1)
df_2 <- data.frame(df, C)
df_2
Letter Number C
1 A 1 7
2 B 4 8
3 A 2 9
4 B 5 10
5 A 3 11
6 B 6 12
I want to use pivot_longer again to make df_2 tidy and get this output:
data.frame(Letter = c(rep("A", 3), rep("B", 3), rep("C", 3)),
Number = seq(1, 12, 1))
Letter Number
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 C 7
8 C 8
9 C 9
10 C 10
11 C 11
12 C 12
Using the same strategy creates an error though:
df_2 %>%
pivot_longer(cols = "C", names_to = "Letter", values_to = "Number")
Error: Failed to create output due to bad names.
* Choose another strategy with `names_repair`
Setting names_repair to minimal runs but doesn't produce the output I want.
Follow it like this
library(tidyverse)
df <- data.frame(A = seq(1, 3, 1),
B = seq(4, 6, 1))
df <- df %>%
pivot_longer(cols = c("A", "B"), names_to = "Letter", values_to = "Number")
C <- seq(7, 12, 1)
df_2 <- data.frame(C)
df_2 <- df_2 %>% pivot_longer(cols = C, names_to = "Letter", values_to = "Number")
df_result <- rbind(df, df_2)
Output
> df_result
# A tibble: 12 x 2
Letter Number
<chr> <dbl>
1 A 1
2 B 4
3 A 2
4 B 5
5 A 3
6 B 6
7 C 7
8 C 8
9 C 9
10 C 10
11 C 11
12 C 12
Maybe try this if it is helpful:
library(tidyverse)
#Code
df_2 %>% pivot_longer(everything()) %>%
arrange(name) %>% group_by(name) %>%
filter(!duplicated(value))
Output:
# A tibble: 12 x 2
# Groups: name [3]
name value
<chr> <dbl>
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 C 7
8 C 8
9 C 9
10 C 10
11 C 11
12 C 12
We could do this easily with stack
library(dplyr)
stack(df_2)[2:1] %>%
distinct %>%
set_names(c("Letter", "Number"))
-output
# Letter Number
#1 A 1
#2 A 2
#3 A 3
#4 B 4
#5 B 5
#6 B 6
#7 C 7
#8 C 8
#9 C 9
#10 C 10
#11 C 11
#12 C 12
Or an option with unnest/enframe
library(tidyr)
library(tibble)
unclass(df_2) %>%
enframe(name = "Letter", value = "Number") %>%
unnest(c(Number)) %>%
distinct
Or using melt
library(reshape2)
melt(df_2) %>%
distinct()
Or in a single line in base R
unique(stack(df_2)[2:1])
I'm starting with a data frame with 5 columns: one treatment column, T_type, and four outcome variable columns, A, B, C and D. I'm trying to stack the outcome variables so I end up with one column for values, another with the names of the four outcome variables and then a column with the treatment names repeated down along the stacked columns. It's what's shown in the R help page for pivot_longer in the relig_income example and pretty much what Jason was trying to do here: dplyr `pivot_longer()` object not found but it's right there?
I get the same sort of error Jason was getting with pivot_longer and have no idea why. Here's what's happening.
dd <- as.data.frame(matrix(rpois(32, 4), nrow = 8))
names(dd) <- LETTERS[1:4]
dd <- data.frame(dd, T_type = rep(c("M", "P"), each = 4))
dd
A B C D T_type
1 3 5 5 4 M
2 7 5 2 2 M
3 2 3 3 10 M
4 3 3 2 3 M
5 8 3 4 3 P
6 4 4 5 1 P
7 6 4 2 6 P
8 9 4 3 6 P
So now I try pivot_longer.
dd %>% pivot_longer(-T_type, cols = A:D, names_to = "response", values_to = "y_obs")
Error in build_longer_spec(data, !!cols, names_to = names_to, values_to = values_to, :
object 'T_type' not found
Re-arranging the columns in dd so T_type is before columns A to D doesn't help.
I'd be grateful if someone could tell me what's going on here and how I can get pivot_longer to do the job.
You need to eliminate T_type from pivot_longer because the first argument of this function is the dataset (which can be omitted in you are in a %>% pipeline)
dd %>% pivot_longer(cols = A:D, names_to = "response", values_to = "y_obs")
Output
# A tibble: 32 x 3
# T_type response y_obs
# <chr> <chr> <int>
# 1 M A 7
# 2 M B 4
# 3 M C 4
# 4 M D 3
# 5 M A 8
# 6 M B 3
# 7 M C 5
# 8 M D 3
# 9 M A 4
# 10 M B 6
# ... with 22 more rows
Try this :
dd %>%
gather("response", "y_obs", -T_type)
Or :
dd %>% pivot_longer(names_to = "response", values_to = "y_obs", -T_type)
Or :
dd %>% pivot_longer(names_to = "response", values_to = "y_obs", A:D)
Youy specify the range of cols : A to D, so you will not find T_type
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I am looking to create a new column based on two other column conditions using data.table. Here is my example code:
group <- c(1,1,1,2,2,2,3,3,3,4,4,4)
date <- c(6,2,3,7,6,9,7,1,4,6,8,9)
val1<- c("","A","A","","A","A","","A","A","","A","A")
df1<-data.frame(group,date,val1)
dt1<-as.data.table(df1)
Here is the output:
group date val1
1 6
1 2 A
1 3 A
2 7
2 6 A
2 9 A
3 7
3 1 A
3 4 A
4 6
4 8 A
4 9 A
I am looking to find the minimum value of date given that val1 = A within each group (1,2,3,4) to look like this:
group date val1 findmin
1 6
1 2 A Y
1 3 A
2 7
2 6 A Y
2 9 A
3 7
3 1 A Y
3 4 A
4 6
4 8 A Y
4 9 A
I have tried
dt1[,findmin:= ifelse(date=min(date[val1 == "A"])),"Y","", by = group]
Read as: if date minimum date where val1 = "A", put a "Y" in a new column called 'findmin', else put nothing, and do this for each group (1,2,3,4). I get this error:
Error in `[.data.table`(dt1, , `:=`(findmin, ifelse(min(date[val1 == "A"]))), :
Provide either by= or keyby= but not both
I appreciate the help, thanks!
You have to be careful with your brackets and that equality is checked with ==:
dt1[,findmin := fifelse(date == min(date[val1 == "A"]), "Y", ""), by = group]
This code works using dplyr. I'm sure there is a much more elegant way to do this.
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(data.table)) {
install.packages("data.table")
}
library(data.table)
group <- c(1,1,1,2,2,2,3,3,3,4,4,4)
date <- c(6,2,3,7,6,9,7,1,4,6,8,9)
val1<- c("","A","A","","A","A","","A","A","","A","A")
df1<-data.frame(group,date,val1)
dt1<-as.data.table(df1)
# filter for A
df2 <- df1 %>% filter(val1 == "A")
# group by group, arrange by date, get the 1st row, ungroup, add findmin = Y
df3 <- df2 %>% group_by(group) %>% arrange(date) %>% slice(1) %>% ungroup() %>% mutate(findmin = "Y", )
# join back to the original data
df4 <- df1 %>% left_join(df3, by = c("group", "date", "val1"))
# set NA in findmin to "" if you want
df5 <- df4 %>% mutate(findmin = ifelse(is.na(findmin), "", findmin))
# print
df5
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
Testing with randomized data
# test randomized
df6 <- sample_frac(df1, size=1)
df6
group date val1
1 3 4 A
2 3 1 A
3 4 8 A
4 4 6
5 4 9 A
6 2 9 A
7 2 7
8 3 7
9 1 3 A
10 1 6
11 2 6 A
12 1 2 A
df6 <- df6 %>%
filter(val1 == "A") %>%
group_by(group) %>%
arrange(date) %>%
slice(1) %>%
ungroup() %>%
mutate(findmin = "Y", )
df7 <- df1 %>%
left_join(df6, by = c("group", "date", "val1")) %>%
mutate(findmin = ifelse(is.na(findmin), "", findmin)) %>%
arrange(group, val1, date, findmin)
df7
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
Alternative use which.min in place of arrange and slice
df6 <- sample_frac(df1, size=1)
df6
df6 <- df6 %>%
filter(val1 == "A") %>%
group_by(group) %>%
slice(which.min(date)) %>%
ungroup() %>%
mutate(findmin = "Y", )
df7 <- df1 %>%
left_join(df6, by = c("group", "date", "val1")) %>%
mutate(findmin = ifelse(is.na(findmin), "", findmin)) %>%
arrange(group, val1, date, findmin)
df7
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 3 years ago.
I have a data set
df <- data.frame("ID" = c("sue_1","bob_2","nick_3","joe_4"),
"1_confidence.x" = c(3,3,1,5),
"2_reading.x" = c(4,3,2,5),
"3_maths.x" = c(3,2,4,2),
"1_confidence.y" = c(3,2,3,4),
"2_reading.y" = c(3,4,2,1),
"3_maths.y" = c(3,4,2,5)
)
Giving this df:
> df
ID X1_confidence.x X2_reading.x X3_maths.x X1_confidence.y X2_reading.y X3_maths.y
1 sue_1 3 4 3 3 3 3
2 bob_2 3 3 2 2 4 4
3 nick_3 1 2 4 3 2 2
4 joe_4 5 5 2 4 1 5
I would like it to get into this format:
ID Test X1_confidence X2_reading X3_maths
1 sue_1 pre 3 4 3
2 sue_1 post 3 3 3
3 bob_2 pre 3 3 2
4 bob_2 post 2 4 4
5 nick_3 pre 1 2 4
6 nick_3 post 3 2 2
7 joe_4 pre 5 5 2
8 joe_4 post 4 1 5
I've tried reshape and gather, but just can't seem to figure it out...
There should be a more "direct" way to do this only with pivot_longer. I was not able to get the arguments correct for it. Here is one way using pivot_longer and pivot_wider together from tidyr 1.0.0
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "key") %>%
mutate(key = sub("\\.x$|\\.y$", "", key)) %>%
group_by(ID, key) %>%
mutate(Test = c("pre", "post")) %>%
pivot_wider(c(ID, Test), key)
# ID Test X1_confidence X2_reading X3_maths
# <fct> <chr> <dbl> <dbl> <dbl>
#1 sue_1 pre 3 4 3
#2 sue_1 post 3 3 3
#3 bob_2 pre 3 3 2
#4 bob_2 post 2 4 4
#5 nick_3 pre 1 2 4
#6 nick_3 post 3 2 2
#7 joe_4 pre 5 5 2
#8 joe_4 post 4 1 5
If your tidyr is not updated here is the same using gather and spread
df %>%
gather(key, value, -ID) %>%
mutate(key = sub("\\.x$|\\.y$", "", key)) %>%
group_by(key) %>%
mutate(Test = c("pre", "post")) %>%
spread(key, value)
This should do the trick:
df_long <- reshape(
data = df,
varying = list(c("X1_confidence.x","X1_confidence.y"),
c("X2_reading.x","X2_reading.y"),
c("X3_maths.x","X3_maths.y")),
idvar = 'ID',
v.names = c('X1_confidence', 'X2_reading', 'X3_maths'),
timevar = 'Test',
times = c('pre', 'post'),
direction = 'long'
)
Then just sort by ID:
df_long <- df_long[order(df_long$ID, decreasing = T), ]
This question already has answers here:
R: reshaping wide to long [duplicate]
(1 answer)
Using tidyr to combine multiple columns [duplicate]
(1 answer)
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I'm hoping to reshape a dataframe in R so that a set of columns read in with duplicated names, and then renamed as var, var.1, var.2, anothervar, anothervar.1, anothervar.2 etc. can be treated as independent observations. I would like the number appended to the variable name to be used as the observation so that I can melt my data.
For example,
dat <- data.frame(ID=1:3, var=c("A", "A", "B"),
anothervar=c(5,6,7),var.1=c(C,D,E),
anothervar.1 = c(1,2,3))
> dat
ID var anothervar var.1 anothervar.1
1 1 A 5 C 1
2 2 A 6 D 2
3 3 B 7 E 3
How can I reshape the data so it looks like the following:
ID obs var anothervar
1 1 A 5
1 2 C 1
2 1 A 6
2 2 D 2
3 1 B 7
3 2 E 3
Thank you for your help!
We can use melt from data.table that takes multiple patterns in the measure
library(data.table)
melt(setDT(dat), measure = patterns("^var", "anothervar"),
variable.name = "obs", value.name = c("var", "anothervar"))[order(ID)]
# ID obs var anothervar
#1: 1 1 A 5
#2: 1 2 C 1
#3: 2 1 A 6
#4: 2 2 D 2
#5: 3 1 B 7
#6: 3 2 E 3
As for a tidyverse solution, we can use unite with gather
dat %>%
unite("1", var, anothervar) %>%
unite("2", var.1, anothervar.1) %>%
gather(obs, value, -ID) %>%
separate(value, into = c("var", "anothervar"))
# ID obs var anothervar
#1 1 1 A 5
#2 2 1 A 6
#3 3 1 B 7
#4 1 2 C 1
#5 2 2 D 2
#6 3 2 E 3