This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 3 years ago.
I have a data set
df <- data.frame("ID" = c("sue_1","bob_2","nick_3","joe_4"),
"1_confidence.x" = c(3,3,1,5),
"2_reading.x" = c(4,3,2,5),
"3_maths.x" = c(3,2,4,2),
"1_confidence.y" = c(3,2,3,4),
"2_reading.y" = c(3,4,2,1),
"3_maths.y" = c(3,4,2,5)
)
Giving this df:
> df
ID X1_confidence.x X2_reading.x X3_maths.x X1_confidence.y X2_reading.y X3_maths.y
1 sue_1 3 4 3 3 3 3
2 bob_2 3 3 2 2 4 4
3 nick_3 1 2 4 3 2 2
4 joe_4 5 5 2 4 1 5
I would like it to get into this format:
ID Test X1_confidence X2_reading X3_maths
1 sue_1 pre 3 4 3
2 sue_1 post 3 3 3
3 bob_2 pre 3 3 2
4 bob_2 post 2 4 4
5 nick_3 pre 1 2 4
6 nick_3 post 3 2 2
7 joe_4 pre 5 5 2
8 joe_4 post 4 1 5
I've tried reshape and gather, but just can't seem to figure it out...
There should be a more "direct" way to do this only with pivot_longer. I was not able to get the arguments correct for it. Here is one way using pivot_longer and pivot_wider together from tidyr 1.0.0
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "key") %>%
mutate(key = sub("\\.x$|\\.y$", "", key)) %>%
group_by(ID, key) %>%
mutate(Test = c("pre", "post")) %>%
pivot_wider(c(ID, Test), key)
# ID Test X1_confidence X2_reading X3_maths
# <fct> <chr> <dbl> <dbl> <dbl>
#1 sue_1 pre 3 4 3
#2 sue_1 post 3 3 3
#3 bob_2 pre 3 3 2
#4 bob_2 post 2 4 4
#5 nick_3 pre 1 2 4
#6 nick_3 post 3 2 2
#7 joe_4 pre 5 5 2
#8 joe_4 post 4 1 5
If your tidyr is not updated here is the same using gather and spread
df %>%
gather(key, value, -ID) %>%
mutate(key = sub("\\.x$|\\.y$", "", key)) %>%
group_by(key) %>%
mutate(Test = c("pre", "post")) %>%
spread(key, value)
This should do the trick:
df_long <- reshape(
data = df,
varying = list(c("X1_confidence.x","X1_confidence.y"),
c("X2_reading.x","X2_reading.y"),
c("X3_maths.x","X3_maths.y")),
idvar = 'ID',
v.names = c('X1_confidence', 'X2_reading', 'X3_maths'),
timevar = 'Test',
times = c('pre', 'post'),
direction = 'long'
)
Then just sort by ID:
df_long <- df_long[order(df_long$ID, decreasing = T), ]
Related
Let's say I have a data frame. I would like to mutate new columns by subtracting each pair of the existing columns. There are rules in the matching columns. For example, in the below codes, the prefix is all same for the first component (base_g00) of the subtraction and the same for the second component (allow_m00). Also, the first component has numbers from 27 to 43 for the id and the second component's id is from 20 to 36 also can be interpreted as (1st_id-7). I am wondering for the following code, can I write in a apply function or loops within mutate format to make the codes simpler. Thanks so much for any suggestions in advance!
pred_error<-y07_13%>%mutate(annual_util_1=base_g0027-allow_m0020,
annual_util_2=base_g0028-allow_m0021,
annual_util_3=base_g0029-allow_m0022,
annual_util_4=base_g0030-allow_m0023,
annual_util_5=base_g0031-allow_m0024,
annual_util_6=base_g0032-allow_m0025,
annual_util_7=base_g0033-allow_m0026,
annual_util_8=base_g0034-allow_m0027,
annual_util_9=base_g0035-allow_m0028,
annual_util_10=base_g0036-allow_m0029,
annual_util_11=base_g0037-allow_m0030,
annual_util_12=base_g0038-allow_m0031,
annual_util_13=base_g0039-allow_m0032,
annual_util_14=base_g0040-allow_m0033,
annual_util_15=base_g0041-allow_m0034,
annual_util_16=base_g0042-allow_m0035,
annual_util_17=base_g0043-allow_m0036)
I think a more idiomatic tidyverse approach would be to reshape your data so those column groups are encoded as a variable instead of as separate columns which have the same semantic meaning.
For instance,
library(dplyr); library(tidyr); library(stringr)
y07_13 <- tibble(allow_m0021 = 1:5,
allow_m0022 = 2:6,
allow_m0023 = 11:15,
base_g0028 = 5,
base_g0029 = 3:7,
base_g0030 = 100)
y07_13 %>%
mutate(row = row_number()) %>%
pivot_longer(-row) %>%
mutate(type = str_extract(name, "allow_m|base_g"),
num = str_remove(name, type) %>% as.numeric(),
group = num - if_else(type == "allow_m", 20, 27)) %>%
select(row, type, group, value) %>%
pivot_wider(names_from = type, values_from = value) %>%
mutate(annual_util = base_g - allow_m)
Result
# A tibble: 15 x 5
row group allow_m base_g annual_util
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 5 4
2 1 2 2 3 1
3 1 3 11 100 89
4 2 1 2 5 3
5 2 2 3 4 1
6 2 3 12 100 88
7 3 1 3 5 2
8 3 2 4 5 1
9 3 3 13 100 87
10 4 1 4 5 1
11 4 2 5 6 1
12 4 3 14 100 86
13 5 1 5 5 0
14 5 2 6 7 1
15 5 3 15 100 85
Here is vectorised base R approach -
base_cols <- paste0("base_g00", 27:43)
allow_cols <- paste0("allow_m00", 20:36)
new_cols <- paste0("annual_util", 1:17)
y07_13[new_cols] <- y07_13[base_cols] - y07_13[allow_cols]
y07_13
This question already has answers here:
Transpose and Merge columns in R [duplicate]
(3 answers)
Closed last year.
How do we combine two or more columns using dplyr?
df = data.frame(a=1:6, b=seq(2,6))
I need my output as
a 1
a 2
a 3
a 4
a 5
a 6
b 2
b 2
b 2
b 2
b 2
b 2
You can use pivot_longer() from the tidyr package:
library(tidyr)
df <- data.frame(a = 1:6, b = rep(2, 6))
df %>% mutate(across(.cols = everything(), .fns = as.numeric)) %>%
pivot_longer(cols = everything(), names_to = "var", values_to = "value") %>%
arrange(var)
rev(stack(df))
ind values
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b 2
8 b 2
9 b 2
10 b 2
11 b 2
12 b 2
I am trying to gather() a data.frame, but somehow it is not doing what I want.
This is my data:
df <- data.frame("id" = c(1),
"reco_1"= c(2),
"sim_1" = c(2),
"title_1"= c(2),
"reco_2" = c(3),
"sim_2" = c(3),
"title_2"= c(3))
And this is what it looks like printed:
> df
id reco_1 sim_1 title_1 reco_2 sim_2 title_2
1 1 2 2 2 3 3 3
When I now gather() my df, it looks like this:
> df %>% gather(reco, sim, -id)
id reco sim
1 1 reco_1 2
2 1 sim_1 2
3 1 title_1 2
4 1 reco_2 3
5 1 sim_2 3
6 1 title_2 3
However, what I would like to have is the following structure:
id reco sim title
1 1 2 2 2
2 2 3 3 3
I would appreciate any help, since I do not even know whether gather() is even the right verb for it.
We can use pivot_longer
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-id, names_to = c(".value", "new_id"), names_sep = "_") %>%
select(-id)
# A tibble: 2 x 4
new_id reco sim title
<chr> <dbl> <dbl> <dbl>
1 1 2 2 2
2 2 3 3 3
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I am looking to create a new column based on two other column conditions using data.table. Here is my example code:
group <- c(1,1,1,2,2,2,3,3,3,4,4,4)
date <- c(6,2,3,7,6,9,7,1,4,6,8,9)
val1<- c("","A","A","","A","A","","A","A","","A","A")
df1<-data.frame(group,date,val1)
dt1<-as.data.table(df1)
Here is the output:
group date val1
1 6
1 2 A
1 3 A
2 7
2 6 A
2 9 A
3 7
3 1 A
3 4 A
4 6
4 8 A
4 9 A
I am looking to find the minimum value of date given that val1 = A within each group (1,2,3,4) to look like this:
group date val1 findmin
1 6
1 2 A Y
1 3 A
2 7
2 6 A Y
2 9 A
3 7
3 1 A Y
3 4 A
4 6
4 8 A Y
4 9 A
I have tried
dt1[,findmin:= ifelse(date=min(date[val1 == "A"])),"Y","", by = group]
Read as: if date minimum date where val1 = "A", put a "Y" in a new column called 'findmin', else put nothing, and do this for each group (1,2,3,4). I get this error:
Error in `[.data.table`(dt1, , `:=`(findmin, ifelse(min(date[val1 == "A"]))), :
Provide either by= or keyby= but not both
I appreciate the help, thanks!
You have to be careful with your brackets and that equality is checked with ==:
dt1[,findmin := fifelse(date == min(date[val1 == "A"]), "Y", ""), by = group]
This code works using dplyr. I'm sure there is a much more elegant way to do this.
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(data.table)) {
install.packages("data.table")
}
library(data.table)
group <- c(1,1,1,2,2,2,3,3,3,4,4,4)
date <- c(6,2,3,7,6,9,7,1,4,6,8,9)
val1<- c("","A","A","","A","A","","A","A","","A","A")
df1<-data.frame(group,date,val1)
dt1<-as.data.table(df1)
# filter for A
df2 <- df1 %>% filter(val1 == "A")
# group by group, arrange by date, get the 1st row, ungroup, add findmin = Y
df3 <- df2 %>% group_by(group) %>% arrange(date) %>% slice(1) %>% ungroup() %>% mutate(findmin = "Y", )
# join back to the original data
df4 <- df1 %>% left_join(df3, by = c("group", "date", "val1"))
# set NA in findmin to "" if you want
df5 <- df4 %>% mutate(findmin = ifelse(is.na(findmin), "", findmin))
# print
df5
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
Testing with randomized data
# test randomized
df6 <- sample_frac(df1, size=1)
df6
group date val1
1 3 4 A
2 3 1 A
3 4 8 A
4 4 6
5 4 9 A
6 2 9 A
7 2 7
8 3 7
9 1 3 A
10 1 6
11 2 6 A
12 1 2 A
df6 <- df6 %>%
filter(val1 == "A") %>%
group_by(group) %>%
arrange(date) %>%
slice(1) %>%
ungroup() %>%
mutate(findmin = "Y", )
df7 <- df1 %>%
left_join(df6, by = c("group", "date", "val1")) %>%
mutate(findmin = ifelse(is.na(findmin), "", findmin)) %>%
arrange(group, val1, date, findmin)
df7
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
Alternative use which.min in place of arrange and slice
df6 <- sample_frac(df1, size=1)
df6
df6 <- df6 %>%
filter(val1 == "A") %>%
group_by(group) %>%
slice(which.min(date)) %>%
ungroup() %>%
mutate(findmin = "Y", )
df7 <- df1 %>%
left_join(df6, by = c("group", "date", "val1")) %>%
mutate(findmin = ifelse(is.na(findmin), "", findmin)) %>%
arrange(group, val1, date, findmin)
df7
group date val1 findmin
1 1 6
2 1 2 A Y
3 1 3 A
4 2 7
5 2 6 A Y
6 2 9 A
7 3 7
8 3 1 A Y
9 3 4 A
10 4 6
11 4 8 A Y
12 4 9 A
How can I get a dense rank of multiple columns in a dataframe? For example,
# I have:
df <- data.frame(x = c(1,1,1,1,2,2,2,3,3,3),
y = c(1,2,3,4,2,2,2,1,2,3))
# I want:
res <- data.frame(x = c(1,1,1,1,2,2,2,3,3,3),
y = c(1,2,3,4,2,2,2,1,2,3),
r = c(1,2,3,4,5,5,5,6,7,8))
res
x y z
1 1 1 1
2 1 2 2
3 1 3 3
4 1 4 4
5 2 2 5
6 2 2 5
7 2 2 5
8 3 1 6
9 3 2 7
10 3 3 8
My hack approach works for this particular dataset:
df %>%
arrange(x,y) %>%
mutate(r = if_else(y - lag(y,default=0) == 0, 0, 1)) %>%
mutate(r = cumsum(r))
But there must be a more general solution, maybe using functions like dense_rank() or row_number(). But I'm struggling with this.
dplyr solutions are ideal.
Right after posting, I think I found a solution here. In my case, it would be:
mutate(df, r = dense_rank(interaction(x,y,lex.order=T)))
But if you have a better solution, please share.
data.table
data.table has you covered with frank().
library(data.table)
frank(df, x,y, ties.method = 'min')
[1] 1 2 3 4 5 5 5 8 9 10
You can df$r <- frank(df, x,y, ties.method = 'min') to add as a new column.
tidyr/dplyr
Another option (though clunkier) is to use tidyr::unite to collapse your columns to one plus dplyr::dense_rank.
library(tidyverse)
df %>%
# add a single column with all the info
unite(xy, x, y) %>%
cbind(df) %>%
# dense rank on that
mutate(r = dense_rank(xy)) %>%
# now drop the helper col
select(-xy)
You can use cur_group_id:
library(dplyr)
df %>%
group_by(x, y) %>%
mutate(r = cur_group_id())
# x y r
# <dbl> <dbl> <int>
# 1 1 1 1
# 2 1 2 2
# 3 1 3 3
# 4 1 4 4
# 5 2 2 5
# 6 2 2 5
# 7 2 2 5
# 8 3 1 6
# 9 3 2 7
# 10 3 3 8