This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 5 years ago.
Heyho,
I want to split my dataframe which looks like:
ID name1_attr1 name1_attr2 name2_attr2 ...
1 2 3 1
2 1 3 4
3 3 4 2
4 6 7 5
into:
ID name attr1 attr2
1 1 2 3
2 1 1 3
3 1 3 4
4 1 6 7
1 2 1
2 2 4
3 2 2
4 2 5
I am really not sure how to do that? Do you have any hint or start for me?
Thanks in advance :)
We can use melt from data.table which can take multiple patterns
library(data.table)
melt(setDT(df),measure = patterns("attr1", "attr2"),
value.name = c("attr1", "attr2"), variable.name = "name")
# ID name attr1 attr2
#1: 1 1 2 3
#2: 2 1 1 3
#3: 3 1 3 4
#4: 4 1 6 7
#5: 1 2 NA 1
#6: 2 2 NA 4
#7: 3 2 NA 2
#8: 4 2 NA 5
data
df <- structure(list(ID = 1:4, name1_attr1 = c(2L, 1L, 3L, 6L), name1_attr2 = c(3L,
3L, 4L, 7L), name2_attr2 = c(1L, 4L, 2L, 5L)), .Names = c("ID",
"name1_attr1", "name1_attr2", "name2_attr2"), class = "data.frame", row.names = c(NA,
-4L))
You can try a tidyverse solution and using a combination of gather and spread.
d <- read.table(text="ID name1_attr1 name1_attr2 name2_attr2
1 2 3 1
2 1 3 4
3 3 4 2
4 6 7 5", header=T)
library(tidyverse)
d %>%
gather(k, v, -ID) %>%
separate(k, c("name","b"), sep = "_") %>%
spread(b, v, fill = "") %>%
arrange(name)
ID name attr1 attr2
1 1 name1 2 3
2 2 name1 1 3
3 3 name1 3 4
4 4 name1 6 7
5 1 name2 1
6 2 name2 4
7 3 name2 2
8 4 name2 5
Related
I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!
A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))
You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3
I'm a beginner in R and I'm facing an issue.
Problem: I need to sort a dataframe by 2 columns (ID, i'th column) and then take lagged difference of the i'th column and record it. Then resort the data with the ID and the i+1 column and so on and so forth.
What I have written up till now:
for (val in (4:length(colnames(df)))){
df <- df[with(df, order(ID, df[val])), ]
d2_df <- df %>%
mutate_at(c(df[val]), list(lagged = ~ . - lag(.)))
}
The above code is messing somehow because the mutate_at function is throwing the error below:
Error: `.vars` must be a character/numeric vector or a `vars()` object, not a list.
Original dataset:
ID S1 S2
1 1 3 1
2 1 5 2
3 1 1 3
4 2 2 7
5 3 4 9
6 3 2 11
After Sort on ID and S1
ID S1 S2
1 1 1 3
2 1 3 1
3 1 5 2
4 2 2 7
5 3 2 11
6 3 4 9
Now what I need? S1.1 (which is the lagged difference of the sorted dataframe respective to each ID)
ID S1 S2 S1.1
1 1 1 3 NA
2 1 3 1 2
3 1 5 2 2
4 2 2 7 NA
5 3 2 11 NA
6 3 4 9 2
Similar logic applies for S2 where a new S2.2 will be generated.
Any help would be immensely appreciated.
Additionally what is required (below); where sum.S1 is the sum of the lagged differences and count.S1 is the count of observations at S1 for respective ID:
ID sum.S1 sum.S2 count.S1 count.S2
1 1 4 2 3 3
2 2 NA NA 1 1
3 3 2 2 2 2
Here's a way using non-standard evaluation (NSE) :
library(dplyr)
library(purrr)
library(rlang)
cols <- c('S1', 'S2')
bind_cols(df, map_dfc(cols, ~{
col <- sym(.x)
df %>%
arrange(ID, !!col) %>%
group_by(ID) %>%
transmute(!!paste0(.x, '.1') := !!col - lag(!!col)) %>%
ungroup %>%
select(-ID)
}))
# ID S1 S2 S1.1 S2.1
#1 1 3 1 NA NA
#2 1 5 2 2 1
#3 1 1 3 2 1
#4 2 2 7 NA NA
#5 3 4 9 NA NA
#6 3 2 11 2 2
data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), S1 = c(3L, 5L,
1L, 2L, 4L, 2L), S2 = c(1L, 2L, 3L, 7L, 9L, 11L)),
class = "data.frame", row.names = c(NA, -6L))
I have the following panel data set:
group i f r d
1 4 8 3 3
1 9 4 5 1
1 2 2 2 2
2 5 5 3 2
2 3 9 3 3
2 9 1 3 1
I want to reorder column i in this data frame according to values in column d for each group. So the highest value for group 1 in column i should correspond to the highest value in column d. In the end my data.frame should look like this:
group i f r d
1 9 8 3 3
1 2 4 5 1
1 4 2 2 2
2 5 5 3 2
2 9 9 3 3
2 3 1 3 1
Here is a dplyr solution.
First, group by group. Then get the permutation rearrangement of column d in a temporary new column, ord and use it to reorder i.
library(dplyr)
df1 %>%
group_by(group) %>%
mutate(ord = order(d),
i = i[ord]) %>%
ungroup() %>%
select(-ord)
## A tibble: 6 x 5
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
original (wrong)
You can achieve this using dplyr and rank:
library(dplyr)
df1 %>% group_by(group) %>%
mutate(i = i[rev(rank(d))])
Edit
This question is actually trickier than it first seems and the original answer I posted is incorrect. The correct solution orders by i before subsetting by the rank of d. This gives OP's desired output which my previous answer did not (not paying attention!)
df1 %>% group_by(group) %>%
mutate(i = i[order(i)][rank(d)])
# A tibble: 6 x 5
# Groups: group [2]
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
There is some confusion regarding the expected output. Here I am showing a way to get both the versions of the output.
A base R using split and mapply
df$i <- c(mapply(function(x, y) sort(y)[x],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
Or another version
df$i <- c(mapply(function(x, y) y[order(x)],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
We can also use dplyr for this :
For 1st version
library(dplyr)
df %>%
group_by(group) %>%
mutate(i = sort(i)[d])
2nd version is already shown by #Rui using order
df %>%
group_by(group) %>%
mutate(i = i[order(d)])
An option with data.table
library(data.table)
setDT(df1)[, i := i[order(d)], group]
df1
# group i f r d
#1: 1 9 8 3 3
#2: 1 2 4 5 1
#3: 1 4 2 2 2
#4: 2 9 5 3 2
#5: 2 5 9 3 3
#6: 2 3 1 3 1
If we need the second version
setDT(df1)[, i := sort(i)[d], group]
data
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L), i = c(4L, 9L,
2L, 5L, 3L, 9L), f = c(8L, 4L, 2L, 5L, 9L, 1L), r = c(3L, 5L,
2L, 3L, 3L, 3L), d = c(3L, 1L, 2L, 2L, 3L, 1L)), class = "data.frame",
row.names = c(NA,
-6L))
I'm trying to expand an R data table that looks like this:
a step_num duration
1 1 5
1 2 4
1 3 1
2 1 7
2 2 2
2 3 9
3 1 1
3 2 1
3 3 3
Into something that looks like this:
a | step_num | duration | 1_duration | 2_duration | 3_duration |
----------------------------------------------------------------
1 1 5 5 - -
1 2 4 - 4 -
1 3 1 - - 1
2 1 7 7 - -
2 2 2 - 2 -
2 3 9 - - 9
3 1 1 1 - -
3 2 1 - 1 -
3 3 3 - - 3
I'm wondering if there's an 'expand' function, so to speak, that would do this.
Thanks!
We can do this in base r.
cbind(df,
reshape(df, idvar = c("a","step_num"), timevar = "step_num", direction = "wide")[,-1])
#> a step_num duration duration.1 duration.2 duration.3
#> 1 1 1 5 5 NA NA
#> 2 1 2 4 NA 4 NA
#> 3 1 3 1 NA NA 1
#> 4 2 1 7 7 NA NA
#> 5 2 2 2 NA 2 NA
#> 6 2 3 9 NA NA 9
#> 7 3 1 1 1 NA NA
#> 8 3 2 1 NA 1 NA
#> 9 3 3 3 NA NA 3
Created on 2019-05-21 by the reprex package (v0.2.1)
Simple tidyverse solution:
library(tidyverse)
df %>%
mutate(step = step_num) %>%
spread(step, duration, fill = '-') %>%
rename_all( ~ gsub('(\\d+)', 'duration_\\1', .))
# a step_num duration_1 duration_2 duration_3
# 1 1 1 5 - -
# 2 1 2 - 4 -
# 3 1 3 - - 1
# 4 2 1 7 - -
# 5 2 2 - 2 -
# 6 2 3 - - 9
# 7 3 1 1 - -
# 8 3 2 - 1 -
# 9 3 3 - - 3
Or an option with dcast from data.table
library(data.table)
dcast(setDT(df), a + step_num ~
paste0("duration_", step_num), value.var = 'duration')
# a step_num duration_1 duration_2 duration_3
#1: 1 1 5 NA NA
#2: 1 2 NA 4 NA
#3: 1 3 NA NA 1
#4: 2 1 7 NA NA
#5: 2 2 NA 2 NA
#6: 2 3 NA NA 9
#7: 3 1 1 NA NA
#8: 3 2 NA 1 NA
#9: 3 3 NA NA 3
NOTE: It is better to have NA instead of - as NA is easily removable with is.na/complete.cases/na.omit and it wouldn't change the class of the column to character
data
df <- structure(list(a = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), step_num = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), duration = c(5L, 4L, 1L, 7L,
2L, 9L, 1L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
Here's an approach using dplyr and tidyr.
We take the original data and add on some columns by first adding a new column col which holds the column header we want, based on the step_num. Then we use tidyr::spread to put the durations into different columns depending on which col they go with. fill = "-" fills all the empty columns with dashes. Finally, we drop the a and step_num columns since they're already there in the original data and we don't want to have copies of them.
(Note, we needed step_num to still exist at the spread step, because we wanted to keep each row aligned with the original rows. Without step_num, the data would get spread into a wider, shorter format that would have misaligned rows.)
library(dplyr); library(tidyr)
df %>%
mutate(col = paste0(step_num, "_duration")) %>%
spread(col, duration, fill = "-") %>%
select(-a, -step_num)) %>%
bind_cols(df, .) # Edit, per excellent suggestion from M-M
a step_num duration 1_duration 2_duration 3_duration
1 1 1 5 5 - -
2 1 2 4 - 4 -
3 1 3 1 - - 1
4 2 1 7 7 - -
5 2 2 2 - 2 -
6 2 3 9 - - 9
7 3 1 1 1 - -
8 3 2 1 - 1 -
9 3 3 3 - - 3
This question already has answers here:
Create counter within consecutive runs of values
(3 answers)
Closed 5 years ago.
I would like to have a sequence along each Blocks as such:
Blocks MySeq
1 1
1 2
2 1
2 2
1 1
1 2
1 3
1 4
3 1
3 2
3 3
4 1
4 2
4 3
4 4
Based on this I have try
myDf %>% dplyr::mutate(MySeq= seq(1:length(unique(Blocks)),rle(Blocks)$"lengths")
However, the sequence is not resetting with each new block. See below:
Blocks MySeq
1 1
1 2
2 1
2 2
1 3
1 4
1 5
1 6
3 1
3 2
3 3
4 1
4 2
4 3
4 4
How can I make a new sequence from each individual Blocks?
Try this
unlist(sapply(rle(df1$Blocks)$lengths,seq_len))
We can use rleid from data.table by grouping the rleid of 'Blocks' and assign (:=) 'MySeq' as the sequence of rows.
library(data.table)
setDT(df1)[, MySeq := seq_len(.N) , .(rleid(Blocks))]
df1
# Blocks MySeq
# 1: 1 1
# 2: 1 2
# 3: 2 1
# 4: 2 2
# 5: 1 1
# 6: 1 2
# 7: 1 3
# 8: 1 4
# 9: 3 1
#10: 3 2
#11: 3 3
#12: 4 1
#13: 4 2
#14: 4 3
#15: 4 4
Or if we are using base R, then sequence of lengths will get the expected output
sequence(rle(df1$Blocks)$lengths)
#[1] 1 2 1 2 1 2 3 4 1 2 3 1 2 3 4
data
df1 <- structure(list(Blocks = c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 4L, 4L, 4L, 4L)), .Names = "Blocks", row.names = c(NA,
-15L), class = "data.frame")