Expanding data frame using tidyverse [duplicate] - r

This question already has answers here:
Expand ranges defined by "from" and "to" columns
(10 answers)
Closed 6 years ago.
Here's an example of what I'm trying to do:
df <- data.frame(
id = letters[1:5],
enum_start = c(1, 1, 1, 1, 1),
enum_end = c(1, 5, 3, 7, 2)
)
df2 <- df %>%
split(.$id) %>%
lapply(function(x) cbind(x, hello = seq(x$enum_start, x$enum_end, by = 1L))) %>%
bind_rows
df2
# id enum_start enum_end hello
# 1 a 1 1 1
# 2 b 1 5 1
# 3 b 1 5 2
# 4 b 1 5 3
# 5 b 1 5 4
# 6 b 1 5 5
# 7 c 1 3 1
# 8 c 1 3 2
# 9 c 1 3 3
# 10 d 1 7 1
# 11 d 1 7 2
# 12 d 1 7 3
# 13 d 1 7 4
# 14 d 1 7 5
# 15 d 1 7 6
# 16 d 1 7 7
# 17 e 1 2 1
# 18 e 1 2 2
Note that the starting and ending values for hello depend on the data and hence the number of rows for each id is dynamic. I'm looking for a solution that involves maybe expand from tidyr but am struggling.

Here's a dplyr/tidyr approach
group_by(df, id) %>%
expand(enum_start, enum_end, hello = full_seq(enum_end:enum_start, 1))
Not sure if there's a tidyr-way without grouping the data (would be interesting to know)

Here is a base R method that produces the desired output.
dfNew <- within(df[rep(seq_len(nrow(df)), df$enum_end), ],
hello <- sequence(df$enum_end))
sequence will return the natural numbers and takes a vector that allows for repeated recounting. It is used to produce the "hello" variable. within reduces typing and returns a modified data.frame. I fed it an expanded version of df where rows are repeated using rep and [.
dfNew
id enum_start enum_end hello
1 a 1 1 1
2 b 1 5 1
2.1 b 1 5 2
2.2 b 1 5 3
2.3 b 1 5 4
2.4 b 1 5 5
3 c 1 3 1
3.1 c 1 3 2
3.2 c 1 3 3
4 d 1 7 1
4.1 d 1 7 2
4.2 d 1 7 3
4.3 d 1 7 4
4.4 d 1 7 5
4.5 d 1 7 6
4.6 d 1 7 7
5 e 1 2 1
5.1 e 1 2 2

Related

convert lists of vectors in just one tibble data frame

I have two lists. Each of them with many vectors (around 500) of different lengths and I would like to get a tibble data frame with three columns.
My reproducible example is the following:
> a
[[1]]
[1] 1 3 6
[[2]]
[1] 5 4
> b
[[1]]
[1] 3 4
[[2]]
[1] 5 6 7
I would like to get the following tibble data frame:
name index value
a 1 1
a 1 3
a 1 6
a 2 5
a 2 4
b 1 3
b 1 4
b 2 5
b 2 6
b 2 7
I would be grateful if someone could help me with this issue
using Base R:
transform(stack(c(a=a,b=b)),name=substr(ind,1,1),ind=substr(ind,2,2))
values ind name
1 1 1 a
2 2 1 a
3 3 1 a
4 5 2 a
5 6 2 a
6 3 1 b
7 4 1 b
8 5 2 b
9 6 2 b
10 7 2 b
using tidyverse:
library(tidyverse)
list(a=a,b=b)%>%map(~stack(setNames(.x,1:length(.x))))%>%bind_rows(.id = "name")
name values ind
1 a 1 1
2 a 2 1
3 a 3 1
4 a 5 2
5 a 6 2
6 b 3 1
7 b 4 1
8 b 5 2
9 b 6 2
10 b 7 2
Here is one option with tidyverse
library(tidyverse)
list(a= a, b = b) %>%
map_df(enframe, name = "index", .id = 'name') %>%
unnest
# A tibble: 10 x 3
# name index value
# <chr> <int> <dbl>
# 1 a 1 1
# 2 a 1 3
# 3 a 1 6
# 4 a 2 5
# 5 a 2 4
# 6 b 1 3
# 7 b 1 4
# 8 b 2 5
# 9 b 2 6
#10 b 2 7
data
a <- list(c(1, 3, 6), c(5, 4))
b <- list(c(3, 4), c(5, 6, 7))

Extract Index of repeat value

how do I extract specific row of data when the column has repetitive value? my data looks like this: I want to extract the row of the end of each repeat of x (A 3 10, A 2 3 etc) or the index of the last value
Name X M
A 1 1
A 2 9
A 3 10
A 1 1
A 2 3
A 1 5
A 2 6
A 3 4
A 4 5
A 5 3
B 1 1
B 2 9
B 3 10
B 1 1
B 2 3
Expected output
Index Name X M
3 A 3 10
5 A 2 3
10 A 5 3
13 B 3 10
15 B 2 3
Using base R duplicated and cumsum:
dups <- !duplicated(cumsum(dat$X == 1), fromLast=TRUE)
cbind(dat[dups,], Index=which(dups))
# Name X M Index
#3 A 3 10 3
#5 A 2 3 5
#10 A 5 3 10
#13 B 3 10 13
#15 B 2 3 15
A solution using dplyr.
library(dplyr)
df2 <- df %>%
mutate(Flag = ifelse(lead(X) < X, 1, 0)) %>%
mutate(Index = 1:n()) %>%
filter(Flag == 1 | is.na(Flag)) %>%
select(Index, X, M)
df2
# Index X M
# 1 3 3 10
# 2 5 2 3
# 3 10 5 3
# 4 13 3 10
# 5 15 2 3
Flag is a column showing if the next number in A is smaller than the previous number. If TRUE, Flag is 1, otherwise is 0. We can then filter for Flag == 1 or where Flag is NA, which is the last row. df2 is the final filtered data frame.
DATA
df <- read.table(text = "Name X M
A 1 1
A 2 9
A 3 10
A 1 1
A 2 3
A 1 5
A 2 6
A 3 4
A 4 5
A 5 3
B 1 1
B 2 9
B 3 10
B 1 1
B 2 3",
header = TRUE, stringsAsFactors = FALSE)

numbering duplicated rows in dplyr [duplicate]

This question already has answers here:
Using dplyr to get cumulative count by group
(3 answers)
Closed 5 years ago.
I come to an issue with numbering the duplicated rows in data.frame and could not find a similar post.
Let's say we have a data like this
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
> df
gr x
1 1 a
2 1 a
3 2 b
4 2 b
5 3 c
6 3 c
7 4 a
8 4 a
9 5 c
10 5 c
11 6 d
12 6 d
13 7 a
14 7 a
and want to add new column called x_dupl to show that first occurrence of x values is numbered as 1 and second time 2 and third time 3 and so on..
thanks in advance!
The expected output
> df
gr x x_dupl
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
Your example data (plus rows where gr = 7 as in your output), and named df1, not df:
df1 <- data.frame(gr = gl(7,2),
x = c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
library(dplyr)
df1 %>%
group_by(x) %>%
mutate(x_dupl = dense_rank(gr)) %>%
ungroup()
# A tibble: 14 x 3
gr x x_dupl
<fctr> <fctr> <int>
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
A base R solution:
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
x <- rle(as.numeric(df$x))
x$values <- ave(x$values, x$values, FUN = seq_along)
df$x_dupl <- inverse.rle(x)
# gr x x_dupl
# 1 1 a 1
# 2 1 a 1
# 3 2 b 1
# 4 2 b 1
# 5 3 c 1
# 6 3 c 1
# 7 4 a 2
# 8 4 a 2
# 9 5 c 2
# 10 5 c 2
# 11 6 d 1
# 12 6 d 1
# 13 7 a 3
# 14 7 a 3

how to mutate a column with ID in group

how to mutate a column with ID in group
data.frame like:
a b c
1 a 1 1
2 a 1 2
3 a 2 3
4 b 1 4
5 b 2 5
6 b 3 6
group by a, flag start with 1, if b equals pre b,then flag=1 else flag+=1
a b c flag
1 a 1 1 1 <- group a start with 1
2 a 1 2 1 <-- in group a, 1(in row 2)=1(in row 1)
3 a 2 3 2 <- in group a, 2(in row 3)!=1(in row 2)
4 b 1 4 1 <- group b start with 1
5 b 2 5 2 <- in group b, 2(in row 5)!=1(in row 4)
6 b 3 6 3 <- in group b, 3(in row 6)!=2(in row 5)
i now using this:
for(i in 2:nrow(x)){
x[i, 'flag'] = ifelse(x[i, 'a']!=x[i-1,'a'], 1, ifelse(x[i, 'b']==x[i-1, 'b'], x[i-1, 'flag'], x[i-1,'flag']+1))
}
but it is inefficiency in large dataset
#
UPDATE
dense_rank in dplyr give me the answer
> x %>% group_by(a) %>% mutate(dense_rank(b))
Source: local data frame [10 x 4]
Groups: a
a b c dense_rank(b)
1 a x 1 1
2 a x 2 1
3 a y 3 2
4 b x 4 1
5 b y 5 2
6 b z 6 3
7 c x 7 1
8 c y 8 2
9 c z 9 3
10 c z 10 3
thanks.
I am not entirely sure what you are trying to do. But it seems to me that you are trying to assign index numbers to values in b for each group (a or b).
#I modified your example here.
a <- rep(c("a","b"), each =3)
b <- c(4,4,5,11,12,13)
c <- 1:6
foo <- data.frame(a,b,c, stringsAsFactors = F)
a b c
1 a 4 1
2 a 4 2
3 a 5 3
4 b 11 4
5 b 12 5
6 b 13 6
#Since you referred to dplyr, I will use it.
cats <- list()
for(i in unique(foo$a)){
ana <- foo %>%
filter(a == i) %>%
arrange(b) %>%
mutate(indexInb = as.integer(as.factor(b)))
cats[[i]] <- ana
}
bob <- rbindlist(cats)
a b c indexInb
1: a 4 1 1
2: a 4 2 1
3: a 5 3 2
4: b 11 4 1
5: b 12 5 2
6: b 13 6 3
Hers's a quick vectorized way to solve this without using any for loops
Base R solution using ave and transform
transform(x, flag = ave(b, a, FUN = function(x) cumsum(c(1, diff(x)))))
# a b c flag
# 1 a 1 1 1
# 2 a 1 2 1
# 3 a 2 3 2
# 4 b 1 4 1
# 5 b 2 5 2
# 6 b 3 6 3
Or a data.table solution (more efficient)
library(data.table)
setDT(x)[, flag := cumsum(c(1, diff(b))), by = a]
x
# a b c flag
# 1: a 1 1 1
# 2: a 1 2 1
# 3: a 2 3 2
# 4: b 1 4 1
# 5: b 2 5 2
# 6: b 3 6 3
Or a dplyr solution (because you tagged it)
library(dplyr)
x %>%
group_by(a) %>%
mutate(flag = cumsum(c(1, diff(b))))
# Source: local data frame [6 x 4]
# Groups: a
#
# a b c flag
# 1 a 1 1 1
# 2 a 1 2 1
# 3 a 2 3 2
# 4 b 1 4 1
# 5 b 2 5 2
# 6 b 3 6 3

How to group consecutive columns of a dataframe using split function of R

Here's a short version of my large dataframe
>k
a b c d e f
1 3 4 5 7 8
2 1 7 9 0 3
3 2 2 5 6 9
I want to split in a way so that I can make separate dataframes of a,b,& c and d,e,& f like this
>k
$`1`
a b c
1 3 4
2 1 7
3 2 2
$`2`
d e f
5 7 8
9 0 3
5 6 9
I tried something like this -
range = seq(3,6,3)
k<-split(k, cut(colnames(k), range))
But it doesn't work since colnames(k) has to be numeric. Any other simple idea?
Something like this?
group <- rep(1:2, each=3)
lapply(unique(group), FUN=function(n) k[group==n])
# [[1]]
# a b c
# 1 1 3 4
# 2 2 1 7
# 3 3 2 2
#
# [[2]]
# d e f
# 1 5 7 8
# 2 9 0 3
# 3 5 6 9

Resources