How to enumerate groups in R? [duplicate] - r

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 2 years ago.
Which native R function or which function in any other library would I be able to make a column listed with the one in the image below?
Dataset
lines = "Group
C
C
C
B
B
A
A
A
A
A
A
D
D
D
D
"
dataset = read.table(textConnection(lines), sep=";", h=T)

Try with cur_group_id() from dplyr:
library(dplyr)
#Code 1
newdf <- dataset%>%
mutate(Group=factor(Group,levels = unique(Group),ordered = T)) %>%
group_by(Group) %>% mutate(Num=cur_group_id())
Output:
# A tibble: 15 x 2
# Groups: Group [4]
Group Num
<ord> <int>
1 C 1
2 C 1
3 C 1
4 B 2
5 B 2
6 A 3
7 A 3
8 A 3
9 A 3
10 A 3
11 A 3
12 D 4
13 D 4
14 D 4
15 D 4
Or using base R:
#Code 2
dataset$Num <- as.integer(factor(dataset$Group,levels = unique(dataset$Group)))
Output:
Group Num
1 C 1
2 C 1
3 C 1
4 B 2
5 B 2
6 A 3
7 A 3
8 A 3
9 A 3
10 A 3
11 A 3
12 D 4
13 D 4
14 D 4
15 D 4

Related

How to stack multiple columns into one using R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 1 year ago.
I have the following data frame:
A <- c(3,5,6,7)
B <- c(2,4,5,3)
C <- c(4,6,7,8)
D <- c(2,4,5,3)
gene <- c(1,2,3,4)
df <- data.frame(gene,A,B,C,D)
df
gene A B C D
1 1 3 2 4 2
2 2 5 4 6 4
3 3 6 5 7 5
4 4 7 3 8 3
How can I stack each lettered column into one new column called "count" such that there is another new column called "sample" that keeps track of the original column from which each count value came frame (ie. I would like the following output):
count sample
3 A
5 A
6 A
7 A
2 B
4 B
5 B
3 B
4 C
6 C
7 C
8 C
2 D
4 D
5 D
3 D
Sorry this is difficult to explain but the output data frame above should make it clear.
Thanks
In base R, use stack after removing the first column
out <- stack(df[-1])
names(out) <- c("count", "sample")
We could use pivot_longer:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(
cols = -gene,
names_to = "sample",
values_to = "count"
) %>%
select(-gene) %>%
arrange(sample)
sample count
<chr> <dbl>
1 A 3
2 A 5
3 A 6
4 A 7
5 B 2
6 B 4
7 B 5
8 B 3
9 C 4
10 C 6
11 C 7
12 C 8
13 D 2
14 D 4
15 D 5
16 D 3

Suming up consecutive values in groups [duplicate]

This question already has answers here:
Calculate cumulative sum (cumsum) by group
(5 answers)
Closed 2 years ago.
I'd like to sum up consecutive values in one column by groups, without long explanation, I have df like this:
set.seed(1)
gr <- c(rep('A',3),rep('B',2),rep('C',5),rep('D',3))
vals <- floor(runif(length(gr), min=0, max=10))
idx <- c(seq(1:3),seq(1:2),seq(1:5),seq(1:3))
df <- data.frame(gr,vals,idx)
gr vals idx
1 A 2 1
2 A 3 2
3 A 5 3
4 B 9 1
5 B 2 2
6 C 8 1
7 C 9 2
8 C 6 3
9 C 6 4
10 C 0 5
11 D 2 1
12 D 1 2
13 D 6 3
And I'm looking for this one:
gr vals idx
1 A 2 1
2 A 5 2
3 A 10 3
4 B 9 1
5 B 11 2
6 C 8 1
7 C 17 2
8 C 23 3
9 C 29 4
10 C 29 5
11 D 2 1
12 D 3 2
13 D 9 3
So ex. in group C we have 8+9=17 (first and second element of the group) and second value is replaced by the sum. Then 17+6=23 (sum of previously summed elements and third element), 3rd element replaced by the new result and so on...
I was looking for some solution here but it isn't what I'm looking for.
Ok, I think I got it
df %>%
group_by(gr) %>%
mutate(nvals = cumsum(vals))
gr vals idx nvals
1 A 2 1 2
2 A 3 2 5
3 A 5 3 10
4 B 9 1 9
5 B 2 2 11
6 C 8 1 8
7 C 9 2 17
8 C 6 3 23
9 C 6 4 29
10 C 0 5 29
11 D 2 1 2
12 D 1 2 3
13 D 6 3 9

Transpose and Merge columns in R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
Quite new to R and I have a dataset in this format:
A B C
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
But I want it in this format:
A 1
A 2
A 3
A 4
A 5
B 1
B 2
B 3
...etc.
Seems like such a simple issue but I need HELP! Thanks
df <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
stack(df)
values ind
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 1 B
7 2 B
8 3 B
9 4 B
10 5 B
11 1 C
12 2 C
13 3 C
14 4 C
15 5 C
Examples using dplyr's gather function:
library(tidyverse)
A <- c(1,2,3,4,5)
B <- c(1,2,3,4,5)
C <- c(1,2,3,4,5)
df <- data.frame(A,B,C)
df %>% gather(key = "key", value = "value")
key value
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 1
7 b 2
8 b 3
9 b 4
10 b 5
11 c 1
12 c 2
13 c 3
14 c 4
15 c 5
You can use the package tidyr. This let's you choose, which columns you want to gather in the column "variable".
# if not installed yet
install.packages("tidyr")
library(tidyr)
data <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
data %>% pivot_longer(c(A, B, C), names_to = "variable", values_to = "value")
# Result
variable value
<chr> <int>
1 A 1
2 B 1
3 C 1
4 A 2
5 B 2
6 C 2
7 A 3
8 B 3
9 C 3
10 A 4
11 B 4
12 C 4
13 A 5
14 B 5
15 C 5

Dynamic select expression in function [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I am trying to write a function that will convert this data frame
library(dplyr)
library(rlang)
library(purrr)
df <- data.frame(obj=c(1,1,2,2,3,3,3,4,4,4),
S1=rep(c("a","b"),length.out=10),PR1=rep(c(3,7),length.out=10),
S2=rep(c("c","d"),length.out=10),PR2=rep(c(7,3),length.out=10))
obj S1 PR1 S2 PR2
1 1 a 3 c 7
2 1 b 7 d 3
3 2 a 3 c 7
4 2 b 7 d 3
5 3 a 3 c 7
6 3 b 7 d 3
7 3 a 3 c 7
8 4 b 7 d 3
9 4 a 3 c 7
10 4 b 7 d 3
In to this data frame
df %>% {bind_rows(select(., obj, S = S1, PR = PR1),
select(., obj, S = S2, PR = PR2))}
obj S PR
1 1 a 3
2 1 b 7
3 2 a 3
4 2 b 7
5 3 a 3
6 3 b 7
7 3 a 3
8 4 b 7
9 4 a 3
10 4 b 7
11 1 c 7
12 1 d 3
13 2 c 7
14 2 d 3
15 3 c 7
16 3 d 3
17 3 c 7
18 4 d 3
19 4 c 7
20 4 d 3
But I would like the function to be able to work with any number of columns. So it would also work if I had S1, S2, S3, S4 or if there was an additional category ie DS1, DS2. Ideally the function would take as arguments the patterns that determine which columns are stacked on top of each other, the number of sets of each column, the names of the output columns and the names of any variables that should also be kept.
This is my attempt at this function:
stack_col <- function(df, patterns, nums, cnames, keep){
keep <- enquo(keep)
build_exp <- function(x){
paste0("!!sym(cnames[[", x, "]]) := paste0(patterns[[", x, "]],num)") %>%
parse_expr()
}
exps <- map(1:length(patterns), ~expr(!!build_exp(.)))
sel_fun <- function(num){
df %>% select(!!keep,
!!!exps)
}
map(nums, sel_fun) %>% bind_rows()
}
I can get the sel_fun part to work for a fixed number of patterns like this
patterns <- c("S", "PR")
cnames <- c("Species", "PR")
keep <- quo(obj)
sel_fun <- function(num){
df %>% select(!!keep,
!!sym(cnames[[1]]) := paste0(patterns[[1]], num),
!!sym(cnames[[2]]) := paste0(patterns[[2]], num))
}
sel_fun(1)
But the dynamic version that I have tried does not work and gives this error:
Error: `:=` can only be used within a quasiquoted argument
Here is a function to get the expected output. Loop through the 'patterns' and the corresponding new column names ('cnames') using map2, gather into 'long' format, rename the 'val' column to the 'cnames' passed into the function, bind the columns (bind_cols) and select the columns of interest
stack_col <- function(dat, pat, cname, keep) {
purrr::map2(pat, cname, ~
dat %>%
dplyr::select(keep, matches(.x)) %>%
tidyr::gather(key, val, matches(.x)) %>%
dplyr::select(-key) %>%
dplyr::rename(!! .y := val)) %>%
dplyr::bind_cols(.) %>%
dplyr::select(keep, cname)
}
stack_col(df, patterns, cnames, 1)
# obj Species PR
#1 1 a 3
#2 1 b 7
#3 2 a 3
#4 2 b 7
#5 3 a 3
#6 3 b 7
#7 3 a 3
#8 4 b 7
#9 4 a 3
#10 4 b 7
#11 1 c 7
#12 1 d 3
#13 2 c 7
#14 2 d 3
#15 3 c 7
#16 3 d 3
#17 3 c 7
#18 4 d 3
#19 4 c 7
#20 4 d 3
Also, multiple patterns reshaping can be done with data.table::melt
library(data.table)
melt(setDT(df), measure = patterns("^S\\d+", "^PR\\d+"),
value.name = c("Species", "PR"))[, variable := NULL][]
This solves your problem, although it does not fix your function:
The idea is to use gather and spread on the columns which starts with the specific pattern. Therefore I create a regex which matches the column names and then first gather all of them, extract the group and the rename the groups with the cnames. Finally spread takes separates the new columns.
library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
patterns <- c("S", "PR")
cnames <- c("Species", "PR")
names(cnames) <- patterns
complete_pattern <- str_c("^", str_c(patterns, collapse = "|^"))
df %>%
mutate(rownumber = 1:n()) %>%
gather(new_variable, value, matches(complete_pattern)) %>%
mutate(group = str_extract(new_variable, complete_pattern),
group = str_replace_all(group, cnames),
group_number = str_extract(new_variable, "\\d+")) %>%
select(-new_variable) %>%
spread(group, value)
# obj rownumber group_number PR Species
# 1 1 1 1 3 a
# 2 1 1 2 7 c
# 3 1 2 1 7 b
# 4 1 2 2 3 d
# 5 2 3 1 3 a
# 6 2 3 2 7 c
# 7 2 4 1 7 b
# 8 2 4 2 3 d
# 9 3 5 1 3 a
# 10 3 5 2 7 c
# 11 3 6 1 7 b
# 12 3 6 2 3 d
# 13 3 7 1 3 a
# 14 3 7 2 7 c
# 15 4 8 1 7 b
# 16 4 8 2 3 d
# 17 4 9 1 3 a
# 18 4 9 2 7 c
# 19 4 10 1 7 b
# 20 4 10 2 3 d

numbering duplicated rows in dplyr [duplicate]

This question already has answers here:
Using dplyr to get cumulative count by group
(3 answers)
Closed 5 years ago.
I come to an issue with numbering the duplicated rows in data.frame and could not find a similar post.
Let's say we have a data like this
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
> df
gr x
1 1 a
2 1 a
3 2 b
4 2 b
5 3 c
6 3 c
7 4 a
8 4 a
9 5 c
10 5 c
11 6 d
12 6 d
13 7 a
14 7 a
and want to add new column called x_dupl to show that first occurrence of x values is numbered as 1 and second time 2 and third time 3 and so on..
thanks in advance!
The expected output
> df
gr x x_dupl
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
Your example data (plus rows where gr = 7 as in your output), and named df1, not df:
df1 <- data.frame(gr = gl(7,2),
x = c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
library(dplyr)
df1 %>%
group_by(x) %>%
mutate(x_dupl = dense_rank(gr)) %>%
ungroup()
# A tibble: 14 x 3
gr x x_dupl
<fctr> <fctr> <int>
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
A base R solution:
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
x <- rle(as.numeric(df$x))
x$values <- ave(x$values, x$values, FUN = seq_along)
df$x_dupl <- inverse.rle(x)
# gr x x_dupl
# 1 1 a 1
# 2 1 a 1
# 3 2 b 1
# 4 2 b 1
# 5 3 c 1
# 6 3 c 1
# 7 4 a 2
# 8 4 a 2
# 9 5 c 2
# 10 5 c 2
# 11 6 d 1
# 12 6 d 1
# 13 7 a 3
# 14 7 a 3

Resources