Create new rows based on numbers in a column [duplicate] - r

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 1 year ago.
I currently have a table with a quantity in it.
ID
Code
Quantity
1
A
1
2
B
3
3
C
2
4
D
1
Is there anyway to get this table?
ID
Code
Quantity
1
A
1
2
B
1
2
B
1
2
B
1
3
C
1
3
C
1
4
D
1
I need to break out the quantity and have that many number of rows.
Thanks!!!!

Updated
Now we have stored the separated, collapsed values into a new column:
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
uncount(Quantity, .remove = FALSE) %>%
mutate(NewQ = 1)
# A tibble: 7 x 4
# Groups: ID [4]
ID Code Quantity NewQ
<int> <chr> <int> <dbl>
1 1 A 1 1
2 2 B 3 1
3 2 B 3 1
4 2 B 3 1
5 3 C 2 1
6 3 C 2 1
7 4 D 1 1
Updated
In case we opt not to replace the existing Quantity column with the collapsed values.
df %>%
group_by(ID) %>%
mutate(NewQ = ifelse(Quantity != 1, paste(rep(1, Quantity), collapse = ", "),
as.character(Quantity))) %>%
separate_rows(NewQ) %>%
mutate(NewQ = as.numeric(NewQ))
# A tibble: 7 x 4
# Groups: ID [4]
ID Code Quantity NewQ
<int> <chr> <int> <dbl>
1 1 A 1 1
2 2 B 3 1
3 2 B 3 1
4 2 B 3 1
5 3 C 2 1
6 3 C 2 1
7 4 D 1 1

We could use slice
library(dplyr)
df %>%
group_by(ID) %>%
slice(rep(1:n(), each = Quantity)) %>%
mutate(Quantity= rep(1))
Output:
ID Code Quantity
<dbl> <chr> <dbl>
1 1 A 1
2 2 B 1
3 2 B 1
4 2 B 1
5 3 C 1
6 3 C 1
7 4 D 1

A base R option using rep
transform(
`row.names<-`(df[rep(1:nrow(df), df$Quantity), ], NULL),
Quantity = 1
)
gives
ID Code Quantity
1 1 A 1
2 2 B 1
3 2 B 1
4 2 B 1
5 3 C 1
6 3 C 1
7 4 D 1

Related

Nested list to grouped rows in R

I have the following nested list called l (dput below):
> l
$A
$A$`1`
[1] 1 2 3
$A$`2`
[1] 3 2 1
$B
$B$`1`
[1] 2 2 2
$B$`2`
[1] 3 4 3
I would like to convert this to a grouped dataframe where A and B are the first group column and 1 and 2 are the subgroups with respective values. The desired output should look like this:
group subgroup values
1 A 1 1
2 A 1 2
3 A 1 3
4 A 2 3
5 A 2 2
6 A 2 1
7 B 1 2
8 B 1 2
9 B 1 2
10 B 2 3
11 B 2 4
12 B 2 3
As you can see A and B are the main group and 1 and 2 are the subgroups. Using purrr::flatten(l) or unnest doesn't work. So I was wondering if anyone knows how to convert a nested list to a grouped row dataframe?
dput of l:
l <- list(A = list(`1` = c(1, 2, 3), `2` = c(3, 2, 1)), B = list(`1` = c(2,
2, 2), `2` = c(3, 4, 3)))
Using stack and rowbind with id:
data.table::rbindlist(lapply(l, stack), idcol = "id")
# id values ind
# 1: A 1 1
# 2: A 2 1
# 3: A 3 1
# 4: A 3 2
# 5: A 2 2
# 6: A 1 2
# 7: B 2 1
# 8: B 2 1
# 9: B 2 1
# 10: B 3 2
# 11: B 4 2
# 12: B 3 2
You can use enframe() to convert the list into a data.frame, and unnest the value column twice.
library(tidyr)
tibble::enframe(l, name = "group") %>%
unnest_longer(value, indices_to = "subgroup") %>%
unnest(value)
# A tibble: 12 × 3
group value subgroup
<chr> <dbl> <chr>
1 A 1 1
2 A 2 1
3 A 3 1
4 A 3 2
5 A 2 2
6 A 1 2
7 B 2 1
8 B 2 1
9 B 2 1
10 B 3 2
11 B 4 2
12 B 3 2
Turn the list directly into a data frame, then pivot it into a long format and arrange to your desired order.
library(tidyverse)
lst %>%
as.data.frame() %>%
pivot_longer(everything(), names_to = c("group", "subgroup"),
values_to = "values",
names_pattern = "(.+?)\\.(.+?)") %>%
arrange(group, subgroup)
# A tibble: 12 × 3
group subgroup values
<chr> <chr> <dbl>
1 A 1 1
2 A 1 2
3 A 1 3
4 A 2 3
5 A 2 2
6 A 2 1
7 B 1 2
8 B 1 2
9 B 1 2
10 B 2 3
11 B 2 4
12 B 2 3
You can combine rrapply with unnest, which has the benefit to work in lists of arbitrary lengths:
library(rrapply)
library(tidyr)
rrapply(l, how = "melt") |>
unnest(value)
# A tibble: 12 × 3
L1 L2 value
<chr> <chr> <dbl>
1 A 1 1
2 A 1 2
3 A 1 3
4 A 2 3
5 A 2 2
6 A 2 1
7 B 1 2
8 B 1 2
9 B 1 2
10 B 2 3
11 B 2 4
12 B 2 3

create new order for existing column values without reordering rows in dataframe - R

I have some results cluster labels from kmeans done on different ids (reprex example below). the problem is the kmeans clusters codes are not ordered consistently across ids although all ids have 3 clusters.
reprex = data.frame(id = rep(1:2, each = 41,
v1 = rep(seq(1:4), 2),
cluster = c(2,2,1,3,3,1,2,2))
reprex
id v1 cluster
1 1 1 2
2 1 2 2
3 1 3 1
4 1 4 3
5 2 1 3
6 2 2 1
7 2 3 2
8 2 4 2
what I want is that the variable cluster should always start with 1 within each ID. Note I don't want to reorder that dataframe by cluster, the order needs to remain the same. so the desired result would be:
reprex_desired<- data.frame(id = rep(1:2, each = 4),
v1 = rep(seq(1:4), 2),
cluster = c(2,2,1,3,3,1,2,2),
what_iWant = c(1,1,2,3,1,2,3,3))
reprex_desired
id v1 cluster what_iWant
1 1 1 2 1
2 1 2 2 1
3 1 3 1 2
4 1 4 3 3
5 2 1 3 1
6 2 2 1 2
7 2 3 2 3
8 2 4 2 3
We can use match after grouping by 'id'
library(dplyr)
reprex <- reprex %>%
group_by(id) %>%
mutate(what_IWant = match(cluster, unique(cluster))) %>%
ungroup
-output
reprex
# A tibble: 8 × 4
id v1 cluster what_IWant
<int> <int> <dbl> <int>
1 1 1 2 1
2 1 2 2 1
3 1 3 1 2
4 1 4 3 3
5 2 1 3 1
6 2 2 1 2
7 2 3 2 3
8 2 4 2 3
Here is a version with cumsum combined with lag:
library(dplyr)
df %>%
group_by(id) %>%
mutate(what_i_want = cumsum(cluster != lag(cluster, def = first(cluster)))+1)
id v1 cluster what_i_want
<int> <int> <dbl> <dbl>
1 1 1 2 1
2 1 2 2 1
3 1 3 1 2
4 1 4 3 3
5 2 1 3 1
6 2 2 1 2
7 2 3 2 3
8 2 4 2 3

Code values in new column based on whether values in another column are unique

Given the following data I would like to create a new column new_sequence based on the condition:
If only one id is present the new value should be 0. If several id's are present, the new value should numbered according to the values present in sequence.
dat <- tibble(id = c(1,2,3,3,3,4,4),
sequence = c(1,1,1,2,3,1,2))
# A tibble: 7 x 2
id sequence
<dbl> <dbl>
1 1 1
2 2 1
3 3 1
4 3 2
5 3 3
6 4 1
7 4 2
So, for the example data I am looking to produce the following output:
# A tibble: 7 x 3
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2
I have tried with the code below, that does not work since all unique values are coded as 0
dat %>% mutate(new_sequence = ifelse(!duplicated(id), 0, sequence))
Use dplyr::add_count() rather than !duplicated():
library(dplyr)
dat %>%
add_count(id) %>%
mutate(new_sequence = ifelse(n == 1, 0, sequence)) %>%
select(!n)
Output:
# A tibble: 7 x 3
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2
You can also try the following. After grouping by id check if the number of rows in the group n() is 1 or not. Use separate if and else instead of ifelse since the lengths are different within each group.
dat %>%
group_by(id) %>%
mutate(new_sequence = if(n() == 1) 0 else sequence)
Output
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2

R Tidyverse - Randomize by ID

I have a df like this one:
id <- c(1,1,2,2,3,3,4,4,5,5)
v1 <- c(3,1,2,3,4,5,6,1,5,4)
pos <- c(1,2,1,2,1,2,1,2,1,2)
df <- data.frame(id,v1,pos)
How can I "randomize" the values of v1 WHILE keeping the inherent order from the "Id" var and also the values of "pos" such as I get df with randomized values like this:
id v1 pos
1 1 1
1 3 2
2 2 1
2 3 2
3 5 1
3 4 2
4 6 1
4 1 2
5 5 1
5 4 2
Above and example of resulting df with id and pos staying as originally created and v1 randomized.
Thx!
Is sample what you're looking for?
df %>%
group_by(id) %>%
mutate(v1 = sample(v1, size = length(v1)))
# A tibble: 10 x 3
# Groups: id [5]
id v1 pos
<dbl> <dbl> <dbl>
1 1 3 1
2 1 1 2
3 2 3 1
4 2 2 2
5 3 4 1
6 3 5 2
7 4 1 1
8 4 6 2
9 5 5 1
10 5 4 2

Create a combination ID number from a set of factors in R

can anyone help me out in computing a new variable that will number a distinct combination from some factors?
Assuming there are 4 within subject factors (A, B, C, D) with 8 repetitions of each combination for any of 10 subjects, this is how my data could look like to represent it's actual structure:
library(AlgDesign) #for generating a factorial design)
df <-gen.factorial(c(2,2,2,2,8,10), factors = "all",
varNames = c("A", "B", "C", "D", "replication", "Subject"))
> head(df)
A B C D replication Subject
1 1 1 1 1 1 1
2 2 1 1 1 1 1
3 1 2 1 1 1 1
4 2 2 1 1 1 1
5 1 1 2 1 1 1
6 2 1 2 1 1 1
> tail(df)
A B C D replication Subject
1275 1 2 1 2 8 10
1276 2 2 1 2 8 10
1277 1 1 2 2 8 10
1278 2 1 2 2 8 10
1279 1 2 2 2 8 10
1280 2 2 2 2 8 10
In this example replication was simply generated in order to force 8 reps but it doesnt "code" the combintation itself.
My original data has only variables A, B, C, D and Subject and I'd like to compute replication in a way that it has distinct values
but for each combination of A, B, C, D
library(AlgDesign)
library(dplyr)
df <-gen.factorial(c(2,2,2,2,8,10), factors = "all",
varNames = c("A", "B", "C", "D", "replication", "Subject"))
df %>%
rowwise() %>% # for each row
mutate(factors = paste0(c(A,B,C,D), collapse = "_")) %>% # create a combination of your factors
ungroup() %>% # forget the row grouping
mutate(replication_upd = as.numeric(factor(factors))) # create a number based on the combination you have
# # A tibble: 1,280 x 8
# A B C D replication Subject factors replication_upd
# <fct> <fct> <fct> <fct> <fct> <fct> <chr> <dbl>
# 1 1 1 1 1 1 1 1_1_1_1 1
# 2 2 1 1 1 1 1 2_1_1_1 9
# 3 1 2 1 1 1 1 1_2_1_1 5
# 4 2 2 1 1 1 1 2_2_1_1 13
# 5 1 1 2 1 1 1 1_1_2_1 3
# 6 2 1 2 1 1 1 2_1_2_1 11
# 7 1 2 2 1 1 1 1_2_2_1 7
# 8 2 2 2 1 1 1 2_2_2_1 15
# 9 1 1 1 2 1 1 1_1_1_2 2
#10 2 1 1 2 1 1 2_1_1_2 10
# # ... with 1,270 more rows
You can remove any unnecessary variables. I left them there so you can see how the process works.
Another option is this
# create a look up table based on unique combinations and assign them a number
df %>% distinct(A,B,C,D) %>% mutate(replication_upd = row_number()) -> look_up
# join back to original dataset
df %>% inner_join(look_up, by=c("A","B","C","D")) %>% tbl_df()
# # A tibble: 1,280 x 7
# A B C D replication Subject replication_upd
# <fct> <fct> <fct> <fct> <fct> <fct> <int>
# 1 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1 2
# 3 1 2 1 1 1 1 3
# 4 2 2 1 1 1 1 4
# 5 1 1 2 1 1 1 5
# 6 2 1 2 1 1 1 6
# 7 1 2 2 1 1 1 7
# 8 2 2 2 1 1 1 8
# 9 1 1 1 2 1 1 9
# 10 2 1 1 2 1 1 10
# # ... with 1,270 more rows
Note that the first approach picks the numbers based on the new variable we create (i.e. orders A,B,C,D), and the second approach uses the initial order of you dataset to pick the number for each unique combination.

Resources