How to add new rows conditionally on R

How to add new rows conditionally on R - r

I have a df with
v1 t1 c1 o1
1 1 9 1
1 1 12 2
1 2 2 1
1 2 7 2
2 1 3 1
2 1 6 2
2 2 3 1
2 2 12 2
And I would like to add 2 rows each time that v1 changes it's value, in order to get this:
v1 t1 c1 o1
1 1 1 1
1 1 1 2
1 2 9 1
1 2 12 2
1 3 2 1
1 3 7 2
2 1 1 1
2 1 1 2
1 2 3 1
1 2 6 2
2 3 3 1
2 3 12 2
So what I'm doing is that every time v1 changes its value I'm adding 2 rows of ones and adding a 1 to the values of t1. This is kind of tricky. I've been able to do it in Excel but I would like to scale to big files in R.

We may do the expansion in group_modify
library(dplyr)
df1 %>%
group_by(v1) %>%
group_modify(~ .x %>%
slice_head(n = 2) %>%
mutate(across(-o1, ~ 1)) %>%
bind_rows(.x) %>%
mutate(t1 = as.integer(gl(n(), 2, n())))) %>%
ungroup
-output
# A tibble: 12 × 4
v1 t1 c1 o1
<int> <int> <dbl> <int>
1 1 1 1 1
2 1 1 1 2
3 1 2 9 1
4 1 2 12 2
5 1 3 2 1
6 1 3 7 2
7 2 1 1 1
8 2 1 1 2
9 2 2 3 1
10 2 2 6 2
11 2 3 3 1
12 2 3 12 2
Or do a group by summarise
df1 %>%
group_by(v1) %>%
summarise(t1 = as.integer(gl(n() + 2, 2, n() + 2)),
c1 = c(1, 1, c1), o1 = rep(1:2, length.out = n() + 2),
.groups = 'drop')
-output
# A tibble: 12 × 4
v1 t1 c1 o1
<int> <int> <dbl> <int>
1 1 1 1 1
2 1 1 1 2
3 1 2 9 1
4 1 2 12 2
5 1 3 2 1
6 1 3 7 2
7 2 1 1 1
8 2 1 1 2
9 2 2 3 1
10 2 2 6 2
11 2 3 3 1
12 2 3 12 2
data
df1 <- structure(list(v1 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), t1 = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), c1 = c(9L, 12L, 2L, 7L, 3L, 6L,
3L, 12L), o1 = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-8L))

Related

Pasting values from a vector to a new column in a for loop with nested data

I have a dataframe that currently looks like this:
subjectID
Trial
1
3
1
3
1
3
1
4
1
4
1
5
1
5
1
5
2
1
2
1
2
3
2
3
2
3
2
5
2
5
2
6
3
1
Etc., where trial number is nested under subject ID. I need to make a new column in which column "NewTrial" is simply what order the trials now appear in. For example:
subjectID
Trial
NewTrial
1
3
1
1
3
1
1
3
1
1
4
2
1
4
2
1
5
3
1
5
3
1
5
3
2
1
1
2
1
1
2
3
2
2
3
2
2
3
2
2
5
3
2
5
3
2
6
4
3
1
1
So far, I have a for-loop written that looks like this:
for (myperson in unique(data$subjectID)){
#This line creates a vector of the number of unique trials per subject: for subject 1, c(1, 2, 3)
triallength=1:length(unique(data$Trial[data$subID==myperson]))
I'm having trouble now finding a way to paste the numbers from the created triallength vector as a column in the dataframe. Does anyone know of a way to accomplish this? I am lacking some experience with for-loops and hoping to gain more. If anyone has a tidyverse/dplyr solution, however, I am open to that as well as an alternative to a for-loop. Thanks in advance, and let me know if any clarification is needed!

Converting to factor with unique values as levels, then as.numeric in an ave should be nice.
transform(dat, NewTrial=ave(Trial, subjectID, FUN=\(x) as.numeric(factor(x, levels=unique(x)))))
# subjectID Trial NewTrial
# 1 1 3 1
# 2 1 3 1
# 3 1 3 1
# 4 1 4 2
# 5 1 4 2
# 6 1 5 3
# 7 1 5 3
# 8 1 5 3
# 9 2 1 1
# 10 2 1 1
# 11 2 3 2
# 12 2 3 2
# 13 2 3 2
# 14 2 5 3
# 15 2 5 3
# 16 2 6 4
# 17 3 1 1
Data:
dat <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), Trial = c(3L, 3L, 3L, 4L,
4L, 5L, 5L, 5L, 1L, 1L, 3L, 3L, 3L, 5L, 5L, 6L, 1L)), class = "data.frame", row.names = c(NA,
-17L))

We could use match on the unique values after grouping by 'subjectID'
library(dplyr)
df1 <- df1 %>%
group_by(subjectID) %>%
mutate(NewTrial = match(Trial, unique(Trial))) %>%
ungroup

We could use rleid:
library(dplyr)
library(data.table)
df %>%
group_by(subjectID) %>%
mutate(NewTrial = rleid(subjectID, Trial))
subjectID Trial NewTrial
<int> <int> <int>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 4 2
5 1 4 2
6 1 5 3
7 1 5 3
8 1 5 3
9 2 1 1
10 2 1 1
11 2 3 2
12 2 3 2
13 2 3 2
14 2 5 3
15 2 5 3
16 2 6 4
17 3 1 1

Count number of observations by group

I'm trying to count the number of every observation for each variable in a dataset regarding a specific group.
The data looks like this:
grp v1 vn
1 2 5
2 4
3 3 4
1 3
1 2 12
4 5
5 3 6
5 6
The Result should be a table like this:
grp v1 vn
1 2 3
2 1 0
3 1 1
4 0 1
5 2 1
I tried to use
x %>% group_by(grp) %>% summarise(across(everything(),n = n()))
but it didn`t really worked.
Any help is appreciated. Thanks in advance!

You can also use the following solution:
library(dplyr)
df %>%
group_by(grp) %>%
summarise(across(v1:vn, ~ sum(!is.na(.x))))
# A tibble: 5 x 3
grp v1 vn
<int> <int> <int>
1 1 2 3
2 2 1 0
3 3 1 1
4 4 0 1
5 5 2 1

Get the data in long format, count non-NA values for each column in each group and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -grp) %>%
group_by(grp, name) %>%
summarise(n = sum(!is.na(value))) %>%
ungroup %>%
pivot_wider(names_from = name, values_from = n)
# grp v1 vn
# <int> <int> <int>
#1 1 2 3
#2 2 1 0
#3 3 1 1
#4 4 0 1
#5 5 2 1
data
df <- structure(list(grp = c(1L, 2L, 3L, 1L, 1L, 4L, 5L, 5L), v1 = c(2L,
4L, 3L, NA, 2L, NA, 3L, 6L), vn = c(5L, NA, 4L, 3L, 2L, 5L, 6L,
NA)), class = "data.frame", row.names = c(NA, -8L))

Using data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(!is.na(x))), grp]
# grp v1 vn
#1: 1 2 3
#2: 2 1 0
#3: 3 1 1
#4: 4 0 1
#5: 5 2 1

Using aggregate.
aggregate(cbind(v1, vn) ~ grp, replace(dat, is.na(dat), 0), function(x) sum(as.logical(x)))
# grp v1 vn
# 1 1 2 3
# 2 2 1 0
# 3 3 1 1
# 4 4 0 1
# 5 5 2 1
Data:
dat <- read.table(header=T, text='grp v1 vn
1 2 5
2 4 NA
3 3 4
1 NA 3
1 2 12
4 NA 5
5 3 6
5 6 NA
')

By group relative order

I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!

A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))

You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3

define an indicator when number of duplicate rows -1 is equal one of the column

I have some duplicate rows whose are the same in some columns, I want to define indicator if the number of duplicate rows -1 are equal the number of one of the column .
example
SAMPN PERNO ARR_HR HHMEM
1 1 2 1
1 2 2 1
2 1 3 2
2 3 3 2
3 1 4 2
3 2 4 2
3 3 4 2
rows are duplicate if they are the same in first ,second and third columns. I want the indicator to be 1 if number of duplicate rows -1 is equal HHMEM .
for example 2 first rows are duplicate so 2-1=1=HHMEM so indicator is 1.
out put
SAMPN PERNO ARR_HR HHMEM indicator
1 1 2 1 1
1 2 2 1 1
2 1 3 2 0
2 3 3 2 0
3 1 4 2 1
3 2 4 2 1
3 3 4 2 1

After grouping by 'SAMPN' and other grouping variables (from OP's comments) create the 'indicator' by coercing the logical vector ((n()- 1) == HHMEM) into binary with as.integer
library(dplyr)
df1 %>%
group_by(SAMPN, ARR_HR, HHMEM) %>%
mutate(indicator = as.integer((n()-1) == HHMEM))
# A tibble: 7 x 5
# Groups: SAMPN [3]
# SAMPN PERNO ARR_HR HHMEM indicator
# <int> <int> <int> <int> <int>
#1 1 1 2 1 1
#2 1 2 2 1 1
#3 2 1 3 2 0
#4 2 3 3 2 0
#5 3 1 4 2 1
#6 3 2 4 2 1
#7 3 3 4 2 1
NOTE: We don't need to create any additional column and then remove it later
Or the same logic in base R with ave
df1$indicator <- +(with(df1, HHMEM == ave(HHMEM, HHMEM, SAMPN,
ARR_HR, FUN = length)-1))
Or using duplicated with table
i1 <- table(cumsum(!duplicated(df1[c(1, 3, 4)])))
as.integer(rep(i1, i1) - 1 == df1$HHMEM)
data
df1 <- structure(list(SAMPN = c(1L, 1L, 2L, 2L, 3L, 3L, 3L), PERNO = c(1L,
2L, 1L, 3L, 1L, 2L, 3L), ARR_HR = c(2L, 2L, 3L, 3L, 4L, 4L, 4L
), HHMEM = c(1L, 1L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame",
row.names = c(NA,
-7L))

We can use add_count to get count and compare it with HHMEM.
library(dplyr)
df %>%
add_count(SAMPN, ARR_HR, HHMEM) %>%
mutate(indicator = as.integer(n - 1 == HHMEM)) %>%
select(-n)
# SAMPN PERNO ARR_HR HHMEM indicator
# <int> <int> <int> <int> <int>
#1 1 1 2 1 1
#2 1 2 2 1 1
#3 2 1 3 2 0
#4 2 3 3 2 0
#5 3 1 4 2 1
#6 3 2 4 2 1
#7 3 3 4 2 1

how refill a column with the help of 2 other column?

I have a data based 3 groups : SAMPN,PERNO,loop
there are 2 columns, mode1 and mode2. and a column called int.
SAMPN PERNO loop mode1 mode2 int
1 1 1 1 2 NA
1 1 1 2 1 NA
1 1 1 3 2 0
1 2 1 3 2 NA
1 2 1 1 1 2
2 2 1 3 2 NA
2 2 1 1 3 NA
2 2 1 3 1 0
2 2 2 1 2 NA
2 2 2 3 1 2
SAMPN is family index, PERNO is index of persons in each family and loop is tour of each person. the last row of each loop for each person is 0 or 2 and and rest of loop is NA. in each family and for each person and each loop I want copy the column mode 1 in int if the last row of loop is 0 and copy mode2 if the last row of loo is 2.
output
SAMPN PERNO loop mode1 mode2 int
1 1 1 1 2 1
1 1 1 2 1 2
1 1 1 3 2 3
1 2 1 3 2 2
1 2 1 1 1 1
2 2 1 3 2 3
2 2 1 1 3 1
2 2 1 3 1 3
2 2 2 1 2 2
2 2 2 3 1 1
the first 3 rows is loop of first person in the first family, I filled that loop by mode1 because the third row was 0. and so on

Here's a way using dplyr
df <- read.table(h=T,text="SAMPN PERNO loop mode1 mode2 int
1 1 1 1 2 NA
1 1 1 2 1 NA
1 1 1 3 2 0
1 2 1 3 2 NA
1 2 1 1 1 2
2 2 1 3 2 NA
2 2 1 1 3 NA
2 2 1 3 1 0
2 2 2 1 2 NA
2 2 2 3 1 2")
library(dplyr)
df %>%
group_by(loop, SAMPN, PERNO) %>%
mutate(int = if(last(int) == 0) mode1 else mode2) %>%
ungroup()
#> # A tibble: 10 x 6
#> SAMPN PERNO loop mode1 mode2 int
#> <int> <int> <int> <int> <int> <int>
#> 1 1 1 1 1 2 1
#> 2 1 1 1 2 1 2
#> 3 1 1 1 3 2 3
#> 4 1 2 1 3 2 2
#> 5 1 2 1 1 1 1
#> 6 2 2 1 3 2 3
#> 7 2 2 1 1 3 1
#> 8 2 2 1 3 1 3
#> 9 2 2 2 1 2 2
#> 10 2 2 2 3 1 1
If you have more values than 0 or 2, switch could be a good alternative :
df %>%
group_by(loop, SAMPN, PERNO) %>%
mutate(int = switch(
as.character(last(int)),
`0` = mode1,
`2` = mode2)) %>%
ungroup()
# same output!

We can also use case_when
library(dplyr)
df %>%
group_by(loop, SAMPN, PERNO) %>%
mutate(int = case_when(rep(last(int) == 0, n()) ~ mode1, TRUE ~mode2))
# A tibble: 10 x 6
# Groups: loop, SAMPN, PERNO [4]
# SAMPN PERNO loop mode1 mode2 int
# <int> <int> <int> <int> <int> <int>
# 1 1 1 1 1 2 1
# 2 1 1 1 2 1 2
# 3 1 1 1 3 2 3
# 4 1 2 1 3 2 2
# 5 1 2 1 1 1 1
# 6 2 2 1 3 2 3
# 7 2 2 1 1 3 1
# 8 2 2 1 3 1 3
#9 2 2 2 1 2 2
#10 2 2 2 3 1 1
data
df <- structure(list(SAMPN = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), PERNO = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), loop = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), mode1 = c(1L, 2L, 3L, 3L,
1L, 3L, 1L, 3L, 1L, 3L), mode2 = c(2L, 1L, 2L, 2L, 1L, 2L, 3L,
1L, 2L, 1L), int = c(NA, NA, 0L, NA, 2L, NA, NA, 0L, NA, 2L)),
class = "data.frame", row.names = c(NA,
-10L))