Ifelse with dplyr in R - r

I would like to use dplyr in replacing NA value in the DV column of each ID with DV value at a specific time point within that individual:
I want to replace NA (DV column) at the time 2 of each ID with DV value at time 4 of that specific ID.
I want to replace NA (DV column) at the time 4 of each ID with DV value at time 0 of that specific ID.
I can not figure out how to do it with dplyr.
Here is my dataset:
ID TIME DV
1 0 5
1 2 NA
1 4 4
2 0 3
2 2 3
2 4 NA
3 0 7
3 2 NA
3 4 9
Expected output:
ID TIME DV
1 0 5
1 2 4
1 4 4
2 0 3
2 2 3
2 4 3
3 0 7
3 2 9
3 4 9
Any suggestions are appreciated.
Best,

I agree with #akrun that perhaps fill is a good fit in general, but your rules suggest handling things a little differently (since "updown" does not follow your rules).
library(dplyr)
# library(tidyr)
dat %>%
tidyr::pivot_wider(id_cols = "ID", names_from = "TIME", values_from = "DV") %>%
mutate(
`2` = if_else(is.na(`2`), `4`, `2`),
`4` = if_else(is.na(`4`), `0`, `4`)
) %>%
tidyr::pivot_longer(-ID, names_to = "TIME", values_to = "DV")
# # A tibble: 9 x 3
# ID TIME DV
# <int> <chr> <int>
# 1 1 0 5
# 2 1 2 4
# 3 1 4 4
# 4 2 0 3
# 5 2 2 3
# 6 2 4 3
# 7 3 0 7
# 8 3 2 9
# 9 3 4 9
It might help to visualize what this is doing by looking mid-pipe:
dat %>%
tidyr::pivot_wider(id_cols = "ID", names_from = "TIME", values_from = "DV")
# # A tibble: 3 x 4
# ID `0` `2` `4`
# <int> <int> <int> <int>
# 1 1 5 NA 4
# 2 2 3 3 NA
# 3 3 7 NA 9
dat %>%
tidyr::pivot_wider(id_cols = "ID", names_from = "TIME", values_from = "DV") %>%
mutate(
`2` = if_else(is.na(`2`), `4`, `2`),
`4` = if_else(is.na(`4`), `0`, `4`)
)
# # A tibble: 3 x 4
# ID `0` `2` `4`
# <int> <int> <int> <int>
# 1 1 5 4 4
# 2 2 3 3 3
# 3 3 7 9 9

We could use fill after grouping by 'ID'
library(dplyr)
library(tidyr)
df1 %>%
arrange(ID, TIME) %>%
# or as #r2evans mentioned
#arrange(ID, factor(TIME, levels = c(0, 2, 4))) %>%
group_by(ID) %>%
fill(DV, .direction = 'downup')
# A tibble: 9 x 3
# Groups: ID [3]
# ID TIME DV
# <int> <int> <int>
#1 1 0 5
#2 1 2 4
#3 1 4 4
#4 2 0 3
#5 2 2 3
#6 2 4 3
#7 3 0 7
#8 3 2 9
#9 3 4 9
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), TIME = c(0L,
2L, 4L, 0L, 2L, 4L, 0L, 2L, 4L), DV = c(5L, NA, 4L, 3L, 3L, NA,
7L, NA, 9L)), class = "data.frame", row.names = c(NA, -9L))

Related

Grouped filter common value in a column

Sample data:
# A tibble: 10 × 2
id value
<int> <dbl>
1 1 1
2 1 2
3 1 3
4 1 5
5 1 6
6 2 6
7 2 3
8 2 2
9 2 0
10 2 10
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
value = c(1, 2, 3, 5, 6, 6, 3, 2, 0, 10)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
How do I perform a group filter for common values in the column value with dplyr? Such that the expected output would be:
# A tibble: 6 × 2
# Groups: id [2]
id value
<int> <dbl>
1 1 2
2 1 3
3 1 6
4 2 6
5 2 3
6 2 2
We could use n_distinct for filtering after grouping by 'value'
library(dplyr)
df1 %>%
group_by(value) %>%
filter(n_distinct(id) == n_distinct(df1$id)) %>%
ungroup
-output
# A tibble: 6 × 2
id value
<int> <dbl>
1 1 2
2 1 3
3 1 6
4 2 6
5 2 3
6 2 2
Or use split/reduce/intersect
library(purrr)
df1 %>%
filter(value %in% (split(value, id) %>% reduce(intersect)))
-output
# A tibble: 6 × 2
id value
<int> <dbl>
1 1 2
2 1 3
3 1 6
4 2 6
5 2 3
6 2 2
In base R, it would be
subset(df1, value %in% Reduce(intersect, split(value, id)))
-output
# A tibble: 6 × 2
id value
<int> <dbl>
1 1 2
2 1 3
3 1 6
4 2 6
5 2 3
6 2 2

How to add new rows conditionally on R

I have a df with
v1 t1 c1 o1
1 1 9 1
1 1 12 2
1 2 2 1
1 2 7 2
2 1 3 1
2 1 6 2
2 2 3 1
2 2 12 2
And I would like to add 2 rows each time that v1 changes it's value, in order to get this:
v1 t1 c1 o1
1 1 1 1
1 1 1 2
1 2 9 1
1 2 12 2
1 3 2 1
1 3 7 2
2 1 1 1
2 1 1 2
1 2 3 1
1 2 6 2
2 3 3 1
2 3 12 2
So what I'm doing is that every time v1 changes its value I'm adding 2 rows of ones and adding a 1 to the values of t1. This is kind of tricky. I've been able to do it in Excel but I would like to scale to big files in R.
We may do the expansion in group_modify
library(dplyr)
df1 %>%
group_by(v1) %>%
group_modify(~ .x %>%
slice_head(n = 2) %>%
mutate(across(-o1, ~ 1)) %>%
bind_rows(.x) %>%
mutate(t1 = as.integer(gl(n(), 2, n())))) %>%
ungroup
-output
# A tibble: 12 × 4
v1 t1 c1 o1
<int> <int> <dbl> <int>
1 1 1 1 1
2 1 1 1 2
3 1 2 9 1
4 1 2 12 2
5 1 3 2 1
6 1 3 7 2
7 2 1 1 1
8 2 1 1 2
9 2 2 3 1
10 2 2 6 2
11 2 3 3 1
12 2 3 12 2
Or do a group by summarise
df1 %>%
group_by(v1) %>%
summarise(t1 = as.integer(gl(n() + 2, 2, n() + 2)),
c1 = c(1, 1, c1), o1 = rep(1:2, length.out = n() + 2),
.groups = 'drop')
-output
# A tibble: 12 × 4
v1 t1 c1 o1
<int> <int> <dbl> <int>
1 1 1 1 1
2 1 1 1 2
3 1 2 9 1
4 1 2 12 2
5 1 3 2 1
6 1 3 7 2
7 2 1 1 1
8 2 1 1 2
9 2 2 3 1
10 2 2 6 2
11 2 3 3 1
12 2 3 12 2
data
df1 <- structure(list(v1 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), t1 = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), c1 = c(9L, 12L, 2L, 7L, 3L, 6L,
3L, 12L), o1 = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-8L))

How to remove the last N rows on a dataset based on ID in R?

How do I go from a dataframe like this:
ID
X
Y
1
4
6
1
6
5
1
8
4
1
9
6
2
6
4
2
7
5
2
3
9
to this:
ID
X
Y
1
4
6
1
6
5
2
6
4
In this example, I wanted to remove the last 2 rows for every ID.
You can use the following solution:
library(dplyr)
df %>%
group_by(ID) %>%
filter(between(row_number(), 1, n()-2))
# A tibble: 3 x 3
# Groups: ID [2]
ID X Y
<int> <int> <int>
1 1 4 6
2 1 6 5
3 2 6 4
Or this one:
df %>%
group_by(ID) %>%
slice(1:(n()-2))
# A tibble: 3 x 3
# Groups: ID [2]
ID X Y
<int> <int> <int>
1 1 4 6
2 1 6 5
3 2 6 4
An option with head used within slice
library(dplyr)
df %>%
group_by(ID) %>%
slice(head(row_number(), -2)) %>%
ungroup
-output
# A tibble: 3 x 3
ID X Y
<int> <int> <int>
1 1 4 6
2 1 6 5
3 2 6 4
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), X = c(4L,
6L, 8L, 9L, 6L, 7L, 3L), Y = c(6L, 5L, 4L, 6L, 4L, 5L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))
You can use:
library(dplyr)
df %>%
group_by(ID) %>%
filter(row_number() <= n()-2)
Output:
ID X Y
<int> <int> <int>
1 1 4 6
2 1 6 5
3 2 6 4

Count number of observations by group

I'm trying to count the number of every observation for each variable in a dataset regarding a specific group.
The data looks like this:
grp v1 vn
1 2 5
2 4
3 3 4
1 3
1 2 12
4 5
5 3 6
5 6
The Result should be a table like this:
grp v1 vn
1 2 3
2 1 0
3 1 1
4 0 1
5 2 1
I tried to use
x %>% group_by(grp) %>% summarise(across(everything(),n = n()))
but it didn`t really worked.
Any help is appreciated. Thanks in advance!
You can also use the following solution:
library(dplyr)
df %>%
group_by(grp) %>%
summarise(across(v1:vn, ~ sum(!is.na(.x))))
# A tibble: 5 x 3
grp v1 vn
<int> <int> <int>
1 1 2 3
2 2 1 0
3 3 1 1
4 4 0 1
5 5 2 1
Get the data in long format, count non-NA values for each column in each group and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -grp) %>%
group_by(grp, name) %>%
summarise(n = sum(!is.na(value))) %>%
ungroup %>%
pivot_wider(names_from = name, values_from = n)
# grp v1 vn
# <int> <int> <int>
#1 1 2 3
#2 2 1 0
#3 3 1 1
#4 4 0 1
#5 5 2 1
data
df <- structure(list(grp = c(1L, 2L, 3L, 1L, 1L, 4L, 5L, 5L), v1 = c(2L,
4L, 3L, NA, 2L, NA, 3L, 6L), vn = c(5L, NA, 4L, 3L, 2L, 5L, 6L,
NA)), class = "data.frame", row.names = c(NA, -8L))
Using data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(!is.na(x))), grp]
# grp v1 vn
#1: 1 2 3
#2: 2 1 0
#3: 3 1 1
#4: 4 0 1
#5: 5 2 1
Using aggregate.
aggregate(cbind(v1, vn) ~ grp, replace(dat, is.na(dat), 0), function(x) sum(as.logical(x)))
# grp v1 vn
# 1 1 2 3
# 2 2 1 0
# 3 3 1 1
# 4 4 0 1
# 5 5 2 1
Data:
dat <- read.table(header=T, text='grp v1 vn
1 2 5
2 4 NA
3 3 4
1 NA 3
1 2 12
4 NA 5
5 3 6
5 6 NA
')

How do I count only previous value not using summarize in R?

This is my dataset.
num col1
1 SENSOR_01
2 SENSOR_01
3 SENSOR_01
4 SENSOR_05
5 SENSOR_05
6 SENSOR_05
7 NA
8 SENSOR_01
9 SENSOR_01
10 SENSOR_05
11 SENSOR_05
structure(list(num = 1:11, col1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
NA, 1L, 1L, 2L, 2L), .Label = c("SENSOR_01", "SENSOR_05" ), class =
"factor"), count = c(3L, 3L, 3L, 3L, 3L, 3L, 0L, 2L, 2L, 2L, 2L)),
class = "data.frame", row.names = c(NA, -11L))
I would like to count for only previous duplicated rows. In the row 1-3, there are sensor 3 repeatedly 3 times so count = 3. Here is my expected outcome.
num col1 count
1 SENSOR_01 3
2 SENSOR_01 3
3 SENSOR_01 3
4 SENSOR_05 3
5 SENSOR_05 3
6 SENSOR_05 3
7 NA 1
8 SENSOR_01 2
9 SENSOR_01 2
10 SENSOR_05 2
11 SENSOR_05 2
Using dplyr, How can I make this outcome?
We can use rleid to create groups and then count number of rows in each group.
library(dplyr)
df %>%
group_by(group = data.table::rleid(col1)) %>%
mutate(n = n()) %>%
ungroup() %>%
dplyr::select(-group)
# A tibble: 11 x 4
# num col1 count n
# <int> <fct> <int> <int>
# 1 1 SENSOR_01 3 3
# 2 2 SENSOR_01 3 3
# 3 3 SENSOR_01 3 3
# 4 4 SENSOR_05 3 3
# 5 5 SENSOR_05 3 3
# 6 6 SENSOR_05 3 3
# 7 7 NA 1 1
# 8 8 SENSOR_01 2 2
# 9 9 SENSOR_01 2 2
#10 10 SENSOR_05 2 2
#11 11 SENSOR_05 2 2
Keeping both the columns for comparison purposes.
Or using data.table
library(data.table)
setDT(df)[, n := .N, by = rleid(col1)]
Like an option, we can use order of variables (rownames in traditional data.frame). The idea is simple:
If within the group of identical sensor names, the distance between adjacent records is equal to 1 and the same is true in a global view, without grouping - set the flag for this record to zero or one otherwise;
Still within the group of identical sensor names, find cumulative sum of flags, which allows us to identify all subgroups of records appearing consequently in global data set;
Still within the group count the number of elements in each individual subgroup;
Repeat for each group of records.
In tidyverse:
dat %>%
mutate(tmp = 1:n()) %>%
group_by(col1) %>%
add_count(tmp = cumsum(c(0, diff(tmp)) > 1)) %>%
ungroup() %>%
select(-tmp)
# # A tibble: 11 x 3
# num col1 n
# <int> <fct> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 NA 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
# 10 10 SENSOR_05 2
# 11 11 SENSOR_05 2
Data:
dat <- structure(
list(
num = 1:11,
col1 = structure(
c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L),
.Label = c("SENSOR_01", "SENSOR_05" ),
class = "factor")
),
class = "data.frame",
row.names = c(NA, -11L)
)
We can use base R with rle to create the 'count' column
df$count <- with(rle(df$col1), rep(lengths, lengths))
df$count
#[1] 3 3 3 3 3 3 1 2 2 2 2
Or the dplyr implementation of the above
library(dplyr)
df %>%
mutate(count = with(rle(col1), rep(lengths, lengths)))
Or an option with tidyverse without including any other packages
library(dplyr)
df %>%
group_by(grp = replace_na(col1, "VALUE"),
grp = cumsum(grp != lag(grp, default = first(grp)))) %>%
mutate(count = n()) %>%
ungroup %>%
select(-grp)
# A tibble: 11 x 3
# num col1 count
# <int> <chr> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 <NA> 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
#10 10 SENSOR_05 2
#11 11 SENSOR_05 2
data
df <- structure(list(num = 1:11, col1 = c("SENSOR_01", "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05", "SENSOR_05", NA, "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05")),
class = "data.frame", row.names = c(NA,
-11L))

Resources