Reverse the order of non-NA values in a variable - r

I am interested in reversing the values for a column that has NA values in a tidy way.
The rev call won't do the trick here:
library(tidyverse)
tibble(
Which = LETTERS[1:11],
x = c( c(3,1,4,2,16), NA, NA, 4, rep(NA, 2), 10)) %>%
mutate(y = rev(x))
As it completely reverses the values (NAs included).
I essentially want a tidy mutate command (no splitting / joining) that reverses the values for the Which column so that E has value 1 (max becomes min) B has value 16 (min becomes max), etc - and NA values remain NA (F, G, I & J).
Edit:
Several answers do not achieve intended outcome. The question is aimed at effectively having a reverse (rev) work while keeping NAs in position.
#Moody_Mudskipper has a solution to the case where there's no repeats, but it fails when there are repeats, e.g.:
rev_na <- function(x) setNames(sort(x), sort(x, TRUE))[as.character(x)]
Works here:
tibble(
Which = LETTERS[1:11],
x = c( c(3,1,4,2,16), NA, NA, 4, rep(NA, 2), 10)) %>%
mutate(y = rev_na(x))
Fails here:
tibble(
Which = LETTERS[1:7],
x = c(3,1,9,9,9, 9, 10)
) %>% mutate(y = rev_na(x), z = rev(x))

If you can tolerate a little hack :
tibble(
Which = LETTERS[1:11],
x = c( c(3,1,4,2,16), NA, NA, 4, rep(NA, 2), 10)) %>%
mutate(y = setNames(sort(x), sort(x, TRUE))[as.character(x)])
#> # A tibble: 11 x 3
#> Which x y
#> <chr> <dbl> <dbl>
#> 1 A 3 4
#> 2 B 1 16
#> 3 C 4 3
#> 4 D 2 10
#> 5 E 16 1
#> 6 F NA NA
#> 7 G NA NA
#> 8 H 4 3
#> 9 I NA NA
#> 10 J NA NA
#> 11 K 10 2
Created on 2021-05-11 by the reprex package (v0.3.0)

This will do
data.frame(
Which = LETTERS[1:11],
x = c( c(3,1,4,2,16), NA, NA, 4, rep(NA, 2), 10)) -> df
df %>% group_by(d = is.na(x)) %>%
arrange(x) %>%
mutate(y = ifelse(!d, rev(x), x)) %>%
ungroup %>% select(-d)
# A tibble: 11 x 3
Which x y
<chr> <dbl> <dbl>
1 B 1 16
2 D 2 10
3 A 3 4
4 C 4 4
5 H 4 3
6 K 10 2
7 E 16 1
8 F NA NA
9 G NA NA
10 I NA NA
11 J NA NA
Needless to say you may arrange back the results if your Which was arranged already or creating a row_number() at the start of the syntax.
df %>%
group_by(d = is.na(x)) %>%
arrange(x) %>%
mutate(y = ifelse(!d, rev(x), x)) %>%
ungroup %>% select(-d) %>%
arrange(Which)
# A tibble: 11 x 3
Which x y
<chr> <dbl> <dbl>
1 A 3 4
2 B 1 16
3 C 4 4
4 D 2 10
5 E 16 1
6 F NA NA
7 G NA NA
8 H 4 3
9 I NA NA
10 J NA NA
11 K 10 2

Related

R Lag Variable And Skip Value Between

DATA = data.frame(STUDENT = c(1,1,1,2,2,2,3,3,4,4),
SCORE = c(6,4,8,10,9,0,2,3,3,7),
CLASS = c('A', 'B', 'C', 'A', 'B', 'C', 'B', 'C', 'A', 'B'),
WANT = c(NA, NA, 2, NA, NA, -10, NA, NA, NA, NA))
I have DATA and wish to create 'WANT' which is calculate by:
For each STUDENT, find the SCORE where SCORE equals to SCORE(CLASS = C) - SCORE(CLASS = A)
EX: SCORE(STUDENT = 1, CLASS = C) - SCORE(STUDENT = 1, CLASS = A) = 8-6=2
Assuming at most one 'C' and 'A' CLASS per each 'STUDENT', just subset the 'SCORE' where the CLASS value is 'C', 'A', do the subtraction and assign the value only to position where CLASS is 'C' by making all other positions to NA (after grouping by 'STUDENT')
library(dplyr)
DATA <- DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = (SCORE[CLASS == 'C'][1] - SCORE[CLASS == 'A'][1]) *
NA^(CLASS != "C")) %>%
ungroup
-output
# A tibble: 10 × 5
STUDENT SCORE CLASS WANT WANT2
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 6 A NA NA
2 1 4 B NA NA
3 1 8 C 2 2
4 2 10 A NA NA
5 2 9 B NA NA
6 2 0 C -10 -10
7 3 2 B NA NA
8 3 3 C NA NA
9 4 3 A NA NA
10 4 7 B NA NA
Here is a solution with the data organized in a wider format first, then a longer format below. This solution works regardless of the order of the "CLASS" column (for instance, if there is one instance in which the CLASS order is CBA or BCA instead os ABC, this solution will work).
Solution
library(dplyr)
library(tidyr)
wider <- DATA %>% select(-WANT) %>%
pivot_wider( names_from = "CLASS", values_from = "SCORE") %>%
rowwise() %>%
mutate(WANT = C-A) %>%
ungroup()
output wider
# A tibble: 4 × 5
STUDENT A B C WANT
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 6 4 8 2
2 2 10 9 0 -10
3 3 NA 2 3 NA
4 4 3 7 NA NA
If you really want like your output example, then we can reorganize the wider data this way:
Reorganizing wider to long format
wider %>%
pivot_longer(A:C, values_to = "SCORE", names_to = "CLASS") %>%
relocate(WANT, .after = SCORE) %>%
mutate(WANT = if_else(CLASS == "C", WANT, NA_real_))
Final Output
# A tibble: 12 × 4
STUDENT CLASS SCORE WANT
<dbl> <chr> <dbl> <dbl>
1 1 A 6 NA
2 1 B 4 NA
3 1 C 8 2
4 2 A 10 NA
5 2 B 9 NA
6 2 C 0 -10
7 3 A NA NA
8 3 B 2 NA
9 3 C 3 NA
10 4 A 3 NA
11 4 B 7 NA
12 4 C NA NA

Expand each group to the max n of rows

How can I expand a group to length of the max group:
df <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 3L), col1 = c("A",
"B", "O", "U", "L", "R")), class = "data.frame", row.names = c(NA,
-6L))
ID col1
1 A
1 B
2 O
3 U
3 L
3 R
Desired Output:
1 A
1 B
NA NA
2 O
NA NA
NA NA
3 U
3 L
3 R
You can take advantage of the fact that df[n_bigger_than_nrow,] gives a row of NAs
dplyr
max_n <- max(count(df, ID)$n)
df %>%
group_by(ID) %>%
summarise(cur_data()[seq(max_n),])
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 9 × 2
#> # Groups: ID [3]
#> ID col1
#> <int> <chr>
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R
base R
n <- tapply(df$ID, df$ID, length)
max_n <- max(n)
i <- lapply(n, \(x) c(seq(x), rep(Inf, max_n - x)))
i <- Map(`+`, i, c(0, cumsum(head(n, -1))))
df <- df[unlist(i),]
rownames(df) <- NULL
df$ID <- rep(as.numeric(names(i)), each = max_n)
df
#> ID col1
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R
Here's a base R solution.
split the df by the ID column, then use lapply to iterate over the split df, and rbind with a data frame of NA if there's fewer row than 3 (max(table(df$ID))).
do.call(rbind,
lapply(split(df, df$ID),
\(x) rbind(x, data.frame(ID = NA, col1 = NA)[rep(1, max(table(df$ID)) - nrow(x)), ]))
)
ID col1
1.1 1 A
1.2 1 B
1.3 NA <NA>
2.3 2 O
2.1 NA <NA>
2.1.1 NA <NA>
3.4 3 U
3.5 3 L
3.6 3 R
Here is a possible tidyverse solution. We can use add_row inside of summarise to add n number of rows to each group. I use max(count(df, ID)$n) to get the max group length, then I subtract that from the number of rows in each group to get the total number of rows that need to be added for each group. I use rep to produce the correct number of values that we need to add for each group. Finally, I replace ID with NA when there is an NA in col1.
library(tidyverse)
df %>%
group_by(ID) %>%
summarise(add_row(cur_data(),
col1 = rep(NA_character_,
unique(max(count(df, ID)$n) - n()))),
.groups = "drop") %>%
mutate(ID = replace(ID, is.na(col1), NA))
Output
ID col1
<int> <chr>
1 1 A
2 1 B
3 NA NA
4 2 O
5 NA NA
6 NA NA
7 3 U
8 3 L
9 3 R
Or another option without using add_row:
library(dplyr)
# Get maximum number of rows for all groups
N = max(count(df,ID)$n)
df %>%
group_by(ID) %>%
summarise(col1 = c(col1, rep(NA, N-length(col1))), .groups = "drop") %>%
mutate(ID = replace(ID, is.na(col1), NA))
Another option could be:
df %>%
group_split(ID) %>%
map_dfr(~ rows_append(.x, tibble(col1 = rep(NA_character_, max(pull(count(df, ID), n)) - group_size(.x)))))
ID col1
<int> <chr>
1 1 A
2 1 B
3 NA NA
4 2 O
5 NA NA
6 NA NA
7 3 U
8 3 L
9 3 R
A base R using merge + rle
merge(
transform(
data.frame(ID = with(rle(df$ID), rep(values, each = max(lengths)))),
q = ave(ID, ID, FUN = seq_along)
),
transform(
df,
q = ave(ID, ID, FUN = seq_along)
),
all = TRUE
)[-2]
gives
ID col1
1 1 A
2 1 B
3 1 <NA>
4 2 O
5 2 <NA>
6 2 <NA>
7 3 U
8 3 L
9 3 R
A data.table option may also work
> setDT(df)[, .(col1 = `length<-`(col1, max(df[, .N, ID][, N]))), ID]
ID col1
1: 1 A
2: 1 B
3: 1 <NA>
4: 2 O
5: 2 <NA>
6: 2 <NA>
7: 3 U
8: 3 L
9: 3 R
An option to tidyr::complete the ID and row_new, using row_old to replace ID with NA.
library (tidyverse)
df %>%
group_by(ID) %>%
mutate(
row_new = row_number(),
row_old = row_number()) %>%
ungroup() %>%
complete(ID, row_new) %>%
mutate(ID = if_else(is.na(row_old),
NA_integer_,
ID)) %>%
select(-matches("row_"))
# A tibble: 9 x 2
ID col1
<int> <chr>
1 1 A
2 1 B
3 NA <NA>
4 2 O
5 NA <NA>
6 NA <NA>
7 3 U
8 3 L
9 3 R
n <- max(table(df$ID))
df %>%
group_by(ID) %>%
summarise(col1 =`length<-`(col1, n), .groups = 'drop') %>%
mutate(ID = `is.na<-`(ID, is.na(col1)))
# A tibble: 9 x 2
ID col1
<int> <chr>
1 1 A
2 1 B
3 NA NA
4 2 O
5 NA NA
6 NA NA
7 3 U
8 3 L
9 3 R
Another base R solution using sequence.
print(
df[
sequence(
abs(rep(i <- rle(df$ID)$lengths, each = 2) - c(0L, max(i))),
rep(cumsum(c(1L, i))[-length(i) - 1L], each = 2) + c(0L, nrow(df)),
),
],
row.names = FALSE
)
#> ID col1
#> 1 A
#> 1 B
#> NA <NA>
#> 2 O
#> NA <NA>
#> NA <NA>
#> 3 U
#> 3 L
#> 3 R

If a column is NA, calculate row mean on other columns using dplyR

In the example below how can I calculate the row mean when column A is NA? The row mean would replace the NA in column A. Using base R, I can use this:
foo <- tibble(A = c(3,5,NA,6,NA,7,NA),
B = c(4,5,4,5,6,4,NA),
C = c(6,5,2,8,8,5,NA))
foo
tmp <- rowMeans(foo[,-1],na.rm = TRUE)
foo$A[is.na(foo$A)] <- tmp[is.na(foo$A)]
foo$A[is.nan(foo$A)] <- NA
Curious how I might do this with dplyR?
You can use ifelse :
library(dplyr)
foo %>%
mutate(A = ifelse(is.na(A), rowMeans(., na.rm = TRUE), A),
A = replace(A, is.nan(A), NA))
# A B C
# <dbl> <dbl> <dbl>
#1 3 4 6
#2 5 5 5
#3 3 4 2
#4 6 5 8
#5 7 6 8
#6 7 4 5
#7 NA NA NA
Here is a solution that not only replace NA in column A, but for all columns in the data frame.
library(dplyr)
foo2 <- foo %>%
mutate(RowMean = rowMeans(., na.rm = TRUE)) %>%
mutate(across(-RowMean, .fns =
function(x) ifelse(is.na(x) & !is.nan(RowMean), RowMean, x))) %>%
select(-RowMean)
Use coalesce:
foo %>%
mutate(m = rowMeans(across(), na.rm = T),
A = if_else(is.na(A) & !is.na(m), m, A)) %>%
select(-m)
# # A tibble: 7 x 3
# A B C
# <dbl> <dbl> <dbl>
# 1 3 4 6
# 2 5 5 5
# 3 3 4 2
# 4 6 5 8
# 5 7 6 8
# 6 7 4 5
# 7 NA NA NA

join and sum columns together R

I have a dataframe:
df <- data.frame(ca = c("a","b","a","c","b", "b"),
f = c(3,4,0,NA,3, 4),
f2 = c(NA,5,6,1,9, 7),
f3 = c(3,0,6,3,0, 8))
I want join and sum my columns "f" and "f2" and rename it in "f_news"
exemple :
df <- data.frame(ca = c("a","b","a","c","b", "b"),
f_new = c(3,9,6,1,12, 11),
f3 = c(3,0,6,3,0, 8))
Do you have an idea of how to do this with summarise, spread, group_by?
Using plyr and dplyr you can do this:
df %>%
rowwise() %>%
mutate(f_new=sum(f, f2, na.rm = T))
# A tibble: 6 x 5
# ca f f2 f3 f_new
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 a 3 NA 3 3
#2 b 4 5 0 9
#3 a 0 6 6 6
#4 c NA 1 3 1
#5 b 3 9 0 12
#6 b 4 7 8 11
This method will retain and NA values
Here is an answer using tidyverse methods from dplyr and tidyr
library(tidyverse)
df <- data.frame(ca = c("a","b","a","c","b", "b"),
f = c(3,4,0,NA,3, 4),
f2 = c(NA,5,6,1,9, 7),
f3 = c(3,0,6,3,0, 8))
df %>%
replace_na(list(f = 0, f2 = 0)) %>%
mutate(f_new = f + f2)
#> ca f f2 f3 f_new
#> 1 a 3 0 3 3
#> 2 b 4 5 0 9
#> 3 a 0 6 6 6
#> 4 c 0 1 3 1
#> 5 b 3 9 0 12
#> 6 b 4 7 8 11
Dplyr can do this quite nice with the following code. Rowwise allows you to consider each row separately. And the mutate command sums whatever columns you want. the na.rm=TRUE handles the issue when you have NA's and want to ignore them. As a comment mentioned, if you do not have this, it will give you an NA if it's in any of the summed values.
library(dplyr)
df %>%
rowwise() %>%
mutate(f_new = sum(f,f2, na.rm = TRUE))

NA filling only if "sandwiched" by the same value using dplyr

Ok, here is yet another missing value filling question.
I am looking for a way to fill NAs based on both the previous and next existent values in a column. Standard filling in a single direction is not sufficient for this task.
If the previous and next valid values in a column are not the same, then the chunk remains as NA.
The code for the sample data frame is:
df_in <- tibble(id= 1:12,
var1 = letters[1:12],
var2 = c(NA,rep("A",2),rep(NA,2),rep("A",2),rep(NA,2),rep("B",2),NA))
Thanks,
Comparing na.locf() (last observation carried forward) and na.locf(fromLast = TRUE) (backward):
mutate(df_in,
var_new = if_else(
zoo::na.locf(var2, na.rm = FALSE) ==
zoo::na.locf(var2, na.rm = FALSE, fromLast = TRUE),
zoo::na.locf(var2, na.rm = FALSE),
NA_character_
))
# # A tibble: 12 x 4
# id var1 var2 var_new
# <int> <chr> <chr> <chr>
# 1 1 a NA NA
# 2 2 b A A
# 3 3 c A A
# 4 4 d NA A
# 5 5 e NA A
# 6 6 f A A
# 7 7 g A A
# 8 8 h NA NA
# 9 9 i NA NA
# 10 10 j B B
# 11 11 k B B
# 12 12 l NA NA
Something like this?
df_in %>% mutate(var_new = {
tmp <- var2
tmp[is.na(tmp)] <- "NA"
rl <- rle(tmp)
tibble(before = c(NA, head(rl$values, -1)),
value = rl$values,
after = c(tail(rl$values, -1), NA),
lengths = rl$lengths) %>%
mutate(value = ifelse(value == "NA" & before == after, before, value),
value = ifelse(value == "NA", NA, value)) %>%
select(value, lengths) %>%
unname() %>%
do.call(rep, .)})
# # A tibble: 12 x 4
# id var1 var2 var_new
# <int> <chr> <chr> <chr>
# 1 1 a NA <NA>
# 2 2 b A A
# 3 3 c A A
# 4 4 d NA A
# 5 5 e NA A
# 6 6 f A A
# 7 7 g A A
# 8 8 h NA <NA>
# 9 9 i NA <NA>
# 10 10 j B B
# 11 11 k B B
# 12 12 l NA <NA>
Explanation
Convert NA to "NA" (because rle does not count consecutive NA.)
Create a run length encoded representation of tmp
Now you cna have a look at values beofre and after the relevant blocks
Replace the values.

Resources