Reshaping basket data frame for separating items in a basket

Reshaping basket data frame for separating items in a basket - r

I work with a data set of customers purchase baskets. Here is a sample of it:
basket item quant
1 1 B 1
2 1 A 2
3 1 C 1
4 2 A 1
5 2 C 1
6 3 A 2
7 4 B 1
8 4 C 1
Here is the code for reproducing it:
input <- data.frame(
basket = c(1,1,1,2,2,3,4,4),
item = c("B","A","C","A","C","A","B","C"),
quant=c(1,2,1,1,1,2,1,1)
)
So in the first basket there are three items with specified quantities. I have a custom function that only works with a specific format of input; We define a maximum basket size. Let's say it is 5. Now the input to that function should be like this:
basket item_1 item_2 item_3 item_4 item_5
1 1 B A A C <NA>
2 2 A C <NA> <NA> <NA>
3 3 A A <NA> <NA> <NA>
4 4 B C <NA> <NA> <NA>
I have been trying to do it using dplyr and summarise but have had no luck.
Any help would be appreciated!

Another possible solution:
library(dplyr)
library(tidyr)
input[rep(1:nrow(input), input$quant),] %>%
group_by(basket) %>%
mutate(item2 = paste0('item_', row_number())) %>%
complete(item2 = paste0('item_', 1:5)) %>%
select(-quant) %>%
spread(item2, item)
which gives:
# A tibble: 4 x 6
basket item_1 item_2 item_3 item_4 item_5
<dbl> <fct> <fct> <fct> <fct> <fct>
1 1. B A A C NA
2 2. A C NA NA NA
3 3. A A NA NA NA
4 4. B C NA NA NA
Using the same logic, but with the data.table-package:
library(data.table)
setDT(input)
input[input[, rep(.I, quant)]
][, .(basket, item, item2 = paste0('item_', rowid(basket)))
][CJ(basket = basket, item2 = paste0('item_', 1:5), unique = TRUE)
, on = .(basket, item2)
][, dcast(.SD, basket ~ item2, value.var = 'item')]

Here is an idea via tidyverse. The tricks here are to replicate your rows based on quant, then remove the quant variable so it doesn't mess with your reshaping to wide data frame. After that you create a new variable that will deal with duplicates and of course finally spread to get the desired wide data frame.
library(tidyverse)
df[rep(rownames(df), df$quant),] %>%
select(-quant) %>%
group_by(basket) %>%
mutate(new = paste0('item_', row_number())) %>%
spread(new, item)
which gives,
# A tibble: 4 x 5
# Groups: basket [4]
basket item_1 item_2 item_3 item_4
<dbl> <fct> <fct> <fct> <fct>
1 1. B A A C
2 2. A C NA NA
3 3. A A NA NA
4 4. B C NA NA

Related

A computation efficient way to find the IDs of the Type 1 rows just above and below each Type 2 rows?

I have the following data
df <- tibble(Type=c(1,2,2,1,1,2),ID=c(6,4,3,2,1,5))
Type ID
1 6
2 4
2 3
1 2
1 1
2 5
For each of the type 2 rows, I want to find the IDs of the type 1 rows just below and above them. For the above dataset, the output will be:
Type ID IDabove IDbelow
1 6 NA NA
2 4 6 2
2 3 6 2
1 2 NA NA
1 1 NA NA
2 5 1 NA
Naively, I can write a for loop to achieve this, but that would be too time consuming for the dataset I am dealing with.

One approach using dplyr lead,lag to get next and previous value respectively and data.table's rleid to create groups of consecutive Type values.
library(dplyr)
library(data.table)
df %>%
mutate(IDabove = ifelse(Type == 2, lag(ID), NA),
IDbelow = ifelse(Type == 2, lead(ID), NA),
grp = rleid(Type)) %>%
group_by(grp) %>%
mutate(IDabove = first(IDabove),
IDbelow = last(IDbelow)) %>%
ungroup() %>%
select(-grp)
# Type ID IDabove IDbelow
# <dbl> <dbl> <dbl> <dbl>
#1 1 6 NA NA
#2 2 4 6 2
#3 2 3 6 2
#4 1 2 NA NA
#5 1 1 NA NA
#6 2 5 1 NA

A dplyr only solution:
You could create your own rleid function then apply the logic provided by Ronak(Many thanks. Upvoted).
library(dplyr)
my_func <- function(x) {
x <- rle(x)$lengths
rep(seq_along(x), times=x)
}
# this part is the same as provided by Ronak.
df %>%
mutate(IDabove = ifelse(Type == 2, lag(ID), NA),
IDbelow = ifelse(Type == 2, lead(ID), NA),
grp = my_func(Type)) %>%
group_by(grp) %>%
mutate(IDabove = first(IDabove),
IDbelow = last(IDbelow)) %>%
ungroup() %>%
select(-grp)
Output:
Type ID IDabove IDbelow
<dbl> <dbl> <dbl> <dbl>
1 1 6 NA NA
2 2 4 6 2
3 2 3 6 2
4 1 2 NA NA
5 1 1 NA NA
6 2 5 1 NA

Separate a shopping list into multiple columns

I have a shopping list data like this:
df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)
# id item
# 1 1 apple2milk5
# 2 2 milk1
# 3 3 juice3apple5
# 4 4 egg10juice1
# 5 5 egg8milk2
I want to separate the variable item into multiple columns and record the number behind the goods. The problem I met is that the goods each person purchases are different so I cannot solve it using tidyr::separate() or other analogous functions. What I expect is:
# id apple milk juice egg
# 1 1 2 5 NA NA
# 2 2 NA 1 NA NA
# 3 3 5 NA 3 NA
# 4 4 NA NA 1 10
# 5 5 NA 2 NA 8
Note: The categories of goods in the market are unknown. So don't assume there are only 4 kinds of goods.
Thanks for any helps!

I just came up with a tidyverse solution which uses stringr::str_extract_all() to extract the quantities, sets their names as product names, and expands them to wide using tidyr::unnest_wider().
library(tidyverse)
df %>%
mutate(N = map2(str_extract_all(item, "\\d+"), str_extract_all(item, "\\D+"), set_names)) %>%
unnest_wider(N, transform = as.numeric)
# # A tibble: 5 × 6
# id item apple milk juice egg
# <int> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 apple2milk5 2 5 NA NA
# 2 2 milk1 NA 1 NA NA
# 3 3 juice3apple5 5 NA 3 NA
# 4 4 egg10juice1 NA NA 1 10
# 5 5 egg8milk2 NA 2 NA 8

I'll add yet another answer. It only slightly differs from #ASuliman's but uses a bit of the newer tidyr and some cute regex to become a bit more straightforward.
The regex trick is that the pattern "(?<=\\d)\\B(?=[a-z])" will match the non-boundary (i.e. an empty location) between numbers and letters, allowing you to create rows for every "apple5" type of entry. Extract the letters into an item column and numbers into a count column. Using the new pivot_wider which replaces spread, you can convert those counts to numeric values as you reshape.
library(dplyr)
library(tidyr)
df %>%
separate_rows(item, sep = "(?<=\\d)\\B(?=[a-z])") %>%
extract(item, into = c("item", "count"), regex = "^([a-z]+)(\\d+)$") %>%
pivot_wider(names_from = item, values_from = count, values_fn = list(count = as.numeric))
#> # A tibble: 5 x 5
#> id apple milk juice egg
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 5 NA NA
#> 2 2 NA 1 NA NA
#> 3 3 5 NA 3 NA
#> 4 4 NA NA 1 10
#> 5 5 NA 2 NA 8

Possibily something like this, and should work with any item/quantity.
It just assumes that the quantity follows the item.
Lets use a custom function which extracts item and quantity:
my_fun <- function(w) {
items <- stringr::str_split(w, "\\d+", simplify = T)
items <- items[items!=""] # dont now why but you get en empty spot each time
quantities <- stringr::str_split(w, "\\D+", simplify = T)
quantities <- quantities[quantities!=""]
d <- data.frame(item = items, quantity=quantities, stringsAsFactors = F)
return(d)
}
Example:
my_fun("apple2milk5")
# gives:
# item quantity
# 1 apple 2
# 2 milk 5
Now we can apply the function to each id, using nest and map:
library(dplyr)
df_result <- df %>%
nest(item) %>%
mutate(res = purrr::map(data, ~my_fun(.x))) %>%
unnest(res)
df_results
# # A tibble: 9 x 3
# id item quantity
# <int> <chr> <chr>
# 1 1 apple 2
# 2 1 milk 5
# 3 2 milk 1
# 4 3 juice 3
# 5 3 apple 5
# 6 4 egg 10
# 7 4 juice 1
# 8 5 egg 8
# 9 5 milk 2
Now we can use dcast() (probabily spread would work too):
data.table::dcast(df_result, id~item, value.var="quantity")
# id apple egg juice milk
# 1 1 2 <NA> <NA> 5
# 2 2 <NA> <NA> <NA> 1
# 3 3 5 <NA> 3 <NA>
# 4 4 <NA> 10 1 <NA>
# 5 5 <NA> 8 <NA> 2
Data:
df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)

tmp = lapply(strsplit(df$item, "(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)", perl = TRUE),
function(x) {
d = split(x, 0:1)
setNames(as.numeric(d[[2]]), d[[1]])
})
nm = unique(unlist(lapply(tmp, names)))
cbind(df, do.call(rbind, lapply(tmp, function(x) setNames(x[nm], nm))))
# id item apple milk juice egg
#1 1 apple2milk5 2 5 NA NA
#2 2 milk1 NA 1 NA NA
#3 3 juice3apple5 5 NA 3 NA
#4 4 egg10juice1 NA NA 1 10
#5 5 egg8milk2 NA 2 NA 8

Place a space before each numeric substring and a newline after it. Then read that data using read.table and unnest it. Finally use pivot_wider to convert from long to wide form.
library(dplyr)
library(tidyr)
df %>%
mutate(item = gsub("(\\d+)", " \\1\n", item)) %>%
rowwise %>%
mutate(item = list(read.table(text = item, as.is = TRUE))) %>%
ungroup %>%
unnest(item) %>%
pivot_wider(names_from = "V1", values_from = "V2")
giving:
# A tibble: 5 x 5
id apple milk juice egg
<int> <int> <int> <int> <int>
1 1 2 5 NA NA
2 2 NA 1 NA NA
3 3 5 NA 3 NA
4 4 NA NA 1 10
5 5 NA 2 NA 8
Variation
This is a variation of the above code that eliminates the unnest. We replace each numeric string by a space, that string, another space, the id and a newline. Then use read.table to read that in. Note the use of %$% rather than %>% before the read.table. Finally use pivot_wider to convert from long to wide form.
library(dplyr)
library(magrittr)
library(tidyr)
df %>%
rowwise %>%
mutate(item = gsub("(\\d+)", paste(" \\1", id, "\n"), item)) %$%
read.table(text = item, as.is = TRUE, col.names = c("nm", "no", "id")) %>%
ungroup %>%
pivot_wider(names_from = "nm", values_from = "no")

You can try
library(tidyverse)
library(stringi)
df %>%
mutate(item2 =gsub("[0-9]", " ", df$item)) %>%
mutate(item3 =gsub("[a-z]", " ", df$item)) %>%
mutate_at(vars(item2, item3), ~stringi::stri_extract_all_words(.) %>% map(paste, collapse=",")) %>%
separate_rows(item2, item3, sep = ",") %>%
spread(item2, item3)
id item apple egg juice milk
1 1 apple2milk5 2 <NA> <NA> 5
2 2 milk1 <NA> <NA> <NA> 1
3 3 juice3apple5 5 <NA> 3 <NA>
4 4 egg10juice1 <NA> 10 1 <NA>
5 5 egg8milk2 <NA> 8 <NA> 2

#replace any digit followed by a character "positive look-ahead assertion" by the digit plus a comma
library(dplyr)
library(tidyr)
df %>% mutate(item=gsub('(\\d+(?=\\D))','\\1,' ,item, perl = TRUE)) %>%
separate_rows(item, sep = ",") %>%
extract(item, into = c('prod','quan'), '(\\D+)(\\d+)') %>%
spread(prod, quan, fill=0)
id apple egg juice milk
1 1 2 0 0 5
2 2 0 0 0 1
3 3 5 0 3 0
4 4 0 10 1 0
5 5 0 8 0 2

This is a simple solution in base R and stringr:
goods <- unique(unlist(stringr::str_split(df$item, pattern = "[0-9]")))
goods <- goods[goods != ""]
df <- cbind(df$id, sapply(goods,
function(x) stringr::str_extract(df$item, pattern = paste0(x,"[0-9]*"))))
df <- as.data.frame(df)
df[-1] <- lapply(df[-1], function(x) as.numeric(stringr::str_extract(x, pattern = "[0-9]*$")))
names(df)[1] <- "id"
Output
id apple milk juice egg
1 1 2 5 NA NA
2 2 NA 1 NA NA
3 3 5 NA 3 NA
4 4 NA NA 1 10
5 5 NA 2 NA 8

Mostly base R with some input from stringr and data.table:
library(stringr)
library(data.table)
cbind(
id = df$id,
rbindlist(
lapply(df$item, function(x) as.list(setNames(str_extract_all(x, "[0-9]+")[[1]], strsplit(x, "[0-9]+")[[1]]))),
fill = TRUE
)
)
id apple milk juice egg
1: 1 2 5 <NA> <NA>
2: 2 <NA> 1 <NA> <NA>
3: 3 5 <NA> 3 <NA>
4: 4 <NA> <NA> 1 10
5: 5 <NA> 2 <NA> 8

A cleaner data.table solution with input from stringr:
df[,
.(it_count = str_extract_all(item, "[0-9]+")[[1]],
it_name = str_extract_all(item, "[^0-9]+")[[1]]),
by = id
][, dcast(.SD, id ~ it_name, value.var = "it_count")]
id apple egg juice milk
1: 1 2 <NA> <NA> 5
2: 2 <NA> <NA> <NA> 1
3: 3 5 <NA> 3 <NA>
4: 4 <NA> 10 1 <NA>
5: 5 <NA> 8 <NA> 2

Replace all NA values for variable with one row equal to 0

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

Another idea using dplyr,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3

We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3

Base R option is to find groups with all NAs and transform them by changing their val to 0 and select only unique rows so that there is only one row per group. We rbind this dataframe with the groups which are !all_NA.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr option looks ugly but one way is to make two groups of dataframes one with groups of all NA values and other with groups of all non-NA values. For groups with all NA values we add row with it's id and val as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3

Changed the df to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id column with a unique(df1$id) variable.

Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3

Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2 and df3, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)

One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.
In this example, id = a.
Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.
So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of ids with corresponding val = NA for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theids from the dataframe df1 with val = NA.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with ids in all_na and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)

R add rows to grouped df using dplyr

I have a grouped df and I would like to add additional rows to the top of the groups that match with a variable (item_code) from the df.
The additional rows do not have an id column. The additional rows should not be duplicated within the groups of df.
Example data:
df <- as.tibble(data.frame(id=rep(1:3,each=2),
item_code=c("A","A","B","B","B","Z"),
score=rep(1,6)))
additional_rows <- as.tibble(data.frame(item_code=c("A","Z"),
score=c(6,6)))
What I tried
I found this post and tried to apply it:
Add row in each group using dplyr and add_row()
df %>% group_by(id) %>% do(add_row(additional_rows %>%
filter(item_code %in% .$item_code)))
What I get:
# A tibble: 9 x 3
# Groups: id [3]
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 Z 6
3 1 NA NA
4 2 A 6
5 2 Z 6
6 2 NA NA
7 3 A 6
8 3 Z 6
9 3 NA NA
What I am looking for:
# A tibble: 6 x 3
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 A 1
3 1 A 1
4 2 B 1
5 2 B 1
6 3 B 1
7 3 Z 6
8 3 Z 1

This should do the trick:
library(plyr)
df %>%
join(subset(df, item_code %in% additional_rows$item_code, select = c(id, item_code)) %>%
join(additional_rows) %>%
subset(!duplicated(.)), type = "full") %>%
arrange(id, item_code, -score)
Not sure if its the best way, but it works
Edit: to get the score in the same order added the other arrange terms
Edit 2: alright, there should now be no duplicated rows added from the additional rows as per your comment

Reorder a single column in a dataframe within each level of another column

Probably the solution to this problem is really easy but I just can't see it. Here is my sample data frame:
df <- data.frame(id=c(1,1,1,2,2,2), value=rep(1:3,2), level=rep(letters[1:3],2))
df[6,2] <- NA
And here is the desired output that I would like to create:
df$new_value <- c(3,2,1,NA,2,1)
So the order of all columns is the same, and for the new_value column the value column order is reversed within each level of the id column. Any ideas? Thanks!

As I understood your question, it's a coincidence that your data is sorted, if you just want to reverse the order without sorting:
library(dplyr)
df %>% group_by(id) %>% mutate(new_value = rev(value)) %>% ungroup
# A tibble: 6 x 4
id value level new_value
<dbl> <int> <fctr> <int>
1 1 1 a 3
2 1 2 b 2
3 1 3 c 1
4 2 1 a NA
5 2 2 b 2
6 2 NA c 1

A slightly different approach, using the parameters in the sort function:
library(dplyr)
df %>% group_by(id) %>%
mutate(value = sort(value, decreasing=TRUE, na.last=FALSE))
Output:
# A tibble: 6 x 3
# Groups: id [2]
id value level
<dbl> <int> <fctr>
1 1.00 3 a
2 1.00 2 b
3 1.00 1 c
4 2.00 NA a
5 2.00 2 b
6 2.00 1 c
Hope this helps!

We can use order on the missing values and on the column itself
library(dplyr)
df %>%
group_by(id) %>%
mutate(new_value = value[order(!is.na(value), -value)])
# A tibble: 6 x 4
# Groups: id [2]
# id value level new_value
# <dbl> <int> <fctr> <int>
#1 1.00 1 a 3
#2 1.00 2 b 2
#3 1.00 3 c 1
#4 2.00 1 a NA
#5 2.00 2 b 2
#6 2.00 NA c 1
Or using the arrange from dplyr
df %>%
arrange(id, !is.na(value), desc(value)) %>%
transmute(new_value = value) %>%
bind_cols(df, .)
Or using base R and specify the na.last option as FALSE in order
with(df, ave(value, id, FUN = function(x) x[order(-x, na.last = FALSE)]))
#[1] 3 2 1 NA 2 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Reshaping basket data frame for separating items in a basket - r

Related

A computation efficient way to find the IDs of the Type 1 rows just above and below each Type 2 rows?

Separate a shopping list into multiple columns

Replace all NA values for variable with one row equal to 0

R add rows to grouped df using dplyr

Reorder a single column in a dataframe within each level of another column

Categories

Resources