dplyr mutate to replace specific values in a data frame - r

I have a data frame that consists of characters "a", "b", "x", "y".
df <- data.frame(v1 = c("a", "b", "x", "y"),
v2 = c("a", "b", "a", "y"))
Now I want to replace all values with the following scheme and also convert the whole data frame to numeric.
"a" -> 0
"b" -> 1
"x" -> 1
"y" -> 2
I know this must be somehow possible with mutate_all but I cannot figure out how
df %>% mutate_all(replace("a", 1)) %>%
mutate_all(is.character, as.numeric)

One solution could be with case_when:
df %>%
mutate_all(funs(case_when(. == "a" ~ 0,
. %in% c("b", "x") ~ 1,
. == "y" ~ 2,
TRUE ~ NA_real_)))
# v1 v2
# 1 0 0
# 2 1 1
# 3 1 0
# 4 2 2

Create a named vector with mappings and then subset it using mutate_all
vec <- c(a = 0, b = 1, x = 1, y = 2)
library(dplyr)
df %>% mutate_all(~vec[.])
# v1 v2
#1 0 0
#2 1 1
#3 1 0
#4 2 2
In base R that would be just
df[] <- vec[unlist(df)]
data
df <- data.frame(v1 = c("a", "b", "x", "y"),
v2 = c("a", "b", "a", "y"), stringsAsFactors = FALSE)

Related

Apply a function between two lists of data frames

I have the following data example and code:
lt1 <- list(df1 <- data.frame(V1 = c("a", "b"),
V2 = c("b", "c"),
V3 = c(1, 2)),
df2 <- data.frame(V1 = c("x", "y"),
V2 = c("x", "z"),
V3 = c(1, 2)))
lvls_func <- function(x) {
x[1:2] %>%
unlist() %>%
unique() %>%
sort()
}
lt_lvls <- lapply(lt1, lvls_func)
complete_func <- function(x) {
tidyr::complete(x[1] = factor(x[1], levels = lt_lvls),
x[2] = factor(x[2], levels = lt_lvls),
x[3] = x[3],
fill = list(x[3] = 0))
}
lt1_final <- lapply(lt1, complete_func)
I have difficulty building my complete_func().
I getting this error when I run my complete_func()
Error: unexpected '=' in:
"complete_func <- function(x) {
tidyr::complete(x[1] ="
In my final list lt1_final I expect this output:
lt1_final <- list(df1 <- data.frame(V1 = c("a", "b", "a", "a", "b", "b", "c", "c", "c"),
V2 = c("b", "c", "a", "c", "b", "a", "a", "b", "c"),
V3 = c(1, 2, 0, 0, 0, 0, 0, 0, 0)),
df2 <- data.frame(V1 = c("x", "y", "x", "x", "y", "y", "z", "z", "z"),
V2 = c("x", "z", "y", "z", "y", "x", "z", "x", "y"),
V3 = c(1, 2, 0, 0, 0, 0, 0, 0, 0)))
Thanks all help
As the lt_lvls is a list of levels, we may need either Map (from base R) or use purrr::map2.
In addition, create the function by making use of across. There are multiple changes in the function
Add an argument lvls in the function
Convert the columns 1 to 2 to factor by looping across within mutate, specify the lvls
Apply complete on the subset of data using either splicing (!!!) (or could use invoke/exec), and specify the fill as a named list with dplyr::lst (or regular list with setNames)
library(dplyr)
library(tidyr)
library(purrr)
complete_func <- function(x, lvls) {
x %>%
dplyr::mutate(across(1:2, factor, levels =lvls)) %>%
tidyr::complete(!!! .[1:2], fill = dplyr::lst(!! names(.)[3] := 0)) %>%
arrange(across(3, ~ .x == 0))
}
-testing
map2(lt1, lt_lvls, ~ complete_func(.x, .y))
[[1]]
# A tibble: 9 × 3
V1 V2 V3
<fct> <fct> <dbl>
1 a b 1
2 b c 2
3 a a 0
4 a c 0
5 b a 0
6 b b 0
7 c a 0
8 c b 0
9 c c 0
[[2]]
# A tibble: 9 × 3
V1 V2 V3
<fct> <fct> <dbl>
1 x x 1
2 y z 2
3 x y 0
4 x z 0
5 y x 0
6 y y 0
7 z x 0
8 z y 0
9 z z 0

How to replace all values in a column with another value?

Suppose I have a data frame df with two columns:
id category
A 1
B 4
C 3
D 1
I want to replace the numbers in category with the following: 1 = "A", 2 = "B", 3 = "C", 4 = "D".
I.e. the output should be
id category
A A
B D
C C
D A
Does anyone know how to do this?
Here I propose three methods to achieve your goal.
Base R
If you have a vector of values for conversion, you can use match to find the index of the vector to replace the category column.
vec <- c("1" = "A", "2" = "B", "3" = "C", "4" = "D")
df$category <- vec[match(df$category, names(vec))]
dplyr
Use a case_when statement to match the values in category, and assign new strings to it.
library(dplyr)
df %>% mutate(category = case_when(category == 1 ~ "A",
category == 2 ~ "B",
category == 3 ~ "C",
category == 4 ~ "D",
TRUE ~ NA_character_))
left_join from dplyr
Or if you have a dataframe with two columns specifying values for conversion, you can left_join them. Here, the dataframe for conversion is created by enframe.
left_join(df, enframe(vec), by = c("category" = "name")) %>% select(-value)
Output
id category
1 A A
2 B D
3 C C
4 D A
Data
df <- structure(list(id = c("A", "B", "C", "D"), category = c("A",
"D", "C", "A")), row.names = c(NA, -4L), class = "data.frame")
A possible solution:
library(tidyverse)
df %>%
mutate(category = LETTERS[category])
#> id category
#> 1 A A
#> 2 B D
#> 3 C C
#> 4 D A

How to convert tidy hierarchical data frame to hierarchical list grid in R?

This is a more complex version of a previous question where I had abstracted the actual problem too much to apply the answers.
R convert tidy hierarchical data frame to hierarchical list
I've converted a hierarchical data frame with two grouping levels into a hierarchical list-grid using a for loop.
Is there a more efficient base R, tidyverse or other approach to achieve this?
In the real dataset:
The grouping variables and description are multi word strings.
The description preface - d# - is in the MWE for ease of checking.
There are 14 associated variables variously of type: character, integer and double
Rules
Group 1 and Group 2 headings to be in description column
Group 1 headings to appear once only
Group 2 heading are children of group 1 heading, and only change when there is a new group 2 heading
Descriptions are children of group 2 headings
From this
g1 g2 desc var1 var2 var3
A a d1 KS3 0.0500 2 PLs
A a d2 CTI 0.0500 9 7O0
A b d3 b8x 0.580 5 he2
A b d4 XOf 0.180 12 XJE
A b d5 ygn 0.900 11 v48
A c d6 dGY 0.770 6 UcH
A d d7 jpG 0.600 4 P5M
B d d8 Z95 0.600 10 j6O
To this
desc var1 var2 var3
A
a
d1 KS3 0.0500 2 PLs
d2 CTI 0.0500 9 7O0
b
d3 b8x 0.580 5 he2
d4 XOf 0.180 12 XJE
d5 ygn 0.900 11 v48
c
d6 dGY 0.770 6 UcH
d
d7 jpG 0.600 4 P5M
B
d
Code
library(tidyverse)
library(stringi)
set.seed(2018)
tib <- tibble(g1 = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
g2 = c("a", "a", "b", "b", "b", "c", "d", "d", "b", "b", "e", "e"),
desc = paste0("d", 1:12, " ", stri_rand_strings(12, 3)),
var1 = round(runif(12), 2),
var2 = sample.int(12),
var3 = stri_rand_strings(12, 3))
tib
# Number of rows in final table
n_rows <- length(unique(tib$g1)) + length(unique(paste0(tib$g1, tib$g2))) + nrow(tib)
# create empty output tibble
output <-
as_tibble(matrix(nrow = n_rows, ncol = ncol(tib)-1)) %>%
rename(id = V1, desc = V2, var1 = V3, var2 = V4, var3 = V5) %>%
mutate(id = NA_character_,
desc = NA_character_,
var1 = NA_real_,
var2 = NA_integer_,
var3 = NA_character_)
# Loop counters
level_1 <- 0
level_2 <- 0
output_row <- 1
for(i in seq_len(nrow(tib))){
# level 1 headings
if(tib$g1[[i]] != level_1) {
output$id[[output_row]] <- "g1"
output$desc[[output_row]] <- tib$g1[[i]]
output_row <- output_row + 1
}
# level 2 headings
if(paste0(tib$g1[[i]], tib$g2[[i]]) != paste0(level_1, level_2)) {
output$id[[output_row]] <- "g2"
output$desc[[output_row]] <- tib$g2[[i]]
output_row <- output_row + 1
}
level_1 <- tib$g1[[i]]
level_2 <- tib$g2[[i]]
# Description and data grid
output$desc[[output_row]] <- tib$desc[[i]]
output$var1[[output_row]] <- tib$var1[[i]]
output$var2[[output_row]] <- tib$var2[[i]]
output$var3[[output_row]] <- tib$var3[[i]]
output_row <- output_row + 1
}
output
Adapting the answer from tyluRp R convert tidy hierarchical data frame to hierarchical list I've hit on a solution.
library(tidyverse)
library(stringi)
set.seed(2018)
tib <- tibble(g1 = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
g2 = c("a", "a", "b", "b", "b", "c", "d", "d", "b", "b", "e", "e"),
desc = paste0("d", 1:12, " ", stri_rand_strings(12, 3)),
var1 = round(runif(12), 2),
var2 = sample.int(12),
var3 = stri_rand_strings(12, 3))
# add unique identifier for description and variable rows
tib <-
tib %>%
rowid_to_column() %>%
mutate(rowid = paste0("z_", rowid))
# separate tibble for variables associated with descriptions
tib_var <-
tib %>%
select(rowid, var1, var2, var3)
# code adapted from tyluRp to reorder the data and add description variables
tib <-
tib %>%
select(g1, g2, desc, rowid) %>%
mutate(g2 = paste(g1, g2, sep = "_")) %>%
transpose() %>%
unlist() %>%
stack() %>%
distinct(values, ind) %>%
mutate(detect_var = str_detect(values, "^z_"),
ind = lead(case_when(detect_var == TRUE ~ values)),
values = case_when(detect_var == TRUE ~ NA_character_,
TRUE ~ values))%>%
drop_na(values) %>%
select(values, ind) %>%
mutate(values = str_remove(values, "\\D_")) %>%
left_join(tib_var, by = c("ind" = "rowid")) %>%
select(-ind) %>%
replace_na(list(var1 = "", var2 = "", var3 = ""))

Get elements by position from one data frame to another

Let's say we have two data frames:
df1 <- data.frame(A = letters[1:3], B = letters[4:6], C = letters[7:9], stringsAsFactors = FALSE)
A B C
1 a d g
2 b e h
3 c f i
df2 <- data.frame(V1 = 1:3, V2 = 4:6, V3 = 7:9)
V1 V2 V3
1 1 4 7
2 2 5 8
3 3 6 9
I need to build a function that takes as input a single value or a vector containing elements from one of the data frames and returns the elements from the other data frame according to their positional indexes.
The function should work like this:
> matchdf(values = c("a", "e", "i"), dfin = df1, dfout = df2)
[1] 1 5 9
> matchdf(values = c(1, 5, 9), dfin = df2, dfout = df1)
[1] "a" "e" "i"
> matchdf(values = c(1, 1, 1), dfin = df2, dfout = df1)
[1] "a" "a" "a"
This is what I have tried so far:
requiere(dplyr)
toVec <- function(df) df %>% as.matrix %>% as.vector
matchdf <- function(values, dfin, dfout) toVec(dfout)[toVec(dfin) %in% values]
# But sometimes the output values aren't in correct order:
> matchdf(c("c", "i", "h"), dt1, dt2)
[1] 3 8 9
# should output 3 9 8
> matchdf(values = c("a", "a", "a"), dfin = dt1, dfout = dt2)
[1] 1
# Should output 1 1 1
Feel free to use data.table or/and dplyr if it eases the task. I would prefer a solution without for loops.
Assumptions:
elements from df1 are different from df2
dim(df1) = dim(df2)
matchdf <- function(values, dfin, dfout){
unlist(sapply(values,
function(val) dfout[dfin == val],
USE.NAMES = F)
)
}
matchdf(c("c", "i", "h"), df1, df2)
#should output 3 9 8
[1] 3 9 8
matchdf(values = c("a", "a", "a"), dfin = df1, dfout = df2)
#should output 1 1 1
[1] 1 1 1
matchdf(values = c("X", "Y", "a"), dfin = df1, dfout = df2)
#should output vector, not list
[1] 1

How to assign a value to a data.frame filtered by dplyr?

I am trying to modify a data.frame filtered by dplyr but I don't quite seem to grasp what I need to do. In the following example, I am trying to filter the data frame z and then assign a new value to the third column -- I give two examples, one with "9" and one with "NA".
require(dplyr)
z <- data.frame(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"))
z %>% filter(w == "a" & x == 2) %>% select(y)
z %>% filter(w == "a" & x == 2) %>% select(y) <- 9 # Should be similar to z[z$w == "a" & z$ x == 2, 3] <- 9
z %>% filter(w == "a" & x == 3) %>% select(y) <- NA # Should be similar to z[z$w == "a" & z$ x == 3, 3] <- NA
Yet, it doesn't work: I get the following error message:
"Error in z %>% filter(w == "a" & x == 3) %>% select(y) <- NA : impossible de trouver la fonction "%>%<-"
I know that I can use the old data.frame notation, but what would be the solution for dplyr?
Thanks!
Filtering will subset the data frame. If you want to keep the whole data frame, but modify part of it, you can, for example use mutate with ifelse. I've added stringsAsFactors=FALSE to your sample data so that y will be a character column.
z <- data.frame(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"),
stringsAsFactors=FALSE)
z %>% mutate(y = ifelse(w=="a" & x==2, 9, y))
w x y
1 a 1 a
2 a 2 9
3 a 3 c
4 b 4 d
5 c 5 e
Or with replace:
z %>% mutate(y = replace(y, w=="a" & x==2, 9),
y = replace(y, w=="a" & x==3, NA))
w x y
1 a 1 a
2 a 2 9
3 a 3 <NA>
4 b 4 d
5 c 5 e
It is my impression that the dplyr package is philosophically opposed to modifying your underlying data. You might find the data.table package friendlier for this operation:
library(data.table)
z <- data.table(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"))
m <- data.table(w = c("a","a"), x = c(2,3), new_y = c("9", NA))
z[m, y := new_y, on=c("w","x")]
w x y
1: a 1 a
2: a 2 9
3: a 3 NA
4: b 4 d
5: c 5 e
I'm sure there's a way in base R as well, but I don't know it. In particular, I can't get merge or match to do the job.

Resources