I need to recode some columns in my data, there are 29 columns with the same coded expressions
The cells are coded with numbers, something like that:
1 - Normal
2 - Altered
3 - NA
I want to create a for loop to change all columns at the same time. I need to transform the number code (1;2;3) into names(Normal;Alteres;NA)
thats what im trying to do.... i dont get any error message but this arent working....
for (i in names(df[,123:151])){
mutate(i = case_when(
i == 1 ~ 'Normal',
i == 2 ~ 'Altered',
i == 3 ~ 'NA'))
}
An easy way to do this would be to use dplyr from tidyverse.
library(tidyverse)
#make test dataframe
col1 <- c("1", "2", "3")
col2 <- c(3, 2, 2)
df <- data.frame(col1, col2)
df_recoded<-df %>%
mutate(across(.cols = everything(), ~case_when(
. == 1 ~ 'Normal',
. == 2 ~ 'Altered',
. == 3 ~ NA_character_)))
Try this:
df %>% mutate(across(.cols = names(df)[121:151],
.fns = ~recode(.,`1` = "Normal", `2` = "Altered", `3` = "NA",.default=NA_character_)))
Related
I want to apply multiple functions to the same dataframe. However, I have not been able to successfully pass column names as a parameter in purrr::imap. I keep get the following error:
Error in UseMethod("select") : no applicable method for 'select'
applied to an object of class "character"
I have tried many combinations for evaluation (e.g., using !!!, [[, enquo, sys.lang, and on and on). when I apply a function (e.g., check_1) directly to a dataframe, select works fine. However, it does not work when I try to pass column names as a parameter using imap and exec.The format of the column name is part of the issue (e.g., 1.1.), but I have tried quotes and single quotes, etc.
This is a follow up to a previous post, but that post and solution focused on applying multiple functions to individual columns. Now, I need to apply multiple functions, which use more than one column in the dataframe; hence, the need to specify column names in a function.
Minimal Example
Data
df <- structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, 4, 5, 3, NA),
`1.2.1.` = c(
"cool", "amazing", "wonderful", "okay",
NA, NA, "chocolate", "fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
What I have Tried
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 == "Jason" & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
}
check_2 <- function(x, col1, col2) {
index <- x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 >= 3 & col1 <= 5 & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
return(index)
}
checks <-
list("df" = list(fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
"df" = list(fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
results <-
purrr::imap(checks, ~ exec(.x$fn, x = .y,!!!.x$pars))
Expected Output
> results
$df
[1] 8
$df
[1] 5 6
Besides the "class character" error, I also get an additional error when I try to test the check_2 function on its own, where it returns no expected values.
[1] 1.2. 1.2.1. row.index
<0 rows> (or 0-length row.names)
I have looked at many other similar SO posts (e.g., this one), but none have solved this issue for me.
The first issue is that you pass the name of the dataframe but not the the dataframe itself. That's why you get the first error as you are trying to select from a character string. To solve this issue add the dataframe to the list you are looping over.
The second issue is that when you pass the column names as character string you have to tell dplyr that these characters refer to columns in your data. This could be achieved by e.g. making use of the .data pronoun.
Finally, instead of select + unlist + as.vector you could simply use dplyr::pull:
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] == "Jason" & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
check_2 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] >= 3 & .data[[col1]] <= 5 & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
checks <-
list(df = list(df = df, fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
df = list(df = df, fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
purrr::map(checks, ~ exec(.x$fn, x = .x$df, !!!.x$pars))
#> $df
#> [1] 8
#>
#> $df
#> [1] 5 6
Use select({{col1}},{{col2}})
this most probably help you
I need to set the values of a column to 0 or 1 based on other columns values.
If they are 0 or NA the new column should be 1.
I Thought about:
ifelse(df[,53:62]==0|NA, df$newCol <- 1, df$newCol <- 0)
But I the End I get only 1 in the new Column
Thanks for your help
I think the tidyverse fits perfectly on this common use case
library(tidyverse)
df_example <- matrix(c(0,1),ncol = 100,nrow = 100) %>%
as_tibble()
df_example %>%
mutate(across(.cols = 53:62,
.fns = ~ if_else(.x == 0|is.na(.x),
1,
0))
) %>%
select(V54) # example**
How can I use mutate to achieve the below?
bd_diag_date <- df %>%
apply(1, function(dates) last(na.omit(dates))) %>%
as.data.frame() %>%
`colnames<-`("diag_date")
I tried this below but didn't work. I can't find out why and it says Error: Column 'diagnosis_date' is of unsupported type symbol. Should I assume mutate takes any function operation that can apply to a vector? If not, then what kind of operation does it accept?
bd_diag_date <- df %>%
rowwise() %>%
{mutate(., diag_date=last(na.omit(all_vars(.))))}
I also have a more general questions. That is how can I debug this? Every time I encounter this problem I have to google stack exchange but I feel like this isn't the right way to improve my dplyr skill.
We can use pmap
library(dplyr)
library(purrr)
df %>%
mutate(diag_date = pmap(., ~ last(na.omit(c(...)))))
If the columns are numeric, we can use pmap_dbl, simply using pmap returns a list column
df %>%
mutate(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
# col1 col2 col3 diag_date
#1 1 NA 2 2
#2 NA 2 NA 2
#3 3 4 NA 4
If we need to return only a single column, use transmute
df %>%
transmute(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
Or with group_split and map
df %>%
group_split(grp = row_number(), keep = FALSE) %>%
map_dfr(~ .x %>%
transmute(diag_date = last(na.omit(unlist(.)))))
Or using base R with max.col
df$diag_date <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df), 'last'))]
data
df <- data.frame(col1 = c(1, NA, 3), col2 = c(NA, 2, 4), col3 = c(2, NA, NA))
I have a seemingly small problem. I want to use mutate_all() in conjunction with case_when(). A sample data frame:
tbl <- tibble(
x = c(0, 1, 2, 3, NA),
y = c(0, 1, NA, 2, 3),
z = c(0, NA, 1, 2, 3),
date = rep(today(), 5)
)
I first made another data frame replacing all the NA's with zero's and the values with a 1 with the following piece of code.
tbl %>%
mutate_all(
funs(
case_when(
. %>% is.na() ~ 0,
TRUE ~ 1
)))
Now I want to replace the NA values with blanks ("") and leave the other values as it is. However, I don't know how to set the TRUE value in a way that it keeps the value of the column.
Any suggestions would be much appreciated!
To leave the NA as "", we can use replace_na from tidyr
library(dplyr)
library(tidyr)
tbl %>%
mutate_all(replace_na, "")
# A tibble: 5 x 3
# x y z
# <chr> <chr> <chr>
#1 0 0 0
#2 1 1 ""
#3 2 "" 1
#4 3 2 2
#5 "" 3 3
With case_when or if_else, we have to make sure the type are the same across. Here, we are converting to character when we insert the "", so make sure the other values are also coerced to character class
tbl %>%
mutate_all(~ case_when(is.na(.) ~ "", TRUE ~ as.character(.)))
If we want to use only specific columns, then we can use mutate_at
tbl %>%
mutate_at(vars(x:y), ~ case_when(is.na(.) ~ "", TRUE ~ as.character(.)))
Also, to simplify the code in OP's post, it can be directly coerced to integer with as.integer or +
tbl %>%
mutate_all(~ as.integer(!is.na(.)))
Or if we are using case_when
tbl %>%
mutate_all(~ case_when(is.na(.)~ 0, TRUE ~ 1))
I am quite new to R. Using dplyr and filter, I want to select records for which a list of variables !=NA.
df %>% filter (var1 != "NA" | var2 != "NA" | var3 != "NA" )
The problem is that I have 85 such variables (ending with HR). So I have extracted them and put them in a vector.
hr_variables <- grep("HR$", names(ssc), value=TRUE)
I would like to make a loop that will fetch hr_variable and then filter() by applying the OR condition to each element.
Is this possible in R?
We can use base R to do this more easily
ssc[!rowSums(is.na(ssc[hr_variables])),]
# col1_HR col2_HR col3
#2 1 3 0.5365853
#3 2 4 0.4196231
Or using tidyverse
library(tidyverse)
ssc %>%
select_(.dots = hr_variables) %>%
map(~is.na(.)) %>%
reduce(`|`) %>%
`!` %>%
extract(ssc, .,)
Or with complete.cases
ssc %>%
select_(.dots = hr_variables) %>%
complete.cases(.) %>%
extract(ssc, ., )
data
set.seed(24)
ssc <- data.frame(col1_HR = c(NA, 1, 2, 3), col2_HR = c(NA, 3, 4, NA), col3 = rnorm(4))