I have
df = data.frame(Col1 = c( NA, 1," ", 2345.768,"hi","", NA, 3.4, "44.99"))
and want to format specific values, and created a udf
format_it = function(y, n_decimals, dash_type, suffix = ""){
if(is.na(y)) return(dash_type)
if(nchar(gsub(" ", "", y))==0) return(y)
has_letter = grep("[A-z]+", y)
if(is_empty(has_letter)== TRUE) {
return(paste0(format(round(as.numeric(y), n_decimals), nsmall=n_decimals, big.mark = ","),suffix))
}
if(has_letter == 1){
return(y)
} else{
x = as.numeric(y)
ifelse(is.na(x),
dash_type,
paste0(format(round(as.numeric(x), n_decimals), nsmall=n_decimals, big.mark = ","),suffix))}
}
I tested each value individually, ie format_it(df$Col1[1],1,"-"), and each one worked ok
but, when I set up a set_formatter in flextable,
df %>%
flextable() %>%
set_formatter(Col1 = function(x) format_it(x,1,"-"))
I hoped the results would be correct, but received the wrong results,
with the message: the condition has length > 1 and only the first element will be used
I tried updating to include Vectorize, but received the same error
Any suggestions?
I would like to see
I'm a little confused on your function, but a fresh code approach to recreating your table (based on your function) in a reproducible way is below, which produces your desired output. It first replaces any NA values in the original data with "-", then checks for all non-numeric values (ie, "hi") using grepl and keeps those the same, then standardizes the significant digits in the numeric values with sprintf. This approach was within the dplyr "world" using mutate() and case_when() and did not use a user-defined function.
df %>%
mutate(Col1 = case_when(
is.na(Col1) ~ "-",
!grepl("[^A-Za-z]", Col1) ~ Col1,
grepl(".", Col1) ~ sprintf("%.1f", as.numeric(Col1)),
)) %>%
flextable::flextable()
Related
I built a function to use it inside the purrr::map2 function and run it in two lists. When I run the function steps separately it works ok. But apparently in map2 it runs the first time (for the first elements of list .x[[1]] .y[[1]]) and then in the second round throws this error in the for function:
How can I find out why it's not working?
PS: It's hard to put an example of the data here because they are lists with very specific characteristics for this function. I'm sorrry.
Follow the function:
df <- list()
build_HUW_raster <- function(.x, .y) {
list.time <- .x %>%
split(.$id) %>%
purrr::map(~list(t=as.matrix(.x$date),
xy=unname(as.matrix(.x[,c(22,23)])))
)
for(i in 1:50){
cat(i," ")
path=list.time[[i]]
ctmc=ctmcmove::path2ctmc(path$xy,path$t,r,method="LinearInterp")
df[[i]] <- as.data.frame(do.call(cbind, ctmc))
}
df <- df %>% purrr::map(~ group_by(., ec) %>%
summarise(rt = mean(rt)) %>%
arrange(desc(rt))
)
stacktime <- df %>% purrr::map(~ rename(., cell = ec)) %>%
map(~dplyr::left_join(cargo.grid, ., by="cell", copy=T)) %>%
map(~raster::rasterize(., r, field="rt", na.rm=F, background=0)) %>%
raster::stack()
stackprop <- .y %>%
split(.$id) %>%
purrr::map(~ raster::rasterize(., y = r,
field=.$proportion,
fun=function(x, ...)median(x))) %>%
raster::stack()
stack_huw <- raster::overlay(raster::calc(stacktime, fun=function(x)
ifelse(is.na(x), NA, x/sum(x, na.rm=T))), stackprop, fun=function(x,y)x*y
)
raster_mean <- raster::stackApply(stack_huw,
indices = rep(1,raster::nlayers(stack_huw)),
fun = "mean",
na.rm = F
)
}
result.list <- purrr::map2 (.x=list1, .y=list2, fun=build_HUW_raster)
The reason is based on the element looped. [[ extracts the list element and depending on the class of the element, map loops over either individual elements if it is a vector/matrix or the columns in case of data.frame as these are units. By using [, it extracts the element as a list
list(1, 2, 3)[1]
[[1]]
[1] 1
vs
list(1, 2, 3)[[1]]
[1] 1
When we loop over map and apply some functions that require a specific structure i.e. colSums require a matrix/data.frame ie. with dim attributes, it fails if we use [[
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[[1]], colSums)
Error in .f(.x[[i]], ...) :
'x' must be an array of at least two dimensions
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[1], colSums)
[[1]]
col1 col2
15 40
Here, we may change the code to
purrr::map2(.x=list1[1], .y=list2[1], fun=build_HUW_raster)
I want to apply multiple functions to the same dataframe. However, I have not been able to successfully pass column names as a parameter in purrr::imap. I keep get the following error:
Error in UseMethod("select") : no applicable method for 'select'
applied to an object of class "character"
I have tried many combinations for evaluation (e.g., using !!!, [[, enquo, sys.lang, and on and on). when I apply a function (e.g., check_1) directly to a dataframe, select works fine. However, it does not work when I try to pass column names as a parameter using imap and exec.The format of the column name is part of the issue (e.g., 1.1.), but I have tried quotes and single quotes, etc.
This is a follow up to a previous post, but that post and solution focused on applying multiple functions to individual columns. Now, I need to apply multiple functions, which use more than one column in the dataframe; hence, the need to specify column names in a function.
Minimal Example
Data
df <- structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, 4, 5, 3, NA),
`1.2.1.` = c(
"cool", "amazing", "wonderful", "okay",
NA, NA, "chocolate", "fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
What I have Tried
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 == "Jason" & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
}
check_2 <- function(x, col1, col2) {
index <- x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 >= 3 & col1 <= 5 & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
return(index)
}
checks <-
list("df" = list(fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
"df" = list(fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
results <-
purrr::imap(checks, ~ exec(.x$fn, x = .y,!!!.x$pars))
Expected Output
> results
$df
[1] 8
$df
[1] 5 6
Besides the "class character" error, I also get an additional error when I try to test the check_2 function on its own, where it returns no expected values.
[1] 1.2. 1.2.1. row.index
<0 rows> (or 0-length row.names)
I have looked at many other similar SO posts (e.g., this one), but none have solved this issue for me.
The first issue is that you pass the name of the dataframe but not the the dataframe itself. That's why you get the first error as you are trying to select from a character string. To solve this issue add the dataframe to the list you are looping over.
The second issue is that when you pass the column names as character string you have to tell dplyr that these characters refer to columns in your data. This could be achieved by e.g. making use of the .data pronoun.
Finally, instead of select + unlist + as.vector you could simply use dplyr::pull:
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] == "Jason" & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
check_2 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] >= 3 & .data[[col1]] <= 5 & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
checks <-
list(df = list(df = df, fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
df = list(df = df, fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
purrr::map(checks, ~ exec(.x$fn, x = .x$df, !!!.x$pars))
#> $df
#> [1] 8
#>
#> $df
#> [1] 5 6
Use select({{col1}},{{col2}})
this most probably help you
I'm recoding survey responses (character) to a set of questions (that are not in continuous columns), and I was thrilled to get the following code to work:
#make a list of the selected columns
fcols <- c(2, 6, 8, 9, 14)
#recode the selected columns
d <- d %>% mutate_at(vars(fcols),
~(recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_)))
My main question has to do with making this work with "across", since "mutate_at" is apparently superseded.
I tried the below - put in the "across", and make sure to add a new closed paren at the end - but it doesn't work:
d <- d %>% mutate(across(vars(fcols),
~(recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_))))
Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `quosures`.
i It must be numeric or character.
i Input `..1` is `across(...)`.
Also, I've been trying to create a new set of columns (rather than just changing the existing ones) using the .names argument, after the .default argument, but I haven't been able to get that to work, except once only partially - when the columns appeared but they were all empty.
Main question: what am I missing in converting this to "across" from the working "mutate_at" version?
Bonus: how do I get the .names part to work?
For across when you have fcols as numbers you don't need vars -
library(dplyr)
d %>% mutate(across(fcols,
~recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_)))
.names is useful when you want to keep the original columns as it is and create new columns.
d %>% mutate(across(fcols,
~recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_), .names = '{col}_new'))
We can wrap with all_of instead of vars. Also, recode can take a named vector
library(dplyr)
library(stringr)
nm1 <- setNames(str_c("NewResponse", 1:4),
str_c("OriginalResponse", 1:4))
d %>%
mutate(across(all_of(fcols),
~recode(., !!! nm1,
.default = NA_character_)))
I have a vector of string where people where asked to guess someones age, this includes statements like "50-60", "ca. 50" or ">50". I want to use regular expressions to match these cases and get the real numerical values. "50-60" should produce 55 (as the mean of both values), the other two examples 50.
For each variant, I wanted to have a case in a switch like below, but it doesn't seem to work. Is it even possible to use a regex in a switch?
switch (string,
str_detect(string, "[:digit:]+[:blank:]*(-|_)[:blank:]*[:digit:]+") = {
first <- str_sub(string, 1, 2) %>% as.numeric()
second <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
value <- mean(c(first, second))
},
str_detect(string, "((ca)\.?)|>|~[:blank:]*[:digit:]+") = {
value <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
},
str_detect(string, "[:digit:]+[:punct:]") = {
value <- str_sub(string, 1, 2) %>% as.numeric()
},
print(string, " could not be matched")
)
The expressions themselves are working as intended (as far as I tested), so I guess I can't use them in the switch like this. However I couldn't find a solution anywhere.
Edit: Added what the expected output for the examples is
We can do this with tidyverse methods
Convert the string to a tibble/data.frame
Remove the characters not neeeded with str_remove_all
Then, separate the column into two by specifying the sep
Get the rowMeans
library(dplyr)
library(tidyr)
library(stringr)
tibble(mystring) %>%
mutate(mystring = str_remove_all(mystring, "[A-Za-z.><]+")) %>%
separate(mystring, into = c('col1', 'col2'), sep="[- ]+",
convert = TRUE) %>%
transmute(out = rowMeans(., na.rm = TRUE))
-output
# A tibble: 3 x 1
out
<dbl>
1 55
2 50
3 50
data
mystring <- c("50-60", "ca. 50", ">50")
You can use a nested if/else approach -
library(stringr)
string <- "50-60"
if(str_detect(string, "[:digit:]+[:blank:]*(-|_)[:blank:]*[:digit:]+")) {
first <- str_sub(string, 1, 2) %>% as.numeric()
second <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
value <- mean(c(first, second))
value
} else if(str_detect(string, "((ca)\\.?)|>|~[:blank:]*[:digit:]+")) {
value <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
value
} else if(str_detect(string, "[:digit:]+[:punct:]")) {
value <- str_sub(string, 1, 2) %>% as.numeric()
value
} else NA
#[1] 55
For string <- "ca. 50" it returns 50.
mystring <- c("50-60", "ca. 50", ">50")
library(stringr)
lapply(str_extract_all(mystring, "[0-9]+"),
function(x) if (length(x) == 1) as.numeric(x[1]) else mean(as.numeric(x)))
[[1]]
[1] 55
[[2]]
[1] 50
[[3]]
[1] 50
I am making my first baby steps with non standard evaluation (NSE) in dplyr.
Consider the following snippet: it takes a tibble, sorts it according to the values inside a column and replaces the n-k lower values with "Other".
See for instance:
library(dplyr)
df <- cars%>%as_tibble
k <- 3
df2 <- df %>%
arrange(desc(dist)) %>%
mutate(dist2 = factor(c(dist[1:k],
rep("Other", n() - k)),
levels = c(dist[1:k], "Other")))
What I would like is a function such that:
df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)
produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain).
I am getting lost in enquo, quo_name etc...
Any suggestion is appreciated.
You can do:
library(dplyr)
sort_keep=function(df,old_column, new_column, levels_to_keep){
old_column = enquo(old_column)
new_column = as.character(substitute(new_column))
df %>%
arrange(desc(!!old_column)) %>%
mutate(use = !!old_column,
!!new_column := factor(c(use[1:levels_to_keep],
rep("Other", n() - levels_to_keep)),
levels = c(use[1:levels_to_keep], "Other")),
use=NULL)
}
df%>%sort_keep(dist,dist2,3)
Something like this?
old_column = "dist"
new_column = "dist2"
levels_to_keep = 3
command = "df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)"
command = gsub('old_column', old_column, command)
command = gsub('new_column', new_column, command)
command = gsub('levels_to_keep', levels_to_keep, command)
eval(parse(text=command))