Apply a self-made function with mutate across

Apply a self-made function with mutate across - r

I have created a function and I want to create new variables according to the conditions of the self-made function but I get mistakes...
This is my function:
percentages <- function(x) {
mutate(x = case_when(
x < 25 ~ "Menos 25%",
x <50 ~ "Entre 30% 50%",
x <75 ~ "Entre 50% 75%",
x >= 75 ~ "Más 75%")
)
return(x)
}
When I apply this code:
transp6 <- transp_articles %>% mutate
(across(c(perc_index_info_inst_org:perc_index_art_10_portal), percentages))
I get this result:
Error: Problem with mutate() input ..1.
i ..1 = across(...).
x no applicable method for 'mutate' applied to an object of class "character"
Run rlang::last_error() to see where the error occurred.
When I apply this:
transp6 <- transp_articles %>%
mutate(across
(.cols = c(perc_index_info_inst_org:perc_index_art_10_portal),
.fns = percentages(c(perc_index_info_inst_org:perc_index_art_10_portal)),
.names = "{.col}_rank"))
I get this:
rror: Problem with mutate() input ..1.
i ..1 = across(...).
x no applicable method for 'mutate' applied to an object of class "character"
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Problem with `mutate()` input `..1`.
i `..1 = across(...)`.
i numerical expression has 1031 elements: only the first used
Error: Problem with `mutate()` input `..1`.
i `..1 = across(...)`.
x no applicable method for 'mutate' applied to an object of class "character"
Run `rlang::last_error()` to see where the error occurred.
Finally, I try to do it with an apply:
apply(transp_articles$perc_index_info_inst_org:transp_articles$perc_index_art_10_portal,
margin = 2, percentages)
And I get:
Error in match.fun(FUN) : argument "FUN" is missing, with no default

Your problem is that you're passing a mutate to a mutate; across takes functions like case_when that accepts a vector and returns a vector. Easiest like this:
library(dplyr)
transp6 <-
transp_articles |>
mutate(across(perc_index_info_inst_org:perc_index_art_10_portal,
~ case_when(
. < 25 ~ "Menos 25%",
. < 50 ~ "Entre 30% 50%",
. < 75 ~ "Entre 50% 75%",
. >= 75 ~ "Más 75%"
)))
Alternatively, define your function:
percentages <- function(x) {
case_when(
x < 25 ~ "Menos 25%",
x < 50 ~ "Entre 30% 50%",
x < 75 ~ "Entre 50% 75%",
x >= 75 ~ "Más 75%")
)
}
And then:
library(dplyr)
transp6 <-
transp_articles |>
mutate(across(.cols = perc_index_info_inst_org:perc_index_art_10_portal,
.fns = percentages,
.names = "{.col}_rank"))
Without reproducible data I cannot make you a working example.
Edit: Bug fix + add example with across with the names argument as OP.

Related

How to create a loop over multiple columns for the clusWilcox.test function

For my data analysis, I need to run clusWilcox.test (clusrank package) for multiple columns in my dataset. This dataset consists of 92 columns, of which colums 1 to 26 are possible independent variables (all factors), and 27 to 92 are dependent variables (all numeric) that I want to test.
If I were to do all the tests manually, it should look like:
clusWilcox.test(column27 ~ treatment + cluster(factor1), data = df)
clusWilcox.test(column28 ~ treatment + cluster(factor1), data = df)
...
clusWilcox.test(column92 ~ treatment + cluster(factor1), data = df)
I though it should be possible to make R do these subsequent test automatically and then give me the output for all these tests at once. I found a code for using the lapply function for a linear model (lm function) that I tried to adapt to the clusWilcox.test, however this does not work.
This is de code I found:
names(df)
varlist <- names(df)[27:92]
models <- lapply(varlist, function(x) {
lm(substitute(i ~ treatment + factor1, list(i = as.name(x))), data = df)
})
models
This works perfectly for my dataset. So my next step was to adapt this code, as I need to run the clusWilcox.test instead of a lm. I adapted the code as follows:
names(df)
varlist <- names(df)[27:92]
models2 <- lapply(varlist, function(x) {
clusWilcox.test(substitute(i ~ treatment + cluster(factor1), list(i = as.name(x))), data = df)
})
models2
However, this code gives the following error message:
Error in `vectbl_as_col_location()`:
! Can't subset columns that don't exist.
✖ Columns `substitute`, `i ~ Treatment`, and `list(i = as.name(x))` don't exist.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_subscript_oob>
Error in `vectbl_as_col_location()`:
! Can't subset columns that don't exist.
✖ Columns `substitute`, `i ~ Treatment`, and `list(i = as.name(x))` don't exist.
---
Backtrace:
1. base::lapply(...)
2. global FUN(X[[i]], ...)
4. clusrank:::clusWilcox.test.default(...)
5. clusrank:::extractVar("x", pars, data)
7. tibble:::`[.tbl_df`(data, , as.character(pars[[var]]))
8. tibble:::vectbl_as_col_location(j, length(x), names(x), j_arg = j_arg, assign = FALSE)
Run `rlang::last_trace()` to see the full context.
>
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `vectbl_as_col_location()`:
! Can't subset columns that don't exist.
✖ Columns `substitute`, `i ~ Treatment`, and `list(i = as.name(x))` don't exist.
---
Backtrace:
▆
1. └─base::lapply(...)
2. └─global FUN(X[[i]], ...)
3. ├─clusrank::clusWilcox.test(...)
4. └─clusrank:::clusWilcox.test.default(...)
5. └─clusrank:::extractVar("x", pars, data)
6. ├─data[, as.character(pars[[var]])]
7. └─tibble:::`[.tbl_df`(data, , as.character(pars[[var]]))
8. └─tibble:::vectbl_as_col_location(j, length(x), names(x), j_arg = j_arg, assign = FALSE)
9. ├─tibble:::subclass_col_index_errors(...)
10. │ └─base::withCallingHandlers(...)
11. └─vctrs::vec_as_location(j, n, names)
12. └─vctrs (local) `<fn>`()
13. └─vctrs:::stop_subscript_oob(...)
14. └─vctrs:::stop_subscript(...)
15. └─rlang::abort(...)
What did I do wrong, and how I can let R repeat the test multiple times over all the columns?

Convert to formula class:
library(clusrank)
#example data
df <- mtcars[, c("vs", "am", "mpg")]
names(df)
# [1] "vs" "am" "mpg"
varlist <- names(df)[1:2]
# [1] "vs" "am"
lapply(varlist, function(x) {
clusWilcox.test(as.formula(substitute(mpg ~ i, list(i = as.name(x)))), data = df)
})
#[[1]]
#
# Clustered Wilcoxon rank sum test using Rosner-Glynn-Lee method
#
#data: mpg; group: vs; (from df)
#number of observations: 32; number of clusters: 32
#Z = 3.9342, p-value = 8.349e-05
#alternative hypothesis: true difference in locations is not equal to 0
#
#
#[[2]]
#
# Clustered Wilcoxon rank sum test using Rosner-Glynn-Lee method
#
#data: mpg; group: am; (from df)
#number of observations: 32; number of clusters: 32
#Z = 3.1291, p-value = 0.001753
#alternative hypothesis: true difference in locations is not equal to 0
Alternatively, I prefer to use paste to construct the formula:
lapply(varlist, function(x) {
f <- as.formula(paste("mpg ~", x))
clusWilcox.test(f, data = df)
})

Getting variables names from glmnet lasso into a data.frame

I'm working with a phyloseq object ps.scale and trying to get the most important variables/features that can predict health status sample_data(ps.scale)$group.
Code is as follows:
library(glmnet)
metadata <- factor(sample_data(ps.scale)$group)
otu_tab <- otu_table(ps.scale)
otu_tab <- apply(otu_tab, 2, function(x) x+1/sum(x+1))
otu_tab <- t(log10(otu_tab))
y <- metadata
x <- otu_tab
lasso <- cv.glmnet(x, y, family="multinomial", alpha=1)
print(lasso)
plot(lasso)
So I get the results and a plot here.
#Call: cv.glmnet(x = x, y = y, family = "multinomial", alpha = 1)
#Measure: Multinomial Deviance
# Lambda Index Measure SE Nonzero
#min 0.03473 36 1.704 0.05392 68
#1se 0.05529 26 1.751 0.05474 16
Now I want to be able to extract the important variables/features (i.e., OTUs). Below are some codes I gathered from the internet:
Code 1
all_1se <- coef(lasso, s = "lambda.1se")
chosen_1se <- all_1se[all_1se > 0, ]
chosen_1se
#Error: 'list' object cannot be coerced to type 'double'
Code 2
tmp_coeffs <- coef(lasso, s = "lambda.1se")
data.frame(name = tmp_coeffs#Dimnames[[1]][tmp_coeffs#i + 1], coefficient = tmp_coeffs#x)
#Error in data.frame(name = tmp_coeffs#Dimnames[[1]][tmp_coeffs#i + 1], :
# trying to get slot "Dimnames" from an object of a basic class ("list") with no slots
Code 3
myCoefs <- coef(lasso, s="lambda.min");
myCoefs[which(myCoefs != 0 ) ]
myCoefs#Dimnames[[1]][which(myCoefs != 0 ) ] #feature names: intercept included
## Asseble into a data.frame
myResults <- data.frame(
features = myCoefs#Dimnames[[1]][ which(myCoefs != 0 ) ], #intercept included
coefs = myCoefs [ which(myCoefs != 0 ) ] #intercept included
)
myResults
#Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'which': 'list' object cannot be coerced to type 'double'
#3.h(simpleError(msg, call))
#2..handleSimpleError(function (cond)
# .Internal(C_tryCatchHelper(addr, 1L, cond)), "'list' object cannot be coerced to type 'double'",
# base::quote(which(myCoefs != 0)))
#1.which(myCoefs != 0)
I need help fixing the above errors, mainly 'list' object cannot be coerced to type 'double'.
Thank you in advance.

R: filter issue - Error in `filter()`: ! Problem while computing `..1 = between(data$column, 400, 1800)`

I have a data frame and am looking to get a subset based on conditions related to another column. This seems to work for some variable, but somehow for one particular one, I am getting the following error message:
Error in `filter()`:
! Problem while computing `..1 = between(Unitranche$Yield, 400, 1800)`.
Caused by error in `Unitranche$Yield`:
! $ operator is invalid for atomic vectors
Run `rlang::last_error()` to see where the error occurred.
The code is as follows:
X = filter(data, X == 1)
Y = filter(data, Y == 1)
Z = filter(data), Z == 1)
no problem until here, but then:
X = filter(X, between(X$a,400,1500))
Y = filter(Y, between(Y$a,400,1800))
Z = filter(Z, a >= 800)
The error pops up for Y, although Y is created/filtered exactly like the X and Z.
Not sure how to replicate this error with sample data; but just in case Y is 278 observations of 5 variables.
Thank you!

R nls function, error in lhs - rhs non-numeric argument to binary operator

I am trying to use the nls (non linear least squared) function on some covid data. I think the error is in my "sigfunction", but I can't seem to figure out how to make it work.
covid <- read.csv("covid19.csv")
plot(covid$Algeria ~ covid$days,xlab = "Time (days)",ylab="Cases")
I know that the line should follow this function.
s = width, m = middle, a = height
sigfunction <- function(a,x,m,s){a*exp(((x-m)/s)^2)}
mod <- nls(y ~ sigfunction, start=list(m=70,s=60,a=30), trace=TRUE)
but the nls command gives me the error:
Error in lhs - rhs : non-numeric argument to binary operator
2. nlsModel(formula, mf, start, wts)
1. nls(y ~ sigfunction, start = list(m = 70, s = 60, a = 30), trace = TRUE)
thanks to G. Grothendieck i fixed one problem in the function, but now when i run it i get this error:
Error in qr(.swts * gr) : dims [product 4] do not match the length of object [146]
i changed the last two lines to:
Algeria <- covid$Algeria
sigfunction <- function(a,x,m,s){a*exp(((x-m)/s)^2)}
mod <- nls(Algeria ~ sigfunction(a,x,m,s), start=list(x=75,m=70,s=60,a=30), trace=TRUE)

Trying to specify predict.coxph type within map2() function

I’ve been scouring the web for the last few days looking at the documentation for map2. I have taken a training set, nested the data and created coxph models for it, saving those models in the nested table. Now I want to predict from that model, but I want to use a type=“expected" as, according to the documentation (R documentation: predict.coxph)
The survival probability for a subject is equal to exp(-expected)
I’ve adapted the relevant code to reproduce my issues using the mpg data set.
I have 4 examples below that do not work after the predict function that does work. Please note that I have removed the coxph.null models from this set, so the only models are of class(coxph). This code can be used to replicate the errors.
#Needed libraries
library(ggplot2)
library(tidyverse)
library(purrr)
library(broom)
library(survival)
#Create data set
mpg_data <- mpg
mpg_data <- mpg_data %>%
mutate(mpg_diff = cty - hwy)
mpg_data <- mpg_data %>%
mutate(EVENT = (mpg_diff >= -8))
set.seed(1)
mpg_data <- mpg_data %>%
mutate(TIME_TO_EVENT = as.integer(runif(234, 1, 100)))
mpg_nested <- mpg_data %>%
group_by(manufacturer) %>%
mutate(n_prot = length(model)) %>%
nest()
# Stepwise regression
stepwise <- function(data) {
response <- Surv(time = data$TIME_TO_EVENT, event = data$EVENT, type = "right")
full <- "Surv(time = data$TIME_TO_EVENT, event = data$EVENT, type = 'right') ~ data$cyl+data$cty+data$hwy+data$displ"
x <- factor(as.factor(data$model))
full <- ifelse(nlevels(x) >= 2, paste(full, "as.character(data$model)", sep = "+"), full)
x <- factor(as.factor(data$trans))
full <- ifelse(nlevels(x) >= 2, paste(full, "as.character(data$trans)", sep = "+"), full)
x <- factor(as.factor(data$drv))
full <- ifelse(nlevels(x) >= 2, paste(full, "as.character(data$drv)", sep = "+"), full)
null_model_ONE <- coxph(response ~ 1, data=data)
full_model_ONE <- coxph(as.formula(full), data=data)
model_ONE <- step(null_model_ONE, scope=list(lower=null_model_ONE, upper=full_model_ONE))
}
survival_mpg <- mpg_nested %>%
mutate(model_fit = map(data, stepwise))
#Predicting values
#This works but is not type="expected"
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map2(model_fit, data, predict))
##TRY 1##
predict.F <- function(model_fit, data){
predict(model_fit, newdata=data, type="expected")
}
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map2(model_fit, data, predict.F))
#Error in mutate_impl(.data, dots) : Evaluation error: requires numeric/complex matrix/vector arguments.
##Try 2##
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map2(model_fit, data, predict(model_fit, newdata = data, type="expected")))
#Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'predict' applied to an object of class "list".
##Try 3##
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map2(model_fit, data, ~ predict(.x, newdata = .y, type="expected")))
#Error in mutate_impl(.data, dots) : Evaluation error: requires numeric/complex matrix/vector arguments.
##Try 4##
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map2(model_fit, data, function(model_fit, data) predict(model_fit, newdata=data, type="expected")))
#Error in mutate_impl(.data, dots) : Evaluation error: requires numeric/complex matrix/vector arguments.

Modifying ##TRY 1## to remove the newdata argument and change the map2() function to the map() function worked
predict.F <- function(model_fit, data){
predict(model_fit, type="expected")
}
survival_mpg_predict <- survival_mpg %>%
mutate(mpg_predict = map(model_fit, predict.F))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Apply a self-made function with mutate across - r

Related

How to create a loop over multiple columns for the clusWilcox.test function

Getting variables names from glmnet lasso into a data.frame

R: filter issue - Error in `filter()`: ! Problem while computing `..1 = between(data$column, 400, 1800)`

R nls function, error in lhs - rhs non-numeric argument to binary operator

Trying to specify predict.coxph type within map2() function

Categories

Resources