This question already has an answer here:
Loop for Shapiro-Wilk normality test for multiple variables in R
(1 answer)
Closed 2 years ago.
I am trying to create a formula which I can used to quickly check different variables for normality. I'm new to R and am not quite sure how to proceed. This is my attempt, but it does not work:
normality_test <- function(my_data) { shapiro.test(my_data$"x") }
My goal is to be able to use the formula as follows:
normality_test("variable name")
Use [[ to access column data.
normality_test<- function(my_data, col) shapiro.test(my_data[[col]])
You can use it as :
normality_test(my_data, "var1")
normality_test(my_data, "var2")
To apply normality_test for all the columns, you could use :
result <- lapply(names(my_data), normality_test, my_data = my_data)
However, if you want to run this for all the columns you can directly use
result <- lapply(my_data, shapiro.test)
with no need to create normality_test function.
Here is a working solution for you. The main difference from yours it the use of [ ] notation as opposed to $ notation for variable extraction and that mine provides both data and variable name to the function. Be sure to select only the variables which are numeric or can be coerced to such for use with the function. Also, since the function now has two arguments and the first one is data you can use marnitrr pipe (%>%) to make it more readable and use the function over a data set.
test <- mtcars
normality_test<- function(my_data, x) {
return(shapiro.test(as.numeric(my_data[,x])))
}
normality_test(test, "qsec")
Related
I have been trying to create a simple function with a two arguments in R that takes a dataset as an example and a categorical feature, and based on that specific feature, stores in a folder ("DATA") inside the parent working directory multiple csv files grouped by the categories in that feature.
The problem I have been facing is as simple as the function may be: I introduced non-standard evaluation with rlang, but multiple errors jump at you for the enquo parameter (either the symbol expected or not being a vector). Therefore, function always fails.
The portion of code I used is the following, assuming always everyone has a folder called "DATA" in the project in Rstudio to store the splitted csv files.
library(tidyverse)
library(data.table)
library(rlang)
csv_splitter <- function(df, parameter){
df <- df
# We set categorical features missing values vector, with names automatically applied with
# sapply. We introduce enquo on the parameter for non-standard evaluation.
categories <- df %>% select(where(is.character))
NA_in_categories <- sapply(categories, FUN = function(x) {sum(is.na(x))})
parameter <- enquo(c(parameter))
#We make sure such parameter is included in the set of categorical features
if (!!parameter %in% names(NA_in_categories)) {
df %>%
split(paste0(".$", !!parameter)) %>%
map2(.y = names(.), ~ fwrite(.x, paste0('./DATA/data_dfparam_', .y, '.csv')))
print("The csv's are stored now in your DATA folder")
} else {
print("your variable is not here or it is continuous, buddy, try another one")
}
}
With an error in either "arg must be a symbol" in the enquo parameter, or with parameter not being a vector (which in this portion of code is solved with the "c(parameter)", I am stuck and unable to apply any other change to solve it.
If anyone does have a suggestion, I'll be more than happy to try it out on my code. In any case, I'll be extremely grateful for your help!
I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)
This question already has an answer here:
BioMart: Is there a way to easily change the species for all of my code?
(1 answer)
Closed 4 years ago.
Is there any way to use a loop to write this code? Each line of code is identical except from the species name
ensembl_hsapiens <- useMart("ensembl",
dataset = "hsapiens_gene_ensembl")
ensembl_mouse <- useMart("ensembl",
dataset = "mmusculus_gene_ensembl")
ensembl_chicken <- useMart("ensembl",
dataset = "ggallus_gene_ensembl")
Here's an approach. Note that using a loop (or a loop-equivalent construct) to populate the global environment isn't often a good idea. But it's what you asked for.
There's nothing special about useMart, so I'll make up a nonsense function that takes two character arguments:
foo <- function(x, y) {
nchar(paste(x, y))
}
Here are the species names. I'll use them for the object names as well.
species <- c("hsapiens", "mmusculus", "ggallus")
Now, you want to create three named objects in the global environment. You can use the assign function for this, noting that you use pos=2 because each loop of lapply is done in its own environment.
lapply(species, function(s) assign(paste0("ensembl_", s),
foo("ensemble", paste0(s, "_gene_ensembl")),
pos = 1))
This gives you what you want. You can replace foo use useMart.
Now, is this a good idea? Perhaps not. I would be more inclined to keep the objects themselves in a list.
objs <- lapply(species, function(s) foo("ensemble", paste0(s, "_gene_ensembl")))
names(objs) <- paste0("ensemble_", species)
You can access them using statements like objs$ensemble_hsapiens or objs[["ensemble_hsapiens"]]
I wrote the following function:
rename.fun(rai,pred){
assign('pred',rai)
return(pred) }
I called it with the arguments rename.fun(k2e,k2e_cat2) and it returns the object I want but it is named pred.
The point of this function is to assign the object I define as rai to the object I define as pred. So rename k2e to k2e_cat2.
I am new to R but I am a SAS programmer. This is a very simple task with the SAS macro processor but I cant seem to figure it out in R
EDIT:
In SAS I would do the following:
%macro rename_fun(rai=) ;
data output (rename=(&rai.=&rai._cat2));
set input;
run;
%mend;
Essentially, I want to add the suffix _cat2 to a bunch of variables, but they need to be in a function call. I know this seems odd but its for a specific project at work. I am new to R so I apologize if this seems silly.
Since you say that you want to rename several columns in a data.frame you could simple do this by using a function that takes a data.frame and a list of column names to rename:
add_suffix_cat2 <- function(df, vars){
names(df)[match(vars, names(df))] <- paste0(vars, "_cat2")
return(df)
}
Then you can call the function like:
mydf <- mtcars
res <- add_suffix_cat2(mydf, c("hp","mpg"))
If you wanted to make the suffix customizable that's simlpe enough to do by adding another parameter to the function.
I am Having a little problem doing a Levene test in R. I does not get any output value, only NaN. Anyone know what the problem might be?
Have used the code:
with(Test,levene.test(Sample1,Sample2,location="median"))
The problem
Best regards
The levene.test function assumes the data are in a single vector. The second argument is a grouping variable.
Concatenate your data using the c() function: data=c(Sample1, Sample2). Construct a vector of group names like gp = rep('Gp1','Gp2', each=240). Then, call the function as follows: levene.test(data, gp, location='median').
This can also be done directly:
levene.test(c(Sample1, Sample2), rep('Gp1', 'Gp2', each=240)), location='median')