I have created some simple functions in R that take data and return variables. I'd appreciate instruction on taking this to another level using the code below as illustration: There's a data frame 'df' which contains variables DOFW, CDFME (there are others, but two will suffice for this example. I'd like to pass to the function the name of the variable and have it build the barplot. Basically I want to create a function that performs the actions from z<- through abline and call that function for each variable. Thank you.
df<-data.frame(y.train,x.train)
z<-ddply(df,~DOFW,summarise,median=median(PCTCHG))
row.names(z)<-z$DOFW
z$DOFW<-NULL
barplot(t(as.matrix(z)),main="Median Return by DOFW",xlab="DOFW")
abline(h=median(y.train))
z<-ddply(df,~CDFME,summarise,median=median(PCTCHG))
row.names(z)<-z$CDFME
z$CDFME<-NULL
barplot(t(as.matrix(z)),main="Median Return by CDFME",xlab="CDFME")
abline(h=median(y.train))
Actually you don't really need non-standard evaluation since ddply allows you to pass a string for the variable name rather than a formula. Here's how you could do that.
#sample data
dd<-data.frame(a=rep(1:5, 2), b=2:11, c=runif(10))
#define function
library(plyr)
myplot<-function(coln) {
z<-ddply(dd, coln, summarise, median=median(c))
barplot(z[,2], main=paste("Median Return By", coln))
abline(h=median(dd$c))
}
#make plots
myplot("a")
myplot("b")
and the easiest way to get the names of the columns as a character vector is names(dd).
Related
This question already has an answer here:
Loop for Shapiro-Wilk normality test for multiple variables in R
(1 answer)
Closed 2 years ago.
I am trying to create a formula which I can used to quickly check different variables for normality. I'm new to R and am not quite sure how to proceed. This is my attempt, but it does not work:
normality_test <- function(my_data) { shapiro.test(my_data$"x") }
My goal is to be able to use the formula as follows:
normality_test("variable name")
Use [[ to access column data.
normality_test<- function(my_data, col) shapiro.test(my_data[[col]])
You can use it as :
normality_test(my_data, "var1")
normality_test(my_data, "var2")
To apply normality_test for all the columns, you could use :
result <- lapply(names(my_data), normality_test, my_data = my_data)
However, if you want to run this for all the columns you can directly use
result <- lapply(my_data, shapiro.test)
with no need to create normality_test function.
Here is a working solution for you. The main difference from yours it the use of [ ] notation as opposed to $ notation for variable extraction and that mine provides both data and variable name to the function. Be sure to select only the variables which are numeric or can be coerced to such for use with the function. Also, since the function now has two arguments and the first one is data you can use marnitrr pipe (%>%) to make it more readable and use the function over a data set.
test <- mtcars
normality_test<- function(my_data, x) {
return(shapiro.test(as.numeric(my_data[,x])))
}
normality_test(test, "qsec")
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
I wrote the following function:
rename.fun(rai,pred){
assign('pred',rai)
return(pred) }
I called it with the arguments rename.fun(k2e,k2e_cat2) and it returns the object I want but it is named pred.
The point of this function is to assign the object I define as rai to the object I define as pred. So rename k2e to k2e_cat2.
I am new to R but I am a SAS programmer. This is a very simple task with the SAS macro processor but I cant seem to figure it out in R
EDIT:
In SAS I would do the following:
%macro rename_fun(rai=) ;
data output (rename=(&rai.=&rai._cat2));
set input;
run;
%mend;
Essentially, I want to add the suffix _cat2 to a bunch of variables, but they need to be in a function call. I know this seems odd but its for a specific project at work. I am new to R so I apologize if this seems silly.
Since you say that you want to rename several columns in a data.frame you could simple do this by using a function that takes a data.frame and a list of column names to rename:
add_suffix_cat2 <- function(df, vars){
names(df)[match(vars, names(df))] <- paste0(vars, "_cat2")
return(df)
}
Then you can call the function like:
mydf <- mtcars
res <- add_suffix_cat2(mydf, c("hp","mpg"))
If you wanted to make the suffix customizable that's simlpe enough to do by adding another parameter to the function.
I am Having a little problem doing a Levene test in R. I does not get any output value, only NaN. Anyone know what the problem might be?
Have used the code:
with(Test,levene.test(Sample1,Sample2,location="median"))
The problem
Best regards
The levene.test function assumes the data are in a single vector. The second argument is a grouping variable.
Concatenate your data using the c() function: data=c(Sample1, Sample2). Construct a vector of group names like gp = rep('Gp1','Gp2', each=240). Then, call the function as follows: levene.test(data, gp, location='median').
This can also be done directly:
levene.test(c(Sample1, Sample2), rep('Gp1', 'Gp2', each=240)), location='median')
I have created a function, which computes the statistics on various patients data, and as well as outputting plots, it generates data frames containing summary statistics for each patient.
If i copy and run the function within R, the outputs are available to me. However, I am now calling the function from a separate R script, and the data frames are no longer available.
Is there any way to correct this?
For example,
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
test(a)
This does not return DF, yet if I were to type:
a=c(1,2,3,4)
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
Then clearly DF is returned. Is there a simple way to fix this so that DF becomes available from the test function?
Try:
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
df<-test(a)
print(df)
By assigning the function's returned value to a new variable it is now accessible in the global space.
If you want to assign an object from within a function to the global environment for easy retrieval then your operators are "<<-" or "->>" for more info see:
?assignOps() i.e.
test <- function(a)
A=a
B=2*a
C=3*a
D=4*a
DF <<- data.frame(A,B,C,D)
}
# trial your dummy data
a=c(1,2,3,4)
test(a)
DF
Hey presto ... it works! Writing return(DF) within the function will not deliver your data frame to the global environment.