I want to write a function that runs the same analysis on different data.frames. Here is a simple version of my code:
set1 <- data.frame(x=c(1,2,4,6,2), y=c(4,6,3,56,4))
set2 <- data.frame(x=c(3,2,3,8,2), y=c(2,6,3,6,3))
mydata <- c("set1", "set2")
for (dataCount in 1:length(data)) {
lm(x~y, data=mydata)
}
How do I call a data.frame by name inside the function? Right now "data" obviously only returns the the names of "mydata" as a character.
There are number of ways of doing this. Your "native" way would be
mydata <- ls(pattern = "set")
for (dataCount in mydata) {
print(summary(lm(x~y, data=get(dataCount))))
}
or you could collate your data.frames into a list and work on that.
mylist <- list(set1, set2)
lapply(mylist, FUN = function(yourdata) {
print(summary(lm(x ~ y, data = yourdata)))
})
Related
I'm kind of puzzled like this. Obviously, if I create a function with generic methods for tibbles, data.frames, and matrices, as such:
dummy_func <- function(data) {
UseMethod("dummy_func")
}
dummy_func.tbl_df <- function(data) {
data <- tibble_data
#do something down-stream
}
dummy_func.data.frame <- function(data) {
data <- data_frame_one
#do something down-stream
}
R will know which one to use since the variable data will be associated with that type. However, what happens if I have a list of a specific object:
dummy_func.tbl_df <- function(data_list) {
data_list <- list_of_tibble_data
#do something down-stream
}
Can R recognize that the data_list in this generic method is associated with a list of tibbles? And same with a list of data.frames, etc?
The short answer is "no", but it's easy to achieve the desired action by using an S3 dummy_func.list method that checks the contents of any list passed. For simplicity we will get it to just report the type of the contents of the list passed to it, but obviously you might want the conditional branches to have specific actions for different types.
dummy_func <- function(data) {
UseMethod("dummy_func")
}
dummy_func.list <- function(data) {
types <- sapply(data, function(x) class(x)[1])
if(all(types == "data.frame")) return("A list of data frames was passed")
if(all(types == "tbl_df")) return("A list of tibbles was passed")
stop("Need a list of all data frames or all tibbles")
}
So we can test it like this:
# Dummy tibble and dummy dataframe
my_tibble <- dplyr::tibble(a = 1:3)
my_df <- data.frame(a = 1:3)
# Dummy lists
my_tbl_list <- list(my_tibble, my_tibble)
my_df_list <- list(my_df, my_df)
my_mixed_list <- list(my_tibble, my_df)
# Test dispatch:
dummy_func(my_tbl_list)
#> [1] "A list of tibbles was passed"
dummy_func(my_df_list)
#> [1] "A list of data frames was passed"
dummy_func(my_mixed_list)
#> Error in dummy_func.list(my_mixed_list): Need a list of all data frames or all tibbles
I am trying to write a function in R that:
1) Receives a data frame and column name as parameters.
2) Performs an operation on the column in the data frame.
func <- function(col, df)
{
col = deparse(substitute(col))
print(paste("Levels: ", levels(df[[col]])))
}
func(Col1, DF)
func(Col2, DF)
mapply(func, colnames(DF)[1:2], DF)
Output
> func(Col1, DF)
[1] "Levels: GREEN" "Levels: YELLOW"
> func(Col2, DF)
[1] "Levels: 0.1" "Levels: 1"
> mapply(func, colnames(DF)[1:2], DF)
Error in `[[.default`(df, col) : subscript out of bounds
Two things :
in your function func, you apply deparse(substitute(col)) to an object col you expected is not a string. So it works with func(Col1, DF). But in your mapply() call, your argument colnames(...) is a string, so it create an error. Same error obtained with func('Col1', DF).
in a mapply() call, all arguments need to be a vector or a list. So you need to use list(df, df), or if you don't want to replicate, remove the argument df of your function func.
This is one alternative that should work:
func <- function(col, df)
{
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, colnames(DF)[1:2], list(DF, DF))
Please have a look at the last comment of #demarsylvain - maybe a copy-paste error on your side, you should have done:
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
you did:
func <- function(col) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
Please upvote and accept the solution of #demarsylvain, it works
EDIT to adress your comment:
To have a generic version for an arbitrary list of column names you can use this code, sorry for the loop :)
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
cnames = colnames(iris)
i <- 1
l = list()
while(i <= length(cnames)) {
l[[i]] <- iris
i <- i + 1
}
mapply(FUN = func, cnames, l)
I want to build a function that calls another object that name is related to the main object name.
For example, Main object is 'VCU_Players' and the other object is 'VCU_Players_opp'
in my function i need to use both objects in my calculations.
So i am trying to do
my_function<- function(x) {
y<-deparse(substitute(x))
z<-"_opp"
y<- paste(y,z,sep = "")
#My Calculations
x$newfield<- x$pts+ y$pts
Return(x)
}
Now i want to pass the object VCU_Players to the function
my_function(VCU_Players)
But the function doesn't figure the VCU_Players_opp object
Consider passing string literals and using get() to retrieve corresponding object:
teams <- c("Team1", "Team2", "Team3", "Team4", "Team5", "Team6",
"Team7", "Team8", "Team9", "Team10", "Team11", "Team12")
my_function <- function(i) {
x <- get(paste0(i, "_players"))
y <- get(paste0(i, "_opp"))
# My Calculations
x$newfield <- x$pts + y$pts
return(x)
}
dfList <- lapply(teams, my_function)
Ideally, however is working with a few lists of many objects, and not separate multiple objects in your global environment. Try importing from your data source (i.e., Excel) multiple objects into single lists:
teamdfs <- c(Team1_players, Team2_players, Team3_players, Team4_players, Team5_players, Team6_players,
Team7_players, Team8_players, Team9_players, Team10_players, Team11_players, Team12_players)
team_oppdfs <- c(Team1_opp, Team2_opp, Team3_opp, Team4_opp, Team5_opp, Team6_opp,
Team7_opp, Team8_opp, Team9_opp, Team10_opp, Team11_opp, Team12_opp)
my_function <- function(x, y) {
# My Calculations
x$newfield <- x$pts + y$pts
return(x)
}
dfList <- mapply(my_function, teamdfs, team_oppdfs, SIMPLIFY = FALSE)
# EQUIVALENT TO Map(my_function, teamdfs, team_oppdfs)
I want to build a function in such a way that once i supplied data='name of data frame' there is no need to write variable=data$variable as just writing variable name from the supplied data frame will serve the purpose
myfunction<-function(variable,data)
{
result=sum(data)/sum(variable)
return(result)
}
for example i have a data frame df
df<-data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
I want to provide following input
myfunction(variable=x,data=df)
instead of below input to serve the purpose
myfunction(variable=df$x,data=df)
We can use non-standard evaluation:
myfunction <- function(variable, data) {
var <- eval(substitute(variable), data)
result = sum(data)/sum(var)
return(result)
}
# Test
myfunction(variable = x, data = df)
#[1] 6
The with or attach functions can help you here, see the ?with and ?attach documentation. Alternatively, you can supply the variable name as a character and use this in the function body. I.e. you can do something like this:
myfunction2 <- function(variable, data) {
result <- sum(data)/sum(data[[variable]])
return(result)
}
df <- data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
myfunction2("x", df)
#[1] 6
Yet another resort is to use non-standard evaluation. A small example of this is something like:
myfunction3 <- function(variable, data) {
var.name <- deparse(substitute(variable))
result <- sum(data)/sum(data[[var.name]])
return(result)
}
myfunction3(variable = x, data = df)
#[1] 6
I need to use the below function in loop as i have 100s of variables.
binning <- function (df,vars,by=0.1,eout=TRUE,verbose=FALSE) {
for (col in vars) {
breaks <- numeric(0)
if(eout) {
x <- boxplot(df[,col][!df[[col]] %in% boxplot.stats(df[[col]])$out],plot=FALSE)
non_outliers <- df[,col][df[[col]] <= x$stats[5] & df[[col]] >= x$stats[1]]
if (!(min(df[[col]])==min(non_outliers))) {
breaks <- c(breaks, min(df[[col]]))
}
}
breaks <- c(breaks, quantile(if(eout) non_outliers else df[[col]], probs=seq(0,1, by=by)))
if(eout) {
if (!(max(df[[col]])==max(non_outliers))) {
breaks <- c(breaks, max(df[[col]]))
}
}
return (cut(df[[col]],breaks=breaks,include.lowest=TRUE))
}}
It creates a variable with binned score. The naming convention of variable is "the original name" plus "_bin".
data$credit_amount_bin <- iv.binning.simple(data,"credit_amount",eout=FALSE)
I want the function runs for all the NUMERIC variables and store the converted bins variables in a different data frame and name them with "the original name _bin".
Any help would be highly appreciated.
Using your function, you could go via lapply, looping over all values that are numeric.
# some data
dat0 <- data.frame(a=letters[1:10], x=rnorm(10), y=rnorm(10), z=rnorm(10))
# find all numeric by names
vars <- colnames(dat0)[which(sapply(dat0,is.numeric))]
# target data set
dat1 <- as.data.frame( lapply(vars, function(x) binning(dat0,x,eout=FALSE)) )
colnames(dat1) <- paste(vars, "_bin", sep="")
Personally, I would prefer having this function with vector input instead of data frame plus variable names. It might run more efficiently, too.