R Saving function output to object when using assign function - r

I am currently trying to make my code dryer by rewriting some parts with the help of functions. One of the functions I am using is:
datasetperuniversity<-function(university,year){assign(paste("data",university,sep=""),subset(get(paste("originaldata",year,sep="")),get(paste("allcollaboration",university,sep=""))==1))}
Executing the function datasetperuniversity("Harvard","2000") would result within the function in something like this:
dataHarvard=subset(originaldata2000,allcollaborationHarvard==1)
The function runs nearly perfectly, except that it does not store a the results in dataHarvard. I read that this is normal in functions, and using the <<- instead of the = could solve this issue, however since I am making use of the assign function this is not really possible, since the = is just the outcome of the assign function.
Here some data:
sales = c(2, 3, 5,6)
numberofemployees = c(1, 9, 20,12)
allcollaborationHarvard = c(0, 1, 0,1)
originaldata = data.frame(sales, numberofemployees, allcollaborationHarvard)

Generally, it's best not to embed data/a variable into the name of an object. So instead of using assign to dataHarvard, make a list data with an element called "Harvard":
# enumerate unis, attaching names for lapply to use
unis = setNames(, "Harvard")
# make a table for each subset with lapply
data = lapply(unis, function(x)
originaldata[originaldata[[ paste0("allcollaboration", x) ]] == 1, ]
)
which gives
> data
$Harvard
sales numberofemployees allcollaborationHarvard
2 3 9 1
4 6 12 1
As seen here, you can use DF[["column name"]] to access a column instead of get as in the OP. Also, see the note in ?subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Generally, it's also better not to embed data in column names if possible. If the allcollaboration* columns are mutually exclusive, they can be collapsed to a single categorical variable with values like "Harvard", "Yale", etc. Alternately, it might make sense to put the data in long form.
For more guidance on arranging data, I recommend Hadley Wickham's tidy data paper.

Related

Creating a simple for loop in R

I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows).
Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble.
My current solution looks like this:
rel.Volume_unmod = tibble(
"Volume_OD" = Volume[[3]] / Volume[[3]],
"Volume_Imp" = Volume[[4]] / Volume[[3]],
"Volume_OD_1" = Volume[[5]] / Volume[[3]],
"Volume_WS_1" = Volume[[6]] / Volume[[3]],
"Volume_OD_2" = Volume[[7]] / Volume[[3]],
"Volume_WS_2" = Volume[[8]] / Volume[[3]],
"Volume_OD_3" = Volume[[9]] / Volume[[3]],
"Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod
I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:
rel.Volume = NULL
for(i in Volume[,3:10]){
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}
Mockup Data
Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)
Solution with Dplyr
library(dplyr)
# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols
# divide by Volumn_OD
rel.Volume_unmod <- Volume %>%
mutate(across(all_of(cols), ~ . / Volume_OD))
# result
rel.Volume_unmod
Explanation
I don't know the names of your columns. Probably, the names correspond to the names of the columns you intended to create in rel.Volume_unmod. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename if you wan to.
There are many ways to select the columns you want to mutate. mutate is a verb from dplyr that allows you to create new columns or perform operations or functions on columns.
across is an adverb from dplyr. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD.
~ is a tidyverse way to create anonymous functions. ~ . / Volum_OD is equivalent to function(x) x / Volumn_OD
all_of is necessary because in this specific case I'm providing across with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.
More info
Check out this book to learn more about data manipulation with tidyverse (which dplyr is part of).
Solution with Base-R
rel.Volume_unmod <- Volume
# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols
# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod
Explanation
lapply is a base R function that allows you to apply a function to every item of a list or a "listable" object.
in this case rel.Volume_unmod is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply takes one column [= one item] a time and applies a function.
the function is /. You usually see / used like this: A / B, but actually / is a Primitive function. You could write the same thing in this way:
`/`(A, B) # same as A / B
lapply can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /). Therefore, we are writing rel.Volume_unmod[3] as additional parameter.
lapply always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...), you are not simply assigning a list to rel.Volume_unmod[3:10]. You are technically using this assigning function: [<-. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<- allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<- you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.
Whithout a minimal working example it's hard to guess what the Variable Volume actually refers to. Apart from that there seems to be a problem with your for-loop:
for(i in Volume[,3:10]){
Assuming Volume refers to a data.frame or tibble, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i successively. You can verify this by putting print(i) inside the loop. But inside the loop it seems like you actually want to use i as a variable containing just the index of the current column as a number (not the column itself):
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
Also, two brackets are usually used with lists, not data.frames or tibbles. (You can, however, do so, because data.frames are special cases of lists.)
Last but not least, initialising the variable rel.Volume with NULL will result in an error, when trying to reassign to that variable, since you haven't told R, what rel.Volume should be.
Try this, if you like (thanks #Edo for example data):
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE),
Vol1 = rnorm(30),
Vol2 = rnorm(30),
Vol3 = rnorm(30))
rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.
for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
rel.Volume[i] = Volume[i]/Volume[3]
}
A more R-like approach would be to avoid using a for-loop altogether, since R's strength is implicit vectorization. These expressions will produce the same result without a loop:
# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))
rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))
Since you said you were new to R, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).

Use Master dataframe to aggregate regression loop using rbind

All, I'm very new to R, and can't find anything in the existing questions database that fits my exact issue. I'm running a loop of several regressions (200), and am trying to bind the results/coefficients into a single dataframe that I can export to Excel, with one set of headers. All variables in each regression are identical. The regression part of my loop looks like,
getreg<-OutChg~USInput
stepreg<-lm(getreg,data=mydata)
I'm trying use a "master" dataframe to bind everything together, such as,
master<-data.frame()
master<-rbind(master,stepreg$coefficients)
But I get the response Error in stepreg$coefficients : $ operator is invalid for atomic vectors. Ideally, I'd like to have something where I don't even have to define master<-data.frame().
Any advice is much appreciated. Thank you!
Try using getreg <- as.formula(OutChg ~ USInput) or just put that in the for the lm() function.
If you use str(stepreg) you will probably find that it is not a list but some other data type (in this case an atomic vector).
In order to use rbind() the variable "master" has to exist (as something)
Using data.table
datandfit <- function(x) {
USInput <- rnorm(100, 0, 5)
OutChg <- USInput*5 + 10 + rnorm(100, 0, 5)
mydata <- data.table(USInput, OutChg)
stepreg <- lm(OutChg ~ USInput, data = mydata)
data.table(t(stepreg$coefficients))
}
This will generate some random data, fit a model to it, and return a data.table of the results. You would skip the first three lines of the function, since you would already have data. Then, you can lapply over the function, which will return a list of 200 data.tables, and use rbindlist to combine them all into one data.table.
rbindlist(lapply(1:200, datandfit))
(Intercept) USInput
1: 9.979968 4.909842
2: 10.086159 5.083225
3: 10.285307 4.873432
4: 10.457751 4.905266
5: 9.108176 5.005555
---
196: 10.715356 4.846002
197: 9.938905 4.966180
198: 9.968473 5.073163
199: 10.098703 5.065169
200: 9.538539 4.946085
All, I finally figured this out! As a new user and non-programmer, figuring out how the different R objects work together is cumbersome, but using master<-list() before doing any of the "binding" got it to work...took me a minute or two to realize I can't have the loop designate the master as a list every time or it erases previous aggregation, too...thanks all for your help!

Use a string in R to refer to part of an object [duplicate]

I am at the point with R where I would like to start writing my own functions because I tend to need to do the same things over and over. However, I am struggling to see how I can generalize what I write. Looking at source code has not helped me learn very well because often it seems that .Internal or .Primitive functions (or other commands I do not know) are used extensively. I would like to simply start by turning my normal copy-pasted solutions into functions - fancier things can come later!
As an example: I do a lot of data formatting that requires doing some operation, and then filling in a data frame with zeros for all other combinations that did not have any data (e.g., years that did not have observations and were therefore not originally recorded, etc). I need to do this over and over for different data sets that have different sets of variables, but the idea and implementation is always the same.
My non-function way of solving this has been (for a specific implementation and minimal example):
df <- data.frame(County = c(1, 45, 57),
Year = c(2002, 2003, 2003),
Level = c("Mean", "Mean", "Mean"),
Obs = c(1.4, 1.9, 10.2))
#Create expanded version of data frame
Counties <- seq(from = 1, to = 77, by = 2)
Years <- seq(from = 1999, to = 2014, by = 1)
Levels <- c("Max", "Mean")
Expansion <- expand.grid(Counties, Years, Levels)
Expansion[4] <- 0
colnames(Expansion) <- colnames(df)
#Merge and order them so that the observed value is on top
df_full <- merge(Expansion, df, all = TRUE)
df_full$duplicate <- with(df_full,
paste(Year, County, Level))
df_full <- df_full[order(df_full$Year,
df_full$County,
df_full$Level,
-abs(df_full$Obs)), ]
#Deduplicate by taking the first that shows up (the observation)
df_full <- df_full[ !duplicated(df_full$duplicate), ]
df_full$duplicate <- NULL
I would like to generalize this so that I could somehow put in a data frame (and probably select the columns I need to order by since that sometimes changes) and then get the expanded version out. My first implementation consisted of a function with too many arguments (the data-frame and then all the column names I wanted to order/expand.grid by) and it also did not work:
gridExpand <- function(df, col1, col2=NULL, col3=NULL, measure){
#Started with "Expansion" being a global outside of the function
#It is identical the first part of the above code
ex <- merge(Expansion, df, all = TRUE)
ex$dupe <- with(ex,
paste(col1, col2, col3))
ex <- ex[order(with(ex,
col1, col2, col3, -abs(measure)))]
ex <- ex[ !duplicated(ex$dupe)]
ex <- subset(ex, select = -(dupe))
}
df_full <- gridExpand(df, Year, County, Level, Obs)
Error in paste(col1, col2, col3) : object 'Year' not found
I am assuming that this did not work because R has no way to know where 'Year' came from. I could potentially try paste(df, "$Year") but it would create "df$Year" which obviously will not work. And I do not ever see anyone else do this in their functions so clearly I am missing how it is that people reference things in data frame relevant functions.
I would ideally like to know of some resources that could help with thinking about generalization, or if someone can point me in the right direction to solving this particular problem I think it might help me see what I am doing wrong. I do not know of a better way to ask for help - I have been trying to read tutorials on writing functions for about 3 months and it is not clicking.
At a glance, the biggest thing that you can do is to not use non-standard-evaluation shortcuts inside your functions: things like $, subset() and with(). These are functions intended for convenient interactive use, not extensible programmatic use. (See, e.g., the Warning in ?subset which should probably be added to ?with, fortunes::fortune(312), fortunes::fortune(343).)
fortunes::fortune(312)
The problem here is that the $ notation is a magical shortcut and like
any other magic if used incorrectly is likely to do the programmatic
equivalent of turning yourself into a toad. -- Greg Snow (in
response to a user that wanted to access a column whose name is stored
in y via x$y rather than x[[y]])
R-help (February 2012)
fortunes::fortune(343)
Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R
newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable
consequences. It's best to acquire the [[ and [ habit early.
-- Peter Ehlers (about the use of $-extraction)
R-help (March 2013)
When you start writing functions that work on data frames, if you need to reference column names you should pass them in as strings, and then use [ or [[ to get the column based on the string stored in a variable name. This is the simplest way to make functions flexible with user-specified column names. For example, here's a simple stupid function that tests if a data frame has a column of the given name:
does_col_exist_1 = function(df, col) {
return(!is.null(df$col))
}
does_col_exist_2 = function(df, col) {
return(!is.null(df[[col]])
# equivalent to df[, col]
}
These yield:
does_col_exist_1(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_1(mtcars, col = "mpg")
# [1] FALSE
does_col_exist_2(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_2(mtcars, col = "mpg")
# [1] TRUE
The first function is wrong because $ doesn't evaluate what comes after it, no matter what value I set col to when I call the function, df$col will look for a column literally named "col". The brackets, however, will evaluate col and see "oh hey, col is set to "mpg", let's look for a column of that name."
If you want lots more understanding of this issue, I'd recommend the Non-Standard Evaluation Section of Hadley Wickham's Advanced R book.
I'm not going to re-write and debug your functions, but if I wanted to my first step would be to remove all $, with(), and subset(), replacing with [. There's a pretty good chance that's all you need to do.

Learning to write functions in R

I am at the point with R where I would like to start writing my own functions because I tend to need to do the same things over and over. However, I am struggling to see how I can generalize what I write. Looking at source code has not helped me learn very well because often it seems that .Internal or .Primitive functions (or other commands I do not know) are used extensively. I would like to simply start by turning my normal copy-pasted solutions into functions - fancier things can come later!
As an example: I do a lot of data formatting that requires doing some operation, and then filling in a data frame with zeros for all other combinations that did not have any data (e.g., years that did not have observations and were therefore not originally recorded, etc). I need to do this over and over for different data sets that have different sets of variables, but the idea and implementation is always the same.
My non-function way of solving this has been (for a specific implementation and minimal example):
df <- data.frame(County = c(1, 45, 57),
Year = c(2002, 2003, 2003),
Level = c("Mean", "Mean", "Mean"),
Obs = c(1.4, 1.9, 10.2))
#Create expanded version of data frame
Counties <- seq(from = 1, to = 77, by = 2)
Years <- seq(from = 1999, to = 2014, by = 1)
Levels <- c("Max", "Mean")
Expansion <- expand.grid(Counties, Years, Levels)
Expansion[4] <- 0
colnames(Expansion) <- colnames(df)
#Merge and order them so that the observed value is on top
df_full <- merge(Expansion, df, all = TRUE)
df_full$duplicate <- with(df_full,
paste(Year, County, Level))
df_full <- df_full[order(df_full$Year,
df_full$County,
df_full$Level,
-abs(df_full$Obs)), ]
#Deduplicate by taking the first that shows up (the observation)
df_full <- df_full[ !duplicated(df_full$duplicate), ]
df_full$duplicate <- NULL
I would like to generalize this so that I could somehow put in a data frame (and probably select the columns I need to order by since that sometimes changes) and then get the expanded version out. My first implementation consisted of a function with too many arguments (the data-frame and then all the column names I wanted to order/expand.grid by) and it also did not work:
gridExpand <- function(df, col1, col2=NULL, col3=NULL, measure){
#Started with "Expansion" being a global outside of the function
#It is identical the first part of the above code
ex <- merge(Expansion, df, all = TRUE)
ex$dupe <- with(ex,
paste(col1, col2, col3))
ex <- ex[order(with(ex,
col1, col2, col3, -abs(measure)))]
ex <- ex[ !duplicated(ex$dupe)]
ex <- subset(ex, select = -(dupe))
}
df_full <- gridExpand(df, Year, County, Level, Obs)
Error in paste(col1, col2, col3) : object 'Year' not found
I am assuming that this did not work because R has no way to know where 'Year' came from. I could potentially try paste(df, "$Year") but it would create "df$Year" which obviously will not work. And I do not ever see anyone else do this in their functions so clearly I am missing how it is that people reference things in data frame relevant functions.
I would ideally like to know of some resources that could help with thinking about generalization, or if someone can point me in the right direction to solving this particular problem I think it might help me see what I am doing wrong. I do not know of a better way to ask for help - I have been trying to read tutorials on writing functions for about 3 months and it is not clicking.
At a glance, the biggest thing that you can do is to not use non-standard-evaluation shortcuts inside your functions: things like $, subset() and with(). These are functions intended for convenient interactive use, not extensible programmatic use. (See, e.g., the Warning in ?subset which should probably be added to ?with, fortunes::fortune(312), fortunes::fortune(343).)
fortunes::fortune(312)
The problem here is that the $ notation is a magical shortcut and like
any other magic if used incorrectly is likely to do the programmatic
equivalent of turning yourself into a toad. -- Greg Snow (in
response to a user that wanted to access a column whose name is stored
in y via x$y rather than x[[y]])
R-help (February 2012)
fortunes::fortune(343)
Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R
newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable
consequences. It's best to acquire the [[ and [ habit early.
-- Peter Ehlers (about the use of $-extraction)
R-help (March 2013)
When you start writing functions that work on data frames, if you need to reference column names you should pass them in as strings, and then use [ or [[ to get the column based on the string stored in a variable name. This is the simplest way to make functions flexible with user-specified column names. For example, here's a simple stupid function that tests if a data frame has a column of the given name:
does_col_exist_1 = function(df, col) {
return(!is.null(df$col))
}
does_col_exist_2 = function(df, col) {
return(!is.null(df[[col]])
# equivalent to df[, col]
}
These yield:
does_col_exist_1(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_1(mtcars, col = "mpg")
# [1] FALSE
does_col_exist_2(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_2(mtcars, col = "mpg")
# [1] TRUE
The first function is wrong because $ doesn't evaluate what comes after it, no matter what value I set col to when I call the function, df$col will look for a column literally named "col". The brackets, however, will evaluate col and see "oh hey, col is set to "mpg", let's look for a column of that name."
If you want lots more understanding of this issue, I'd recommend the Non-Standard Evaluation Section of Hadley Wickham's Advanced R book.
I'm not going to re-write and debug your functions, but if I wanted to my first step would be to remove all $, with(), and subset(), replacing with [. There's a pretty good chance that's all you need to do.

Loop and clear the basic function in R

I've got this dataset
install.packages("combinat")
install.packages("quantmod")
library(quantmod)
library(combinat)
library(utils)
getSymbols("AAPL",from="2012-01-01")
data<-AAPL
p1<-4
dO<-data[,1]
dC<-data[,4]
emaO<-EMA(dO,n=p1)
emaC<-EMA(dC,n=p1)
Pos_emaO_dO_UP<-emaO>dO
Pos_emaO_dO_D<-emaO<dO
Pos_emaC_dC_UP<-emaC>dC
Pos_emaC_dC_D<-emaC<dC
Pos_emaC_dO_D<-emaC<dO
Pos_emaC_dO_UP<-emaC>dO
Pos_emaO_dC_UP<-emaO>dC
Pos_emaO_dC_D<-emaO<dC
Profit_L_1<-((lag(dC,-1)-lag(dO,-1))/(lag(dO,-1)))*100
Profit_L_2<-(((lag(dC,-2)-lag(dO,-1))/(lag(dO,-1)))*100)/2
Profit_L_3<-(((lag(dC,-3)-lag(dO,-1))/(lag(dO,-1)))*100)/3
Profit_L_4<-(((lag(dC,-4)-lag(dO,-1))/(lag(dO,-1)))*100)/4
Profit_L_5<-(((lag(dC,-5)-lag(dO,-1))/(lag(dO,-1)))*100)/5
Profit_L_6<-(((lag(dC,-6)-lag(dO,-1))/(lag(dO,-1)))*100)/6
Profit_L_7<-(((lag(dC,-7)-lag(dO,-1))/(lag(dO,-1)))*100)/7
Profit_L_8<-(((lag(dC,-8)-lag(dO,-1))/(lag(dO,-1)))*100)/8
Profit_L_9<-(((lag(dC,-9)-lag(dO,-1))/(lag(dO,-1)))*100)/9
Profit_L_10<-(((lag(dC,-10)-lag(dO,-1))/(lag(dO,-1)))*100)/10
which are given to this frame
frame<-data.frame(Pos_emaO_dO_UP,Pos_emaO_dO_D,Pos_emaC_dC_UP,Pos_emaC_dC_D,Pos_emaC_dO_D,Pos_emaC_dO_UP,Pos_emaO_dC_UP,Pos_emaO_dC_D,Profit_L_1,Profit_L_2,Profit_L_3,Profit_L_4,Profit_L_5,Profit_L_6,Profit_L_7,Profit_L_8,Profit_L_9,Profit_L_10)
colnames(frame)<-c("Pos_emaO_dO_UP","Pos_emaO_dO_D","Pos_emaC_dC_UP","Pos_emaC_dC_D","Pos_emaC_dO_D","Pos_emaC_dO_UP","Pos_emaO_dC_UP","Pos_emaO_dC_D","Profit_L_1","Profit_L_2","Profit_L_3","Profit_L_4","Profit_L_5","Profit_L_6","Profit_L_7","Profit_L_8","Profit_L_9","Profit_L_10")
There is vector with variables for later usage
vector<-c("Pos_emaO_dO_UP","Pos_emaO_dO_D","Pos_emaC_dC_UP","Pos_emaC_dC_D","Pos_emaC_dO_D","Pos_emaC_dO_UP","Pos_emaO_dC_UP","Pos_emaO_dC_D")
I made all possible combination with 4 variables of the vector (there are no depended variables)
comb<-as.data.frame(combn(vector,4))
comb
and get out the ,,nonsense" combination (where are both possible values of variable)
rc<-comb[!sapply(comb, function(x) any(duplicated(sub('_D|_UP', '', x))))]
rc
Then I prepare the first combination to later subseting
var<-paste(rc[,1],collapse=" & ")
var
and subset the frame (with all DVs)
kr<-eval(parse(text=paste0('subset(frame,' , var,')' )))
kr
Now I have the subseted df by the first combination of 4 variables.
Then I used the evaluation function on it
evaluation<-function(x){
s_1<-nrow(x[x$Profit_L_1>0,])/nrow(x)
s_2<-nrow(x[x$Profit_L_2>0,])/nrow(x)
s_3<-nrow(x[x$Profit_L_3>0,])/nrow(x)
s_4<-nrow(x[x$Profit_L_4>0,])/nrow(x)
s_5<-nrow(x[x$Profit_L_5>0,])/nrow(x)
s_6<-nrow(x[x$Profit_L_6>0,])/nrow(x)
s_7<-nrow(x[x$Profit_L_7>0,])/nrow(x)
s_8<-nrow(x[x$Profit_L_8>0,])/nrow(x)
s_9<-nrow(x[x$Profit_L_9>0,])/nrow(x)
s_10<-nrow(x[x$Profit_L_10>0,])/nrow(x)
n_1<-nrow(x[x$Profit_L_1>0,])/nrow(frame)
n_2<-nrow(x[x$Profit_L_2>0,])/nrow(frame)
n_3<-nrow(x[x$Profit_L_3>0,])/nrow(frame)
n_4<-nrow(x[x$Profit_L_4>0,])/nrow(frame)
n_5<-nrow(x[x$Profit_L_5>0,])/nrow(frame)
n_6<-nrow(x[x$Profit_L_6>0,])/nrow(frame)
n_7<-nrow(x[x$Profit_L_7>0,])/nrow(frame)
n_8<-nrow(x[x$Profit_L_8>0,])/nrow(frame)
n_9<-nrow(x[x$Profit_L_9>0,])/nrow(frame)
n_10<-nrow(x[x$Profit_L_10>0,])/nrow(frame)
pr_1<-sum(kr[,"Profit_L_1"])/nrow(kr[,kr=="Profit_L_1"])
pr_2<-sum(kr[,"Profit_L_2"])/nrow(kr[,kr=="Profit_L_2"])
pr_3<-sum(kr[,"Profit_L_3"])/nrow(kr[,kr=="Profit_L_3"])
pr_4<-sum(kr[,"Profit_L_4"])/nrow(kr[,kr=="Profit_L_4"])
pr_5<-sum(kr[,"Profit_L_5"])/nrow(kr[,kr=="Profit_L_5"])
pr_6<-sum(kr[,"Profit_L_6"])/nrow(kr[,kr=="Profit_L_6"])
pr_7<-sum(kr[,"Profit_L_7"])/nrow(kr[,kr=="Profit_L_7"])
pr_8<-sum(kr[,"Profit_L_8"])/nrow(kr[,kr=="Profit_L_8"])
pr_9<-sum(kr[,"Profit_L_9"])/nrow(kr[,kr=="Profit_L_9"])
pr_10<-sum(kr[,"Profit_L_10"])/nrow(kr[,kr=="Profit_L_10"])
mat<-matrix(c(s_1,n_1,pr_1,s_2,n_2,pr_2,s_3,n_3,pr_3,s_4,n_4,pr_4,s_5,n_5,pr_5,s_6,n_6,pr_6,s_7,n_7,pr_7,s_8,n_8,pr_8,s_9,n_9,pr_9,s_10,n_10,pr_10),ncol=3,nrow=10,dimnames=list(c(1:10),c("s","n","pr")))
df<-as.data.frame(mat)
return(df)
}
result<-evaluation(kr)
result
And I need to help in several cases.
1, in evaluation function the way the matrix is made is wrong (s_1,n_1,pr_1 are starting in first column but I need to start the order by rows)
2, I need to use some loop/lapply function to go trough all possible combinations (not only the first one like in this case (var<-paste(rc[,1],collapse=" & ")) and have the understandable output where is evaluation function used on every combination and I will be able to see for which combination of variables is the evaluation done (understand I need to recognize for what is this evaluation made) and compare evaluation results for each combination.
3, This is not main point, BUT I generally want to evaluate all possible combinations (it means for 2:n number of variables and also all combinations in each of them) and then get the best possible combination according to specific DV (Profit_L_1 or Profit_L_2 and so on). And I am so weak in looping now, so, if it this possible, keep in mind what am I going to do with it later.
Thanks, feel free to update, repair or improve the question (if there is something which could be done way more easily, effectively - do it - I am open for every senseful advice.

Resources