I am trying to solve the following a constrained maximization problem.
The example here is simply me trying to recreate a simple example.
I have a dataframe as follows:
Obs=c(1,2,3,4,5)
Var1=c(11,15,16,19,20)
Var2=c(1.5,22,0.9,1.7,.1)
Var3=c(2.6,2.5,3.5,3.6,2.1)
Value_One = c(10,12.5,8.4,7.5,2.6)
Cost = c(1.1,1.2,1.3,1.6,1.7)
Value_overall = c(10,21,31,4,29)
df=data.frame(Obs,Var1,Var2,Var3,Value_One,Cost,Value_overall)
var_sel=c('Var1','Var2')
coeff_sel=c(2.5,4.5)
gamma=.7
I have to run a constrained optimization problem an example of which is as follows (Note the exact values, does not matter. Please feel free to change them as you please):
Value_func = function(x){
Value_var=x$Cost
# - since the contrained optimum function is for minima.
-((x$Value_overall+gamma*(x$Value_null-
(as.matrix(x[var_sel])%*%(as.matrix(coeff_sel)))))-2*x[Cost])
}
#Please feel free to change the values below.
#I just want to know where I am going wrong. The exact values do not matter here.
for (i2 in 1:nrow(df)){
x=df[i2,]
zzz=constrOptim(-1.2, Value_func, NULL,ui=1,ci=-1.3)
}
What I want to do is to run the above for each row of the dataframe. When I run the above example, I get the following error:
Error: $ operator is invalid for atomic vectors
Called from: f(theta, ...)
I tried to look for a solution and this is what I got but it does not seem to be applicable in my case (R $ operator is invalid for atomic vectors in constraOptim).
Please help. Thanks in advance.
This makes a result without error. Changes to your code include:
Added Value_null to the data.frame
Changed the function argument to costs and modify the function after the matrix stuff.
Saved the results to zzz as a list instead of static.
If you design this as a matrix in the first place, you could utilize apply.
df <- data.frame(Obs=c(1,2,3,4,5)
,Var1=c(11,15,16,19,20)
,Var2=c(1.5,22,0.9,1.7,.1)
,Var3=c(2.6,2.5,3.5,3.6,2.1)
,Value_One = c(10,12.5,8.4,7.5,2.6)
,Cost = c(1.1,1.2,1.3,1.6,1.7)
,Value_overall = c(10,21,31,4,29)
#added to match
,Value_null = 5
)
var_sel=c('Var1','Var2')
coeff_sel=c(2.5,4.5)
gamma=.7
Value_func = function(costs){
# - since the contrained optimum function is for minima.
-((x$Value_overall+gamma*(x$Value_null-
(as.matrix(x[var_sel])%*%(as.matrix(coeff_sel)))))-2*costs)
}
for (i2 in 1:nrow(df)){
x=df[i2,]
zzz[[i2]]=constrOptim(1, Value_func, NULL,ui=1,ci=-1.3, x$Cost)
}
Or the apply approach. I don't like that I'm assigning x <<- z but it gives results.
Value_func = function(costs){
# - since the contrained optimum function is for minima.
-((x['Value_overall']+gamma*(x['Value_null']-
(x[var_sel]%*%(coeff_sel))))-2*costs)
}
apply(df, 1, function(z) {
x<<- z
constrOptim(1, Value_func, NULL, ui = 1, ci = -1.3, z['Cost'])
}
)
Related
I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.
I have been trying to analyse the relationship between the cost parameter,c , and the gamma parameter using the ksvm package (kernlab). The program I have written is as follows:
function (data)
{
library(kernlab)
p<-ncol(data)
y<-data[,p]
x<-data[,-p]
Rad.gamma<-matrix(seq(exp(-10),exp(1),length=20))
Con.c<-matrix(c(0.1,0.5,1.5),nrow=1)
mat<-expand.grid(Rad.gamma,Con.c)
Output<-data.frame(0,nrow=80,ncol=2)
for(i in 1:80)
{
Gamma<-mat[i,1]
CC<-mat[i,2]
Svm<-ksvm(y~.,data=as.data.frame(x),
kernel="rbfdot",kpar=list(sigma=Gamma),
cross=5, C=CC, type='C-svc',prod.model=FALSE)
Output[i,1]<-error(Svm)
Output[i,2]<-cross(svm)
Output[i,3]<-nSV(svm)/nrow(data)
}
Output<-data.frame(Output)
results<-cbind(mat,Output)
colnames(results)<-c("C","Train","Cross","SVs")
results
}
The error I obtain is:
Error in votematrix[i, ret < 0] <- votematrix[i, ret < 0] + 1 :
NAs are not allowed in subscripted assignments
I have attempted to check stackoverflow for a solution but the best answer I could find is that data.frame needs to come before cbind when there are missing values. I have been testing this function with the iris data set and there are no missing values. I would like to plot the results and analyze the patterns of the output matrix's contents; that should be simple enough. The problem is getting the results table to use for the plotting.
Any help would be greatly appreciated.
The mat made by expand grid has 60 rows, and you are attempting to find indexes of up to row 80. This should work:
data(iris)
library(kernlab)
p<-ncol(iris)
y<-iris[,p]
x<-iris[,-p]
Rad.gamma<-matrix(seq(exp(-10),exp(1),length=20))
Con.c<-matrix(c(0.1,0.5,1.5),nrow=1)
mat<-expand.grid(Rad.gamma,Con.c)
Output<-data.frame(0,nrow=60,ncol=2)
for(i in 1:60){
Gamma<-mat[i,1]
CC<-mat[i,2]
Svm<-ksvm(y~.,data=as.data.frame(x),
kernel="rbfdot",kpar=list(sigma=Gamma),
cross=5, C=CC, type='C-svc',prod.model=FALSE)
Output[i,1]<-error(Svm)
Output[i,2]<-cross(Svm)
Output[i,3]<-nSV(Svm)/nrow(iris)
}
Output<-data.frame(Output)
results<-cbind(mat,Output)
colnames(results)<-c("C","Train","Cross","SVs")
results
additionally, results have 5 columns, perhaps
colnames(results)<-c("gamma", "C","Train", "Cross","SVs")
may I suggest using apply instead of the for loop. In this case one needs not worry where to store the results:
out = apply(mat, 1, function(p){
Gamma<-p[1]
CC<-p[2]
Svm<-ksvm(y~.,data=as.data.frame(x),
kernel="rbfdot",kpar=list(sigma=Gamma),
cross=5, C=CC, type='C-svc',prod.model=FALSE)
out = data.frame(error(Svm), cross(Svm), nSV(Svm)/nrow(iris))
colnames(out) = c("train", "Cross","SVs")
return(out)
})
out = do.call(rbind, out)
out = data.frame(mat, out)
I'm attempting to create sigma/summation function with the variables in my dataset that looks like this:
paste0("(choose(",zipdistrib$Leads[1],",",zipdistrib$Starts[1],")*beta(a+",zipdistrib$Starts[1],",b+",zipdistrib$Leads[1],"-",zipdistrib$Starts[1],")/beta(a,b))")
When I enter that code, I get
[1] "(choose(9,6)*beta(a+6,b+9-6)/beta(a,b))"
I want to create a sigma/summation function where a and b are unknown free-floating variables and the values of Leads[i] and Starts[i] are determined by the values for Leads and Starts for observation i in my dataset. I have tried using a sum function in conjunction with mapply and sapply to no avail. Currently, I am taking the tack of creating the function as a string using a for loop in conjunction with a paste0 command so that the only things that change are the values of the variables Leads and Starts. Then, I try coercing the result into a function. To my surprise, I can actually enter this code without creating a syntax error, but when I try optimize the function for variables a and b, I'm not having success.
Here's my attempt to create the function out of a string.
betafcn <- function (a,b) {
abfcnstring <-
for (i in 1:length(zipdistrib$Zip5))
toString(
paste0(" (choose(",zipdistrib$Leads[i],",",zipdistrib$Starts[i],")*beta(a+",zipdistrib$Starts[i],",b+",zipdistrib$Leads[i],"-",zipdistrib$Starts[i],")/beta(a,b))+")
)
as.function(
as.list(
substr(abfcnstring, 1, nchar(abfcnstring)-1)
)
)
}
Then when I try to optimize the function for a and b, I get the following:
optim(c(a=.03, b=100), betafcn(a,b))
## Error in as.function.default(x, envir) :
argument must have length at least 1
Is there a better way for me to compile a sigma from i=1 to length of dataset with mapply or lapply or some other *apply function? Or am I stuck using a dreaded for loop? And then once I create the function, how do I make sure that I can optimize for a and b?
Update
This is what my dataset would look like:
leads <-c(7,4,2)
sales <-c(3,1,0)
zipcodes <-factor(c("11111", "22222", "33333"))
zipleads <-data.frame(ZipCode=zipcodes, Leads=leads, Sales=sales)
zipleads
## ZipCode Leads Sales
# 1 11111 7 3
# 2 22222 4 1
# 3 33333 2 0
My goal is to create a function that would look something like this:
betafcn <-function (a,b) {
(choose(7,3)*beta(a+3,b+7-3)/beta(a,b))+
(choose(4,1)*beta(a+4,b+4-1)/beta(a,b))+
(choose(2,0)*beta(a+0,b+2-0)/beta(a,b))
}
The difference is that I would ideally like to replace the dataset values with any other possible vectors for Leads and Sales.
Since R vectorizes most of its operations by default, you can write an expression in terms of single values of a and b (which will automatically be recycled to the length of the data) and vectors of x and y (i.e., Leads and Sales); if you compute on the log scale, then you can use sum() (rather than prod()) to combine the results. Thus I think you're looking for something like:
betafcn <- function(a,b,x,y,log=FALSE) {
r <- lchoose(x,y)+lbeta(a+x,b+x-y)-lbeta(a,b)
if (log) r else exp(r)
}
Note that (1) optim() minimizes by default (2) if you're trying to optimize a likelihood you're better off optimizing the log-likelihood instead ...
Since all of the internal functions (+, lchoose, lbeta) are vectorized, you should be able to apply this across the whole data set via:
zipleads <- data.frame(Leads=c(7,4,2),Sales=c(3,1,0))
objfun <- function(p) { ## negative log-likelihood
-sum(betafcn(p[1],p[2],zipleads$Leads,zipleads$Sales,
log=TRUE))
}
objfun(c(1,1))
optim(fn=objfun,par=c(1,1))
I got crazy answers for this example (extremely large values of both shape parameters), but I think that's because it's awfully hard to fit a two-parameter model to three data points!
Since the shape parameters of the beta-binomial (which is what this appears to be) have to be positive, you might run into trouble with unconstrained optimization. You can use method="L-BFGS-B", lower=c(0,0) or optimize the parameters on the log scale ...
I thought your example was hopelessly complex. If you are going to attemp making a function by pasting character values, you first need to understand how to make a function body with an unevaluated expression, and after that basic task is understood, then you can elaborate ... if in fact it is necessary, noting BenBolker's suggestions.
choosefcn <- function (a,b) {}
txtxpr <- paste0("choose(",9,",",6,")" )
body(choosefcn) <- parse(text= txtxpr)
#----------
> betafcn
function (a, b)
choose(9, 6)
val1 <- "a"
val2 <- "b"
txtxpr <- paste0("choose(", val1, ",", val2, ")" )
body(choosefcn) <- parse(text= txtxpr)
#
choosefcn
#function (a, b)
#choose(a, b)
It also possible to configure the formal arguments separately with the formals<- function. See each of these help pages:
?formals
?body
?'function' # needs to be quoted
I have written this function
tippett <- function(p1, p2) {
t <- 1 - (1 - min(p1, p2))^2
return(list(p.value=t))
}
and I would like to apply it in a matrix of generated numbers horizontally. For instance in this matrix: z <- matrix(c(rbeta(5, 1, 5), rbeta(5, 1, 10)), ncol=2).
Unfortunately apply does not seem to work and using it in this fashion s<-apply(z,1,tippett)
generates this error
Error in FUN(newX[, i], ...) : argument "p2" is missing, with no default
Could you please tell me how I could circumvent that? It is my hope to generalize this procedure to thousands of rows and I can't imagine doing it manually.
Thank you.
I think you could change your function...
tippett<-function(p){
t<-1-(1-min(p))^2
return(list(p.value=t))
}
Then your function call should work. (I have not tested this though...)
s<-apply(z,1,tippett)
Alternatively, if you cannot change this function, you could try something like...
s <- sapply(seq_along(nrow(z)), function(i) tippett(z[i,1], z[i,2]))
I think that should all work. Sorry, running some code currently and cannot verify/test it.
Try this in one line:
tippet2 = function(z) as.list(1-(1-do.call(pmin, as.data.frame(z)))^ncol(z))
tippet2(z)
This is vectorized.
This applies to your matrix or a matrix with more columns.
Having a bit of a day for questions today.
Given the following data :=
set.seed(1234)
a = data.table(date=seq(ymd('2001-6-30'),ymd('2003-6-30'),by='weeks'),a=rnorm(105),b=rnorm(105),c=rnorm(105))
b = data.table(date=seq(ymd('2001-6-30'),ymd('2003-6-30'),by='weeks'),a=rnorm(105),b=rnorm(105),c=rnorm(105))
a[,idkey:='port']
b[,idkey:='bm']
setkeyv(a,names(a))
setkeyv(b,names(b))
beta=merge(a,b,all=T)
If I try and calculate the beta of each column in the portfolio using the following code
beta[,lapply(.SD,function(x)cov(x[idkey=='port'],x[idkey=='bm'])/var(x[idkey=='bm'])),.SDcols=2:5]
I get the error
Error: is.numeric(x) || is.logical(x) is not TRUE
Which I suspect is because the idkey column is non.numeric.
The following code works fine
for(i in 2:4){
be = cov(beta[idkey=='port',i,with=F],beta[idkey=='bm',i,with=F])/var(beta[idkey=='bm',i,with=F])
print(be)
}
My question is - how do I calculate the beta using both data.tables without having to go down the clunkier for-loop route?
I don't see a way to avoid loop functions. However, you should use binary search:
setkeyv(beta,c("idkey", "date"))
sapply(list("a","b","c"),
function(x) cov(beta['port', ..x],
beta['bm', ..x])/var(beta['bm', ..x]))
..x means "look up one level"