How to find global maximum in R optimization with bounds - r

I have five variables. Each variable has some bounds. And I am investing some amount of money on each channel. Now my question is are there any optimizer or logic to find out global maximum value for the given functional form. And sum of combinations should not exceed my total spend.
parameters=c(10,120,105,121,180,140) #intercept and variable coefficients
spend=c(16,120,180,170,180) # total spend
total=sum(spend)
upper_bound=c(50,200,250,220,250)
lower_bound=c(10,70,100,90,70)
var1=seq(lower_bound[1],upper_bound[1],by=1)
var2=seq(lower_bound[2],upper_bound[2],by=1)
var3=seq(lower_bound[3],upper_bound[3],by=1)
var4=seq(lower_bound[4],upper_bound[4],by=1)
var5=seq(lower_bound[5],upper_bound[5],by=1)
functional form is: exp(BETA 0-BETA i/X i)
I have used expand.grid function to find existing combinations. But I am getting too many combinations.
Here is my code.
seq_data=expand.grid(var1=var1,var2=var2,var3=var3,var4=var4,var5=var5)
rs=rowSums(seq_data)
seq_data=seq_data[rs<=total,]
seq_data1=seq_data
for(i in 1:length(seq_data))
seq_data1[,i]=exp(parameters[1]-parameters[i+1]/seq_data1[,i])
How can I overcome this problem. Please suggest me if there are any other alternative.
Thanks in advance.

Related

Is there a way to handle calculations invovling exponential of big values in R?

I have looked a bit online and in the site but I did not find any solution. My problem is relatively simple so if you could point me to a possible solution, much appreciated.
test_vec <- c(2,8,709,600)
mean(exp(test_vec))
test_vec_bis <- c(2,8,710,600)
mean(exp(test_vec_bis))
exp(709)
exp(710)
# The numerical limit of R is at exp(709)
How can I calculate the mean of my vector and deal with the Inf values knowing that R could probably handle the mean value but not all values in the numerator of the mean calculation ?
There is an edge case where you can solve your problem by simply restating your problem mathematically, but that would require that the length of your vector is extremely large and/or that your large exp. numbers are close to the numeric limit:
Since the mean sum(x)/n can be written as sum(x/n) and since exp(x)/exp(y) = exp(x-y), you can calculate sum(exp(x-log(n))), which gives you a relief of log(n).
mean(exp(test_vec))
[1] 2.054602e+307
sum(exp(test_vec - log(length(test_vec))))
[1] 2.054602e+307
sum(exp(test_vec_bis - log(length(test_vec_bis))))
[1] 5.584987e+307
While this works for your example, most likely this won't work for your real vector.
In this case, you will have to consult packages like Rmpfr as suggested by #fra.
Here's one way where you qualify to only select those in your test_vec that give an answer < Inf:
mean(exp(test_vec)[which(exp(test_vec) < Inf)])
[1] 1.257673e+260
t2 <- c(2,8,600)
mean(exp(t2))
[1] 1.257673e+260
This assumes you were looking to exclude values that result in Inf, of course.

Forcing discrete time series to be monotonous decreasing

I've an evaluations series. Each evaluation could have discrete values ranging from 0 to 4. Series should decrease in time. However, since values are inserted manually, errors could happen.
Therefore, I would like to modify my series to be monotonous decreasing. Moreover, I would minimize the number of evaluations modified. Finally, if two or more series would satisfy these criteria, would choose the one with the higher overall values sum.
E.g.
Recorded evaluation
4332422111
Ideal evaluation
4332222111
Recorded evaluation
4332322111
Ideal evaluation
4333322111
(in this case, 4332222111 would have satisfied criteria too, but I chose with the higher values)
I tried with brutal force approach by generating all possible combinations, selecting those monotonous decreasing and finally comparing each one of these with that recorded.
However, series could be even 20-evaluations long and combinations would too many.
x1 <- c(4,3,3,2,4,2,2,1,1,1)
x2 <- c(4,3,3,2,3,2,2,1,1,1)
You could almost certainly break this algorithm, but here's a first try: replace locations with increased values by NA, then fill them in with the previous location.
dfun <- function(x) {
r <- replace(x,which(c(0,diff(x))>0),NA)
zoo::na.locf(r)
}
dfun(x1)
dfun(x2)
This gives the "less-ideal" answer in the second case.
For the record, I also tried
dfun2 <- function(x) {
s <- as.stepfun(isoreg(-x))
-s(seq_along(x))
}
but this doesn't handle the first example as desired.
You could also try to do this with discrete programming (about which I know almost nothing), or a slightly more sophisticated form of brute force -- use a stochastic algorithm that strongly penalizes non-monotonicity and weakly penalizes the distance from the initial sequence ... (e.g. optim(..., method="SANN") with a candidate function that adds or subtracts 1 from an element at random)

How to filter genes in matrix based on quantile cutoff?

This is a matrix with some example data:
S1 S2 S3
ARHGEF10L 11.1818 11.0186 11.243
HIF3A 5.2482 5.3847 4.0013
RNF17 4.1956 0 0
RNF10 11.504 11.669. 12.0791
RNF11 9.5995 11.398 9.8248
RNF13 9.6257 10.8249 10.5608
GTF2IP1 11.8053 11.5487 12.1228
REM1 5.6835 3.5408 3.5582
MTVR2 0 1.4714 0
RTN4RL2 8.7486 7.9144 7.9795
C16orf13 11.8009 9.7438 8.9612
C16orf11 0 0 0
FGFR1OP2 7.679 8.7514 8.2857
TSKS 2.3036 2.8491 0.4699
I have a matrix "h" with 10,000 genes as rownames and 100 samples as columns. I need to select top 20% highly variable genes for clustering. But I'm not sure about what I gave is right or not.
So, for this filtering I have used genefilter R package.
varFilter(h, var.func=IQR, var.cutoff=0.8, filterByQuantile=TRUE)
Do you think the command which I gave is right to get top 20% highly variable genes? And can anyone please tell me how this method works in a statistical way?
I haven't used this package myself, but the helpfile of the function you're using makes the following remark:
IQR is a reasonable variance-filter choice when the dataset is split
into two roughly equal and relatively homogeneous phenotype groups. If
your dataset has important groups smaller than 25% of the overall
sample size, or if you are interested in unusual individual-level
patterns, then IQR may not be sensitive enough for your needs. In such
cases, you should consider using less robust and more sensitive
measures of variance (the simplest of which would be sd).
Since your data has a bunch of small groups, it might be wise to follow this advice to change your var.func to var.func = sd.
sd computes the standard deviation, which should be easy to understand.
However, this function expects its data in the form of an expressionSet object. The error message you got (Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'exprs' for signature '"matrix"') implies that you don't have that, but just a plain matrix instead.
I don't know how to create an expressionSet, but I think that doing that is overly complicated anyways. So I would suggest going with the code that you posted in the comments:
vars <- apply(h, 1, sd)
h[vars > quantile(vars, 0.8), ]

How to run a function to EACH of my observations in R?

My problem is as follows:
I have a dataset of 6000 observation containing information from costumers (each observation is one client's information).
I'm optimizing a given function (in my case is a profit function) in order to find an optimal for my variable of interest. Particularly I'm looking for the optimal interest rate I should offer in order to maximize my expected profits.
I don't have any doubt about my function. The problem is that I don't know how should I proceed in order to apply this function to EACH OBSERVATION in order to obtain an OPTIMAL INTEREST RATE for EACH OF MY 6000 CLIENTS (or observations, as you prefer).
Until now, it has been easy to find the UNIQUE optimal (same for all clients) for this variable that would maximize my profits (This is, the global maximum I guess). But what I need to know is how I should proceed in order to apply my optimization problem to EACH of my 6000 observations, INDIVIDUALLY, in order to have the optimal interest rate to offer to each costumer (this is, 6000 optimal interest rates, one for each of them).
I guess I should do something similar to a for loop, but my experience in this area is limited, and I'm quite frustrated already. What's more, I've tried to use mapply(myfunction, mydata) as usual, but I only get error messages.
This is how my (really) simple code now looks like:
profits<- function(Rate)
sum((Amount*(Rate-1.2)/100)*
(1/(1+exp(0.600002438-0.140799335888812*
((Previous.Rate - Rate)+(Competition.Rate - Rate))))))
And results for ONE optimal for the entire sample:
> optimise(profits, lower = 0, upper = 100, maximum = TRUE)
$maximum
[1] 6.644821
$objective
[1] 1347291
So the thing is, how do I rewrite my code in order to maximize this and obtain the optimal of my variable of interest for EACH of my rows?
Hope I've been clear! Thank you all in advance!
It appears each of your customers are independent. So you just put lapply() around the optimize() call:
lapply(customer_list, function(one_customer){
optimise(profits, lower = 0, upper = 100, maximum = TRUE)
})
This will return a very big list, where each list element has a $maximum and a $objective. You can then run lapply to total the $maximums, to find just how rich you have become!

Time Series Clustering in R

I have two time series- a baseline (x) and one with an event (y). I'd like to cluster based on dissimilarity of these two time series. Specifically, I'm hoping to create new features to predict the event. I'm much more familiar with clustering, but fairly new to time series.
I've tried a few different things with a limited understanding...
Simulating data...
x<-rnorm(100000,mean=1,sd=10)
y<-rnorm(100000,mean=1,sd=10)
This package seems awesome but there is limited information available on SO or Google.
library(TSclust)
d<-diss.ACF(x, y)
the value of d is
[,1]
[1,] 0.07173596
I then move on to clustering...
hc <- hclust(d)
but I get the following error:
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
My assumption is this error is because I only have one value in d.
Alternatively, I've tried the following on a single timeseries (the event).
library(dtw)
distMatrix <- dist(y, method="DTW")
hc <- hclust(y, method="complete")
but it takes FOREVER to run the distance Matrix.
I have a couple of guesses at what is going wrong, but could use some guidance.
My questions...
Do I need a set of baseline and a set of event time series? Or is one pairing ok to start?
My time series are quite large (100000 rows). I'm guessing this is causing the SLOW distMatrix calculation. Thoughts on this?
Any resources on applied clustering on large time series are welcome. I've done a pretty thorough search, but I'm sure there are things I haven't found.
Is this the code you would use to accomplish these goals?
Thanks!

Resources