Finding minima of a function using genetic algorithm - math

I am working on genetic algorithm project. I need code to find out maxima/minima of Rastrigin function or Easom function (For y=0) using basic genetic algorithm.

Ok, we look at Easom function.
Problem statement
Finding minimum at:
f(x) = -cos(x1)cos(x2)exp(-(x1-phi)^2 - (x2-phi)^2)
Representation choose
For example vector of real numbers. Interval of values for each element is <-5; 5>.
Fitness
Main issue of GA. We have for example two individuals:
Individual1: [-1|2.7|-0.68|3.78||-2.14|1.63|-1.75|-3.8]
Individual2: [1|1|1|1||-0.5|-0.5|-0.5|-0.5]
First individual is decoded as 4.8 and -6.06. His fitness function is -9.23073... × 10^-40.
Second individual is decoded as 4 and -2. His fitness is
-4.30104456071396041116767479151655914468678005731098... × 10^-13
And now the issue. Fitness is so low, so we can considerate both as 0. You have two options. Waiting for Godot(maybe at some generation born the divination individual with global minimum value). Or you can use heuristic. Heuristic is based on dividing fitness on two values, major fitness and minor fitness. Major fitness is value of x in function. This value is always 0 so start can't search. Minor fitness is heuristic with purpose give the search a way. You defines some function, for example average of x. So minor fitness for individual1 is -0.63 and individual2 is 1. So individual2 is "better" and he will have higher probability for selection etc.
Minor fitness only give your search a way.
Can be this way wrong? Yes, it's heuristic.
Important, minor function purpose is create preference for individuals with same major function. When major fitness is different, we use major fitness as value for orientation.
Example:
Individual1 fitness: Major: -0.1| Minor: 3
Individual2 fitness: Major: 0| Minor: 8
First one is better because of major fitness.

Related

Conditional Binomial - Does sample space change given previous results?

I am brushing up on some probability using R - and came across the following question:
It is estimated that approximately 20% of marketing calls result in a sale. What is the probability that the last 4 marketing calls made on a day (where 12 calls are made) are the only ones to result in a sale?
My initial thought would be that the probability of the last 4 calls being a sale is independent - ergo its a binomial distribution and can use dbinom(4,12,.2). However after looking at it further - I'm not sure if the sample space changes to just the remaining 4 calls, ergo dbinom(4,4,.2)
Also my understanding of the Binomial PMF
Pz({Z=z})= (n¦z)p^z(1-p)^(n-z)
which i believe that R replicates in the dbinom function, would provide the probability of any 4 successes, not specifically the last 4 items.
Is it a simple case of removing the N Choose Z piece of the PMF function? Is there an equivalent function in R?
Havnt looked at probability in a while so appreciate any assistance!

Code syntax in calculating posterior distribution in WinBUGS

Recently I read "The BUGS Book – A Practical Introduction to Bayesian Analysis" to learn WinBUGS. The way WinBUGS describes the derivation of posterior distribution makes me feel confused.
Let's take Example 4.1.1 in this book to illustrae:
Suppose we observe the number of deaths y in a given hospital for a
high-risk operation. Let n denote the total number of such
operations performed and suppose we wish to make inferences regarding
the underlying true mortality rate, $\theta$.
The code of WinBUGS is:
y <- 10 # the number of deaths
n <- 100 # the total number of such operations
#########################
y ~ dbin(theta,n) # likelihood, also a parametric sampling distribution
logit(theta) <- logit.theta # normal prior for the logistic transform of theta
logit.theta ~ dnorm(0,0.368) # precision = 1/2.71
The author said that:
The software knows how to derive the posterior distribution and
subsequently sample from it.
My question is:
Which code reflects the logic structure to tell WinBUGS about "which parameter that I want to calculate its posterior distribution"?
This question seems silly, but if I do not read the background first, I truly cannot find directly in the code above about which parameter is focused on (e.g., theta, or y?).
Below are some of my thoughts (as a beginner of WinBUGS):
I think the following three attributions of the code style in WinBUGS makes me confused:
(1) the code does not follow "a specific sequence". For example, why is logit.theta ~ dnorm(0,0.368) not in front of logit(theta) <- logit.theta?
(2) repeated variable. Foe example, why did the last two lines not be reduced into one line: logit(theta) ~ dnorm(0,0.368)?
(3) variables are defined in more than one place. For example, y is defined two times: y <- 10 and y ~ dbin(theta, n). This one has been explained in Appendix A of the book (i.e., However, a check has been built in so that when finding a logical node that also features as a stochastic node, a stochastic node is created with the calculated values as fixed data), yet I still cannot catch its meaning.
BUGS is a declarative language. For the most part, statements aren't executed in sequence, they define different parts of the model. BUGS works on models that can be represented by directed acyclic graphs, i.e. those where you put a prior on some components, then conditional distributions on other components given the earlier ones.
It's a fairly simple language, so I think logit(theta) ~ dnorm(0, 0.368) is just too complicated for it.
The language lets you define a complicated probability model, and declare observations of certain components in it. Once you declare an observation, the model that BUGS samples from is the the original full model conditioned on that observation. y <- 10 defines observed data. y ~ dbin(theta,n) is part of the model.
The statement n <- 100 could be either: for fixed constants like n, it doesn't really matter which way you think of it. Either the model says that n is always 100, or n has an undeclared prior distribution not depending on any other parameter, and an observed value of 100. These two statements are equivalent.
Finally, your big question: Nothing in the code above says which parameter you want to look at. BUGS will compute the joint posterior distribution of every parameter. n and y will take on their fixed values, theta and logit.theta will both be simulated from the posterior. In another part of your code (or by using the WinBUGS menus) you can decide which of those to look at.

Topsis - Query regarding negative and positive attributes

In Topsis technique, we calculate negative and positive ideal solutions, so we need to have positive and negative attributes (criterions) measuring the impact but what if I have attributes in the model having only positive impact? Is it possible to calculate Topsis results using only positive attributes?? If yes then how to calculate the relative part. Thanks in advance
Good question. Yes, you can have all positive attributes or even all negatives. So, while assessing alternatives you might encounter two different types of attributes: desirable attributes or undesirable attributes.
As a decision-maker, you want to maximise desirable attributes (beneficial criteria) and minimise undesirable attributes (costing criteria).
TOPSIS was created in 1981 by Hwang and Yoon*1. The central idea behind this algorithm is that the most desirable solution would be the one that it is the most similar to the ideal solution, so a hypothetical alternative with the highest possible desirable attributes and the lowest possible desirable attributes, and the less similar to the so-called 'anti-ideal' solution, so a hypothetical alternative with the lowest possible desirable attributes and the highest possible undesirable attributes.
That similarity is modelled with a geometric distance, known as Euclidean distance.*2
Assuming you already have built the decision matrix. So that you know the alternatives with their respective criterion and values. And you already identified which attributes are desirable and undesirable. (Make sure you normalise and weight the matrix)
The steps of TOPSIS are:
Model the IDEAL Solution.
Model the ANTI-IDEAL Solution.
Calculate Euclidean distance to the Ideal solution for each alternative.
Calculate Euclidean distance to the Anti-Ideal solution for each alternative.
You have to calculate the ratio of relative proximity to the ideal solution.
The formula is the following:
So, distance to anti-ideal solution divided by distance to ideal solution + distance to anti-ideal solution.
Then, you have to sort the alternatives by this ratio and select the one that outranks the others.
Now, let's put this theory into practice... let's say you want to select which is the best investment out of different startups. And you will only consider 4 beneficial criteria: (A) Sales revenue, (B) Active Users, (C) Life-time value, (D) Return rate
## Here we have our decision matrix, in R known as performance matrix...
performanceTable <- matrix(c(5490,51.4,8.5,285,6500,70.6,7,
288,6489,54.3,7.5,290),
nrow=3,
ncol=4,
byrow=TRUE)
# The rows of the matrix contains the alternatives.
row.names(performanceTable) <- c("Wolox","Globant","Bitex")
# The columns contains the attributes:
colnames(performanceTable) <- c("Revenue","Users",
"LTV","Rrate")
# You set the weights depending on the importance for the decision-maker.
weights <- c(0.35,0.25,0.25,0.15)
# And here is WHERE YOU INDICATE THAT YOU WANT TO MAXIMISE ALL THOSE ATTRIBUTES!! :
criteriaMinMax <- c("max", "max", "max", "max")
Then for the rest of the process you can follow R documentation on the TOPSIS function: https://www.rdocumentation.org/packages/MCDA/versions/0.0.19/topics/TOPSIS
Resources:
Youtube tutorial: TOPSIS - Technique for Order Preference by Similarity to Ideal Solution. Manoj Mathew Retrieved from https://www.youtube.com/watch?v=kfcN7MuYVeI
R documentation on TOPSIS function. Retrieved from: https://www.rdocumentation.org/packages/MCDA/versions/0.0.19/topics/TOPSIS
REFERENCES:
1 Hwang, C. L., & Yoon, K. (1981). Methods for multiple attribute decision making. In Multiple attribute decision making (pp. 58-191).Springer, Berlin, Heidelberg.
James E. Gentle (2007). Matrix Algebra: Theory, Computations, and Applications in Statistics. Springer-Verlag. p. 299. ISBN 0-387-70872-3.

Error probability function

I have DNA amplicons with base mismatches which can arise during the PCR amplification process. My interest is, what is the probability that a sequence contains errors, given the error rate per base, number of mismatches and the number of bases in the amplicon.
I came across an article [Cummings, S. M. et al (2010). Solutions for PCR, cloning and sequencing errors in population genetic analysis. Conservation Genetics, 11(3), 1095–1097. doi:10.1007/s10592-009-9864-6]
that proposes this formula to calculate the probability mass function in such cases.
I implemented the formula with R as shown here
pcr.prob <- function(k,N,eps){
v = numeric(k)
for(i in 1:k) {
v[i] = choose(N,k-i) * (eps^(k-i)) * (1 - eps)^(N-(k-i))
}
1 - sum(v)
}
From the article, suggest we analysed an 800 bp amplicon using a PCR of 30 cycles with 1.85e10-5 misincorporations per base per cycle, and found 10 unique sequences that are each 3 bp different from their most similar sequence. The probability that a novel sequences was generated by three independent PCR errors equals P = 0.0011.
However when I use my implementation of the formula I get a different value.
pcr.prob(3,800,0.0000185)
[1] 5.323567e-07
What could I be doing wrong in my implementation? Am I misinterpreting something?
Thanks
I think they've got the right number (0.00113), but badly explained in their paper.
The calculation you want to be doing is:
pbinom(3, 800, 1-(1-1.85e-5)^30, lower=FALSE)
I.e. what's the probability of seeing less than three modifications in 800 independent bases, given 30 amplifications that each have a 1.85e-5 chance of going wrong. I.e. you're calculating the probability it doesn't stay correct 30 times.
Somewhat statsy, may be worth a move…
Thinking about this more, you will start to see floating-point inaccuracies when working with very small probabilities here. I.e. a 1-x where x is a small number will start to go wrong when the absolute value of x is less than about 1e-10. Working with log-probabilities is a good idea at this point, specifically the log1p function is a great help. Using:
pbinom(3, 800, 1-exp(log1p(-1.85e-5)*30), lower=FALSE)
will continue to work even when the error incorporation rate is very low.

Help understanding unipolar transfer function

There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?
e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.
While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.
e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....
'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977

Resources