Calculating mutual information between two classifiers for a fixed dataset

Calculating mutual information between two classifiers for a fixed dataset - math

After searching the web, I have no answer for my question. There are some formulas that, for example, can calculate the entropy between two classifiers.
How can I calculate the mutual information?

Consider we have two random variables X and Y . The mutual information between these two statistical variables is defined by I(X,Y) = H(X) + H(Y) − H(X,Y) where H(X) is the
entropy of variable X , H(X,Y) is the joint entropy of both variables, and H(Y|X) = H(X,Y) − H(X) is the conditional
entropy that measures the uncertainty of variable Y given the value of variable X. The MI between these two variables measure the amount of uncertainty reduction for one variable given the value of the other variable.

Related

Generate random data based on correlation matrix for multiple timesteps in R

I would like to simulate data for some cases (e.g. nPerson=1000 obversations) at
some consecutive timesteps (e.g. ts = 3) for N intercorrelated variables (e.g. N=5).
The simulation should be based on a correlation matrix (corrMat, nrows=nPerson,.ncols = N).
corrMat should be identical for all timesteps.
I already found out that the MASS package has a function to create
random data fitting the constraints given by corrMat.
t1 <- mvrnorm(nPerson,mu=rep(0, N),Sigma=corrMat,empirical=T)
Now I would like to simulate t2 as a function of t1 and corrMat.
The data of t2 therefore should correlate according to corrMat
and they should also have same variance as the variables of t1.
One important constrained: for the intial values corrMat[i,i] = 1,
for consequtive timesteps it should be posible, that corrMat[i,i] < 1,
because each variable is depending on itsself a timestep before,
but a perfect correlation is notintended.
Maybe there is a variance decomposition of the correlation matrix,
that calculates an error variance for each of the n variables at the
next time step, so that one could calculate the
values at timestep t+1 as sum of the weighted correlations of the
variables at timestep t and then adding a random error,distributed
according to the error variance (with mean of error = 0) that replicates
the correlation matrix again at t+1.
Assuming normal errors:
getRand <- function (range) {
return (rnorm(1,mean=0, sd=range) )
}
That the (very simplified) code for the i-th variable x_i:
x_i[t+1] = 0
for (j:1..N) {
x_i[t+1] = x_i[t+1] + corrMat[i,j] * x_j[t]
}
x_i[t+1] = x_i[t+1] + getRand(sdErr)
So the question would be more specific: how to calculate sdErr?
For simplification I try to assume, that the variance for all variables
should be 1.
Thank you for any hint, how to get one step further!

I will do a mathematical formulation of the problem to stats.stackexchange.com,
as mikeck suggested to discuss details of the correlation problems more
in depth.
I still am interested in finding a geneal formula to calculate sdErr
to use it in the calculation of x_i[t+1].
But meanwhile I found a useful practical solution to the specific question "how to calculate sdErr?" without a formula for sdErr:
(1) simply calculate all variables WITHOUT errors (according to the equation above).
(2) calculate variances of the new variables
(3) calculate (for each i) differences var(x_i[t]) - var(x_i[t+1]) = sdErr ^ 2
So this sdErr can be added to each variable for each new observation.
This should lead to observations at t+1 which at least have the same variances as the observations in t.
Details concercing the question, if the model definition is adequate,
will be part of another post.

Puzzling on probabilistic programming language in Julia, Turing.jl and Distributions.jl

I am currently working on a simple Bayesian problem in which my likelihood is the product of N Poisson distributions, so I got the following:
likelihood(x) ~ product_i(Poisson(mean_i))
where x is a vector, representing the variables, and the product on the other side run over the dimension (from i to N). mean_i is the mean associated to the variables x_i in the vector x.
As far as I have understood Turing and Distributions, I think that the following is the 'correct' code part for the likelihood:
vector_x = [x for x in variables]
vector_x ~ Product([Poisson(mean_i) for mean_i in means]
where means is a vector containing the various means associated to each poisson process.
Is it correct? :D

r - Estimate selection-unbiased allele frequencies with linear regression systems

I have a few data sets consisting of frequencies for i distinct alleles/SNPs of some populations. Additionally I recorded some factors that are suspicious for having changed the frequencies of these alleles within the populations in the past due to their selectional effect. It is assumed that the selection impact can be described in the form of a simple linear regression for every selection factor.
Now I'd like to estimate how the allele frequencies are expected to be under identical selectional forces (thus, I set selection=1). These new allele frequencies a'_i are derived as
a'_i = a_i - function[a_i|selection=1]
with the current frequency a_i of the allele i of a population and function[a_i|selection=1] as the estimated allele frequency under the absence of selectional forces.
However, there are some constraints for the whole process:
The minimal values of a'_i allowed is 0.
The sum of all allele frequencies a'_i has to be 1.
Usually I'd solve this problem by applying multiple linear regressions. But then the constraints are not fulfilled ...
Any idea how to approach this analysis with constraints (maybe using linear equation/regression systems or structural equation modelling)?
Here is an example data set containing allele frequencies for the ABO major allele groups (p, q, r) as well as the selection variables (x, y, z).
Although this example file only contains 3 alleles and 3 influential variables, all my data sets contain up to ~1050 alleles/SNPs and always 8 selection variables that may have (but don't have to) an impact on the allele frequencies ...
Many thanks in advance for ideas, code snippets and hints!

Efficiently calculating integral of a multivariate function on non-rectangular region?

I want to compute the expected value of a multivariate function f(x) wrt to dirichlet distribution. My problem is "penta-nomial" (i.e 5 variables) so calculating the explicit form of the expected value seems unreasonable. Is there a way to numerically integrate it efficiently?
f(x) = \sum_{0,4}(x_i*log(n/x_i))
x = <x_0, x_1, x_2, x_3, x_4> and n is a constant

A Feature Selection Algorithm POE1ACC for features with continuous value

i want to implement the algorithm of "Probability of Error and Average Correlation Coefficient". (more info Page 143. It is a algorithm to elect unused features from set of features. As far as i know, this algorithm is not limited to boolean valued features but i dont know how i can use it for continuous features.
This is the only example what i could find about this algorithm:
Thus, X is to be predicted feature and C is any feature. To calculate Probability of Error value of C, they select values which are mismatching with green pieces. Thus PoE of C is (1-7/9) + (1-6/7) = 3/16 = 1875.
My question is thus: How can we use a continuous feature instead of a boolean feature, like in this example, to calculate PoE? Or is it not possible?

The algorithm that you describe is a feature selection algorithm, similar to the forward selection technique. At each step, we find a new feature Fi that minimizes this criterion :
weight_1 * ErrorProbability(Fi) + weight_2 * Acc(Fi)
ACC(Fi) represents the mean correlation between the feature Fi and other features already selected. You want to minimize this in order to have all your features not correlated, thus have a well conditionned problem.
ErrorProbability(Fi) represents if the feature correctly describes the variable you want to predict. For example, lets say you want to predict if tommorow will be rainy depending on temperature (continuous feature)
The Bayes error rate is (http://en.wikipedia.org/wiki/Bayes_error_rate) :
P = Sum_Ci { Integral_xeHi { P(x|Ci)*P(Ci) } }
In our example
Ci belong to {rainy ; not rainy}
x are instances of temperatures
Hi represent all temperatures that would lead to a Ci prediction.
What is interesting is that you can take any predictor you like.
Now, suppose you have all temperatures in one vector, all states rainy/not rainy in another vector :
In order to have P(x|Rainy), consider the following values :
temperaturesWhenRainy <- temperatures[which(state=='rainy')]
What you should do next is to plot an histogram of these values. Then you should try to fit a distribution on it. You will havea parametric formula of P(x|Rainy).
If your distribution is gaussian, you can do it simply :
m <- mean(temperaturesWhenRainy)
s <- sd(temperaturesWhenRainy)
Given some x value, you have the density of probability of P(x|Rainy) :
p <- dnorm(x, mean = m, sd = s)
You can do the same procedure for P(x|Not Rainy). Then P(Rainy) and P(Not Rainy) are easy to compute.
Once you have all that stuff you can use the Bayes error rate formula, which yields your ErrorProbability for a continuous feature.
Cheers

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex