I am given a table, indexed on 2 dimensions (x,y), with values(not necessarily ordered, though I think it isn't a horrendously unsafe assumption) z given, such that f(x, y) = z
So given an x and y, I interpolate to find a z value. Now given an x value(or y I suppose, not really important) and a z value, I need to find that y value that corresponds to the data. Is it possible to do this without the knowledge of an ordering in the z values of the table? If there is an ordering to the z values of the table is it possible? In my head, given ordering, it should be possible to find a unique solution, but I don't know how I can do it if I am not given ordering.
Could you post some or preferably all of the data? Assuming it is linear and continuous, we have n copies of ax +by = z. Let's say x=3 and z=4, we have 3 unknowns which we can put in a matrix with n rows and 3 columns. The first row would look like 3 b 4 since we are treating a, y and z as unknowns. Now try echelon row reduction. More specifically, do row1 - row2 (now don't do row2 - row1), row1-row3, row1-row4... row2-row3, row2-row4... there should be nchoose2 of these permutations. If there is a solution then each permutation will be in the form qia=qjz (the a and z won't be there of course) where qi and qj are the known numbers and q,i and j are constant.
Related
I have a Mixed Integer Programming (MIP) problem, currently modelled in Python's PuLP library. My issue is however very generic, syntax doesn't play a role here.
I want to add a constraint to my model that works like this:
if b=1 then x=y
The variable b is a binary variable taking values 0 or 1. x and y are variables that represent the current stock level. x as a continuous variable, y as an integer variable.
I know constraints can only be modelled in the following format:
a*x+c <= y # a, c are constants, x, y variables
I hope there is some workaround how I can model the above described if b then x equals y constraint.
Here are my approaches so far:
b*y <= x
y >= x*b # works in theory, but multiplication of 2 variables is not allowed
For 2 binary variables x and y the following is true:
M*y > x # represents: if x then y (M is a sufficient large constant)
I guess the solution involves a large M constant, maybe even further helper variables.
A little background: I want to model an inventory problem, with continuous stock levels. However, order decisions should only be possible in integer numbers. I therefore need the stock level to be modelled with float numbers. At the point of order (b==1) however in integer.
I hope someone can help here, even if this is rather theoretic than directly coding related. Hints to further resources that might help are also highly appreciated.
b=1 => x=y
can be modeled as:
y-M(1-b) <= x <= y+M(1-b)
I'm a complete beginner with R and I need to perform regressions on some data sets. My problem is, I'm not sure, how to rewrite the model into the mathematical formula.
Most confusing are interactions and poly function.
Can they be understood like a product and a polynomial?
Example
Let's have following model, both a and b are vectors of numbers:
y ~ poly(a, 2):b
Can it be rewritten mathematically like this?
y = a*b + a^2 * b
Example 2
And when I get a following expression from fit summary
poly(a, 2)2:b
is it equal to the following formula?
a^2 * b
Your question has two fold:
what does poly do;
what does : do.
For the first question, I refer you to my answer https://stackoverflow.com/a/39051154/4891738 for a complete explanation of poly. Note that for most users, it is sufficient to know that it generates a design matrix of degree number or columns, each of which being a basis function.
: is not a misery. In your case where b is also a numeric, poly(a, 2):b will return
Xa <- poly(a, 2) # a matrix of two columns
X <- Xa * b # row scaling to Xa by b
So your guess in the question is correct. But note that poly gives you orthogonal polynomial basis, so it is not as same as I(a) and I(a^2). You can set raw = TRUE when calling poly to get ordinary polynomial basis.
Xa has column names. poly(a,2)2 just means the 2nd column of Xa.
Note that when b is a factor, there will be a design matrix, say Xb, for b. Obviously this is a 0-1 binary matrix as factor variables are coded as dummy variables. Then poly(a,2):b forms a row-wise Kronecker product between Xa and Xb. This sounds tricky, but is essentially just pair-wise multiplication between all columns of two matrices. So if Xa has ka columns and Xb has kb columns, the resulting matrix has ka * kb columns. Such mixing is called 'interaction'.
The resulting matrix also has column names. For example, poly(a, 2)2:b3 means the product of the 2nd column of Xa and the dummy column in Xb for the third level of b. I am not saying 'the 3rd column of Xb' as this is false if b is contrasted. Usually a factor will be contrasted so if b has 5 levels, Xb will have 4 columns. Then the dummy column for third level will be the 2nd column of Xb, if the first factor level is the reference level (hence not appearing in Xb).
Given a set of variables, x's. I want to find the values of coefficients for this equation:
y = a_1*x_1 +... +a_n*x_n + c
where a_1,a_2,...,a_n are all unknowns. Thinking this in perspective of data frame, I want to create this value of y for every rows in the data.
My question is: for y, a_1...a_n and c are all unknown, is there a way for me to find a set of solutions a_1,...,a_n under the condition that corr(y,x_1), corr(y,x_2) .... corr(y,x_n) are all greater than 0.7. For simplicity take correlation here as Pearson correlation. I know there would no be unique solution. But how can I construct a set of solutions for a_1,...,a_n to fulfill this condition?
Spent a day to search the idea but could not get any information out of it. Any programming language to tackle this problem is welcomed or at least some reference for this.
No, it is not possible in general. It may be possible in some special cases.
Given x₁, x₂, ... you want to find y = a₁x₁ + a₂x₂ + ... + c so that all the correlations between y and the x's are greater than some target R. Since the correlation is
Corr(y, xi) = Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ]
your constraint is
Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ] > R
which can be rearranged to
Cov(y, xi)² > R² * Var(y) * Var(xi)
and this needs to be true for all i.
Consider the simple case where there are only two columns x₁ and x₂, and further assume that they both have mean zero (so you can ignore the constant c) and variance 1, and that they are uncorrelated. In that case y = a₁x₁ + a₂x₂ and the covariances and variances are
Cov(y, x₁) = a₁
Cov(y, x₂) = a₂
Var(x₁) = 1
Var(x₂) = 1
Var(y) = (a₁)² + (a₂)²
so you need to simultaneously satisfy
(a₁)² > R² * ((a₁)² + (a₂)²)
(a₂)² > R² * ((a₁)² + (a₂)²)
Adding these inequalities together, you get
(a₁)² + (a₂)² > 2 * R² * ((a₁)² + (a₂)²)
which means that in order to satisfy both of the inequalities, you must have R < Sqrt(1/2) (by cancelling common factors on both sides of the inequality). So the very best you could do in this simple case is to choose a₁ = a₂ (the exact value doesn't matter as long as they are equal) and both of the correlations Corr(y,a₁) and Corr(y,a₂) will be equal to 0.707. You cannot achieve correlations higher than this between y and all of the x's simultaneously in this case.
For the more general case with n columns (each of which has mean zero, variance 1 and zero correlation between columns) you cannot simultaneously achieve correlations greater than 1 / sqrt(n) (as pointed out in the comments by #kazemakase).
In general, the more independent variables there are, the lower the correlation you will be able to achieve between y and the x's. Also (although I haven't mentioned it above) the correlations between the x's matter. If they are in general positively correlated, you will be able to achieve a higher target correlation between y and the x's. If they are in general uncorrelated or negatively correlated, you will only be able to achieve low correlations between y and the x's.
I am not expert in this field so read with extreme prejudice!
I am a bit confused by your y
Your y is a single constant and you want to have the correlation between it and all the x_i values be > 0.7 ? I am no math/statistics expert but my feelings for this are that this is achievable only if the correlation between x_i,x_j upholds the same condition. in that case you can simply do the average of x_i like this:
y=(x_1+x_2+x_3+...+x_n)/n
so the a_i=1.0/n and c=0.0 But still the question is:
What meaning has a correlation between 2 numbers only?
More reasonable would be if y is a function dependent on x
for example like this:
y(x) = a_1*(x-x_1)+... +a_n*(x-x_n) + c
or any other equation (hard to make any without knowing where it came from and for what purpose). Then you can compute the correlation between two sets
X = { x_1 , x_2 ,..., x_n }
Y = { y(x_1),y(x_2),...y(x_n) }
In that case I would give try approximation search for the c,a_i constants to maximize correlation between X,Y, but the results complexity for the whole thing would be insane. So instead I would tweak just one constant. at the time
set some safe c,a_1,a_2,... constants
tweak a_1
so compute correlation for (a_1-delta) and (a_1+delta) and then choose the direction which is in favor of correlation. then keep going in that direction until the correlation coefficient start to drop.
Then you can recursively to this again with smaller delta. Btw this is exactly what my approx class does from the link above.
loop #2 through all the a_i
loop this whole few times to enhance precision
May be you could compute the c after each run to minimize the distance between X,Y sets.
I have number of strings (n strings) and I am computing edit distance between strings in a way that I take first one and compare it to the (n-1) remaining strings, second one and compare it to (n-2) remaining, ..., comparing until I ran out of the strings.
Why would an average edit distance be computed as sum of all the edit distances between all the strings divided by the number of comparisons squared. This squaring is confusing me.
Thanks,
Jannine
I assume you have somewhere an answer that seems to come with a squared factor -which I'll take as n^2, where n is the number of strings (not the number of distinct comparisons, which is n*(n-1)/2, as +flaschenpost points to ). It would be easier to give you a more precise answer if you'd exactly quote what that answer is.
From what I understand of your question, it isn't, at least it's not the usual sample average. It is, however, a valid estimator of central tendency with the caveat that it is a biased estimator.
See https://en.wikipedia.org/wiki/Bias_of_an_estimator.
Let's define the sample average, which I will denote as X', by
X' = \sum^m_i X_i/N
IF N=m, we get the standard average. In your case, this is the number of distinct pairs which is m=n*(n-1)/2. Let's call this average Xo.
Then if N=n*n, it is
X' = (n-1)/(2*n) Xo
Xo is an unbiased estimator of the population mean \mu. Therefore, X' is biased by a factor f=(n-1)/(2*n). For n very large this bias tends to 1/2.
That said, it could be that the answer you see has a sum that runs not just over distinct pairs. The normalization would then change, of course. For instance, we could extend that sum to all pairs without changing the average value: The correct normalization would then be N = n*(n-1); the value of the average would still be Xo though as the number of summands has double as well.
Those things are getting easier to understand if done by hand with pen and paper for a small example.
If you have the 7 Strings named a,b,c,d,e,f,g, then the simplest version would
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to a, b to c, ... , b to g (this are 6)
. . .
Compare g to a, g to b, ... , g to f (this are 6)
So you have 7*6 or n*(n-1) values, so you divide by nearly 7^2. This is where the square comes from. Maybe you even compare a to a, which should bring a distance of 0 and increase the values to 7*7 or n*n. But I would count it a bit as cheating for the average distance.
You could double the speed of the algorithm, just changing it a small bit
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to c, ... , b to g (this are 5)
Compare c to d, ... , b to g (this are 4)
. . .
Compare f to g (this is 1)
That is following good ol' Gauss 7*6/2, or n*(n-1)/2.
So in Essence: Try doing a simple example on paper and then count your distance values.
Since Average is still and very simply the same as ever:
sum(values) / count(values)
p1 <- c(.25,.025,.025,.1,.2,.4)
T <- sample(1:6,size=N,replace=TRUE, prob=someprobabilityvector)
Y <- rbinom(N,1,p1[c(T)])
HI folks, I am new to R and programming in general and need some help with understanding sth basic. could someone explain to me one what is happening in vector Y above. I figure out what p1[c(T)] does above. But have no idea what vector Y is doing. All help is appreciated in advance.
The first line of your code creates a vector of six probabilities:
p1 <- c(.25,.025,.025,.1,.2,.4)
In the second line, you randomly choose N values from the numbers one to six (with replacement). The probability for each value is specified in someprobabilityvector. Hence, the function will return a vector of length N including values between 1 and 6
T <- sample(1:6,size=N,replace=TRUE, prob=someprobabilityvector)
In the third line, N random numbers from a binomial distribution with one trial and probablities specified in p1[c(T)] are generated. c(T) is the same as T: the vector including values from 1 to 6. The vector is used for indexing the vector p1. Hence, p1[c(T)] will return a vector including N values from vector p1.
Y <- rbinom(N,1,p1[c(T)])
Since the specified binomial distribution has one trial only, the vector Y will contain zeroes and ones.