Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In trying to solve a particular Project Euler question, I ran into difficulties with a particular mathematical formula. According to this web page (http://www.mathpages.com/home/kmath093.htm), the formula for determining the probability for rolling a sum, T, on a number of dice, n, each with number of sides, s, each numbered 1 to s, can be given as follows:
alt text http://www.freeimagehosting.net/uploads/8294d47194.gif
After I started getting nonsensical answers in my program, I started stepping through, and tried this for some specific values. In particular, I decided to try the formula for a sum T=20, for n=9 dice, each with s=4 sides. As the sum of 9 4-sided dice should give a bell-like curve of results, ranging from 4 to 36, a sum of 20 seems like it should be fairly (relatively speaking) likely. Dropping the values into the formula, I got:
alt text http://www.freeimagehosting.net/uploads/8e7b339e32.gif
Since j runs from 0 to 7, we must add over all j...but for most of these values, the result is 0, because at least one the choose formulae results are 0. The only values for j that seem to return non-0 results are 3 and 4. Dropping 3 and 4 into this formula, I got
alt text http://www.freeimagehosting.net/uploads/490f943fa5.gif
Which, when simplified, seemed to go to:
alt text http://www.freeimagehosting.net/uploads/603ca84541.gif
which eventually simplifies down to ~30.75. Now, as a probability, of course, 30.75 is way off...the probability must be between 0 and 1, so something has gone terribly wrong. But I'm not clear what it is.
Could I misunderstanding the formula? Very possible, though I'm not clear at all where the breakdown would be occuring. Could it be transcribed wrong on the web page? Also possible, but I've found it difficult to find another version of it online to check it against. Could I be just making a silly math error? Also possible...though my program comes up with a similar value, so I think it's more likely that I'm misunderstanding something.
Any hints?
(I would post this on MathOverflow.com, but I don't think it even comes close to being the kind of "postgraduate-level" mathematics that is required to survive there.)
Also: I definitely do not want the answer to the Project Euler question, and I suspect that other people that my stumble across this would feel the same way. I'm just trying to figure out where my math skills are breaking down.
According to mathworld (formula 9 is the relevant one), the formula from your source is wrong.
The correct formula is supposed to be n choose j, not n choose T. That'll really reduce the size of the values within the summation.
The mathworld formula uses k instead of j and p instead of T:
Take a look at article in wikipedia - Dice.
The formula here looks almost similar, but have one difference. I think it will solve your problem.
I'm going to have to show my ignorance here.... Isn't 9 choose 20 = 0? More generally, isn't n choose T going to always be 0 since T>=n? Perhaps I'm reading this formula incorrectly (I'm not a math expert), but looking at de Moive's work, I'm not sure how this formula was derived; it seems slightly off. You might try working up from Moive's original math, page 39, in the lemma.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
language agnostic question but I wanted to know if anyone can provide guidance logically on how to create a set that through adding or subtracting could generate any number between 1 and N.
An idea.
If we used only summation, then the set consisting of powers of 2 less or equal than N would be such a minimal set. With subtraction, the powers of 3 seems to be an idea good enough for start. I suspect it may be minimal but don't have a proof.
My reasoning is the following. Suppose we have k numbers in the set. Then there are at most 3^k possible results we can get with summation and subtraction. (Note that those results contain negative integers. In fact, such a set of results is symmetric with respect to 0. And, usually, we'll get fewer than 3^k different results - some of them will simply repeat).
An example of "optimal" choice:
Let's take the first number equal to 1. Then the possible results are: -1, 0, 1. Next, take 3 - then we'll get all integers between -4 and 4. (Choice of number 2, instead of 3, is clearly inefficient.) Then, to obtain the next contiguous (and non-overlapping) sequence of integers we should take 9. And so on, until number N is reached.
We could use other numbers instead of powers of 3. In such a case, we should take care of gaps that will results in consecutive steps. And I doubt it would produce a set with fewer elements. (I may be wrong, though.)
Sort all numbers.
Remove duplicates that are next to each other (is current equal to previous?)
Add / Subtract the remains.
The problem:
ceiling(31)
#31
ceiling(31/60*60)
#32
What is the correct way to fix this kind of errors?
Doing the multiplication before the division is not an option, my code looks something like this:
x <- 31/60
...
y <- ceiling(x*60)
I'm thinking of doing a new function:
ceil <- function(x) {
ceiling(signif(x))
}
But I'm new to R, maybe there is a better way.
UPDATE
Sorry, I didn't give more details, I have the same problem in different parts of my code for different reasons, but always with ceiling.
I am aware of the rounding error in floating-point calculation. Maybe the title of the question could be improved, I don't want to fix an imprecision of the ceiling function, what I want to do is perhaps the opposite, make ceiling less exact. A way to tell R to ignore the digits that are clearly noise:
options(digits=17)
31/60*60
#31.000000000000004
But, apparently, the epsilon required to ignore the noise digits depends on the context of the problem.
The real problem here, I strongly believe, is found in my hero The Data Munger Guru's tagline, which is: "What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it. "
There are myriad cases where floating-point precision will cause apparent integers to turn into "integer +/- epsilon" , and so you need to figure out why you are going for "ceiling" , why you allow your values to not be integers, etc. <-- more or less what Pascal Cuoq wrote in his comment.
The solution to your concern thus depends on what's actually going on. Perhaps you want, say trunc(x/60)->y followed with trunc(y*60) , or maybe not :-) . Maybe you want y<-round(x/60*60) +1 , or jhoward's suggested approach. It depends, as I stress here, critically on what your goal is and how you want to deal with corner cases.
Preamble:
I have been implementing my own CAT system. The resources that have helped me most are these:
An On-line, Interactive, Computer Adaptive Testing Tutorial, 11/98 -- A good explanation of how to pick a test question based on which one would return the most information. Fascinating idea, really. The equations are not illustrated with examples, however... but there is a simulation to play with. Unfortunately the simulation is down!
Computer-Adaptive Testing: A Methodology Whose Time Has Come -- This has similar equations, although it does not use IRT or the Newton-Raphson Method. It is also Rasch, not 3PL. It does, however, have a BASIC program that is far more explicit than the usual equations that are cited. I have converted portions of the program in order to get my own system to experiment with, but I would prefer to use 1PL and/or 3PL.
Rasch Dichotomous Model vs. One-parameter Logistic Model -- This clears some stuff up, but perhaps only makes me more dangerous at this stage.
Now, the question.
I want to be able to measure someone's ability level based on a series of questions that are rated at a 1PL difficulty level and of course the person's answers and whether or not they are correct.
I have to first have a function that calculates the probably of a given item. This equation gives the probability function for 1PL.
Probability correct = e^(ability - difficulty) / (1+ e^(ability - difficulty))
I'll go with this one arbitrarily for now. Using an ability estimate of 0, we get the following probabilities:
-0.3 --> 0.574442516811659
-0.2 --> 0.549833997312478
-0.1 --> 0.52497918747894
0 --> 0.5
0.1 --> 0.47502081252106
0.2 --> 0.450166002687522
0.3 --> 0.425557483188341
This makes sense. A problem targeting their level is 50/50... and the questions are harder or easier depending on which direction you go. The harder questions have a smaller chance of coming out correct.
Now... consider a test taker that has done five questions at this difficulty: -.1, 0, .1, .2, .1. Assume they got them all correct except the one that's at difficulty .2. Assuming an ability level of 0... I would want some equations to indicate that this person is slightly above average.
So... how to calculate that with 1PL? This is where it gets hard.
Looking at the equations on the various pages... I will start with an assumed ability level... and then gradually adjust it with each question after more or less like the following.
Starting Ability: B0 = 0
Ability after problem 1: B1 = B0 + [summations and function evaluated for item 1 at ability B0]
Ability after problem 2: B2 = B1 + [summations and functions evaluated for items 1-2 at ability B1]
Ability after problem 3: B3 = B2 + [summations and functions evaluated for items 1-3 at ability B2]
Ability after problem 4: B4 = B3 + [summations and functions evaluated for items 1-4 at ability B3]
Ability after problem 5: B5 = B4 + [summations and functions evaluated for items 1-5 at ability B4]
And so on.
Just reading papers on this, this is the gist of what the algorithm should be doing. But there are so many different ways to do this. The behaviour of my code is clearly wrong as I get division by zero errors... so this is where I get lost. I've messed with information functions and done derivatives, but my college level math is not cutting it.
Can someone explain to me how to do this part? The literature I've read is short on examples and the descriptions of the math appears incomplete to me. I suppose I'm asking for how to do this with a 3PL model that assumes that c is always zero and a is always 1.7 (or maybe -1.7-- whatever works.) I was trying to get to 1PL somehow anyway.
Edit: A visual guide to item response theory is the best explanation of how to do this I've seen so far, but the text gets confusing at the most critical point. I'm closer to getting this, but I'm still not understanding something. Also... the pattern of summations and functions isn't in this text like I expected.
How to do this:
This is an inefficient solution, but it works and is reasonably inituitive.
The last link I mentioned in the edit explains this.
Given a probability function, set of question difficulties, and corresponding set of evaluations-- ie, whether or not they got it correct.
With that, I can get a series of functions that will tell you the chance of their giving that exact response. Now... multiply all of those functions together.
We now have a big mess! But it's a single function in terms of the unknown ability variable that we want to find.
Next... run a slew of numbers through this function. Whatever returns the maximum value is the test taker's ability level. This can be used to either determine the standard error or to pick the next question for computer adaptive testing.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
From my understanding, the expectation of vector (let's say nx1) is equivalent to finding the mean. However if we have two vectors x and y, both of which are (nx1), what does it mean to try to find the expectation of the product of these vectors?
e.g:
E[x * y] = ?
Here are we taking the inner product or the outer product? If I was using Matlab, would I be doing:
E[x' * y]
or
E[x * y']
or
E[x .* y]
I'm not really understanding the intuition behind expectation as applied to the product of vectors (my background is not in mathematics), so if someone could shed light on this for me I would really appreciate it. Thanks!
== EDIT ==
You're right, I wasn't clear. I came across the definition of the covariance where the formula given was:
Cov[X; Y] = E[X * Y] - E[X] * E[Y]
And the part where E[X * Y] came up is what confused me. I should have put this up on a math site, and will next time. Thanks for the help.
As much as I believe this belongs either on a math or statistics site, I'm feeling bored at the moment, so I'll say a few words.
YOU need to define when you are doing, and to understand what you want to see. Numbers, vectors, by themselves are all just that - numbers. There is no meaning without context. I'll argue this is your problem.
For example, you can view a vector as a list of numbers, thus samples from some distribution, but samples of a scalar valued parameter. Thus, my vector might be a list of the temperatures in my house over the course of a day, or of the rainfall for the last week. As such, we can talk about a mean of those measurements. If we had a distribution, we could talk about the expected value of that distribution.
You might also look at a vector as a SINGLE piece of information. It might represent my location on the surface of the earth, so perhaps [latitude, longitude, elevation]. As such, it makes no sense to take the mean of these three pieces of information. However, I might be interested in an average location, taken over many such location measurements over a period of time.
As far as worrying about inner versus outer products, they are confusing you. Instead, think about WHAT these numbers represent and what you need to do with them, and only THEN worry about how to compute what you need.
Following on from #woodchips 's answer - when it does make sense to multiply two random variables and find the expectation of the product, in the discrete case it depends on whether you have the values for X and Y that correspond with each other i.e. if for each event you have an x and a y. In that case to find the expectation of the product, you simply multiply each pair of x and y you have and find the mean. If they're independent and you just have two vectors of samples and there is no co-occurrence, the expectation of the product is simply the product of their individual expectations.
I have question that comes from a algorithms book I'm reading and I am stumped on how to solve it (it's been a long time since I've done log or exponent math). The problem is as follows:
Suppose we are comparing implementations of insertion sort and merge sort on the same
machine. For inputs of size n, insertion sort runs in 8n^2 steps, while merge sort runs in 64n log n steps. For which values of n does insertion sort beat merge sort?
Log is base 2. I've started out trying to solve for equality, but get stuck around n = 8 log n.
I would like the answer to discuss how to solve this mathematically (brute force with excel not admissible sorry ;) ). Any links to the description of log math would be very helpful in my understanding your answer as well.
Thank you in advance!
http://www.wolframalpha.com/input/?i=solve%288+log%282%2Cn%29%3Dn%2Cn%29
(edited since old link stopped working)
Your best bet is to use Newton;s method.
http://en.wikipedia.org/wiki/Newton%27s_method
One technique to solving this would be to simply grab a graphing calculator and graph both functions (see the Wolfram link in another answer). Find the intersection that interests you (in case there are multiple intersections, as there are in your example).
In any case, there isn't a simple expression to solve n = 8 log₂ n (as far as I know). It may be simpler to rephrase the question as: "Find a zero of f(n) = n - 8 log₂ n". First, find a region containing the intersection you're interested in, and keep shrinking that region. For instance, suppose you know your target n is greater than 42, but less than 44. f(42) is less than 0, and f(44) is greater than 0. Try f(43). It's less than 0, so try 43.5. It's still less than 0, so try 43.75. It's greater than 0, so try 43.625. It's greater than 0, so keep going down, and so on. This technique is called binary search.
Sorry, that's just a variation of "brute force with excel" :-)
Edit:
For the fun of it, I made a spreadsheet that solves this problem with binary search: binary‑search.xls . The binary search logic is in the second data column, and I just auto-extended that.