In what order we need to put weights on scale? - math

I' am doing my homework in programming, and I don't know how to solve this problem:
We have a set of n weights, we are putting them on a scale one by one until all weights is used. We also have string of n letters "R" or "L" which means which pen is heavier in that moment, they can't be in balance. There are no weights with same mass. Compute in what order we have to put weights on scale and on which pan.
The goal is to find order of putting weights on scale, so the input string is respected.
Input: number 0 < n < 51, number of weights. Then weights and the string.
Output: in n lines, weight and "R" or "L", side where you put weight. If there are many, output any of them.
Example 1:
Input:
3
10 20 30
LRL
Output:
10 L
20 R
30 L
Example 2:
Input:
3
10 20 30
LLR
Output:
20 L
10 R
30 R
Example 3:
Input:
5
10 20 30 40 50
LLLLR
Output:
50 L
10 L
20 R
30 R
40 R
I already tried to compute it with recursion but unsuccessful. Can someone please help me with this problem or just gave me hints how to solve it.

Since you do not show any code of your own, I'll give you some ideas without code. If you need more help, show more of your work then I can show you Python code that solves your problem.
Your problem is suitable for backtracking. Wikipedia's definition of this algorithm is
Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it determines that the candidate cannot possibly be completed to a valid solution.
and
Backtracking can be applied only for problems which admit the concept of a "partial candidate solution" and a relatively quick test of whether it can possibly be completed to a valid solution.
Your problem satisfies those requirements. At each stage you need to choose one of the remaining weights and one of the two pans of the scale. When you place the chosen weight on the chosen pan, you determine if the corresponding letter from the input string is satisfied. If not, you reject the choice of weight and pan. If so, you continue by choosing another weight and pan.
Your overall routine first inputs and prepares the data. It then calls a recursive routine that chooses one weight and one pan at each level. Some of the information needed by each level could be put into mutable global variables, but it would be more clear if you pass all needed information as parameters. Each call to the recursive routine needs to pass:
the weights not yet used
the input L/R string not yet used
the current state of the weights on the pans, in a format that can easily be printed when finalized (perhaps an array of ordered pairs of a weight and a pan)
the current weight imbalance of the pans. This could be calculated from the previous parameter, but time would be saved by passing this separately. This would be total of the weights on the right pan minus the total of the weights on the left pan (or vice versa).
Your base case for the recursion is when the unused-weights and unused-letters are empty. You then have finished the search and can print the solution and quit the program. Otherwise you loop over all combinations of one of the unused weights and one of the pans. For each combination, calculate what the new imbalance would be if you placed that weight on that pan. If that new imbalance agrees with the corresponding letter, call the routine recursively with appropriately-modified parameters. If not, do nothing for this weight and pan.
You still have a few choices to make before coding, such as the data structure for the unused weights. Show me some of your own coding efforts then I'll give you my Python code.
Be aware that this could be slow for a large number of weights. For n weights and two pans, the total number of ways to place the weights on the pans is n! * 2**n (that is a factorial and an exponentiation). For n = 50 that is over 3e79, much too large to do. The backtracking avoids most groups of choices, since choices are rejected as soon as possible, but my algorithm could still be slow. There may be a better algorithm than backtracking, but I do not see it. Your problem seems to be designed to be handled by backtracking.
Now that you have shown more effort of your own, here is my un-optimized Python 3 code. This works for all the examples you gave, though I got a different valid solution for your third example.
def weights_on_pans():
def solve(unused_weights, unused_tilts, placement, imbalance):
"""Place the weights on the scales using recursive
backtracking. Return True if successful, False otherwise."""
if not unused_weights:
# Done: print the placement and note that we succeeded
for weight, pan in placement:
print(weight, 'L' if pan < 0 else 'R')
return True # success right now
tilt, *later_tilts = unused_tilts
for weight in unused_weights:
for pan in (-1, 1): # -1 means left, 1 means right
new_imbalance = imbalance + pan * weight
if new_imbalance * tilt > 0: # both negative or both positive
# Continue searching since imbalance in proper direction
if solve(unused_weights - {weight},
later_tilts,
placement + [(weight, pan)],
new_imbalance):
return True # success at a lower level
return False # not yet successful
# Get the inputs from standard input. (This version has no validity checks)
cnt_weights = int(input())
weights = {int(item) for item in input().split()}
letters = input()
# Call the recursive routine with appropriate starting parameters.
tilts = [(-1 if letter == 'L' else 1) for letter in letters]
solve(weights, tilts, [], 0)
weights_on_pans()
The main way I can see to speed up that code is to avoid the O(n) operations in the call to solve in the inner loop. That means perhaps changing the data structure of unused_weights and changing how it, placement, and perhaps unused_tilts/later_tilts are modified to use O(1) operations. Those changes would complicate the code, which is why I did not do them.

Related

what if the FD steps varied w.r.t output/input

I am using the finite difference scheme to find gradients.
Lets say i have 2 outputs (y1,y2) and 1 input (x) in a single component. And in advance I know that the sensitivity of y1 with respect to x is not same as the sensitivity of y2 to x. And thus i could potentially have two different steps for those as in ;
self.declare_partials(of=y1, wrt=x, method='fd',step=0.01, form='central')
self.declare_partials(of=y2, wrt=x, method='fd',step=0.05, form='central')
There is nothing that stops me (algorithmically) but it is not clear what would openmdao gradient calculation exactly do in this case?
does it exchange information from the case where the steps are different by looking at the steps ratios or simply treating them independently and therefore doubling computational time ?
I just tested this, and it does the finite difference twice with the two different step sizes, and only saves the requested outputs for each step. I don't think we could do anything with the ratios as you suggested, as the reason for using different stepsizes to resolve individual outputs is because you don't trust the accuracy of the outputs at the smaller (or large) stepsize.
This is a fair question about the effect of the API. In typical FD applications you would get only 1 function call per design variable for forward and backward difference and 2 function calls for central difference.
However in this case, you have asked for two different step sizes for two different outputs, both with central difference. So here, you'll end up with 4 function calls to compute all the derivatives. dy1_dx will be computed using the step size of .01 and dy2_dx will be computed with a step size of .05.
There is no crosstalk between the two different FD calls, and you do end up with more function calls than you would have if you just specified a single step size via:
self.declare_partials(of='*', wrt=x, method='fd',step=0.05, form='central')
If the cost is something you can bear, and you get improved accuracy, then you could use this method to get different step sizes for different outputs.

Is it possible to represent 'average value' in programming?

Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.

How to represent Graham's number in programming language (specifically python)?

I'm new to programming, and I would like to know how to represent Graham's number in python (the language I decided to learn as my first). I can show someone in real life on paper how to somewhat get to Graham's number, but for anyone who doesn't know what it is, here it is.
So imagine you have a 3. Now I can represent powers of 3 using ^ (up arrow). So 3^3 = 27. 3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987. 3^^^3 = 3^7625597484987 = scary big number. Now we can see that every time you add an up arrow, the number gets massively big.
Now imagine 3^^^^3 = 3^(3^7625597484987)... this number is stupid big.
So now that we have 3^(3^7625597484987), we will call this G1. That's 3^^^^3 (4 arrows in between the 3's).
Now G2 is basically 3 with G1 number of arrows in between them. Whatever number 3^(3^7625597484987) is, is the number of arrows in between the 2 3's of G2. So this number has 3^(3^7625597484987) number of arrows.
Now G3 has G2 number of arrows in between the 2 3's. We can see that this number (G3) is just huge. Stupidly huge. So each G has the number of arrows represented by the G before the current G.
Do this over and over again, placing and previous number of "G" arrows into the next number. Do this until G64, THAT'S Graham's number.
Now here is my question. How do you represent "the number of a certain thing (in this case arrows) is the number of arrows in the next G"? How do you represent the number of something "goes into" the next string in programming. If I can't do it in python, please post which languages this would be possible in. Thanks for any responses.
This is an example of a recursively defined number, which can be expressed using a base case and a recursive case. These two cases capture the idea of "every output is calculated from a previous output following the rule '____,' except for the first one which is ____."
A simple canonical one would be the factorial, which uses the base case 1! = 1 (or sometimes 0! = 1) and the recursive case n! = n * (n-1)! (when n>1). Or, in a more code-like fashion, f(1) = 1 and f(n) = n * f(n-1). For more information on how to write a function that behaves like this, find a good resource to learn about recursion.
The G1, G2, G3, etc. in the definition of Graham's number are like these calls to the previous results. You need a function to represent an "arrow" so you can call an arbitrary number of them. So to translate your mental model into code:
"[...] we will call this [3^^^^3] G1. [...] 'The number of a certain thing (in this case arrows) is the number of arrows in the next G' [...]"
Let the aforementioned function be arrow(n) = 3^^^...^3 (where there are n arrows). Then to get Graham's number, your base case will be G(1) = arrow(4), and your recursive case will be G(n) = arrow(G(n-1)) (when n>1).
Note that arrow(n) will also have a recursive definition (see Knuth's up-arrow notation), as you showed but didn't describe, by calculating each output from the previous (emphasis added):
3^3 = 27.
3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987.
3^^^3 = 3^7625597484987 = scary big number.
You might describe this as "every output [arrow(n)] is 3 to the power of the previous output, except for the first one [arrow(1)] which is just 3." I'll leave the translation from description to definition as an exercise.
(Also, I hope you didn't actually want to represent Graham's number, but rather just its calculation, since it's too big even for Python or any computer for that matter.)
class GrahamsNumber(object):
pass
G = GrahamsNumber()
For most purposes, this is as good of a representation of Graham's number as any other one. Sure, you can't do anything useful with G, but for the most part you can't do anything useful with Graham's number either, so it accurately reflects realistic use cases.
If you do have some specific use cases (and they're feasible), then a representation can be tailored to allow those use cases; but we can't guess what you want to be able to do with G, you have to spell it out explicitly.
For fun, I implemented hyperoperators in python as the module hyperop and one of my examples is Graham's number:
def GrahamsNumber():
# This may take awhile...
g = 4
for n in range(1,64+1):
g = hyperop(g+2)(3,3)
return g
The magic is in the recursion, which loosely stated looks like H[n](x,y) = reduce(lambda x,y: H[n-1](y,x), [a,]*b).
To answer your question, this function (in theory) will calculate the number in question. Since there is no way this program will ever finish before the heat death of the Universe, you gain more of an understanding from the "function" itself than the actual value.

Concept of Naive Bayes for demonstration purposes, how to calculate word possibilities

I need to demonstrate the Bayesian spam filter in school.
To do this I want to write a small Java application with GUI (this isn't a problem).
I just want to make sure that I really grasped the concept of the filter, before starting to write my code. So i will describe what I am going to build and how I will program it and would be really grateful if you could give a "thumbs up" or "thumbs down".
Remember: It is for a short presentation, just to demonstrate. It does not have to be performant or something else ;)
I imagine the program having 2 textareas.
In the first I want to enter a text, for example
"The quick brown fox jumps over the lazy dog"
I then want to have two buttons under this field with "good" or "bad".
When I hit one of the buttons, the program counts the appearances of each word in each section.
So for example, when I enter following texts:
"hello you viagra" | bad
"hello how are you" | good
"hello drugs viagra" | bad
For words I do not know I assume a probability of 0.5
My "database" then looks like this:
<word>, <# times word appeared in bad message>
hello, 2
you, 1
viagra, 2
how, 0
are, 0
drugs, 1
In the second textarea I then want to enter a text to evaluate if it is "good" or "bad".
So for example:
"hello how is the viagra selling"
The algorithm then takes the whole text apart and looks up for every word it's probability to appear in a "bad" message.
This is now where I'm stuck:
If I calculate the probability of a word to appear in a bad message by # times it appeared in bad messages / # times it appeared in all messages, the above text would have 0 probability to be in any category, because:
how never appeared in a bad message, so probability is 0
viagra never appeared in a good message, so probability also 0
When I now multiply the single probabilities, this would give 0 in both cases.
Could you please explain, how I calculate the probability for a single word to be "good" or "bad"?
Best regards and many thanks in advance
me
For unseen words you would like to do Laplace smoothing. What does that mean: having a zero for some word count is counterintuitive since it implies that probability of this word is 0, which is false for any word you can imagine :-) Thus you want to add a little, but positive probability to every word.
Also, consider using logarithms. Long messages will have many words with probability < 1. When you multiply lots of small floating numbers on a computer you can easily run into numerical issues. In order to overcome it, you may note that:
log (p1 * ... * pn) = log p1 + ... + log pn
So we traded n multiplications of small numbers for n additions of relatively big (and negative) ones. Then you can exponentiate result to obtain a probability estimate.
UPD: Actually, it's an interesting subtopic for your demo. It shows a drawback of NB of outputting zero probabilities and a way one can fix it. And it's not an ad-hoc patch, but a result of applying Bayesian approach (it's equivalent to adding a prior)
UPD 2: Didn't notice it first time, but it looks like you got Naive Bayes concept wrong. Especially, bayesian part of it.
Essentially, NB consists of 2 components:
We use Bayes' rule for a posterior distribution over class labels. This gives us p(class|X) = p(X|class) p(class) / p(X) where p(X) is the same for all classes, so it doesn't have any influence on the order of probabilities. Or, another way to say the same, is to say that p(class|X) is proportional to p(X|class) p(class) (up to a constant). As you may guessed already, that's where Bayes comes from.
The formula above does not have any model assumptions, it's a probability theory law. However, it's too hard to apply it directly since p(X|class) denotes probability of encountering message X in a class. No way we would have enough data to estimate probability of every single message possible. So here goes our model assumption: we say that words of a message are independent (which is, obviously, wrong and incorrect, thus method is Naive). This leads us to p(X|class) = p(x1|class) * ... * p(xn|class) where n is amount of words in X.
Now we need somehow estimate probabilities p(x|class). x here is not a whole message, but just a (one) word. Intuitively, probability of getting some word from a given a class is equal to the number of occurrences of that word in that class divided by the total size of the class: #(word, class) / #(class) (or, we could use Bayes' rule once again: p(x|class) = p(x, class) / p(class)).
Accordingly, since p(x|class) is a distribution over xs, we need it to sum to 1. Thus, if we apply Laplace smoothing by saying p(x|class) = (#(x, class) + a) / Z where Z is a normalizing constant, we need to enforce the following constraint: sum_x p(x|class) = 1, or, equivalently, sum_x(#(x, class) + a) = Z. It gives us Z = #(class) + a * N where N is number of all words (just number of words, not their occurrences!)

Quantifying the non-randomness of a specialized random generator?

I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )

Resources