Assigning specific values to a boolean array - julia

Say I am tossing a fair coin where 'tails' is assigned the value x = -1/2 and 'heads' is assigned x = 1/2.
I do this N times and I want to obtain the sum. This is what I have tried:
p = 0.5
N = 1e4
X(N,p)=(rand(N).<p)
I know this is incomplete but when I check (rand(N).<p) I see an array consisting of true, false. I interpret this as 'Tails' or 'Heads'. However, I don't know how to assign the values 1/2 and -1/2 to each of these elements in order for me to find the sum. If I simply use sum((rand(N).<p)) I do get an integer value, but I don't think this is the right way to do it because I haven't specified the values 1/2 and -1/2 anywhere.
Any help is greatly appreciated.

As indicated by the comments already, you want to do
sum(rand([-0.5, 0.5], N))
where N must be an integer (you wrote N=1e4, therefore typeof(N) == Float64 and rand won't work).
The documentation of rand (obtained by ?rand) describes what rand(S, N) does:
Pick a random element or array of random elements from the set of
values specified by S
Here, S can be an optional indexable collection, an array of values in your case (or a type like Int). So, above S = [-0.5, 0.5] and rand draws N random elements from this collection, which we can afterwards sum up.
Assigning specific values to a boolean array
Since this is the title of your question, and the answer above doesn't actually address this, let me comment on this as well.
You could do sum((rand(N).<p)-0.5), i.e. you shift all the ones to 0.5 and all the zeros to -0.5 to get the wanted result. Note that this is a general strategy: Let's say you want true to be a and false to be b, where a and b are numbers. You achieve this by (rand(N).<p)*(a-b) + b.
However, beyond being more "complicated", sum((rand(N).<p)-0.5) will allocate temporary arrays, first one of booleans, then one of numbers, the latter of which will eventually go into sum. Because of these unnecessary allocations this approach will be slower than the solution above.

Related

Implementation of Speck cipher

I am trying to implement the speck cipher as specified here: Speck Cipher. On page 18 of the document you can find some speck pseudo-code I want to implement.
It seems that I got a problem on understanding the pseudo-code. As you can find there, x and y are plaintext words with length n. l[m-2],...l[0], k[0] are key words (as for words, they have length n right?). When you do the key expansion, we iterate for i from 0 to T-2, where T are the round numbers (for example 34). However I get an IndexOutofBoundsException, because the array with the l's has only m-2 positions and not T-2.
Can someone clarify what the key expansions does and how?
Ah, I get where the confusion lies:
l[m-2],...l[0], k[0]
these are the input key words, in other words, they represent the key. These are not declarations of the size of the arrays, as you might expect if you're a developer.
Then the subkey's in array k should be derived, using array l for intermediate values.
According to the formulas, taking the largest i, i.e. i_max = T - 2 you get a highest index for array l of i_max + m - 1 = T - 2 + m - 1 = T + m - 3 and therefore a size of the array of one more: T + m - 2. The size of a zero-based array is always the index of the last element - plus one, after all.
Similarly, for subkey array k you get a highest index of i_max + 1, which is T - 2 + 1 or T - 1. Again, the size of the array is one more, so there are T elements in k. This makes a lot of sense if you require T round keys :)
Note that it seems possible to simply redo the subkey derivation for each round if you require a minimum of RAM. The entire l array doesn't seem necessary either. For software implementations that doesn't matter a single iota of course.

Generate Unique Combinations of Integers

I am looking for help with pseudo code (unless you are a user of Game Maker 8.0 by Mark Overmars and know the GML equivalent of what I need) for how to generate a list / array of unique combinations of a set of X number of integers which size is variable. It can be 1-5 or 1-1000.
For example:
IntegerList{1,2,3,4}
1,2
1,3
1,4
2,3
2,4
3,4
I feel like the math behind this is simple I just cant seem to wrap my head around it after checking multiple sources on how to do it in languages such as C++ and Java. Thanks everyone.
As there are not many details in the question, I assume:
Your input is a natural number n and the resulting array contains all natural numbers from 1 to n.
The expected output given by the combinations above, resembles a symmetric relation, i. e. in your case [1, 2] is considered the same as [2, 1].
Combinations [x, x] are excluded.
There are only combinations with 2 elements.
There is no List<> datatype or dynamic array, so the array length has to be known before creating the array.
The number of elements in your result is therefore the binomial coefficient m = n over 2 = n! / (2! * (n - 2)!) (which is 4! / (2! * (4 - 2)!) = 24 / 4 = 6 in your example) with ! being the factorial.
First, initializing the array with the first n natural numbers should be quite easy using the array element index. However, the index is a property of the array elements, so you don't need to initialize them in the first place.
You need 2 nested loops processing the array. The outer loop ranges i from 1 to n - 1, the inner loop ranges j from 2 to n. If your indexes start from 0 instead of 1, you have to take this into consideration for the loop limits. Now, you only need to fill your target array with the combinations [i, j]. To find the correct index in your target array, you should use a third counter variable, initialized with the first index and incremented at the end of the inner loop.
I agree, the math behind is not that hard and I think this explanation should suffice to develop the corresponding code yourself.

Domain Error Issue

When I compute the difference between the largest and the smallest number in an empty vector(v←⍳0) using ⌈⌿(⌈/c)- ⌊⌿(⌊/c) , it gives me a domain error. This statement works fine with normal vectors and matrices.
How do I handle the exception such that it does not give me an error when the vector is empty? It should not return anything or just return a zero.
A guard is the best way to do this:
{0=⍴⍵:0 ⋄ (⌈/⍵)-⌊/⍵}
Note that the use of two reductions, one with axis specfication, is not really needed or correct actually. That is, if you want it to work on all of the elements of a simple array of any dimension, simply ravel the argument first:
{0=⍴⍵:0 ⋄ (⌈/⍵)-⌊/⍵},10 10 ⍴⍳100
99
Or for an array of any structure or depth, you can use "super ravel":
{0=⍴⍵:0 ⋄ (⌈/⍵)-⌊/⍵}∊(1 2 3)(7 8 9 10)
9
Note that quadML (Migration Level) must be set to 3 to ensure that epsilon is "super ravel."
Note also the equivalence of the following when operating on a matrix:
⌈⌿⌈/10 10 ⍴⍳100
99
⌈/⌈/10 10 ⍴⍳100
99
⌈/⌈⌿10 10 ⍴⍳100
99
⌈⌿⌈⌿10 10 ⍴⍳100
99
Using reduction with axis is not needed in this case, and obscures the intent and is also potentially more expensive. Better to just ravel the whole thing.
As I mentioned in the comments, Dyalog APL has guards, which can be used for conditional execution, and thus you can simply check for the empty vector and give a different answer.
This can be implemented in a more traditional/pure APL method however.
This version only works in 1-dimension
In the APL font:
Z←DIFFERENCE V
⍝ Calculate difference between vectors, with empty set protection
⍝ Difference is calculated by a reduced ceiling subtracted from the reduced floor
⍝ eg. (⌈⌿(⌈V)) - (⌊⌿(⌊V))
⍝ Protection is implemented by comparison against the empty set ⍬≡V
⍝ Which yields 0 or 1, and using that result to select an answer from a tuple
⍝ If empty, then it drops the first element, yielding just a zero, otherwise both are retained
⍝ eg. <condition>↓(a b) => 0 = (a b), 1 = (b)
⍝ The final operation is first ↑, to remove the first element from the tuple.
Z←↑(⍬≡V)↓(((⌈⌿(⌈V)) - (⌊⌿(⌊V))) 0)
Or in brace notation, for people without the font.
Z{leftarrow}DIFFERENCE V
{lamp} Calculate difference between vectors, with empty set protection
{lamp} Difference is calculated by a reduced ceiling subtracted from the reduced floor
{lamp} eg. ({upstile}{slashbar}({upstile}V)) - ({downstile}{slashbar}({downstile}V))
{lamp} Protection is implemented by comparison against the empty set {zilde}{equalunderbar}V
{lamp} Which yields 0 or 1, and using that result to select an answer from a tuple
{lamp} If empty, then it drops the first element, yielding just a zero, otherwise both are retained
{lamp} eg. <condition>{downarrow}(a b) => 0 = (a b), 1 = (b)
{lamp} The final operation is first {uparrow}, to remove the first element from the tuple.
Z{leftarrow}{uparrow}({zilde}{equalunderbar}V){downarrow}((({upstile}{slashbar}({upstile}V)) - ({downstile}{slashbar}({downstile}V))) 0)
and an image for the sake of preservation...
Updated. multi-dimensional
Z←DIFFERENCE V
⍝ Calculate difference between vectors, with empty set protection
⍝ Initially enlist the vector to get reduce to single dimension
⍝ eg. ∊V
⍝ Difference is calculated by a reduced ceiling subtracted from the reduced floor
⍝ eg. (⌈/V) - (⌊/V)
⍝ Protection is implemented by comparison against the empty set ⍬≡V
⍝ Which yields 0 or 1, and using that result to select an answer from a tuple
⍝ If empty, then it drops the first element, yielding just a zero, otherwise both are retained
⍝ eg. <condition>↓(a b) => 0 = (a b), 1 = (b)
⍝ The final operation is first ↑, to remove the first element from the tuple.
V←∊V
Z←↑(⍬≡V)↓(((⌈/V) - (⌊/V)) 0)

Calculating Cosine Similarity of two Vectors of Different Size

I have 2 questions,
I've made a vector from a document by finding out how many times each word appeared in a document. Is this the right way of making the vector? Or do I have to do something else also?
Using the above method I've created vectors of 16 documents, which are of different sizes. Now i want to apply cosine similarity to find out how similar each document is. The problem I'm having is getting the dot product of two vectors because they are of different sizes. How would i do this?
Sounds reasonable, as long as it means you have a list/map/dict/hash of (word, count) pairs as your vector representation.
You should pretend that you have zero values for the words that do not occur in some vector, without storing these zeros anywhere. Then, you can use the following algorithm to compute the dot product of these vectors (pseudocode):
algorithm dot_product(a : WordVector, b : WordVector):
dot = 0
for word, x in a do
y = lookup(word, b)
dot += x * y
return dot
The lookup part can be anything, but for speed, I'd use hashtables as the vector representation (e.g. Python's dict).

Efficient Multiplication of Varying-Length #s [Conceptual]

EDIT
So it seems I "underestimated" what varying length numbers meant. I didn't even think about situations where the operands are 100 digits long. In that case, my proposed algorithm is definitely not efficient. I'd probably need an implementation who's complexity depends on the # of digits in each operands as opposed to its numerical value, right?
As suggested below, I will look into the Karatsuba algorithm...
Write the pseudocode of an algorithm that takes in two arbitrary length numbers (provided as strings), and computes the product of these numbers. Use an efficient procedure for multiplication of large numbers of arbitrary length. Analyze the efficiency of your algorithm.
I decided to take the (semi) easy way out and use the Russian Peasant Algorithm. It works like this:
a * b = a/2 * 2b if a is even
a * b = (a-1)/2 * 2b + a if a is odd
My pseudocode is:
rpa(x, y){
if x is 1
return y
if x is even
return rpa(x/2, 2y)
if x is odd
return rpa((x-1)/2, 2y) + y
}
I have 3 questions:
Is this efficient for arbitrary length numbers? I implemented it in C and tried varying length numbers. The run-time in was near-instant in all cases so it's hard to tell empirically...
Can I apply the Master's Theorem to understand the complexity...?
a = # subproblems in recursion = 1 (max 1 recursive call across all states)
n / b = size of each subproblem = n / 1 -> b = 1 (problem doesn't change size...?)
f(n^d) = work done outside recursive calls = 1 -> d = 0 (the addition when a is odd)
a = 1, b^d = 1, a = b^d -> complexity is in n^d*log(n) = log(n)
this makes sense logically since we are halving the problem at each step, right?
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
Many thanks in advance
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
This actually change everything about the problem (and make your algorithm incorrect).
It means than 1234 is provided as 1,2,3,4 and you cannot operate directly on the whole number. You need to analyze your algorithm in terms of #additions, #multiplications, #divisions.
You should expect a division to be a bit more expensive than a multiplication, and a multiplication to be lot more expensive than an addition. So a good algorithm try to reduce the number of divisions and multiplications.
Check out the Karatsuba algorithm, (ps don't copy it that's not what your teacher want) is one of the fastest for this specification.
Add 3): Native integers are limited in how large (or small) numbers they can represent (32- or 64-bit integers for example). To represent arbitrary length numbers you can choose strings, because then you are not really limited by this. The problem is then, of course, that your arithmetic units are not really made to add strings ;-)

Resources