Arranging Vigenere Cipher into columns - encryption

As I understand if you arrange a Vigenere cipher into columns you can use the Index Of Coincidence to find out the key length.
I'm struggling to write an Algorithm that would take a piece of text and arrange it into columns.
For example -
1 2 3 4 5 6 7 8 9 10
Would return this if the period is 2 -
1,3,5,7,9
2,4,6,8,10
and perform an IOC test on each of these strings
IF the period is 3 -
1,4,7,10
2,5,8
3,6,9
and perform an IOC test on each of these strings
Etc etc.
I've constructed an IOC test however I'm struggling to think of an algorithm to split the text up into collumns, any tips on how to think more like a computer scientist and construct algorithms like this?

If you already know the key length, it's pretty trivial. If you don't know the key length, you have to guess it by entropy. Here is an example in Python for instance:
if you_dont_know_key_length:
key_length = find_key_length_by_entropy(ciphertext)
columns = [ciphertext[i::key_length] for i in xrange(key_length)]
Any language should basically have the same construct (pick every n-th element in the ciphertext)

Related

Finding if there relationship between numbers

I have a challenge. This may be little tricky or even not possible but wanted to check if anyone has any thoughts on this?
PS : This question is in general and not related to only to R. May be I can say its general mathematics
I have a data
df
ColA ColB ColC
6 9 27
1 4 32
4 8 40
If you observe closely, there is some relationship between these columns.
Example, (ColC/ColB)+ColA will give you number 9.
df
ColA ColB ColC ColD
6 9 27 9
1 4 32 9
4 8 40 9
However this data is manipulated and I made sure there is some relation.
But in general, lets us take any numbers, is there a way to find if there is any relationship between these numbers. Need not be (ColC/ColB)+ColA . It could be anything.
Say we have 5 columns of numeric data. I need to find mathematical operation between these so that common number exists.
This is more into mathematics(algebra).
Can anyone let me know is this even possible ?
For some types of relationships this is doable. But when such a method fails to find a relationship, it typically just means there could be a relationship of a kind not covered by your approach.
One common tool for finding relationships is linear algebra, and linear dependencies in particular. Write your data in a matrix like you did. Consider that a linear equation
a*ColA + b*ColB + c*ColC = 0
Use standard techniques such as Gaussian elimination to find coefficients a, b, c which satisfy this equation but are not all zero themselves. You probably can find a library to compute the kernel of a matrix which you can use for that. Now you know whether one of the columns can be expressed as a linear combination of the other two.
This is a very limited class of relationships, and doesn't cover your example yet. But you can improve it by including more columns. Include a column with ones everywhere to allow for a constant term in your formula. Include all pair wise products.
x + a*ColA + b*ColB + c*ColC + ab*ColA*ColB + ac*ColA*ColC + bc*ColB*ColC + aa*ColA^2 + bb*ColB^2 + cc*ColC^2 = 0
Now for your data this could tell you that there is a solution of the form
b=-9 c=1 ab=1 x=a=ac=bc=aa=bb=cc=0
-9*ColB + ColC + ColA*ColB = 0
which is equivalent to the relationship you described in your question.
But also observed that you are now using 3 data points to determine 10 variables. So this one relationship is by far not the only one.
In general you want at least as many data points as you have variables in your equation. You want at least as many rows as you have columns in your extended matrix. Only then can you say that a relationship between them us indeed a property of the underlying data and not merely an artifact of having too much flexibility and too little data.
In R you might want to look into using linear models for determining coefficients in the presence of imprecise data. You can also use powers of formulas to include all interactions between columns, i.e. those higher degree terms which I included above as well.

What is the correct name for those mathematical operations?

I am not an english speaker, however I need to write code where I need to include print messages in English, hence using english terminology from Math, statistics etc.
This is the case:
I have two lists and I compare them, let's say:
list 1 - 1 2 3 4 5
list 2 - 2 4 6
So naturally when I compare both lists you see that 2 4 are present in both lists. What is the operation itself called? Because when I try to translate it from my language to english it's "section" or "cutting". I don't believe that this is the official mathematical term for this operation.
Also I want to know what is it called when you show the things that are missing in both lists. For example 1 3 5 6 ?
Thanks and sorry for the silly question.
Intersection for {1,2,3,4,5} ; {2,4,6} = {2,4}
Symmetric difference for {1,2,3,4,5} ; {2,4,6} = {1,3,5,6}

Program asked in a online hiring challenge

Given N integers in the form of Ai where 1≤i≤N, the goal is to find the M that minimizes the sum of |M-Ai| and then report that sum.
For example,
Sample Input: 1 2 4 5
Sample Output: 6
Explanation: One of the best M′s you could choose in this case is 3.
So the answer = |1−3|+|2−3|+|4−3|+|5−3| = 6.
The approach I used is sort the given input and take the middle number as M.
But I was not able to solve all the test cases. I am unable to find any other approach for this question. Where did I go wrong?(Please help me this question has been bugging me from the past 2 days.Thanks)
Can M be any real number or must it be an integer?
If there are no constraints on M your algorithm must work fine.
If M must be an integer then you have to choose M among floor(The Middle Number) and ceiling(The Middle Number).
In which language did you code up the algorithm?

Most efficient format for array data for R import?

I'm in the enviable position of being able to set up the format for my data collection ahead of time, rather than being handed some crazy format and having to struggle with it. I'd like to make sure I'm setting it up in a way that minimizes headaches down the road, but I'm not very familiar with importing into multidimensional arrays so I'd like input. It also seems like a thought exercise that others might get some use from.
I am compiling a large number of data summaries (500+) with 23 single data values for each experiment and two additional vectors that vary between 100 and 1500 data values (these two vectors happen to always match in length for each sample, but their length is different for each sample). I'm having to store all of these in an Excel sheet which I'm currently building. I want to set it up in a way that efficiently stores this data for import into an R array.
I'm assuming that the longer dimensions, which vary in length, will have the max length (1500) and a bunch of NA's at the end rather than try to keep track of ragged data in Excel.
My current plan would be to store these in long form in Excel, with data labels in the first column (dim1, dim2,...), and the data summaries in each subsequent column (a, b, c...), since this saves the most space. Using a smaller number of dimensions as an example (7 single values, 2 vectors of length 1500), the data would look like this in Excel:
a b c...
dim1 2 5 7...
dim2 3 6 8...
dim3 6 8 2 ...
dim4 5 6 1...
dim5 6 2 1...
dim6 0 3 8...
dim7 8 5 4...
dim8 1 1 1...
dim8 2 2 2 ...
... continued x1500
dim9 4 4 4...
dim9 5 5 5 ...
...continued x1500
Can I easily import this, using the leftmost column to identify the dimensions of the array in long form? I don't see an easy way to do this using Reshape2, but perhaps I'm missing something. Or, do I need to have the data in paired columns?
It isn't clear to me whether this format is the most efficient way to organize this data for import into a multidimensional array, or if there is a better way. Eventually there will be a large number of samples so I'd like to think through this now rather than struggle later.
What is the most painless way to import this...or, is there a more efficient way of setting it up for easier import?
Hmm.. I can't think of a case that you would have to use melt. If you keep the current format, and add a heading to the 'dim' column then you should be able to work with that data fairly easily.
If you did transpose the data on 'dim' I think it would make things a lot more difficult.
It might good to know what variable types a,b,c,etc. are in order to make a better assessment.

formula for knowing the fibonacci character at certain index

Is there any formula you can directly apply for knowing a fibonacci character within a word without having to construct the word from scratch.
For example. Let's consider:
0 a
1 b
2 ba
3 bab
4 babba
5 babbabab
Is there a way to find what character is in w(4) at index 3 which in this case is b as
w(4) equals babba provided you know beforehand that w(4) has 5 characters?
Thanks
That would have been really easy to google.
Note: there are no words, like w(4), they are just construction steps of the single infinite word.
You can find the closed formula and its description on wikipedia.

Resources