How to create an 8-bit comparator with four 2-bit comparators? - comparator

I am designing an 8-bit comparator in Xilinx ISE Project Navigator. My goal is to add four 2-bit comparators, as shown at the picture. The input is a 16-bit literal, of which the first 8 bits are number A, the second are number B (SW(15:8) -> A; SW(7:0) -> B). There are two inputs BTN0 and BTN1, I use BTN0 to give the first comparator the EQ input value 1.
In ISim, the comparison works fine if the two numbers are equal, but gets weird when I try with two different numbers. I am working from several sources and I'm a beginner at all this, so there could be easily a bug/error I didn't think about.
http://25.media.tumblr.com/4e443e33d84b43e80e4f595b0044ab86/tumblr_mjd7vttpuc1r65yueo1_1280.png

I am afraid the 2-bit comparator is not correct. For example, if A1 = 1, A0 = 0, B1 = 0, and B0 = 0, the output of the AND3B1 is 0, and the output of the AND4B1 will also be 0, so AG = 0.

Related

Most elegant way to determine how much one has been bitshifted

So let's say we bitshift 1 by some number x; eg, in c:
unsigned char cNum= 1, x= 6;
cNum <<= x;
cNum will equal 01000000b (0x40).
Easy peasy. But without using a lookup table or while loop, is there a simple operation that will take cNum and give me x back?
AFAIK, no 'simple' formula is available.
One can, however, calculate the index of most significant (or least significant) set bit:
a = 000010010, a_left = 3, a_right = 1
b = 001001000, b_left = 5, b_right = 3
The difference of the shifts is 2 (or -2).
One can then shift the smaller by abs(shift) to compare that a << 2 == b. (In some architectures there exists a shift by signed value, which works without absolute value or checking which way the shift needs to be carried.)
In ARM Neon there exists an instruction for counting the MSB bit and in Intel there exists an instruction to scan both from left and right.
log2(cNum)+ 1; will yield x where cNum != 0, at least in GNU c.
And the compiler does the casts automagically which is probably bad form, but it gives me what I need.

Modulo operator in decryption

I'm creating an encryptor/decryptor for ascii strings where I take the ascii value of a char, add 1 to it, then mod it by the highest ascii value so that I get a valid ascii char out.
The problem is the decryption.
Let's say that (a + b) % c = d
I know b, c, and d's values.
How do I get the a variables value out from that?
This is exactly the ROT1 substitution cipher. Subtract 1, and if less than lowest value (0 I assume, given how you're describing it), then add the highest value.
Using terms like "mod," while accurate, make this seem more complicated than it is. It's just addition on a ring. When you go past the last letter, you come back to the first letter and vice-versa. Once you put your head around how the math works, the equations should pop out. Basically, you just add or subtract as normal (add to encrypt, subtract to decrypt in this case), and at the end, mod "normalizes" you back onto the ring of legal values.
Use the inverse formula
a = (b - d) mod c
or in practice
a = (b - d + c) % c.
The term + c needs to be added as a safeguard because the % operator does not implement a true modulo in the negatives.
Let's assume that c is 2, d is 0 and b is 4.
Now we know that a must be 2... Or 4 actually.. or 6... Or any other even number.
You can't solve this problem, there are infinite solutions.

All numbers in a given range but random order

Let's say I want to generate all integers from 1-1000 in a random order. But...
No numbers are generated more then once
Without storing an Array, List... of all possible numbers
Without storing the already generated numbers.
Without missing any numbers in the end.
I think that should be impossible but maybe I'm just not thinking about the right solution.
I would like to use it in C# but I'm more interested in the approche then the actual implementation.
Encryption. An encryption is a one-to-one mapping between two sets. If the two sets are the same, then it is a permutation specified by the encryption key. Write/find an encryption that maps {0, 1000} onto itself. Read up on Format Preserving Encryption (FPE) to help you here.
To generate the random order just encrypt the numbers 0, 1, 2, ... in order. You don't need to store them, just keep track of how far you have got through the list.
From a practical point of view, numbers in {0, 1023} would be easier to deal with as that would be a block cipher with a 10 bit block size, and you could write a simple Feistel cipher to generate your numbers. You might want to do that anyway, and just re-encrypt numbers above 1000 -- the cycle walking method of FPE.
If randomness isn't a major concern, you could use a linear congruential generator. Since an LCG won't produce a maximal length sequences when the modulus is a prime number, you would need to choose a larger modulus (the next highest power of 2 would be an obvious choice) and skip any values outside the required range.
I'm afraid C# isn't really my thing, but hopefully the following Python is self-explanatory. It will need a bit of tweaking if you want to generate sequences over very small ranges:
# randint(a, b) returns a random integer in the range (a..b) (inclusive)
from random import randint
def lcg_params(u, v):
# Generate parameters for an LCG that produces a maximal length sequence
# of numbers in the range (u..v)
diff = v - u
if diff < 4:
raise ValueError("Sorry, range must be at least 4.")
m = 2 ** diff.bit_length() # Modulus
a = (randint(1, (m >> 2) - 1) * 4) + 1 # Random odd integer, (a-1) divisible by 4
c = randint(3, m) | 1 # Any odd integer will do
return (m, a, c, u, diff + 1)
def generate_pseudorandom_sequence(rmin, rmax):
(m, a, c, offset, seqlength) = lcg_params(rmin, rmax)
x = 1 # Start with a seed value of 1
result = [] # Create empty list for output values
for i in range(seqlength):
# To generate numbers on the fly without storing them in an array,
# just run the following while loop to fetch a new number
while True:
x = (x * a + c) % m # Iterate LCG until we get a value in the
if x < seqlength: break # required range
result.append(x + offset) # Add this value to the list
return result
Example:
>>> generate_pseudorandom_sequence(1, 20)
[4, 6, 8, 1, 10, 3, 12, 5, 14, 7, 16, 9, 18, 11, 20, 13, 15, 17, 19, 2]

Increment number stored as array of digit-counters

I'm trying to store a counter that can become very large (well over 32 and probably 64-bit limits), but rather than use a single integer, I'd like to store it as an array of counters for each digit. This should be pretty language-agnostic.
In this form, 0 would be [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] (one zero, none of the other digits up to 9). 1 would be [0, 1, 0, ...] and so on. 10 would therefore be [1, 1, 0, ...].
I can't come with a way to keep track of which digits should be decremented (moving from 29 to 30, for example) and how those should be moved. I suspect that it can't be done without another counter, either a single value representing the last cell touched, or an array of 10 more counters to flag when each digit should be touched.
Is it possible to represent a number in this fashion and count up without using a simple integer value?
No, this representation by itself would be useless because it fails to encode digit position, leading to many numbers having the same representation (e.g. 121 and 211).
Either use a bignum library, or 80-bits worth of raw binary (that being sufficient to store your declared range of 10e23)

How do you represent data records containing non-numerical features as vectors (mathematical, NOT c++ vector)?

Many data mining algorithms/strategies use vector representation of data records in order to simulate a spatial representation of the data (like support vector machines).
My trouble comes from how to represent non-numerical features within the dataset. My first thought was to 'alias' each possible value for a feature with a number from 1 to n (where n is the number of features).
While doing some research I came across a suggestion that when dealing with features that have a small number of possible values that you should use a bit string of length n where each bit represents a different value and only the one bit corresponding to the value being stored is flipped. I can see how you could theoretically save memory using this method with features that have less possible values than the number of bits used to store an integer value on your target system but the data set I'm working with has many different values for various features so I don't think that solution will help me at all.
What are some of the accepted methods of representing these values in vectors and when is each strategy the best choice?
So there's a convention to do this. It's much easier to show by example than to explain.
Suppose you have have collected from your web analytics app, four sets of metrics describing each visitor to a web site:
sex/gender
acquisition channel
forum participation level
account type
Each of these is a categorical variable (aka factor) rather than a continuous variable (e.g., total session time, or account age).
# column headers of raw data--all fields are categorical ('factors')
col_headers = ['sex', 'acquisition_channel', 'forum_participation_level', 'account_type']
# a single data row represents one user
row1 = ['M', 'organic_search', 'moderator', 'premium_subscriber']
# expand data matrix width-wise by adding new fields (columns) for each factor level:
input_fields = [ 'male', 'female', 'new', 'trusted', 'active_participant', 'moderator',
'direct_typein', 'organic_search', 'affiliate', 'premium_subscriber',
'regular_subscriber', 'unregistered_user' ]
# now, original 'row1' above, becomes (for input to ML algorithm, etc.)
warehoused_row1 = [1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0]
This transformation technique seems more sensible to me than keeping each variable as a single column. For instance, if you do the latter, then you have to reconcile the three types of acquisition channels with their numerical representation--i.e., if organic search is a "1" should affiliate be a 2 and direct_typein a 3, or vice versa?
Another significant advantage of this representation is that it is, despite the width expansion, a compact representation of the data. (In instances where the column expansion is substantial--i.e., one field is user state, which might mean 1 column becomes 50, a sparse matrix representation is obviously a good idea.)
for this type of work i use the numerical computation libraries NumPy and SciPy.
from the Python interactive prompt:
>>> # create two data rows, representing two unique visitors to a Web site:
>>> row1 = NP.array([0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0])
>>> row2 = NP.array([1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0])
>>> row1.dtype
dtype('int64')
>>> row1.itemsize
8
>>> # these two data arrays can be converted from int/float to boolean, substantially
>>> # reducing their size w/ concomitant performance improvement
>>> row1 = NP.array(row1, dtype=bool)
>>> row2 = NP.array(row2, dtype=bool)
>>> row1.dtype
dtype('bool')
>>> row1.itemsize # compare with row1.itemsize = 8, above
1
>>> # element-wise comparison of two data vectors (two users) is straightforward:
>>> row1 == row2 # element-wise comparison
array([False, False, False, False, True, True, False, True, True, False], dtype=bool)
>>> NP.sum(row1==row2)
5
For similarity-based computation (e.g. k-Nearest Neighbors), there is a particular metric used for expanded data vectors comprised of categorical variables called the Tanimoto Coefficient. For the particular representation i have used here, the function would look like this:
def tanimoto_bool(A, B) :
AuB = NP.sum(A==B)
numer = AuB
denom = len(A) + len(B) - AuB
return numer/float(denom)
>>> tanimoto_bool(row1, row2)
0.25
There are no "widely accepted answer" that I know of, it entirely depends on what you want.
The main idea behind your post is that the trivial memory representation of a state may be too memory intensive. For example, to store a value that can have at most four states, you will use an int (32 bits) but you could manage with only 2 bits, so 16 times less.
However, the cleverer your representation of a vector (ie : compact), the longer it will take you to code/decode it from/to the trivial representation.
I did a project where I represented the state of a Connect-4 board with 2 doubles (64 bits), where each double coded the discs owned by each player. It was a vast improvement over storing the state as 42 integers! I could explore much farther by having a smaller memory footprint. This is typically what you want.
It is possible through clever understanding of the Connect-4 to code it with only one double! I tried it, and the program became sooo long that I reverted to using 2 doubles instead of one. The program spent most of its time in the code/decode functions. This is typically what you do not want.
Now, because you want an answer, there are some guidelines :
If you can store booleans with one byte only, then keep them as booleans (language/compiler dependant).
Concatenate all small features (between 3 and 256 possible values) in primitive types like int, double, long double or whatever your language uses. Then write functions to code/decode, using bitshift operators for speed if possible.
Keep features that can have "lots of" (more than 256) possible values as-is.
Of course, these are not absolutes. If you have a feature that can take exactly 2^15 values and another 2^17 values, then concatenate them in a primitive type that has a 32bits size.
TL;DR : There's a trade off between memory consumption and speed. You need to adjust according to your problem

Resources