Related
The 68k registers are divided into two groups of eight. Eight data registers (D0 to D7) and eight address registers (A0 to A7). What is the purpose of this separation, would not be better if united?
The short answer is, this separation comes from the architecture limitations and design decisions made at the time.
The long answer:
The M68K implements quite a lot of addressing modes (especially when compared with the RISC-based processors), with many of its instructions supporting most (if not all) of them. This gives a large variety of addressing modes combinations within every instruction.
This also adds a complexity in terms of opcode execution. Take the following example:
move.l $10(pc), -$20(a0,d0.l)
The instruction is just to copy a long-word from one location to another, simple enough. But in order to actually perform the operation, the processor needs to figure out the actual (raw) memory addresses to work with for both source and destination operands. This process, in which operands addressing modes are decoded (resolved), is called the effective address calculation.
For this example:
In order to calculate the source effective address - $10(pc),
the processor loads the value of PC (program) counter register
and adds $10 to it.
In order to calculate the destination effective address -
-$20(a0,d0.l), the processor loads the value of A0 register, adds the value of D0 register to it, then subtracts
$20.
This is quite a lot of calculations of a single opcode, isn't it?
But the M68K is quite fast in performing these calculations. In order to calculate effective addresses quickly, it implements a dedicated Address Unit (AU).
As a general rule, operations on data registers are handled by the ALU (Arithmetic Logical Unit) and operations involving address calculations are handled by the AU (Address Unit).
The AU is well optimized for 32-bit address operations: it performs 32-bit subtraction/addition within one bus cycle (4 CPU ticks), which ALU doesn't (it takes 2 bus cycles for 32-bit operations).
However, the AU is limited to just load and basic addition/subtraction operations (as dictated by the addressing modes), and it's not connected to the CCR (Conditional Codes Register), which is why operations on address registers never update flags.
That said, the AU should've been there to optimize calculation of complex addressing modes, but it just couldn't replace the ALU completely (after all, there were only about 68K transistors in the M68K), hence there are two registers set (data and address registers) each having their own dedicated unit.
So this is just based on a quick lookup, but using 16 registers is obviously easier to program. The problem could be that you would then have to make instructions for each of the 16 registers. Which would double the number of opcodes needed. Using half for each purpose is not ideal but gives access to more registers in general.
How bad is changing generated GUID manually and using it? Is the probability of collision still insignificant or is manipulation with GUIDs dangerous?
Sometimes we just change some letter of previously generated GUID and use it. Should we stop doing it?
This depends on the version of the GUID and where you are making the change. Let's dissect a little how a GUID actually looks like:
A GUID has a version. The 13th hex digit in a GUID marks its version. Current GUIDs are usually generated with version 4. If you change the version backwards you risk collision with a GUID that already exists. Change it forwards and you risk collision with potential future GUIDs.
A GUID has a variant too. The 17th hex digit in a GUID is the variant field. Some values of it are reserved for backward compatibility, one value is reserved for future expansion. So changing something there means you risk collision with previously-generated GUIDs or maybe GUIDs to be generated in the future.
A GUID is structured differently depending on the version. Version 4 GUIDs use (for the most part – excepting the 17th hex digit) truly random or pseudo-random bits (in most implementation pseuso-random). Change something there and your probability of collision remains about the same.
It should be very similar for version 3 and 5 GUIDs which use hashes, although I don't recall ever seeing one in the wild. Not so much for versions 1 and 2, though. Those have a structure and depending on where you change something you make things difficult.
Version 1 GUIDs include a timestamp and a counter field which gets incremented if two GUIDs are generated in the same clock interval (and thus would lead to the same timestamp). If you change the timestamp you risk colliding with a GUID generated earlier or later on the same machine. If you change the counter you risk colliding with a GUID that was generated at the same time and thus needed the counter as a “uniquifier”.
Version 2 GUIDs expand on version 1 and include a user ID as well. The timestamp is less accurate and contains a user or group ID while a part of the counter is used to indicate which one is meant (but which only has a meaning to the generating machine). So with a change in those parts you risk collision with GUIDs generated by another user on the same machine.
Version 1 and 2 GUIDs include a MAC address. Specifically, the MAC address of the computer that generated them. This ensures that GUIDs from different machines are different even if generated in the very same instant. There is a fallback if a machine doesn't have a MAC address but then there is no uniqueness guarantee. A MAC address also has a structure and consists of an “Organisationally Unique Identifier” (OUI; which is either locally-administered or handed out by the IEEE) and an unique identifier for the network card.
If you make a change in the OUI you risk colliding with GUIDs generated in computers with network cards of other manufacturers. Unless you make the change so the second-least significant bit of the first octet is 1, in which case you're switching to a locally-administered OUI and only risk collision with GUIDs generated on computers that have an overridden MAC address (which might include most VMs with virtual network hardware).
If you chance the card identifier you risk collision with GUIDs generated on computers with other network cards by the same manufacturer or, again, with those where the MAC address was overridden.
No other versions exist so far but the gist is the following: A GUID needs all its parts to ensure uniqueness; if you change something you may end up with a GUID which isn't necessarily unique anymore. So you're probably making it more of a GID or something. The safest to change are probably the current version 4 GUIDs (which is what Windows and .NET will generate) as they don't really guarantee uniqueness but instead make it very, very unlikely.
Generally I'd say you're much better off generating a new GUID, though. This also helps the person reading them because you can tell two GUIDs apart as different easily if they look totally different. If they only differ in a single digit a person is likely to miss the change and assume the GUIDs to be the same.
Further reading:
Wikipedia: GUID
Wikipedia: UUID
Eric Lippert: GUID guide. Part 1, part 2, part 3. (Read it; this guy can explain wonderfully and happens to be on SO too)
Wikipedia: MAC address
RFC 4122: The GUID versions
RFC 4122: The variant field
DCE 1.1: Authentication and security services – The description of version 2 GUIDs
Raymond Chen: GUIDs are globally unique, but substrings of GUIDs aren't
Raymond Chen: GUIDs are designed to be unique, not random
I have no idea how that would affect the uniqueness of the GUID, but it's probably not a good idea.
Visual Studio has a built in GUID generator that takes a couple of seconds to spin up and create a new GUID. If you don't use VS then there are other easy ways to create a new one. This page has 2 scripts (VB script and PHP) that will do the job and here's a .net version
I'm looking for a simple clock synchronization protocol that would be easy to implement with small footprint and that would work also in the absence of internet connection, so that it could be used e.g. within closed laboratory networks. To be clear, I'm not looking for something that can be used just to order events (like vector clocks), but something that would enable processes on different nodes to synchronize their actions based on local clocks. As far as I understand, this would require a solution that can take clock drift into account. Presence of TCP/IP or similar relatively low-latency stream connections can be assumed.
Disclaimer: I'm not an NTP expert by any means. Just a hobbyist having fun on the weekend.
I realize you said you didn't want an NTP implementation, because of the perceived complexity and because an Internet NTP server may not be available in your environment.
However, an simplified NTP look-up may be easy to implement, and if you have a local NTP server you can achieve good synchronization.
Here's how:
Review RFC 5905
You'll see NTP v4 packets look something like:
LI (2 bits)
VN (3 bits) - Use '100' (4)
Mode (3 bits)
Stratum (8 bits)
Poll (8 bits)
Precision (8 bits)
Root Delay (32 bits)
Root Dispersion (32 bits)
Reference Id (32 bits)
Reference Timestamp (64 bits)
Origin Timestamp (64 bits)
Receive Timestamp (64 bits)
Transmit Timestamp (64 bits)
Extension Field 1 (variable)
Extension Field 2 (variable)
...
Key Identifier
Digest (128 bits)
The digest is not required, so forming a valid client request is very easy. Following the guidance in the RFC, use LI = '00', VN = '100' (decimal 4), Mode = '011' (decimal 3).
Using C# to illustrate:
byte[] ntpData = new byte[48]
Array.Clear(ntpData, 0, ntpData.Length);
ntpData[0] = 0x23; // LI = 00, VN = 100, Mode = 011
Open a socket to your target server and send it over.
int ntpPort = 123;
IPEndPoint target = new IPEndPoint(Dns.GetHostEntry(serverDnsName).AddressList[0], ntpPort);
Socket s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
s.Connect(target);
s.Send(ntpData);
In the response, the current time will be in the Transmit Timestamp (bytes [40 - 48]). Timestamps are 64-bit unsigned fixed-point numbers. The integer part is the first 32 bits, the fractional part is the last 32 bits. It represents the number of seconds since 0h on Jan-1-1900.
s.Receive(ntpData);
s.Close();
ulong intPart = 0;
ulong fractPart = 0;
for (int i = 0; i < 4; i++)
intPart = (intPart << 8) | ntpData[40 + i];
for (int i = 4; i < 8; i++)
fractPart = (fractPart << 8) | ntpData[40 + i];
To update the clock with (roughly) second granularity, use: # of seconds since 0h Jan-1-1900 = intPart + (fractPart / 2^32). (I say roughly because network latency isn't accounted for, and we're rounding down here)
ulong seconds = intPart + (fractPart / 4294967296);
TimeSpan ts = TimeSpan.FromTicks((long)seconds * TimeSpan.TicksPerSecond);
DateTime now = new DateTime(1900, 1, 1);
now = DateTime.SpecifyKind(now, DateTimeKind.Utc);
now += ts;
"now" is now a DateTime with the current time, in UTC.
While this might not answer your question, hopefully it makes NTP a little less opaque. =)
I was able to implement a parred down version of the Precision Time Protocol very quickly and easily based solely on the wikipedia article. If all you are interested in is synchronizing them with each other as opposed to synchronizing them with the outside world, you should be able to get millisecond accuracy with minimal effort.
The fundamental basics of the protocol involve the following:
A master clock broadcasts a sync message with the timestamp of when he sent the message (T1).
The clients record the time at which they received the sync message as T1'.
The clients send a delay request back to the master and record the time they sent the message as T2.
The master responds to the delay request with the time he received the message. This time is T2'.
Client adjusts their clock by (T1' - T1 - T2' + T2)/2.
If you need better stability, you can implement a phase locked loop or a linear regression or something similar to better control your jitter and avoid large swings due to network lags. There are a number of more complicated features specified by the protocol, but if you want to implement them depends on how close 'good enough' is.
ntp is the right tool for the job. You do not need an internet connection, and for an extra $105 and a few hours of your life, you can even be GPS synchronized for an absolute time reference without an internet connection, though that appears to not be important to you.
Ignoring the slight additional complexity of GPS synchronization, you can get synchronized to a chosen system's clock using a few configuration file lines (four lines on each client, five lines on the server). The ntpd binary is 505kB on my system. You can also use ntpdate which can be periodically run to adjust the system clock (zero lines of configuration on the client, other than the call to the ntpdate application with the right arguments). That binary is 80kb. There is a SNTP protocol which allows even smaller footprints for embedded appliations (talking to a normal ntp server). There is also an alternate NTP implementation called chrony.
There is also a program called rdate (typically only on older systems, though source is available) which works kinda like ntpdate but much less precisely. You also need an RFC 868 server, often provided in inetd.
The only other alternative is the Precision Time Protocol already mentioned.
Might Precision Time Protocol fit the bill? It doesn't look real simple, but it seems to do more or less exactly what you are asking for. (There are some open source implementations referenced on the Wikipedia page.)
I think the problem is that this is an inherently tricky problem, so the solutions tend to be complex. NTP is trying to provide a correct absolute time, which definitely goes beyond what you need, but it does have the advantage of being well known and widely implemented.
Might using http://www.ietf.org/rfc/rfc5905.txt be appropriate?
Even if it is much more than what you need, you could certainly implement a "compatible" client that works with an NTP server (even if you run your own NTP server), but where the client implementation is purposely naive?
Eg, if you don't care about small time adjustments, don't implement them. If you don't care about bidirectional synchronisation, don't implement that, etcetera.
(Be warned: Most of the functionality present in that RFC is there for a reason - Accurate time synchronisation has many pitfalls - including the fact that many OS's do not like it if the time suddenly changes)
Not really a proper answer, but just a reminder to make sure that you understand exactly what the hardware clock sources are and any caveats about them - especially if you are planning to use some slightly exotic possibilities like the low-power CPU / RTOS combination you mention.
Even the x86 case has at least 2 or 3 clocks which could be in use, depending on the setup - all with different properties.
I understand how standard random number generators work. But when working with crytpography, the random numbers really have to be random.
I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this.
How does a cryptographically secure random number generator get its values with no repeatable patterns?
A cryptographically secure number random generator, as you might use for generating encryption keys, works by gathering entropy - that is, unpredictable input - from a source which other people can't observe.
For instance, /dev/random(4) on Linux collects information from the variation in timing of hardware interrupts from sources such as hard disks returning data, keypresses and incoming network packets. This approach is secure provided that the kernel does not overestimate how much entropy it has collected. A few years back the estimations of entropy from the various different sources were all reduced, making them far more conservative. Here's an explanation of how Linux estimates entropy.
None of the above is particularly high-throughput. /dev/random(4) probably is secure, but it maintains that security by refusing to give out data once it can't be sure that that data is securely random. If you want to, for example, generate a lot of cryptographic keys and nonces then you'll probably want to resort to hardware random number generators.
Often hardware RNGs are designed about sampling from the difference between a pair of oscillators that are running at close to the same speed, but whose rates are varied slightly according to thermal noise. If I remember rightly, the random number generator that's used for the UK's premium bond lottery, ERNIE, works this way.
Alternate schemes include sampling the noise on a CCD (see lavaRND), radioactive decay (see hotbits) or atmospheric noise (see random.org, or just plug an AM radio tuned somewhere other than a station into your sound card). Or you can directly ask the computer's user to bang on their keyboard like a deranged chimpanzee for a minute, whatever floats your boat.
As andras pointed out, I only thought to talk about some of the most common entropy gathering schemes. Thomas Pornin's answer and Johannes Rössel's answer both do good jobs of explaining how one can go about mangling gathered entropy in order to hand bits of it out again.
For cryptographic purposes, what is needed is that the stream shall be "computationally indistinguishable from uniformly random bits". "Computationally" means that it needs not be truly random, only that it appears so to anybody without access to God's own computer.
In practice, this means that the system must first gather a sequence of n truly random bits. n shall be large enough to thwart exhaustive search, i.e. it shall be infeasible to try all 2^n combinations of n bits. This is achieved, with regards to today's technology, as long as n is greater than 90-or-so, but cryptographers just love powers of two, so it is customary to use n = 128.
These n random bits are obtained by gathering "physical events" which should be unpredictable, as far as physics are concerned. Usually, timing is used: the CPU has a cycle counter which is updated several billions times per second, and some events occur with an inevitable amount of jitter (incoming network packets, mouse movements, key strokes...). The system encodes these events and then "compresses" them by applying a cryptographically secure hash function such as SHA-256 (output is then truncated to yield our n bits). What matters here is that the encoding of the physical events has enough entropy: roughly speaking, that the said events could have collectively assumed at least 2^n combinations. The hash function, by its definition, should make a good job at concentrating that entropy into a n-bit string.
Once we have n bits, we use a PRNG (Pseudo-Random Number Generator) to crank out as many bits as necessary. A PRNG is said to be cryptographically secure if, assuming that it operates over a wide enough unknown n-bit key, its output is computationally indistinguishable from uniformly random bits. In the 90's, a popular choice was RC4, which is very simple to implement, and quite fast. However, it turned out to have measurable biases, i.e. it was not as indistinguishable as was initially wished for. The eSTREAM Project consisted in gathering newer designs for PRNG (actually stream ciphers, because most stream ciphers consist in a PRNG, which output is XORed with the data to encrypt), documenting them, and promoting analysis by cryptographers. The eSTREAM Portfolio contains seven PRNG designs which were deemed secure enough (i.e. they resisted analysis and cryptographers tend to have a good understanding of why they resisted). Among them, four are "optimized for software". The good news is that while these new PRNG seem to be much more secure than RC4, they are also noticeably faster (we are talking about hundreds of megabytes per second, here). Three of them are "free for any use" and source code is provided.
From a design point of view, PRNG reuse much of the elements of block ciphers. The same concepts of avalanche and diffusion of bits into a wide internal state are used. Alternatively, a decent PRNG can be built from a block cipher: simply use the n-bit sequence as key into a block cipher, and encrypt successive values of a counter (expressed as a m-bit sequence, if the block cipher uses m-bit blocks). This produces a pseudo-random stream of bits which is computationally indistinguishable from random, as long as the block cipher is secure, and the produced stream is no longer than m*2^(m/2) bits (for m = 128, this means about 300 billions of gigabytes, so that's big enough for most purposes). That kind of usage is known as counter mode (CTR).
Usually, a block cipher in CTR mode is not as fast as a dedicated stream cipher (the point of the stream cipher is that, by forfeiting the flexibility of a block cipher, better performance is expected). However, if you happen to have one of the most recent CPU from Intel with the AES-NI instructions (which are basically an AES implementation in hardware, integrated in the CPU), then AES with CTR mode will yield unbeatable speed (several gigabytes per second).
First of all, the point of a cryptographically secure PRNG is not to generate entirely unpredictable sequences. As you noted, the absence of something that generates large volumes of (more or less) true randomness1 makes that impossible.
So you resort to something which is only hard to predict. “Hard” meaning here that it takes unfeasibly long by which time whatever it was necessary for would be obsolete anyway. There are a number of mathematical algorithms that play a part in this—you can get a glimpse if you take some well-known CSPRNGs and look at how they work.
The most common variants to build such a PRNG are:
Using a stream cipher, which already outputs a (supposedly secure) pseudo-random bit stream.
Using a block cipher in counter mode
Hash functions on a counter are also sometimes used. Wikipedia has more on this.
General requirements are just that it's unfeasible to determine the original initialization vector from a generator's bit stream and that the next bit cannot be easily predicted.
As for initialization, most CSPRNGs use various sources available on the system, ranging from truly random things like line noise, interrupts or other events in the system to other things like certain memory locations, &c. The initialization vector is preferrably really random and not dependent on a mathematical algorithm. This initialization was broken for some time in Debian's implementation of OpenSSL which led to severe security problems.
1 Which has its problems too and one has to be careful in eliminating bias as things such as thermal noise has different characteristics depending on the temperature—you almost always have bias and need to eliminate it. And that's not a trivial task in itself.
In order for a random number generator to be considered cryptographically secure, in needs to be secure against attack by an adversary who knows the algorithm and a (large) number of previously generated bits. What this means is that someone with that information can't reconstruct any of the hidden internal state of the generator and give predictions of what the next bits produced will be with better than 50% accuracy.
Normal pseudo-random number generators are generally not cryptographically secure, as reconstructing the internal state from previously output bits is generaly trivial (often, the entire internal state is just the last N bits produced directly). Any random number generator without good statistical properties is also not cryptographically secure, as its output is at least party predictable even without knowing the internal state.
So, as to how they work, any good crypto system can be used as a cryptographically secure random number generator -- use the crypto system to encrypt the output of a 'normal' random number generator. Since an adversary can't reconstruct the plaintext output of the normal random number generator, he can't attack it directly. This is a somewhat circular definition an begs the question of how you key the crypto system to keep it secure, which is a whole other problem.
Each generator will use its own seeding strategy, but here's a bit from the Windows API documentation on CryptGenRandom
With Microsoft CSPs, CryptGenRandom uses the same random number
generator used by other security components. This allows numerous
processes to contribute to a system-wide seed. CryptoAPI stores an
intermediate random seed with every user. To form the seed for the
random number generator, a calling application supplies bits it might
have—for instance, mouse or keyboard timing input—that are then
combined with both the stored seed and various system data and user
data such as the process ID and thread ID, the system clock, the
system time, the system counter, memory status, free disk clusters,
the hashed user environment block. This result is used to seed the
pseudorandom number generator (PRNG).
In Windows Vista with Service Pack 1 (SP1) and later, an
implementation of the AES counter-mode based PRNG specified in NIST
Special Publication 800-90 is used. In Windows Vista, Windows Storage
Server 2003, and Windows XP, the PRNG specified in Federal Information
Processing Standard (FIPS) 186-2 is used. If an application has access
to a good random source, it can fill the pbBuffer buffer with some
random data before calling CryptGenRandom. The CSP then uses this data
to further randomize its internal seed. It is acceptable to omit the
step of initializing the pbBuffer buffer before calling
CryptGenRandom.
I'm looking for a simple encryption tutorial, for encoding a string into another string. I'm looking for it in general mathematical terms or psuedocode; we're doing it in a scripting language that doesn't have access to libraries.
We have a Micros POS ( point of sale ) system and we want to write a script that puts an encoded string on the bottom of receipts. This string is what a customer would use to log on to a website and fill out a survey about the business.
So in this string, I would like to get a three-digit hard-coded location identifier, the date, and time; e.g.:
0010912041421
Where 001 is the location identifier, 09 the year, 12 the month, and 04 the day, and 1421 the military time ( 2:41 PM ). That way we know which location the respondent visited and when.
Obviously if we just printed that string, it would be easy for someone to crack the 'code' and fill out endless surveys at our expense, without having actually visited our stores. So if we could do a simple encryption, and decode it with a pre-set key, that would be great. The decoding would take place on the website.
The encrypted string should also be about the same number of characters, to lessen the chance of people mistyping a long arbitrary string.
Encryption won't give you any integrity protection or authentication, which are what you need in this application. The customer knows when and where they made a purchase, so you have nothing to hide.
Instead, consider using a Message Authentication Code. These are often based on a cryptographic hash, such as SHA-1.
Also, you'll want to consider a replay attack. Maybe I can't produce my own code, but what's to stop me from coming back a few times with the same code? I assume you might serve more than one customer per minute, and so you'll want to accept duplicate timestamps from the same location.
In that case, you'll want to add a unique identifier. It might only be unique when combined with the timestamp. Or, you could simply extend the timestamp to include seconds or tenths of seconds.
First off, I should point out that this is probably a fair amount of work to go through if you're not solving a problem you are actually having. Since you're going to want some sort of monitoring/analysis of your survey functionality anyway, you're probably better off trying to detect suspicious behavior after the fact and providing a way to rectify any problems.
I don't know if it would be feasible in your situation, but this is a textbook case for asymmetric crypto.
Give each POS terminal it's own private key
Give each POS terminal the public key of your server
Have the terminal encrypt the date, location, etc. info (using the server's public key)
Have the terminal sign the encrypted data (using the terminal's private key)
Encode the results into human-friendly string (Base64?)
Print the string on the receipt
You may run into problems with the length of the human-friendly string, though.
NOTE You may need to flip flop the signing and encrypting steps; I don't have my crypto reference book(s) handy. Please look this up in a reputable reference, such as Applied Cryptography by Schneier.
Which language are you using/familiar with?
The Rijndael website has c source code to implement the Rijndael algorithm. They also have pseudo code descriptions of how it all works. Which is probably the best you could go with. But most of the major algorithms have source code provided somewhere.
If you do implement your own Rijndael algorithm, then be aware that the Advanced Encryption Standard limits the key and block size. So if you want to be cross compatible you will need to use those sizes I think 128 key size and 128, 192, 256 key sizes.
Rolling your own encryption algorithm is something that you should never do if you can avoid it. So finding a real algorithm and implementing it if you have to is definitely a better way to go.
Another alternative that might be easier is DES, or 3DES more specifically. But I don't have a link handy. I'll see if I can dig one up.
EDIT:
This link has the FIPS standard for DES and Triple DES. It contains all the permutation tables and such, I remember taking some 1s and 0s through a round of DES manually once. So it is not too hard to implement once you get going, just be careful not to change around the number tables. P and S Boxes they are called if I remember correctly.
If you go with these then use Triple DES not DES, 3DES actually uses two keys, doubling the key size of the algorithm, which is the only real weakness of DES. It has not been cracked as far as I know by anything other than brute force. 3DES goes through des using one key to encrypt, the other to decrypt, and the same one to encrypt again.
The Blowfish website also has links to implement the Blowfish algorithm in various languages.
I've found Cryptographic Right Answers to be a helpful guide in choosing the right cryptographic primitives to use under various circumstances. It tells you what crypto/hash to use and what sizes are appropriate. It contains links to the various cryptographic primitives it refers to.
One way would be to use AES - taking the location, year, month, and day - encoding it using a private key and then tacking on the last 4 digits (the military time) as the inversion vector. You can then convert it to some form of Base32. You'll end up with something that looks like a product key. It may be too long for you though.
A slight issue would be that you would probably want to use more digits on the military time though since you could conceivably get multiple transactions on the same day from the same location within the same minute.
What I want to use is XOR. It's simple enough that we can do it in the proprietary scripting language ( we're not going to be able to do any real encryption in it ), and if someone breaks it, they we can change the key easily enough.