QR Code - is uniqueness guaranteed for unique inputs - qr-code

I want to generate QR codes to pass a unique alpha-numeric code to a site. The QR will be generated from a string like:
https://example.com/ABCD1234
The ABCD1234 is the unique code and there will be ~100 million of them. Can I be sure no two QR codes will be the same, and be read to 100% accuracy?
Anything to watch out for standards wise?

100% is too pure of a number to get specific advice. This reply is general, and not meant for any super mission critial needs.
The clarity of barcodes in general depends on the quality of the printed image, as well as the reader's ability to decode (optics and decoder). QR codes have some error correction attributes (https://en.wikipedia.org/wiki/QR_code). Commercial decoders, even for webcams, are used widely for consumer and retail applications.
Keep in mind, barcodes were designed for reading labels quickly, and often while moving. They are not well suited for things like security codes, where deep levels of checks are needed. It is possible for things to go wrong.
Given that, it really depends on what you mean by 100%. The barcode symbology cannot make many guarantees, but the content might begin to. If your line-of-business app is mildly mission critical, and your app can control what's printed on labels, and the http address is to 'your' line-of-business web site, you could append a check value within the content, in the QR code printed. For example, ~https://example.com/ABC1234?check=5551212, has a check value which the web site can optionally verify. However, the more content you have, the more dense the printed pattern will get (and possibly more difficult to read).

Related

Barcode for identification of separator pages

I want to use a barcode as a way to identify a separator page in a stack of scanned documents.
I want to figure out the best type of barcode to use for that.
Here is the current situation: The user scans in a stack of paper (1-10 pages) that represent one document.
It would be much faster for them to scan in a bigger stack of paper.
To accommodate this I am going to create a page with a special pattern on it and write a C# program that will look for that pattern and create separate documents based on those pages separating the documents.
I am writing my own program because I will be looking for barcodes on the actual documents as well so I need custom code.
My question is:
Which barcode technology will be the best for the separator page?
My gut tells me to use QR Code; but I would like hear what others have to say.
As long as your scanning code can rely on your barcode being relatively level with the page and the amount of data that you want to scan is less than 50 or so characters, you don't need to go 2D with your symbology. I would recommend Code 128.
If you aren't relying on a library, it is much easier write the code to spot and decode a raster with a predefined pattern of 1's and 0's. Using QR code or any other 2D symbology (Datamatrix or PDF417) should only be considered necessary if you need a high volume of characters as the decoding of a 2D symbol is much more complex.
This assumes that you also have control over the symbology that will be used within the documents and they follow the same constraints.

Can One Time Passwords be used as identifiers?

If I have bunch of OTPs mixed and if I know all of their generation seeds (the OPT URI) can I group by source URI?
I have a use case there I need the system to be 100% blind to the data relationships that its passing around.
For example: Users enter OTPs from their smartphones instead of their logins it should become very difficult identify entries by one user. As data is exported of the system that has OPT seeds is it possible to reestablish entry's ownership?
That's possible, but with a big complexity. You will need to generate codes for all seeds you have and then find if there is any match.
Also, there is a chance to receive the same code for different seeds at some moment. To avoid this problem you can ask a user for several consecutive codes, this significantly decreases the possibility of codes matching just by case.

R geographic address validation

I am trying to calculate physical distances between geographic locations (addresses) with ggmaps/mapdist function in R. Apart from the uncomfortable fact that Google Maps allows only 2500 queries/session, I have to cope with the misspelled or other way imperfect "addresses". The most typical problem is that the exact address strings themselves are added by several other info (floor, door etc.), but it is very problematic to detect any pattern in these what would allow applying regular expression.
My goal is:
Check if the address string is recognizable to Google Maps;
If not, find a way to truncate to an acceptable form, perhaps by parsing words step by step from the string.
Have anybody coped with this kind of problem?
Thanks.
There are a couple of factors running into each other here. One factor is the misspellings and other complexities related to addresses and the other is pinpointing (geocoding) a given address. Although they are related problems, each must be handled to accomplish your objectives.
There are numerous service providers out there that can do either or both with minimal cost involved. This can be found with a simple Google search. You can then investigate each to see if they match your use case and licensing requirements.
All of that considered, you'll want to get your address list cleaned up on a minimum. Doing that will enable you to utilize any number of geocoding providers.
Depending upon the size of your list, you can get your list cleaned up and geocoded for perhaps $20.
In the interest of full disclosure, I'm the founder of SmartyStreets. We provide a web interface (to help clean up the address list) as well as an API (which can be used on a continual basis to keep addresses clean). We also geocode your list at no extra charge. Further, we don't have any licensing restrictions on the number of lookups that can be performed during a given timeframe. (We have customers that hit us hundreds of millions of times per day.) The entire process of signing up and cleaning up your list takes just a few minutes.

Is information a subset of data?

I apologize as I don't know whether this is more of a math question that belongs on mathoverflow or if it's a computer science question that belongs here.
That said, I believe I understand the fundamental difference between data, information, and knowledge. My understanding is that information carries both data and meaning. One thing that I'm not clear on is whether information is data. Is information considered a special kind of data, or is it something completely different?
The words data,information and knowlege are value-based concepts used to categorize, in a subjective fashion, the general "conciseness" and "usefulness" of a particular information set.
These words have no precise meaning because they are relative to the underlying purpose and methodology of information processing; In the field of information theory these have no meaning at all, because all three are the same thing: a collection of "information" (in the Information-theoric sense).
Yet they are useful, in context, to summarize the general nature of an information set as loosely explained below.
Information is obtained (or sometimes induced) from data, but it can be richer, as well a cleaner (whereby some values have been corrected) and "simpler" (whereby some irrelevant data has been removed). So in the set theory sense, Information is not a subset of Data, but a separate set [which typically intersects, somewhat, with the data but also can have elements of its own].
Knowledge (sometimes called insight) is yet another level up, it is based on information and too is not a [set theory] subset of information. Indeed Knowledge typically doesn't have direct reference to information elements, but rather tells a "meta story" about the information / data.
The unfounded idea that along the Data -> Information -> Knowledge chain, the higher levels are subsets of the lower ones, probably stems from the fact that there is [typically] a reduction of the volume of [IT sense] information. But qualitatively this info is different, hence no real [set theory] subset relationship.
Example:
Raw stock exchange data from Wall Street is ... Data
A "sea of data"! Someone has a hard time finding what he/she needs, directly, from this data. This data may need to be normalized. For example the price info may sometimes be expressed in a text string with 1/32th of a dollar precision, in other cases prices may come as a true binary integer with 1/8 of a dollar precision. Also the field which indicate, say, the buyer ID, or seller ID may include typos, and hence point to the wrong seller/buyer. etc.
A spreadsheet made from the above is ... Information
Various processes were applied to the data:
-cleaning / correcting various values
-cross referencing (for example looking up associated codes such as adding a column to display the actual name of the individual/company next to the Buyer ID column)
-merging when duplicate records pertaining to the same event (but say from different sources) are used to corroborate each other, but are also combined in one single record.
-aggregating: for example making the sum of all transaction value for a given stock (rather than showing all the individual transactions.
All this (and then some) turned the data into Information, i.e. a body of [IT sense] Information that is easily usable, where one can quickly find some "data", such as say the Opening and Closing rate for the IBM stock on June 8th 2009.
Note that while being more convenient to use, in part more exact/precise, and also boiled down, there is not real [IT sense] information in there which couldn't be located or computed from the original by relatively simple (if only painstaking) processes.
An financial analyst's report may contain ... knowledge
For example if the report indicate [bogus example] that whenever the price of Oil goes past a certain threshold, the value of gold start declining, but then quickly spikes again, around the time the price of coffee and tea stabilize. This particular insight constitute knowledge. This knowledge may have been hidden in the data alone, all along, but only became apparent when one applied some fancy statistically analysis, and/or required the help of a human expert to find or confirm such patterns.
By the way, in the Information Theory sense of the word Information, "data", "information" and "knowlegde" all contain [IT sense] information.
One could possibly get on the slippery slope of stating that "As we go up the chain the entropy decreases", but that is only loosely true because
entropy decrease is not directly or systematically tied to "usefulness for human"
(a typical example is that a zipped text file has less entropy yet is no fun reading)
there is effectively a loss of information (in addition to entropy loss)
(for example when data is aggregate the [IT sense] information about individual records get lost)
there is, particular in the case of Information -> Knowlege, a change in level of abstration
A final point (if I haven't confused everybody yet...) is the idea that the data->info->knowledge chain is effectively relative to the intended use/purpose of the [IT-sense] Information.
ewernli in a comment below provides the example of the spell checker, i.e. when the focus is on English orthography, the most insightful paper from a Wallstreet genius is merely a string of words, effectively "raw data", some of it in need of improvement (along the orthography purpose chain.
Similarly, a linguist using thousands of newspaper articles which typically (we can hope...) contain at least some insight/knowledge (in the general sense), may just consider these articles raw data, which will help him/her create automatically French-German lexicon (this will be information), and as he works on the project, he may discover a systematic semantic shift in the use of common words betwen the two languages, and hence gather insight into the distinct cultures.
Define information and data first, very carefully.
What is information and what is data is very dependent on context. An extreme example is a picture of you at a party which you email. For you it's information, but for the the ISP it's just data to be passed on.
Sometimes just adding the right context changes data to information.
So, to answer you question: No, information is not a subset of data. It could be at least the following.
A superset, when you add context
A subset, needle-in-a-haystack issue
A function of the data, e.g. in a digest
There are probably more situations.
This is how I see it...
Data is dirty and raw. You'll probably have too much of it.
... Jason ... 27 ... Denton ...
Information is the data you need, organised and meaningful.
Jason.age=27
Jason.city=Denton
Knowledge is why there are wikis, blogs: to keep track of insights and experiences. Note that these are human (and community) attributes. Except for maybe a weird science project, no computer is on Facebook telling people what it believes in.
information is an enhancement of data:
data is inert
information is actionable
note that information without data is merely an opinion ;-)
Information could be data if you had some way of representing the additional content that makes it information. A program that tries to 'understand' written text might transform the input text into a format that allows for more complex processing of the meaning of that text. This transformed format is a kind of data that represents information, when understood in the context of the overall processing system. From outside the system it appears as data, whereas inside the system it is the information that is being understood.

How to decode U.P.S. Information from UPS MaxiCode Barcode?

I recently purchased a 2D Barcode reader. When scanning a U.P.S. barcode, I get about half of the information I want, and about half of it looks to be encrypted in some way. I have heard there is a UPS DLL.
Example - Everything in bold seems to be encrypted, while the non-bold text contains valuable, legitimate data.
[)>01961163522424800031Z50978063UPSN12312307G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
In other words, this text seems OK - and I can parse the information
[)>01961163522424800031Z50978063UPSN123123 ...
While, this data seems to be encrypted
... 07G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
Any Ideas???
Everything I read on the internet says I should be able to read this thing. I'm just not finding any information on specifics. The "encrypted" info contains street address, package number (like 1/2), weight, and several other items Im dying to get my hands on. I suppose I will contact UPS directly.
The data after the SCAC is compressed and requires a DLL or some other component from UPS in order to decode. Note that a MaxiCode holds only about 100 characters of data so compression is required in order to encode more shipping data.
It seems to be well-documented ... anything cryptic is likely to be info the shipper is including for their own (or their customer's) purposes.
I know that the block of characters you get when scanning those barcodes is divided up into blocks using non-printing characters, so trying to view the characters without knowing how they are divided by the encoder is tough. Look for info on the format in which they store their data, or find a decoder that will display those characters.
This is a page I have come across before, however, this page discusses ENCODING a barcode, using plain English and some component. The information used to ENCODE is the information I would like to retrieve when DECODING.
Like I said, when the information is RETRIEVED, half of the information comes out Garbled (encrpyted?) and there is no documentation about to decrypt that code...
According to the link you sent me, I should see something like this:
[)>01963360910628400021Z14647438UPSN410E1W1951/1Y135ReoTAMPAFL
However, I get something like this:
[)>01961163522424800031Z50978063UPSN12312307G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
This leads me to believe the page you sent me is either out-of-date, or that it is simply a reference for how to use their controls to encode, not decode.
Why would you suppose that UPS wants you to decode that part? Moreover, I believe that the piece of code is not encoded - it may be ID of something in their DB.
I expect the unreadable part is not UPS data, but private data intentially obfuscated by agreement between the shipper and receiver.
Check this site out it provides a free decode app.
MaxiCode Barcode FAQ & Tutorial by IDAutomation®
Maxicode is an international 2D (two-dimensional) barcode that is currently used by UPS on shipping labels for world-wide addressing and package sortation. ...
http://www.idautomation.com/maxicodefaq.html
http://www.google.com/patents/US7039496 has quite a bit of information about the encoded data in images 3-12. It looks like the first gives the uncompressed format, while the second is a compression dictionary. The description makes reference to a lot of ANSI standards that are beyond my comprehension, but it does appear that what you're seeing is a format '07' string, so perhaps there's enough information here to do a complete decode?
Bearing in mind, of course, that this is part of a patent and that implementing it without paying royalties could get you in trouble. IANAL

Resources