I want to use a barcode as a way to identify a separator page in a stack of scanned documents.
I want to figure out the best type of barcode to use for that.
Here is the current situation: The user scans in a stack of paper (1-10 pages) that represent one document.
It would be much faster for them to scan in a bigger stack of paper.
To accommodate this I am going to create a page with a special pattern on it and write a C# program that will look for that pattern and create separate documents based on those pages separating the documents.
I am writing my own program because I will be looking for barcodes on the actual documents as well so I need custom code.
My question is:
Which barcode technology will be the best for the separator page?
My gut tells me to use QR Code; but I would like hear what others have to say.
As long as your scanning code can rely on your barcode being relatively level with the page and the amount of data that you want to scan is less than 50 or so characters, you don't need to go 2D with your symbology. I would recommend Code 128.
If you aren't relying on a library, it is much easier write the code to spot and decode a raster with a predefined pattern of 1's and 0's. Using QR code or any other 2D symbology (Datamatrix or PDF417) should only be considered necessary if you need a high volume of characters as the decoding of a 2D symbol is much more complex.
This assumes that you also have control over the symbology that will be used within the documents and they follow the same constraints.
Related
I want to generate QR codes to pass a unique alpha-numeric code to a site. The QR will be generated from a string like:
https://example.com/ABCD1234
The ABCD1234 is the unique code and there will be ~100 million of them. Can I be sure no two QR codes will be the same, and be read to 100% accuracy?
Anything to watch out for standards wise?
100% is too pure of a number to get specific advice. This reply is general, and not meant for any super mission critial needs.
The clarity of barcodes in general depends on the quality of the printed image, as well as the reader's ability to decode (optics and decoder). QR codes have some error correction attributes (https://en.wikipedia.org/wiki/QR_code). Commercial decoders, even for webcams, are used widely for consumer and retail applications.
Keep in mind, barcodes were designed for reading labels quickly, and often while moving. They are not well suited for things like security codes, where deep levels of checks are needed. It is possible for things to go wrong.
Given that, it really depends on what you mean by 100%. The barcode symbology cannot make many guarantees, but the content might begin to. If your line-of-business app is mildly mission critical, and your app can control what's printed on labels, and the http address is to 'your' line-of-business web site, you could append a check value within the content, in the QR code printed. For example, ~https://example.com/ABC1234?check=5551212, has a check value which the web site can optionally verify. However, the more content you have, the more dense the printed pattern will get (and possibly more difficult to read).
It's a Drupal site with solr for search. Mainly I am not satisfied with current search result on Chinese. The tokenizer has broken the words into supposed small pieces. Most of them are reasonable. But still, it made mistakes by not treating something as a valid token either breaking it to pieces or not breaking it.
Assuming I am writing Chinese now: big data analysis is one word which shouldn't be broken. So my search on it should find it. Also I want people to find AI and big data analysis training as the first hit when they search the exact phrase AI and big data analysis training.
So I want a way to intervene or compensate the current tokens to make the search smarter.
Maybe there is a file in solr allow me to manually write these tokens down to relate them certain phrases? So every time when indexing, solr can use it as a reference.
You different steps to achieve what you want :
1) I don't see an extremely big problem with your " over tokenization" :
big data analysis is one word which shouldn't be broken. So my search on it should find it. -> your search will find it even if tokenized, I understand this was an example and the actual words are chinese, but I suspect a different issue there
2) You can use the edismax[1] query parser with phrase boost at various level to boost subsequent tokens or phrases ( pf,pf2,pf3...ps,ps2,ps3...)
[1] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html , https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-ThepsParameter
I need to recognize a complex chemichal names from a scanned document (pdf). They contain special characters and are written in a table format. I also have an Excel document that contains ALL possible names (I would say rows because there are no combinations) that I may encounter during scanning. Is there a way to create ligatures (so the Finereader will recognize an entire row instead of dissecting it into separate characters)? I tried creating a user dictionary but Finereader does not treat it as a one row.
The only way to create ligatures is to use "user pattern training". In FineReader, go to Tools -> Options -> Read tab (changes slightly depending on FR version) and enable User pattern training. During training extend your box to include several combined characters, thus creating a ligature.
The formulas recognition using this method is tough but may be possible.
I have done this many times in my work at www.wisetrend.com. I am a former ABBYY support employee and current integrator and OCR consulting specialist. I will be glad to help if you need more specific assistance.
I was wondering if there's a way to build a QR code with two kinds of data - one text data and two link URLs. Is it possible to do it?
A QR Code is a two-dimensional barcode capable of storing (according to Wikipedia) up to 2,953 bytes of binary data or 4,296 simple alphanumeric characters. The data can contain whatever you like.
The difficulty with storing multiple URLs in a QR-code is not that it is impossible, but that most scanner apps in smart phones and so on will only process a single URL. If you are writing the scanner app too then, yes, it it possible, otherwise it is possible but probably not advisable.
If you wish to store a single URL and some contact details you might look at storing a vCard in your QR code (here is a generator; I have no affiliation with this project).
It's indeed possible, but all scanner apps will not recognize all the data, and only one show one data. This QR code generator has a Multi URL feature that can redirect based on different parameters as time, location, device, ...
It is possible. we can enter text,URL,v card on a single QR code.
Well, actually, the QR code is "only" storing characters, so you could imagine having an app or any software that read the QR code content, which contains data and two URL, which split the string to open two tab.
I recently purchased a 2D Barcode reader. When scanning a U.P.S. barcode, I get about half of the information I want, and about half of it looks to be encrypted in some way. I have heard there is a UPS DLL.
Example - Everything in bold seems to be encrypted, while the non-bold text contains valuable, legitimate data.
[)>01961163522424800031Z50978063UPSN12312307G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
In other words, this text seems OK - and I can parse the information
[)>01961163522424800031Z50978063UPSN123123 ...
While, this data seems to be encrypted
... 07G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
Any Ideas???
Everything I read on the internet says I should be able to read this thing. I'm just not finding any information on specifics. The "encrypted" info contains street address, package number (like 1/2), weight, and several other items Im dying to get my hands on. I suppose I will contact UPS directly.
The data after the SCAC is compressed and requires a DLL or some other component from UPS in order to decode. Note that a MaxiCode holds only about 100 characters of data so compression is required in order to encode more shipping data.
It seems to be well-documented ... anything cryptic is likely to be info the shipper is including for their own (or their customer's) purposes.
I know that the block of characters you get when scanning those barcodes is divided up into blocks using non-printing characters, so trying to view the characters without knowing how they are divided by the encoder is tough. Look for info on the format in which they store their data, or find a decoder that will display those characters.
This is a page I have come across before, however, this page discusses ENCODING a barcode, using plain English and some component. The information used to ENCODE is the information I would like to retrieve when DECODING.
Like I said, when the information is RETRIEVED, half of the information comes out Garbled (encrpyted?) and there is no documentation about to decrypt that code...
According to the link you sent me, I should see something like this:
[)>01963360910628400021Z14647438UPSN410E1W1951/1Y135ReoTAMPAFL
However, I get something like this:
[)>01961163522424800031Z50978063UPSN12312307G:%"6*AH537&M9&QXP2E:)16(E&539R'64O
This leads me to believe the page you sent me is either out-of-date, or that it is simply a reference for how to use their controls to encode, not decode.
Why would you suppose that UPS wants you to decode that part? Moreover, I believe that the piece of code is not encoded - it may be ID of something in their DB.
I expect the unreadable part is not UPS data, but private data intentially obfuscated by agreement between the shipper and receiver.
Check this site out it provides a free decode app.
MaxiCode Barcode FAQ & Tutorial by IDAutomation®
Maxicode is an international 2D (two-dimensional) barcode that is currently used by UPS on shipping labels for world-wide addressing and package sortation. ...
http://www.idautomation.com/maxicodefaq.html
http://www.google.com/patents/US7039496 has quite a bit of information about the encoded data in images 3-12. It looks like the first gives the uncompressed format, while the second is a compression dictionary. The description makes reference to a lot of ANSI standards that are beyond my comprehension, but it does appear that what you're seeing is a format '07' string, so perhaps there's enough information here to do a complete decode?
Bearing in mind, of course, that this is part of a patent and that implementing it without paying royalties could get you in trouble. IANAL