There's two QR code in the page reference several dynamic input data, and data contains number, alphanumeric and Chinese(UTF8), and these two QR codes with same module width and error correction level(M), if data is below
QR1 = 0000|ABC|def|中文|
QR2 = aaa#bbb.com| |XYZ
does any idea to make QR1 and QR2 will be rendered almost same size?
I try to make data of QR1 and QR2 with same length by appending space but no work :(
thanks
I made those QR Codes at www.unitaglive.com/qrcode.
They have the same quantity of columns (that's is called the Version http://www.qrcode.com/en/about/version.html).
What would you like to do exactly?
Related
I am trying to get abundance estimates of rodents for my survey area using capture mark recapture. I am using the package "marked" but keep running into a problem where my capture histories are changed to numbers eg. 0010 becomes 10.
I have tried csv's, txt files, changing numbers to text in excel and can't seem to come right. When I am able to get data into R as a capture history as opposed to a number I get "Incorrect ch values in data:10;FMU".
I think the error lies in the input data, and I'm wondering if anyone perhaps has an example of their input data for something similar?
Any suggestions, would be greatly appreciated.
I want to create a dataframe that contains > 100 observations on ~20 variables. Now, this will be based on a list of html files which are saved to my local folder. I would like to make sure that are matches the correct values per variable to each observation. Assuming that R would use the same order of going through the files for constructing each variable AND not skipping variables in case of errors or there like, this should happen automatically.
But, is there a "save way" to this, meaning assigning observation names to each variable value when retrieving the info?
Take my sample code for extracting a variable to make it more clear:
#Specifying the url for desired website to be scrapped
url <- 'http://www.imdb.com/search/title?
count=100&release_date=2016,2016&title_type=feature'
#Reading the HTML code from the website
webpage <- read_html(url)
title_data_html <- html_text(html_nodes(webpage,'.lister-item-header a'))
rank_data_html <- html_text(html_nodes(webpage,'.text-primary'))
description_data_html <- html_text(html_nodes(webpage,'.ratings-bar+ .text-
muted'))
df <- data.frame(title_data_html, rank_data_html,description_data_html)
This would come up with a list of rank and description data, but no reference to the observation name for rank or description (before binding it in the df). Now, in my actual code one variable suddenly comes up with 1 value too much, so 201 descriptions but there are only 200 movies. Without having a reference to which movie the description belongs, it is very though to see why that happens.
A colleague suggested to extract all variables for 1 observation at a time and extend the dataframe row-wise (1 observation at a time), instead of extending column-wise (1 variable at a time), but spotting errors and clean up needs per variable seems way more time consuming this way.
Does anyone have a suggestion of what is the "best practice" in such a case?
Thank you!
I know it's not a satisfying answer, but there is not a single strategy for solving this type of problem. This is the work of web scraping. There is no guarantee that the html is going to be structured in the way you'd expect it to be structured.
You haven't shown us a reproducible example (something we can run on our own machine that reproduces the problem you're having), so we can't help you troubleshoot why you ended up extracting 201 nodes during one call to html_nodes when you expected 200. Best practice here is the boring old advice to LOOK at the website you're scraping, LOOK at your data, and see where the extra or duplicate description is (or where the missing movie is). Perhaps there's an odd element that has an attribute that is also matching your xpath selector text. Look at both the website as it appears in a browser, as well as the source. Right click, CTL + U (PC), or OPT + CTL + U (Mac) are some ways to pull up the source code. Use the search function to see what matches the selector text.
If the html document you're working with is like the example you used, you won't be able to use the strategy you're looking for help with (extract the name of the movie together with the description). You're already extracting the names. The names are not in the same elements as the descriptions.
I have a problem with unformatted data and I don't know where, so I will post my entire workflow.
I'm integrating my own code into an existing climate model, written in fortran, to generate a custom variable from the model output. I have been successful in getting sensible and readable formatted output (values up to the thousands), but when I try to write unformatted output then the values I get are absurd (on the scale of 1E10).
Would anyone be able to take a look at my process and see where I might be going wrong?
I'm unable to make a functional replication of the entire code used to output the data, however the relevant snippet is;
c write customvar to file [UNFORMATTED]
open (unit=10,file="~/output_test_u",form="unformatted")
write (10)customvar
close(10)
c write customvar to file [FORMATTED]
c open (unit=10,file="~/output_test_f")
c write (10,*)customvar
c close(10)
The model was run twice, once with the FORMATTED code commented out and once with the UNFORMATTED code commented out, although I now realise I could have run it once if I'd used different unit numbers. Either way, different runs should not produce different values.
The files produced are available here;
unformatted(9kb)
formatted (31kb)
In order to interpret these files, I am using R. The following code is what I used to read each file, and shape them into comparable matrices.
##Read in FORMATTED data
formatted <- scan(file="output_test_f",what="numeric")
formatted <- (matrix(formatted,ncol=64,byrow=T))
formatted <- apply(formatted,1:2,as.numeric)
##Read in UNFORMATTED data
to.read <- file("output_test_u","rb")
unformatted <- readBin(to.read,integer(),n=10000)
close(to.read)
unformatted <- unformatted[c(-1,-2050)] #to remove padding
unformatted <- matrix(unformatted,ncol=64,byrow=T)
unformatted <- apply(unformatted,1:2,as.numeric)
In order to check the the general structure of the data between the two files is the same, I checked that zero and non-zero values were in the same position in each matrix (each value represents a grid square, zeros represent where there was sea) using;
as.logical(unformatted)-as.logical(formatted)
and an array of zeros was returned, indicating that it is the just the values which are different between the two, and not the way I've shaped them.
To see how the values relate to each other, I tried plotting formatted vs unformatted values (note all zero values are removed)
As you can see they have some sort of relationship, so the inflation of the values is not random.
I am completely stumped as to why the unformatted data values are so inflated. Is there an error in the way I'm reading and interpreting the file? Is there some underlying way that fortran writes unformatted data that alters the values?
The usual method that Fortran uses to write unformatted file is:
A leading record marker, usually four bytes, with the length of the following record
The actual data
A trailing record marker, the same number of bytes as the leading record marker, with the same information (used for BACKSPACE)
The usual number of bytes in the record marker is four bytes, but eight bytes have also been sighted (e.g. very old versions of gfortran for 64-bit systems).
If you don't want to deal with these complications, just use stream access. On the Fortran side, open the file with
OPEN(unit=10,file="foo.dat",form="unformatted",access="stream")
This will give you a stream-oriented I/O model like C's binary streams.
Otherwise, you would have to look at your compiler's documentation to see how exactly unformatted I/O is implemented, and take care of the record markers from the R side. A word of caution here: Different compilers have different methods of dealing with very long records of more than 2^31 bytes, even if they have four-byte record markers.
Following on from the comments of #Stibu and #IanH, I experimented with the R code and found that the source of error was the incorrect handling of the byte size in R. Explicitly specifying a bite size of 4, i.e
unformatted <- readBin(to.read,integer(),size="4",n=10000)
allows the data to be perfectly read in.
I have some products which have 2d GS1 bar codes on them. Most have the format 01.17.10 which is GTIN.Expiry Date.Lot Number.
This makes sense as 01 and 17 are fixed length, so can be parsed easily, just by splitting the string in the appropriate place.
However, I also have some in the format 01.10.17.21 (GTIN.Lot.Expiry.Serial Number) which doesn't make sense because Lot and Serial number are variable length, meaning I cannot use position to decode the various elements. Also, I cannot search for the AIs as they could legitimately appear in the data.
It seems that I've no way of reliably decoding this format. Am I missing something?
Thanks!
According to the GS 1 website, "More than one AI can be carried in one bar code. When this happens, AIs with a fixed length data content (e.g., SSCC has a fixed length of 18 digits) are placed at the beginning and AI with variable lengths are placed at the end. If more than one variable length AI is placed in one bar code, then a special "function" character is used to tell the scanner system when one ends and the other one starts."
So it looks like they intend for you to order your AIs with the fixed width identifiers first. Then separate the variable-width fields with a function character, which it, appears is FNC1, but implementing that that will depend on the barcode symbology you are using, It may be different between DataMatrix, Code 128 and QR Code for example.
Why do some QR codes look different when using the same URL?
There are 40 Versions (sizes) of QR Codes, 4 error correction levels and 8 masking possibilities giving a total of 1280 possible QR codes for any given input.
Typically the version is chosen based on the amount of data to be stored and the mask is chosen to produce the best image in terms of readability. The error correction level is chosen by the encoder based on how much data might need to be recovered...
Choosing a different error correction level will result in a different image. The higher the level, better the chances it can recover from unreadable data.
http://en.wikipedia.org/wiki/QR_code#Storage