Why does InDesign put "c_" before literal text? - adobe

I'm working with the old interchange format of Adobe InDesign (.inx files).
This XML file has text contents like the following:
<pcnt>c_Stackoverflow

</pcnt>
Which results in
Stackoverflow<CR><CR>
Question: Why does it put c_ before the actual value and not simply use CDATA in doubt?

Adobe has been encoding everything.
The reason they prefix the payload is because you can have several values in a single field, hence, CDATA would not be working.
c_ indicates a string
x_ represents a list (enum)
x_a represents a list with 0x0a items (a = 10 hexadecimal)
l_ represents a long
See a full list of all prefixes in the PDF:
http://partners.adobe.com/public/developer/en/indesign/sdk/working_with_inx_file_format.pdf

Related

How concat multiple fields for a GS1 Data-matrix (BXN) in Zebra Programming Lang (ZPL)

I'm trying to show some data in a GS1 Datamatrix which has field separators (FNC1,GS) pass within the variable to a zpl template.
Originally, in ZebraDesigner I couldn't get zpl to allow me to pass the separators within the parameter/variable. The separators would only show as text within the data, not as control characters for the scanner. (I was able to pass the separators as Fix Data, however it needs to work with a parameter).
Alternatively, I was hoping to edit the zpl and concatenate the control characters and QR values into one printed data for the Datamatrix.
This is zpl using one variable QRCode: (This works but not with passed separators)
^BY208,208^FT448,1123^BXN,8,200,0,0,1,~
^FH\^FN18^FDQRCode^FS
This is using fixed data where FNC1 is \7E and GS is \1D: (This works but doesn't use variables/parameters)
^BY208,208^FT448,1123^BXN,8,200,0,0,1,~
^FH\^FD\7E188text234567890\1Dmoretext^FS
This is my attempt to concat the separators and variables QRData1...:
^BY208,208^FT448,1123^BXN,8,200,0,0,1,~
^FH\^FD\7E^FN18^FDQRData1^FN22^FD\1D^FDQRData2^FD\1D^FN23^FDQRData3^FS
Unfortunately, the QR code only shows the value for the last var QRData3
Escape your field seperator hex codes with an _ (underscore), not with a backslash.
And use only one ^FD command like in your second example.
For reference see the pages of the commands ^FD, ^FH and ^BX in the Zebra ZPL II Programming Giude
As the OP found out, the field seperator _d029 worked for him! This is the hex value (0xD029) for the control character.
More information can be found here:
Encode GS,RS, and EOT for Code 128 and PDF417
GS is ~029
RS is ~030
EOT is ~004
Example:
[)><RS>06<GS>13V12GG7<GS>1P029-102489-157<GS>NC-411-661478-1<RS><EOT>
Enter the data as:
[)>~03006~d02913V12GG7~0291P029-102489-157~029NC-411-661478-1~030~004
Encode GS,RS, and EOT for Data Matrix, Aztec, and QR Code
GS is ~d029
RS is ~d030
EOT is ~d004
Example:
[)><RS>06<GS>13V12GG7<GS>1P029-102489-157<GS>NC-411-661478-1<RS><EOT>
Enter the data as:
[)>~d03006~d02913V12GG7~d0291P029-102489-157~d029NC-411-661478-1~d030~d004

DICOM VR ST with multiple value

If I read the definition of DICOM VR ST, short text:
A character string that may contain one or more paragraphs. It may
contain the Graphic Character set and the Control Characters, CR, LF,
FF, and ESC. It may be padded with trailing spaces, which may be
ignored, but leading spaces are considered to be significant. Data
Elements with this VR shall not be multi-valued and therefore
character code 5CH (the BACKSLASH "\" in ISO-IR 6) may be used.
So, the data element shall not be multi-valued.
But, I found a few DICOM Tag in the dictionary that has DICOM VR=ST and DICOM VM=1-n, which is multi-valued.
For example:
(0014,0023) CAD File Format
And few others from (0014,...)
So, how should I understand this? Is the DICOM VR definition wrong ?
AFAIK the definition of Short Text is correct and the VR should be always 1.
The Tags 0014,0023 and 0014,0024 are retired anyway.

How to convert IBM file to hexadecimal using DFSORT?

I'm trying to pass a IBM file to hex values.
With this input:
H800
Would save this output in a file:
48383030
I tried by this way:
//R45ORF80V JOB (EFAS,2SGJ000),'LLAMI',NOTIFY=R45ORF80,
// MSGLEVEL=(1,1),MSGCLASS=X,CLASS=A,
// REGION=0M,TIME=5
//*---------------------------------------------------
//SORTEST EXEC PGM=ICEMAN
//SORTIN DD DSN=LF58.DFE.V1408001,DISP=SHR
//SORTOUT DD DSN=LF58.DFE.V1408001.OUT,
// DISP=(NEW,CATLG,DELETE),
// LRECL=4,DATACLAS=CDMULTI
//SYSOUT DD SYSOUT=X
//SYSPRINT DD SYSOUT=X
//SYSUDUMP DD SYSOUT=X
//SYSIN DD *
SORT FIELDS=COPY
OUTREC FIELDS=(1,4,HEX)
END
/*
But it outputs the following:
C8F1F0F0
What am I doing wrong?
Is posible to convert to hexadecimal a file with 500 of LREC with COMP-3 fields too?
Just by the way I could use "HEX" command while I browse a file using file manager.
Your control cards are giving you the output you have asked for. They are showing you the hexadecimal values of those characters in EBCDIC, not in ASCII, the hexadecimal values you are expecting.
If you actually want to see the ASCII equivalent, use TRAN=ETOA, then TRAN=HEX.
You are using OUTREC FIELDS. FIELDS has a new synonym (from exactly 10 years) which is BUILD. FIELDS is supported for backwards compatibility.
INREC and OUTREC are similar, INREC operates before a SORT or MERGE, OUTREC afterwards.
What I recommend, unless you need to be doing it after a SORT/MERGE, is to use INREC.
So:
INREC BUILD=(1,4,TRAN=ETOA)
But, there is no need to use BUILD. BUILD always creates a new version of the record. Many times this is what you want when you are rearranging fields. Here, you are not.
INREC OVERLAY=(1,4,TRAN=ETOA)
If you replace your OUTREC with that, your output file will be encoded in ASCII.
If you want to see the ASCII as well:
INREC OVERLAY=(1,4,TRAN=ETOA,1,4,TRAN=HEX)
If you want to see the ASCII instead:
INREC OVERLAY=(1,4,TRAN=ETOA,1:1,4,TRAN=HEX)
Note the 1: in the last example. This says "the results are going to be at position 1", so overwriting your previous converted data. OVERLAY can do that, BUILD cannot in one statement.

How to read a non-standard DBF memo (BLOB) file from ACT?

I am trying to convert data from Act 2000 to a MySQL database. I have successfully imported the DBF files into individual MySQL tables. However I am having issues with the *.BLB file, which seems to be a non-standard memo file.
The DBF files, identifies themselves as dbase III Plus, No memo format. There is a single *.BLB which is a memo file for multiple DBFs to share BLOB data.
If you read this document: http://cicorp.com/act/sdk/ACT6-SDK-ChapterA.htm#_Toc483994053)
You can see that the REGARDING column is a 6 character one. The description is: This 6-byte field is supplied by the system and contains a reference to a field in the Binary Large Object (BLOB) Database.
Now upon opening the *.BLB I can see that the block size is 64 bytes. All the blocks of text are NULL padded out to that size.
Where I am stumbling is trying to convert the values stored in the REGARDING column to blocks location in the BLB file. My assumption is that 6 character field is an offset.
For example, one value for REGARDING is, (ignoring the square brackets): [ ",J$]
In my Googling, I found this: http://ulisse.elettra.trieste.it/services/doc/dbase/DBFstruct.htm#C1.5
It explains that in memo fields (in normal DBF files at least) the space value is ignore (i.e. it's padding out the column).
Therefore if I'm correct (again, square brackets) [",J$] should be the offset in my BLB file. Luckily I've still got access to the original ACT2000 software, so I can compare the full text in the program / MySQL and BLB file.
Using my example value, I know that the DB row with REGARDING value of [ ",J$] corresponds to a 1024 byte offset (or 16 blocks, assuming my guess of a 64 byte sized block).
I've tried reading some Python code for open source projects that read DBF files - but I'm in over my head.
I think what I need to do is unpack the characters to binary, but am not sure.
How can I find the 64-block based spot to read from based on what's found in the DBF files?
EDIT by Jerry Dodge
I've attempted to reverse-engineer the strings in this field to hexadecimal values, and then to an integer value using StrToInt64, but the result still does not match up with the blob file. I've also tried multiplying this integer value by 64 and not multiplying, but the result keeps winding up outside of the size of the blob file, not actually finding any data.
For example, a value of ___/BD (_ = space) translates to $2f4244 hexidecimal, which in turn translates to the integer value of 3097156, but does not correspond with any relevant portion of data in the blob file, even when multiplied or divided by 64.
According to the SDK you linked, the following happens as I understand:
There is a TYPE field (right behing REGARDING) that encodes what REGARDING is used for (see the second table of the linked chapter). So I'd assume that if type=6 (meeting not held) the REGARDING is either irrelevant or only contains a meeting ID reference from some other table. On that line of thought I would only expect REGARDING to be a BLB offset if type=101 (or possibly 100). I'd also not abandon the thought that in these relevant cases TYPE might be a concatenation of BLB file index and offset (because there is a mention that each file must not be longer than 30K chars and I really expect to be able to store much more data even in one table).

How can I convert MathType equation into MathML format?

I want to convert MathType equation saved as GIF format to MathML. Firstly, I opened these GIF files and saved them within MathType 6.7. As a result, MathML text is inserted into the end of GIF files. However, when I extracted MathML text from these GIF files using Perl script, I found some garbled characters in the MathML text as following text:
<mn>xxx</mn>
In the above line, a garbled character  is inserted before 'mn' label. Is this MathType 's BUG? How can I work around this problem? I have uploaded my test GIF files. URL is: http://ubuntuone.com/p/1352/
Update:
I have tried to paste full block of MathML here, but I found the syntax format of MathML text was messed. So I pasted the MathML on GitHub: https://gist.github.com/1068723.
There is a garbled character in the seventh line of MathML text: "  ?#x00A0;".
The original GIF file which doesn't contain MathML text: http://ubuntuone.com/p/13Ba/
Perl script that extracts MathML from GIF image generated by MathType: https://gist.github.com/1068749
Thanks,
thinkhy
Thanks thinkhy. It could be you extracting the data incorrectly (we haven't looked at your script yet). Only one of your GIFs had MathML -- the one that has a file name starting 106R. In that one, if you just grab all the bytes from the first bit that looks like MathML until the end, you do periodically get odd bytes in there, mostly 255's except the last one. (This however doesn't appear to be the junk character you're seeing.) The reason for the 255's is that the MathML is distributed over multiple comment records, each one of which starts with a count of the bytes in the record. From the MathType SDK (free download; link below):
GIF Image Files
MathML text is embedded into a GIF file as an Application Extension Record, which consists of a 14-byte header (Application Extension Descriptor), followed by the MTEF data. The header contains:
Byte Introducer = 0x21;
Byte ExtensionLabel = 0xFF;
Byte BlockSize = 0x0B;
Byte ApplicationId[8] = "MathType";
Byte AuthenticationCode[3] = "003";
The data follows this header and is written as a series of blocks each containing 255 bytes or less. Each block starts with a single byte count followed by the data. The end is marked as a block with length 0.
The header is unique enough that the easiest way to extract the data might be to scan the file for the 14-byte header, then expect the MathML data blocks to follow. Properly decoding the GIF records isn't that hard either, but obviously requires you read the GIF specification.
You may already be using the SDK, but you didn't say whether you were or not, so here's the link: http://www.dessci.com/en/reference/sdk/.

Resources