Using GhostPCL to converting PCL with images to PDF - unix

I'm currently attempting to convert some PCL files into PDF using GhostPCL (PCL6).
For the most part this works. However, there is an odd problem with some of the conversion. For some reason, PCL6 is not converting some logos where are at the top of our documents. The logo is of the format:
^[(25XABCDEFGHIJKLMNOPQ^[(3#^M
^[(25X^[&a+1.49RRSTUVWXYZ[\]^_`ab^[(3#^M
^[(25X^[&a+1.49Rcdefghijklmnopqrs^M
when viewing the PCL file in vim. When printing the file as a PCL file, the image prints out correctly, but when converting to pdf, the following takes it's place:
ABCDEFGHIJKLMNOPQ
RSTUVWXYZ[\]^_`ab
cdefghijklmnopqrs
I recognize that the format is meant to be matched against some sort of embedded image or font, but it has been really difficult trying to find useful documentation on PCL (so I can actually figure out what these characters mean) or the conversion process.
Can anyone offer some insight on how to approach the conversion? We will need these images/logos in the converted documents since they often contain disclaimer information as part of the image.
EDIT1: I've also attempted converting to postscript and printing then and the same behavior occurs.
EDIT2: When rendering the PCL file in a viewer, the same text shows up instead of the image. But when printing, the logo does show up. Strange...
EDIT3: To clarify, sending the PCL file to a printer directly does not seem to cause the problem (i.e, the logo does print correctly). It's only when I attempt to convert it to another file format that the problem occurs.

What happens when you try rendering the PCL input with Ghostscript ? Eg to the display device. If it doesn't render its not going to end up in a PDF either.
Have you tried printing the file to a PCL printer ?
If it works to a PCL printer, but not when rendering you can open a bug against ghostpcl. If it renders but does not end up in the PDF then you can open a bug against ghostspcl with the 'pdf writer' component.
Its possible that the logo is shown using a rasterop, this is a part of the PCL imaging model which has no counterpart in PDF and so cannot be reproduced. The result of using a rasterop with the PDF device is variable, sometimes it will do what you expect, often it will not.

Related

Inserting images in R markdown using a path variable

I am new to R and Rmd and trying to generate a report using Rmd. This report has several images inserted along with the text. I am able to insert an image by hardcoding the path of the image. I have no problems with that but I need the path as a variable because it varies with the project. Can anyone help me with the syntax for calling a variable within a path to the image?
![Relatedness check](/data/array_processing/in_progress/Project123/files/data/plots/Project123.ibd.png)
"Project123" changes based on the project. Is there a way I can declare this variable and call it to define the path?
Help please.
Images can use online R code for dynamic paths and/or alt text. (Early adopters of rmarkdown often tried this method as the default method of including R plots in the reports, using png(filepath...); plot(...); dev.off() followed by what I recommend you use.)
This will allow you to do what you need:
![something meaningful](`r filepath`)
as raw markdown (and not inside a traditional code chunk).
If you aren't familiar with inline code blocks, then know that you can put just about anything in an inline code block. This is handy for including dynamic content in a paragraph of text, for example "the variance of the sample is \r var(sample(99))``". (Often it is just a pre-created variable, if numeric it is often rounded or formated to control the display of significant figures.)

Multibyte characters reading problem in IronPdf

I am trying IronPDF. I want to insert PDF metadata to database which I read with IronPDF. However, some "ı" characters in the metadata are not read with IronPDF. Spaces are left in place of these characters. Here is my code sample:
var md = PdfDocument.FromFile("___PATH OF PDF FILE___");
var article_title = md.MetaData.Title;
When I copy paste string to Notepad++ it gives a result like this:
And here is the screenshot of application view:
Is there a way to solve this problem or is this a bug of IronPDF? If everything goes well, of course, I think of buying. But of course, if it fails on the first try, continue to iTextSharp.
EDIT: First of all, I apologize for Windows, which made me surprised. I struggled to get a new system up all day and unfortunately it's still visual studio etc. not to be installed. I added one of the files I had problems with in the below and the IronPDF version appears as 2019.7.0.0.
PDF file: https://yadi.sk/d/HwP9JWRWTzMlSA
First of all, since you haven't provided us with a sample PDF to work with; I've google some Turkish PDF documents having metadata with Turkish characters. This is the file that I came up with: link
As you can see above the Author metadata field has ı Turkish character.
Then I created a dotnet fiddle in order to test this file using IronPDF (with the latest available version - since you haven't specified any):
sample using IronPDF
The output from this sample is ElifCakroglu which is showing the exact same symptom when copied to Notepad++:
Playing with the encodings did not help resolving this issue. So I created another dotnet fiddle to test your alternative solution which was iTextSharp: sample using iTextSharp
This time everything was working as it should be: ElifCakıroglu
Note: I've also tried creating a Word 2016 document and saving it as a PDF then using that file with the above samples and both of them did not work (not accepting as a valid PDF) for some reason. After that I tried and online PDF document validator, but the file was fine. Then I used an online converter to change the PDF version with the default settings and used the output PDF with both samples and the surprising thing is that both of them worked correctly.
My conclusion is that iTextSharp is working consistently with both documents having metadata with Turkish characters present, while IronPDF works correctly 50% of the time.
I believe that this issue is resolved and can be tested in the 2020.9 release branch of IronPdf.
https://www.nuget.org/packages/IronPdf/

RStudio character encoding

I am trying to view characters of multiple of languages in RStudio. What I find unusual is I am able to view these in the console, but not in the viewer. UTF-8 encoded characters appear like 'U+3042', 'U+500B', etc. in the viewer.
Is there a way to get the viewer to display the actual characters instead of the encoded character?
Here are a couple of images showing what I mean -
In console: https://ibb.co/T0681H7
In viewer: https://ibb.co/QnxF25c
This is a known issue in RStudio. Feel free to comment/upvote here:
https://github.com/rstudio/rstudio/issues/4193

Converting .pdf files to excel (.xls)

A friend of mine doing an internship asked me 2 hours ago if I could help him avoid to do manually 462 pdf file to .xls using free online soft.
I thought of a shell script using unoconv, but I didn't find out how to use it properly, and I am not sure if unoconv can solve this problem since it mainly converts file to pdf, not the reverse thing.
Conversion from PDF to any other structured format is not always possible and not generally recommended.
Having said that, this does look like a one-off job and there's a fair few of them (462).
It's worth pursuing, if you can reliably extract text from most of them and it's reasonably structured. It's a matter of trying to get regular text output across a sample of the PDF's that you can reliably parse into a table structure.
There's plenty of tools around that target either direct or OCR based text extraction, just google around.
One I like is pstotext from the ghostscript suite; the -bboxes option lets me get the coordinates of each word and leaves it up to me to re-assemble the structure. Despite its name it does work on input PDFs. Downside is that it can be a bit flakey and works on some PDF's but not others.
If you get this far, you'd then most likely then need to write a shell-script or program to convert that to a CSV. You can either open this directly via a spread-sheet or look for tools to convert this into XLS.
PS If he hasn't already, get the intern to ask if there's any possible way of getting at the original data that was used to created the PDFs It will save a lot of time and effort and lead to a way more accurate result.
Update An alternative to pstotext is renderpdf.pl command which is included in the Perl CAM::PDF module. More robust, but just reports text (x,y) position, not bounding boxes.
Other responses on a linked question suggest Tabula, too.
https://github.com/tabulapdf/tabula
I tried and it works very well.

Annotating Adobe Reader PDFs with math symbols

Many of the math textbooks and other literature I read is in PDF format, so I frequently find myself annotating these with the Adobe Reader comments tool.
I did find a helpful guide here, but sometimes I'd like the option of inserting math symbols, too. Has anyone found a reliable way to insert math symbols, TeX, or other arbitrary formatting into the annotations?
So far, the best I've come up with is to enter the unicode prefixed by "0x" and hit alt+X after it. Maybe with the Adobe javascript SDK you could write a script to shortcut this.
I don't think any of the current commercial editors make this easy, which is too bad. I am sure the vendors monitor this site, so there is hope.
In the meantime, here is a manual workaround.
Use tikz to create your comment boxes. Here are the two examples I found to be most relevant: Boxes and Positioning. Play around with the options to get both the shape and the placement you want. Generate a pdf file from the latex source that contains your comments.
IMPORTANT: if your comments end before the last page of the original document, insert:
\pagebreak{} % create empty page
\thispagestyle{empty} % get rid of page numbers et al
~ % put a space so the page gets generated
before your \end{document}, to get an empty last page. The following command will reuse the last page of your comments document on all subsequent pages of the original document.
Use a recent version of pdftk with the multistamp command to overlay your equations file with your original file like so:
pdftk original.pdf multistamp comments.pdf output out.pdf
Also see this question.
The free (as speech) PDF tool, Okular, supports this functionality by putting latex formula directly between $$...$$.

Resources