What is the format of string out put in mPDF? and how to read the string generated by mPDF? - mpdf

When you are using mPDF to generate output PDF as string as stated in documentation ($mpdf->Output('', "S");) , what is the format of that string (ex, base68) and how to read them in front-end or convert them to something that can be used to view pdf in front-end.

The string output is a plaintext/binary combination stream of the resulting PDF document. You read it with a PDF viewer directly.

Related

Base64-encoded binary string, txt to PDF in R

I am trying to download PDF's via an API where I recieve the data as Base64-encoded binary in JSON format.
Is there a way to convert that to a pdf using R?
My approach is the following but that generated "PDF" can't be read properly by a PDF reader.
By looking at it in Notepad it also looks like it is missing something like additional metadata?, as it should create the following PDF
file <- fromJSON("data.txt")
decoded <- base64_dec(file$data)
save(decoded, "file.pdf")
File: data.txt
You should use writeBin to write out the binary raw data to a file
data <- jsonlite::fromJSON("data.txt")
raw <- openssl::base64_decode(data$data)
writeBin(raw, "output.pdf")

Error while converting .pdf file to .ppm in R

I'm trying to apply OCR on a pdf to extract the text for which I'm following this post ,
for which I need to first convert pdf file to ppm format since we cannot apply OCR directly on a pdf file. But the first step itself is not happening
So how do I convert pdf to ppm in order to do OCR on a pdf file.
This is a post of mine where I get an error on first step itself.
Any suggestion will be helpful.
Thanks.

How do you convert a table that is in a .docx file to an .xlsx or a csv file in python or R?

I have a document like the one mentioned below. There is some text above the table and then there's a table. How do I extract table from the docx file in R or python and then convert it to a csv file or an xlsx file. I don't even mind a .txt file if it retains the exact format of the table. I just don't know what to do with this doc file.
If the document is docx, then it is all XML. The docx file is just a zip container with various XML "parts". Take a look at the Open XML SDK for some ideas on how to parse the file. This SDK is C#, but maybe you can get some ideas from that.
If you are just going to extract the table it should not be too bad ( Updating complex docx documents can get very complicated. I'm working on this now.) My tip to make things easier is to go to the table properties, then to the Alt Text tab and add a unique value to the "Title" field. The value will show up like this within the table properties: <w:tblCaption w:val="TBL1"/>, which will make the table easier to extract from the XML.
If you are going to work with Open XML documents, get the OOXML Chrome Addin. That is great for exploring the internals of docx files.
Note: I saw the link to another SO answer for this. That uses "automation", which is certainly easier to code, but Office via "automation" on the server is not recommended by MS.
You can extract tables from docx using python-docx in python.
Try this:
from docx import Document
import pandas as pd
document = Document(file_path)
tables = []
for index,table in enumerate(document.tables):
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
df[i][j] = cell.text
pd.DataFrame(df).to_excel("Table# "+str(index)+".xlsx")

How can I get Arabic report using itextsharp

I used this tutorial:
http://www.aspsnippets.com/Articles/Export-ASPNet-Web-Page-with-images-to-PDF-using-ITextsharp.aspx
to convert my page to a pdf report, but when I use Arabic characters in HTML code, I get empty an report. I think iTextsharp can't decode Arabic language; for example, I write this:
<tr><td>نام</td><td>Age</td></tr>
<tr><td>John</td><td>11</td></tr>
When i use Arabic, in- and output show me empty cells. How can i solve that?

How to generate Rich Text Format (RTF) document in ASP.NET?

How to generate Rich Text Format (RTF) document in ASP.NET?
The RTF specification from Microsoft:
http://msdn.microsoft.com/en-us/library/aa140277%28office.10%29.aspx
Or, in order to bypass the RTF learning curve, you could generate HTML from ASP.NET then shell out and use the html2rtf tool to convert it to rtf:
http://www.gkrueger.com/personal/html2rtf/index.html
Or, if your doc is going to be used in template-like manner, you could create the rtf file by hand in WordPad and save it with special strings that you invent, like $YOUR_NAME_HERE$. Later, you read the your saved rtf file and replace your special strings.
Just follow the spec:
http://en.wikipedia.org/wiki/Rich_Text_Format
Start your file with
MyFile.WriteLine("{\\rtf1 ")
and just follow the spec linked by Spencer.

Resources