Using the following PHP code:
$pdf = new mPDF('utf-8', 'A4-L'); //have tried several of the formats
$pdf->WriteHTML($content,2);
$pdf->Output();
where $content is a very plain html table
As long as the number of rows in the table are less than one page, the PDF is generated without errors. If I add one or more rows to the html I get the PHP warnings:
Message: Invalid argument supplied for foreach()
Filename: mpdf/mpdf.php
Line Number: 11043
repeated many, many times. With mPDF error notices turned on I get
mPDF error: Some data has already been output to browser, can't send PDF file
I tried suppressing the warnings using the code in Can't get rid of PHP notices in mPDF
This sort of worked, but the force-downloaded PDF has the following quirk:
first page: blank
2nd page: table header only
3rd to next to last pages: header and 1 row of data
last page: header + full page of data
This is the 3rd PDF from PHP library I've tried to use with my CodeIgniter framework, so I'm extremely frustrated at this point.
PHPExcel (which I'm using to give my client Excel download data option) works awesome for formatting Excel spreadsheets, but the output to PDF is hideous no matter what options I throw at it (mainly, it puts a outline around each cell with a HUGE padding)
ezPDF/PHP-PDF (per http://www.ahowto.net/php/easily-integrate-ezpdf-a-k-a-pdf-php-into-codeigniter-framework) worked awesome. Except I couldn't get it to work with the client's logo that they were adamant about showing at the top of the PDF.
I guess I'm going to try domPDF next.
Related
Playing with Azure Machine Learning using the Designer and am getting a "Delimiter not found" error when importing my data.
I originally started with a few hundred html files stored as azure blobs. Each file would be considered a single row of text, however, I had no luck importing these files for further text analytics.
I created a Data Factory job that imported each file, stripped all the tabs, quotes, cr/lf from the text, added a column for the file name and stored it all as a combined tab-delimited file. In notepad++ I can confirm that the format is FileName tab HtmlText. This is the file I'm trying to import into ML and getting the missing delimiter message as I'm trying to define the import module.
Here is the error when I try and create a dataset:
{
"message": "'Delimiter' is not specified or invalid."
}
Question 1: Is there a better way to do text analytics on a large collection of html files?
Question 2: Is there a format I need to use in my combined .tsv file that works?
Question 3: Is there maybe a max length to the string column? My html can be 10's of thousands of characters long.
you're right that it might be line length, but my guess is that there are still some special characters (i.e. anything starting with \ that aren't properly escaped or removed. How did you scrape and strip the text data? Have you tried using beautifulsoup?
I am trying IronPDF. I want to insert PDF metadata to database which I read with IronPDF. However, some "ı" characters in the metadata are not read with IronPDF. Spaces are left in place of these characters. Here is my code sample:
var md = PdfDocument.FromFile("___PATH OF PDF FILE___");
var article_title = md.MetaData.Title;
When I copy paste string to Notepad++ it gives a result like this:
And here is the screenshot of application view:
Is there a way to solve this problem or is this a bug of IronPDF? If everything goes well, of course, I think of buying. But of course, if it fails on the first try, continue to iTextSharp.
EDIT: First of all, I apologize for Windows, which made me surprised. I struggled to get a new system up all day and unfortunately it's still visual studio etc. not to be installed. I added one of the files I had problems with in the below and the IronPDF version appears as 2019.7.0.0.
PDF file: https://yadi.sk/d/HwP9JWRWTzMlSA
First of all, since you haven't provided us with a sample PDF to work with; I've google some Turkish PDF documents having metadata with Turkish characters. This is the file that I came up with: link
As you can see above the Author metadata field has ı Turkish character.
Then I created a dotnet fiddle in order to test this file using IronPDF (with the latest available version - since you haven't specified any):
sample using IronPDF
The output from this sample is ElifCakroglu which is showing the exact same symptom when copied to Notepad++:
Playing with the encodings did not help resolving this issue. So I created another dotnet fiddle to test your alternative solution which was iTextSharp: sample using iTextSharp
This time everything was working as it should be: ElifCakıroglu
Note: I've also tried creating a Word 2016 document and saving it as a PDF then using that file with the above samples and both of them did not work (not accepting as a valid PDF) for some reason. After that I tried and online PDF document validator, but the file was fine. Then I used an online converter to change the PDF version with the default settings and used the output PDF with both samples and the surprising thing is that both of them worked correctly.
My conclusion is that iTextSharp is working consistently with both documents having metadata with Turkish characters present, while IronPDF works correctly 50% of the time.
I believe that this issue is resolved and can be tested in the 2020.9 release branch of IronPdf.
https://www.nuget.org/packages/IronPdf/
So I recently helped write a code for my lab which takes our processed data and makes a merged data frame of it. For purpose of keeping the lab updated, we keep our data tables updated on a secure wiki and thus I need an HTML made so I can basically upload the dataframe onto the wiki easily. It's worked before - all I did was basically copy what was already written and working and edited it to work for a different time point in our data collection. I have no errors given back to me and the data looks how I want it to look. As far as I know this script should be written logically and working well and so far it does except for one issue: R will make a file for the HTML, but there is no HTML written in the text document.
I have HTML's written from the other data time points which are written the exact same as this one, so I don't think it is a script construction thing.
Any ideas as to why this could be happening? I just need to know where to triage.
The package used for HTML is R2HTML, included in my packages list up at the top of the script. For HTML(, file=paste()), you will need to use your own directory to see if the HTML is written as a text file.
If I am not wrong , You are trying to get the dataframe in html format .
In this case you need to use xtable package in R
Just the below code in bottom of the script
## install the xtable package before importing it
library("xtable")
print(xtable(ChildSRPtotsFU_wiki), type="html", file="check_stack_overflow.html")
I love knitr & rmarkdown, but I often find myself in situations where I have a lengthy report that takes some nontrivial amount of time to run. After it's generated, I notice my inevitable typos in text. However, re-knitting everything to just fix a couple typos (just in text, not code) takes a long time and seems avoidable. I was about to start taking a hack at developing my own solution to this, but I'm thinking it's the kind of thing that could already have a mature solution which would likely be more robust than the one I'd build.
I'm wondering if there is solution out there within knitr or third party that would allow me to edit just the text of my reports without rerunning code, generating plots and outputs etc. I know, I can simply edit the generated html text, but then those changes must be replicated in the R/Rmd code that generated it, or they get out of sync. I'm envisioning a function like this:
argument 1: the R/Rmd script with text edits (no code changes)... perhaps a warning is generated when code chunks change
argument 2: the html output file from the last time the R script in argument was knitted without the text edits.
return: the html report (argument 2) updated with the comments in the R/Rmd script (argument 1).
I use the cache option sometimes for large datasets. I toggle eval and echo on and off when developing if I'm just working on the text of my report. However, I'm looking for a function that would take care of all this for me, so one doesn't have to mess with the code and chunk options to make small edits to text.
Here's an interim solution that lets you retain the speed of making changes directly to the rendered text, but you have to do a little work after you're done making changes.
Assuming the following files:
input.knitr is the original Knitr file with text and code integrated.
output.html is the resulting HTML code that has been rendered by Knitr.
Consider making direct text edits to output.html and then running something like Meld visual merge tool:
meld output.html input.knitr
Then manually select the edits in output.html that are new and should be fixed in the original source input.knitr. Tools such as Meld do a pretty good job of aligning the texts so that the chunks and knitted output will appear as large "changes" that, in practice, you would ignore. You would focus on the small changes in the non-chunk sections.
I have a classic ASP web app that outputs reports to Excel, but it's really just html.
Some reports output with multiple groups and each group can span multiple pages (vertically). I'm aware of the "Page Titles" ability of Excel to print a specified row (or rows) on every page, however, I need the title of each group to also display in the title. Otherwise the title of the first group gets displayed as the title of every group.
I saw on google groups that someone suggested putting each group on a separate worksheet however I don't think I can output multiple worksheets easily - or at all - using html alone.
I'm looking for a quick and dirty solution as I don't have much time to devote to maintaining this crufty old app.
This is a bit late as answers go but I think I have found a solution. What you can do is open Excel, manually mock up what you want, then save it as a webpage. Open the generated file(s) up in a simple text editor and examine the generated HTML/XML. I did this for a workbook with multiple worksheets and it appears to work.
You can do the same with the multiple groups since that seems like the solution you really want, the process is the same. But the multiple worksheets option will work as well. Here are the interesting bits of what Excel generated for me (from Book.htm, not the Sheet files) when I saved a simple 2 sheet workbook with 'abc' on the first page and 'def' on the second:
<script language="JavaScript">
var c_lTabs=2;
var c_rgszSh=new Array(c_lTabs);
c_rgszSh[0] = "Sheet1";
c_rgszSh[1] = "Sheet2";
------
<xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>Sheet1</x:Name>
<x:WorksheetSource HRef="Book1_files/sheet001.htm"/>
</x:ExcelWorksheet>
<x:ExcelWorksheet>
<x:Name>Sheet2</x:Name>
<x:WorksheetSource HRef="Book1_files/sheet002.htm"/>
</x:ExcelWorksheet>
</x:ExcelWorksheets>
<x:Stylesheet HRef="Book1_files/stylesheet.css"/>
<x:WindowHeight>13065</x:WindowHeight>
<x:WindowWidth>15315</x:WindowWidth>
<x:WindowTopX>360</x:WindowTopX>
<x:WindowTopY>75</x:WindowTopY>
<x:ProtectStructure>False</x:ProtectStructure>
<x:ProtectWindows>False</x:ProtectWindows>
</x:ExcelWorkbook>
</xml><![endif]-->
</head>