Read MS Word file page by page with header and footer using OpenXml and iTextSharp

Read MS Word file page by page with header and footer using OpenXml and iTextSharp - asp.net

I'm new in OpenXml and recently started development using OpenXml with ASP.Net(C#/Vb).
I want to read MS Word file page by page with header and footer using OpenXml and then write those pages to pdf file using iTextSharp.
Code for Reference:
Dim fileName As String = "D:\With_Header.docx"
Dim body As Body
Dim contentOfFile As String = String.Empty
Dim pdfDoc As New iTextSharp.text.Document(iTextSharp.text.PageSize.A4, 25, 25, 30, 30)
Dim pdfWriter As PdfWriter = pdfWriter.GetInstance(pdfDoc, fs)
Using strm As Stream = File.Open(fileName, FileMode.Open)
Using doc As WordprocessingDocument = WordprocessingDocument.Open(strm, False)
body = doc.MainDocumentPart.Document.Body
contentOfFile = body.InnerText
pdfDoc.Add(New iTextSharp.text.Paragraph(contentOfFile))
pdfDoc.Close()
pdfWriter.Close()
End Using
End Using

Related

Illegal file path

I have an aspx webpage that is getting parameters as querystring and i wanted to open that page and read it the lines and export it as a pdf, i am using the following code but it does not seems to work
Dim strpath As String = Server.MapPath("pagename.aspx?id=0000")
Dim sr As StreamReader = New StreamReader(strpath, False)
Dim line As String
line = sr.ReadToEnd
sr.Close()
' Code to convert to pdf
'Dim doc As New Document(PageSize.LETTER, 80, 50, 30, 65)
Dim fsNew As New StringReader(line)
Dim doc As New Document(PageSize.A4, 80, 50, 30, 65)
Dim Styles As New StyleSheet()
'stryle.LoadTagStyle("ol", "16,0")
Using fs As New FileStream("newpdf.pdf", FileMode.Create)
PdfWriter.GetInstance(doc, fs)
Using stringReader As New StringReader(line)
Dim parsedList = HTMLWorker.ParseToList(stringReader, Styles)
doc.Open()
' parse each html object and add it to the pdf document
For Each item As Object In parsedList
doc.Add(DirectCast(item, IElement))
Next
doc.Close()
End Using
End Using

Merging Jpg file to Pdf Stream

Here is my issue: I'm trying to read JPG files from a folder and convert them to one PDF file for example if in my folder I have 1).Hello.jpg 2). World.jpg I want to grab those files and combined it to a one PDF file so the result will be
newPDF.pdf
I'm reading the images correctly from the folder adding them to the document but it's not creating the new PDF file in the folder. How can I solve this??
Here is my code:
'!=Orginally after setting all the files in the folder we need to read the path from the session file.
'!= After reading the path we need to read each file from the folder and generate one pdf file.
Dim attachmentsFolder As String = "E:/IRAttachments/PSC/2013/2/IR-7264"
Dim fileName As String = String.Concat("IR_7264(", DateTime.Now.ToString("yyyyMMddHHmmssfff").ToString(), ").pdf")
Dim finalPathName As String = String.Concat(attachmentsFolder, "/", fileName)
'!= Step 2). read the pdf/images from folder and merge them to a one pdf file.
Dim files As New List(Of String)()
Dim readerList As New List(Of PdfReader)()
m_HashTableIRAttachments = New Hashtable
m_DictionaryEntryIRAttachments = New DictionaryEntry
Dim fileExtentionType As String = String.Empty
Dim doc As Document = New Document
For Each filePath As String In Directory.GetFiles(attachmentsFolder)
fileExtentionType = filePath.Substring(filePath.LastIndexOf("."))
If fileExtentionType = ".jpg" Then '# Get the extension type
Dim document As New Document()
Using stream = New FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None)
PdfWriter.GetInstance(document, stream)
document.Open()
Using imageStream = New FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
Dim image__1 = Image.GetInstance(imageStream)
document.Add(image__1)
Dim pdfFile As String = finalPathName
End Using
document.Close()
End Using
'PdfWriter.GetInstance(doc, New FileStream(Request.PhysicalApplicationPath + fileName, FileMode.Create))
'doc.Open()
'doc.Add(New Paragraph("Hello World"))
'Dim myDoc As New Document(PageSize.A4, 10.0F, 10.0F, 100.0F, 0.0F)
'Dim pdfFile As String = finalPathName
'Dim writer As PdfWriter = PdfWriter.GetInstance(myDoc, New FileStream(pdfFile, FileMode.Create))
'myDoc.Open()
'Dim para As New Paragraph("Let's write some text before inserting image.")
'Dim myImage As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(filePath)
'myImage.ScaleToFit(300.0F, 250.0F)
'myImage.SpacingBefore = 50.0F
'myImage.SpacingAfter = 10.0F
'myImage.Alignment = Element.ALIGN_CENTER
'myDoc.Add(para)
'myDoc.Add(myImage)
'myDoc.Close()
'doc.Close()
Else
'# Means it's a pdf and not a jpg file.
Dim pdfReader1 As New PdfReader(filePath)
readerList.Add(pdfReader1)
End If
Next

When you create the Stream for your PDF file, you are using the fileName variable, which is only the name, not the full path. It is likely that the PDF is being created - just not where you are expecting it to be. You probably want to use finalPathName instead:
Using stream = New FileStream(finalPathName, FileMode.Create, FileAccess.Write, FileShare.None)
I would also recommend you take a look at the methods available on the System.IO.Path class, and use them when constructing file paths and getting the file extension, e.g.
Dim finalPathName As String = Path.Combine(attachmentsFolder, fileName)
'...
fileExtentionType = Path.GetExtension(filePath)
' etc.
EDIT
It looks like you are also overwriting the PDF file for each image file, while I would imagine you want all of the images in one PDF file. Your loop for the images should probably be inside the Using stream = ... block (e.g. between document.Open() and document.Close()).

itextsharp SetFields not setting

I have my temp PDF on the network and am using asp to fill in the fields and then download the file.
The problem I have is that the file downloaded is just the blank template, none of the fields are filled?
My code
Dim doc As New Document(PageSize.A4.Rotate)
Dim ms As New MemoryStream()
Dim writer = PdfWriter.GetInstance(doc, ms)
writer.Open()
Dim PdfR As New PdfReader("http://192.168.0.221/template.pdf")
Dim PdfS As New PdfStamper(PdfR, ms)
Dim fields As AcroFields = PdfS.AcroFields
fields.SetField("s1", "00")
fields.SetField("pono", "100")
PdfS.FormFlattening = True
PdfS.Close()
PdfR.Close()
Dim r = System.Web.HttpContext.Current.Response
r.ContentType = "application/pdf"
r.AddHeader("Content-Disposition", String.Format("attachment;filename=Testing.pdf", "Testing"))
r.BinaryWrite(ms.ToArray)

If anyone else ever hits this issue
1) If you dont mind your fields being editable then remove the FormFlattening command
2) Else add this fields.GenerateAppearances = True

Cannot get CSS to work in iTextSharp (5.4.3) when making pdf

I have a problem trying to apply a css file to my pdf using the iTextSharp (5.4.3) generation library. basically the css is not being applied at all.
I have the following method in my vb.net file
Protected Sub btnPreview_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnPreview.Click
Dim bytes As Byte()
bytes = System.Text.Encoding.UTF8.GetBytes(letterRadEdit.Content)
Dim tagProcessor As tool.xml.html.DefaultTagProcessorFactory()
Using input As New MemoryStream(bytes, False)
Dim ms As New MemoryStream()
Dim document As New iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 36, 36, 36, 36)
Dim writer As PdfWriter = PdfWriter.GetInstance(document, ms)
writer.CloseStream = False
document.Open()
Dim htmlContext As HtmlPipelineContext = New HtmlPipelineContext(Nothing)
htmlContext.SetAcceptUnknown(True)
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory())
Dim cssResolver As ICSSResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(False)
cssResolver.AddCssFile(HttpContext.Current.Server.MapPath("/assets/css/pdf.css"), True)
Dim pipeline As New CssResolverPipeline(cssResolver, New HtmlPipeline(htmlContext, New PdfWriterPipeline(document, writer)))
Dim pdfworker As New XMLWorker(pipeline, True)
Dim p As New XMLParser(True, pdfworker, New System.Text.UTF8Encoding)
Try
'p.AddListener(pdfworker)
'p.Parse(input, Encoding.UTF8)
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, input, New FileStream(HttpContext.Current.Server.MapPath("~/assets/css/pdf.css"), FileMode.Open, FileAccess.Read))
Catch
Finally
pdfworker.Close()
End Try
document.Close()
ms.Position = 0
Response.Buffer = True
Response.Clear()
Response.ContentType = "application/pdf"
Response.AddHeader("content-disposition", "attachment; filename=preview.pdf")
Response.BinaryWrite(ms.GetBuffer())
Response.Flush()
End Using
End Sub
the CSS file simply contains :
p{color:#e10000;margin-bottom:1.2em;}
(This is to test whether it's rendering correctly, all text should be red)
My problem is that the following command
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, input, New FileStream(HttpContext.Current.Server.MapPath("~/assets/css/pdf.css"), FileMode.Open, FileAccess.Read))
correctly produces the pdf, but doesn't apply the CSS to it. I know it's reading the CSS because I had a permissions exception until I applied the FileAccess.Read property
the method
p.Parse(input, Encoding.UTF8)
doesn't produce any pdf, just an 'Element not allowed' exception, this is because the html (coming from a radeditor text box Q3 2013) is old html and the parse seems to have a problem with tables.

iTextSharp is very poor with designs using css, images etc. Instead wkhtmltopdf is the best.

Well it would appear that the CSS was correctly being applied as I tested a
td{
border:1px solid red;
padding:0.4em;
margin:0;
}
to the pdf, and all the cells got bordered in red, so it would appear that the pdf overrides certain styles. Not sure why.

asp.net openxml open docx, change content and stream to user

My code is below. I'm trying to open a Word document with Open XML and change certain text. The document must then be send to the client where they can save it on their PC or Open it. It send a document to the client but it is blank. When I save my InMemory document it says the file cannot be open it must contain at least one root element. I'm using Visual STudio 2010 Express. Please help me. What is wrong with my code?
Dim fileName As String = "directory on server\doc.docx"
Dim myDocument As WordprocessingDocument = WordprocessingDocument.Open(fileName, True)
Dim docText As String = Nothing
Dim sr As StreamReader = New StreamReader(myDocument.MainDocumentPart.GetStream)
docText = sr.ReadToEnd
sr.Close()
Dim regexText As Regex = New Regex("XXXCourtXXX")
docText = regexText.Replace(docText, "JOHANNESBURG")
Dim ms As New MemoryStream()
Dim sw As StreamWriter = New StreamWriter(ms)
sw.Write(docText)
myDocument.MainDocumentPart.FeedData(ms)
Dim mem = New MemoryStream()
myDocument.MainDocumentPart.GetStream().CopyTo(Response.OutputStream)
Response.ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
Response.AppendHeader("Content-Disposition", "attachment;filename=Notice.docx")
mem.Position = 0
mem.CopyTo(Response.OutputStream)
Response.Flush()
Response.End()

You're dimming a new memory stream mem, writing nothing to it and then copying it to the output stream. Remove all lines referencing your mem variable.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Read MS Word file page by page with header and footer using OpenXml and iTextSharp - asp.net

Related

Illegal file path

Merging Jpg file to Pdf Stream

itextsharp SetFields not setting

Cannot get CSS to work in iTextSharp (5.4.3) when making pdf

asp.net openxml open docx, change content and stream to user

Categories

Resources