I have generated some PDF report using MigraDoc. Initial code is as follows:-
MigraDoc.DocumentObjectModel.Document document = new MigraDoc.DocumentObjectModel.Document();
MigraDoc.DocumentObjectModel.Section section = document.AddSection();
...
Paragraph paragraph = section.Headers.Primary.AddParagraph();
....
table = section.AddTable();
...
paragraph = section.Footers.Primary.AddParagraph();
...
The PDF was rendered successfully. Now I want to add some graphics into the pages of this document. I have gone through several articles for that and found that everyone using PdfDocument class instead of MigraDoc.DocumentObjectModel.Document. Is it possible to apply graphics into pages of a document of type MigraDoc.DocumentObjectModel.Document using XGraphics? If it is not possible, what is the best way to mix PdfDocument with MigraDoc.DocumentObjectModel.Document to accomplish the same?
MigraDoc uses PDFsharp and an XGraphics object to create the PDF pages.
There are several ways to add content to pages created by MigraDoc.
This MigraDoc sample shows some options:
http://pdfsharp.net/wiki/MixMigraDocAndPdfSharp-sample.ashx
You can even call MigraDoc to use "your" XGraphics object for drawing:
// Alternative rendering with progress indicator.
// Set a callback for phase 1.
pdfRenderer.DocumentRenderer.PrepareDocumentProgress += PrepareDocumentProgress;
// Now start phase 1: Preparing pages (i.e. calculate the layout).
pdfRenderer.PrepareRenderPages();
// Now phase 2: create the PDF pages.
Console.WriteLine("\r\nRendering document ...");
int pages = pdfRenderer.DocumentRenderer.FormattedDocument.PageCount;
for (int i = 1; i <= pages; ++i)
{
var page = pdfRenderer.PdfDocument.AddPage();
Console.Write("\rRendering page " + i + "/" + pages);
PageInfo pageInfo = pdfRenderer.DocumentRenderer.FormattedDocument.GetPageInfo(i);
page.Width = pageInfo.Width;
page.Height = pageInfo.Height;
page.Orientation = pageInfo.Orientation;
using (XGraphics gfx = XGraphics.FromPdfPage(page))
{
gfx.MUH = pdfRenderer.Unicode ? PdfFontEncoding.Unicode : PdfFontEncoding.WinAnsi;
gfx.MFEH = pdfRenderer.FontEmbedding;
pdfRenderer.DocumentRenderer.RenderPage(gfx, i);
}
}
Console.WriteLine("\r\nSaving document ...");
Sample code taken from this post:
http://forum.pdfsharp.net/viewtopic.php?p=9293#p9293
Related
Is it possible to scale a page from e.g. A2 to A1 with PDFsharp?
I can set the size of the page via Size, Width and Height. But how can I scale the content of the page?
based on the comment from Vive and the link provided there, here an example for resizing to A4 with C#:
you must include:
using PdfSharp.Pdf;
using PdfSharp.Drawing;
using PdfSharp;
then:
// resize this file from A3 to A4
string filename = #"C:\temp\A3.pdf";
// Create the new output document (A4)
PdfDocument outputDocument = new PdfDocument();
outputDocument.PageLayout = PdfPageLayout.SinglePage;
XGraphics gfx;
XRect box;
// Open the file to resize
XPdfForm form = XPdfForm.FromFile(filename);
// Add a new page to the output document
PdfPage page = outputDocument.AddPage();
if (form.PixelWidth > form.PixelHeight)
page.Orientation = PageOrientation.Landscape;
else
page.Orientation = PageOrientation.Portrait;
double width = page.Width;
double height = page.Height;
gfx = XGraphics.FromPdfPage(page);
box = new XRect(0, 0, width, height);
gfx.DrawImage(form, box);
// Save the document...
string newfilename = #"c:\temp\resized.pdf";
outputDocument.Save(newfilename);
You can use DrawImage() to draw an existing PDF page on a new PDF page. You can specify the destination rectangle and thus you can scale the page as needed.
Use the XPdfForm class to access the existing PDF file.
See the Two Pages on One sample for details:
http://www.pdfsharp.net/wiki/TwoPagesOnOne-sample.ashx
I have a pdf file that as the follow security properties: printing: allowed; document assembly: NOT allowed; content copy: allowed; content copy for accessibility: allowed; page extraction:NOT allowed;
I try to get text with sample code as documentation sample as follow:
pdftext.Text = null;
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(filename);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
text.Append(System.Environment.NewLine);
text.Append("\n Page Number:" + page);
text.Append(System.Environment.NewLine);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
progressBar1.Value++;
}
pdftext.Text += text.ToString();
pdfReader.Close();
but the output text is lines with ""??? ? ???????\n?? ??? ? " values;
seems that file is crypted or we have a encoding problem...
note that in the follow lines
var f = pdfReader.IsOpenedWithFullPermissions; -> FALSE
var f1 = pdfReader.IsEncrypted(); - > FALSE
var f2 = pdfReader.ComputeUserPassword(); - > NULL
var f3 = pdfReader.Is128Key(); - > FALSE
var f4 = pdfReader.HasUsageRights();
f, f1, f3, f4 return FALSE ...than seems that the document is not crypted,
...so I don't know if is a Encoding problem or question related to encrypet strings...
Someone can help me?
thanks in advance.
G.G.
Whenever you have trouble extracting text from a document using standard code, the first thing to do is try and copy&paste the text from it using Adobe Acrobat Reader. Adobe Reader copy&paste implements text extraction according to the recommendations of the PDF specification, and if this fails, this usually means that the necessary information required for text extraction in the document are either missing or broken (by accident or by design). To extract the text, one either needs to customize the code specifically to the specific PDF or resort to OCR.
In case of the document at hand, Adobe Reader copy&paste does result in garbage, too, just like when extracting with iText. Thus, there is something fishy in the document.
Inspecting the document one finds that the fonts contain ToUnicode mappings like this:
/CIDInit /ProcSet
findresource begin 12 dict begin begincmap /CIDSystemInfo<</Registry(Adobe)
/Ordering(Identity)
/Supplement 0
>>
def
/CMapName/F18 def
1 begincodespacerange <0000> <FFFF> endcodespacerange
44 beginbfrange
<20> <20> <0020>
<21> <21> <E0F9>
<22> <22> <E0F1>
<23> <23> <E0FA>
<24> <24> <E0F7>
<25> <25> <E0A3>
<26> <26> <E084>
<27> <27> <E097>
<28> <28> <E098>
<29> <29> <E09A>
<2A> <2A> <E08A>
<2B> <2B> <E099>
<2C> <2C> <E0A5>
<2D> <2D> <E086>
<2E> <2E> <E094>
<2F> <2F> <E0DE>
<30> <30> <E0A6>
<31> <31> <E096>
<32> <32> <E088>
<33> <33> <E082>
<34> <34> <E04C>
<35> <35> <E0A4>
<36> <36> <E0F6>
<37> <37> <E0F2>
<38> <38> <E0D8>
<39> <39> <E0AA>
<3A> <3A> <E06C>
<3B> <3B> <E087>
<3C> <3C> <E095>
<3D> <3D> <E0C4>
<3E> <3E> <E07E>
<3F> <3F> <E055>
<40> <40> <E089>
<41> <41> <E085>
<42> <42> <E083>
<43> <43> <E070>
<44> <44> <E0E6>
<45> <45> <E080>
<46> <46> <E0C8>
<47> <47> <E0F4>
<48> <48> <E062>
<49> <49> <E0F3>
<4A> <4A> <E04E>
<4B> <4B> <E05E>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
I.e., if you are not into this, the fonts claim that all their glyphs (with the exception of the space glyph at 0x20) represent characters U+E0xx from the Unicode private use area. As the name of that area indicates, there is no common meaning of characters with these values.
Thus, text extraction according to the PDF specification will return strings of characters with undefined meaning with results as you observed in iText or I saw in Adobe Reader.
Sometimes in such a situation one can still enforce proper text extraction by ignoring the ToUnicode map and using either the font Encoding or information inside the embedded font program.
Unfortunately it turns out that here the Encoding effectively contains the same information as does the ToUnicode map, e.g. for the same font as above
/Differences [ 32 /space /uniE0F9 /uniE0F1 /uniE0FA /uniE0F7 /uniE0A3 /uniE084 /uniE097 /uniE098
/uniE09A /uniE08A /uniE099 /uniE0A5 /uniE086 /uniE094 /uniE0DE /uniE0A6 /uniE096
/uniE088 /uniE082 /uniE04C /uniE0A4 /uniE0F6 /uniE0F2 /uniE0D8 /uniE0AA /uniE06C
/uniE087 /uniE095 /uniE0C4 /uniE07E /uniE055 /uniE089 /uniE085 /uniE083 /uniE070
/uniE0E6 /uniE080 /uniE0C8 /uniE0F4 /uniE062 /uniE0F3 /uniE04E /uniE05E ]
and the fonts turns out to be Type3 fonts, i.e. there is no embedded font program but each glyph is defined as an individual PDF canvas without further character information.
Thus, nothing to gain here either.
Actually these small PDF canvasses contain inlined bitmap graphics of the respective glyph which also is the cause of the poor graphical quality of the document (if you don't see that immediately, simply zoom in a bit and you'll see the ragged outlines of the glyphs).
By the way, such a construct usually means that the producer of the PDF explicitly wants to prevent text extraction.
If you happen to have to extract text from many such documents, you can try and determine a mapping from their U+E0xx characters to actually sensible Unicode characters and apply that mapping to your extracted text.
If all those fonts in all those documents happen to use the same U+E0xx codepoints for the same actual characters, you'll be able to do text extraction from those documents after investing a certain amount of initial work.
Otherwise do try OCR.
The following code adds pages to a document which map the ToUnicode values to the characters shown:
void AddFontsTo(PdfReader reader, PdfStamper stamper)
{
int documentPages = reader.NumberOfPages;
for (int page = 1; page <= documentPages; page++)
{
// ignore inherited resources for now
PdfDictionary pageResources = reader.GetPageResources(page);
if (pageResources == null)
continue;
PdfDictionary pageFonts = pageResources.GetAsDict(PdfName.FONT);
if (pageFonts == null || pageFonts.Size == 0)
continue;
List<BaseFont> fonts = new List<BaseFont>();
List<string> fontNames = new List<string>();
HashSet<char> chars = new HashSet<char>();
foreach (PdfName key in pageFonts.Keys)
{
PdfIndirectReference fontReference = pageFonts.GetAsIndirectObject(key);
if (fontReference == null)
continue;
DocumentFont font = (DocumentFont) BaseFont.CreateFont((PRIndirectReference)fontReference);
if (font == null)
continue;
PdfObject toUni = PdfReader.GetPdfObjectRelease(font.FontDictionary.Get(PdfName.TOUNICODE));
CMapToUnicode toUnicodeCmap = null;
if (toUni is PRStream)
{
try
{
byte[] touni = PdfReader.GetStreamBytes((PRStream)toUni);
CidLocationFromByte lb = new CidLocationFromByte(touni);
toUnicodeCmap = new CMapToUnicode();
CMapParserEx.ParseCid("", toUnicodeCmap, lb);
}
catch
{
toUnicodeCmap = null;
}
}
if (toUnicodeCmap == null)
continue;
ICollection<int> mapValues = toUnicodeCmap.CreateDirectMapping().Values;
if (mapValues.Count == 0)
continue;
fonts.Add(font);
fontNames.Add(key.ToString());
foreach (int value in mapValues)
chars.Add((char)value);
}
if (fonts.Count == 0 || chars.Count == 0)
continue;
Rectangle size = (fonts.Count > 10) ? PageSize.A4.Rotate() : PageSize.A4;
PdfPTable table = new PdfPTable(fonts.Count + 1);
table.AddCell("Page " + page);
foreach (String name in fontNames)
{
table.AddCell(name);
}
table.HeaderRows = 1;
float[] widths = new float[fonts.Count + 1];
widths[0] = 2;
for (int i = 1; i <= fonts.Count; i++)
widths[i] = 1;
table.SetWidths(widths);
table.WidthPercentage = 100;
List<char> charList = new List<char>(chars);
charList.Sort();
foreach (char character in charList)
{
table.AddCell(((int)character).ToString("X4"));
foreach (BaseFont font in fonts)
{
table.AddCell(new PdfPCell(new Phrase(character.ToString(), new Font(font))));
}
}
stamper.InsertPage(reader.NumberOfPages + 1, size);
ColumnText columnText = new ColumnText(stamper.GetUnderContent(reader.NumberOfPages));
columnText.AddElement(table);
columnText.SetSimpleColumn(size);
while ((ColumnText.NO_MORE_TEXT & columnText.Go(false)) == 0)
{
stamper.InsertPage(reader.NumberOfPages + 1, size);
columnText.Canvas = stamper.GetUnderContent(reader.NumberOfPages);
columnText.SetSimpleColumn(size);
}
}
}
I applied it to your document like this:
string input = #"4700198773.pdf";
string output = #"4700198773-fonts.pdf";
using (PdfReader reader = new PdfReader(input))
using (FileStream stream = new FileStream(output, FileMode.Create, FileAccess.Write))
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AddFontsTo(reader, stamper);
}
The additional pages look like this:
Now you have to compare the outputs for the different fonts and pages of this document with each other and with those of a representative selection of file. If you find good enough a pattern, you can try this replacement way.
In my Requirement i want to repeat a particular paragraph which is in section in a word document.
here word document divided into sections, in sections we have paragraphs like below
#Section Start
1) TO RECEIVE AND ADOPT FINANCIAL STATEMENTS FOR THE YEAR ENDED [FYE]
a.That the Financial Statements of the Company for the financial year ended [FYE] together with the Director(s)' Report and Statement thereon be hereby received and adopted.
b. Second paragraph.
c. Third paragraph.
#Section End
i want to repeat "a" point into 3 times
i tried the below code
// Copy all content including headers and footers from the specified
//pages into the destination document.
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, doc.Sections.Count, NodeType.Section);
System.Data.DataTable dt = GetDataTable(); //Sample DataTable which is having Keys and Values
int sectionCount = 0;
foreach (Section section in pageSections)
{
NodeCollection paragraphs = section.GetChildNodes(NodeType.Paragraph, true);
for (int i = 0; i < paragraphs.Count; i++)
{
string text = paragraphs[i].Range.Text;
}
}
Please help me how to repeat a paragraph.
I am working as Social Media Developer at Aspose. Please use the following sample code to repeat a paragraph using Aspose.Words for .NET.
Document doc = new Document("document.docx");
PageNumberFinder finder = new PageNumberFinder(doc);
// Split nodes which are found across pages.
finder.SplitNodesAcrossPages(true);
// Copy all content including headers and footers from the specified pages into the
//destination document.
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, doc.Sections.Count, NodeType.Section);
//Sample DataTable which is having Keys and Values
System.Data.DataTable dt = GetDataTable();
int sectionCount = 0;
foreach (Section section in pageSections)
{
NodeCollection paragraphs = section.GetChildNodes(NodeType.Paragraph, true);
for (int i = 0; i < paragraphs.Count; i++)
{
//Paragraph you want to copy
if (i == 10)
{
//Use Document Builder to Navigate to the paragraph
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveTo(paragraphs[i]);
//Insert a Paragraph break
builder.InsertParagraph();
//Insert the Paragraph to repeat it
builder.Writeln(paragraphs[i].ToString(SaveFormat.Text));
}
}
}
doc.Save("test.docx");
Is there a library or code somewhere that does that?
Some questions suggest software like Convert a PDF to a Transparent PNG with GhostScript
I need something that's done by program. So my site, which is an asp site, should have a function
function PNGfromPDF (someFile as String) as PNGSomething
end function
Something like that.
Any open source solution for that?
Try:
PdfDocument inputDocument = PdfReader.Open(fileNames[i], PdfDocumentOpenMode.Import);
// for each page create a new PDF file and save it on the disk
for (int pageCount = 0; pageCount < inputDocument.PageCount; pageCount++)
{
fileNameWithoutExtension = Path.GetFileNameWithoutExtension(fileNames[i]);
fileName = string.Format("{0}\\Documents\\{1}", Session.CentralWorkingDirectory, String.Format("{0} ({1}-{2}).pdf", fileNameWithoutExtension, pageCount + 1, inputDocument.PageCount));
pdfFile = PDFFile.Open(fileName);
pdfFile.SerialNumber = Configurations.PDFVIEW_KEY;
// Get image file name
string imageFileName = string.Format("{0}.png", fileName.Remove(fileName.Length - 4));
// If thumbnail already exists delete it
if (File.Exists(imageFileName))
{
File.Delete(imageFileName);
}
// Convert page to PNG and save it.
//Bitmap pageImage = pdfFile.GetPageImage(0, 32);
Bitmap pageImage = pdfFile.GetPageImage(0, 92);
pageImage.Save(imageFileName, ImageFormat.Png);
// Cleanup resources
pageImage.Dispose();
pdfFile.Dispose();
}
Here I am using below namespace...
using PdfSharp.Drawing;
using O2S.Components.PDFRender4NET; // Thrid party components so you use PDF sharp with this componets
using System.Drawing.Imaging;
I am using PngCs dll to fetch the chunk data for Png image file in asp.net, I am able to do that but now I want to update the chunk data for that PNG.
I used PngWriter but it is creating whole new file without inheriting chunk data.
PngReader pngr = FileHelper.CreatePngReader(path);
pngr.GetMetadata().GetTxtForKey(PngChunkITXT.KEY_Title);
Response.Write(pngr.GetMetadata().GetTxtForKey(PngChunkITXT.KEY_Title));
Below code is for writing new Png Image through PngWriter ,I want embed new itxt chunk while creating new file.
PngReader pngr = FileHelper.CreatePngReader(origFilename); // or you can use the constructor
PngWriter pngw = FileHelper.CreatePngWriter(destFilename, pngr.ImgInfo, true); // idem
Console.WriteLine(pngr.ToString()); // just information
int chunkBehav = ChunkCopyBehaviour.COPY_ALL_SAFE; // tell to copy all 'safe' chunks
pngw.CopyChunksFirst(pngr, chunkBehav); // copy some metadata from reader
for (int row = 0; row < pngr.ImgInfo.Rows; row++)
{
ImageLine l1 = pngr.ReadRowInt(row); // format: RGBRGB... or RGBARGBA...
pngw.WriteRow(l1, row);
}
pngw.CopyChunksLast(pngr, chunkBehav); // metadata after the image pixels? can happen
pngw.End(); // dont forget this
pngr.End();
for further reference click this link
Try this
PngReader pngr = FileHelper.CreatePngReader(origFilename);
PngWriter pngw = FileHelper.CreatePngWriter(destFilename, pngr.ImgInfo, true);
pngw.CopyChunksFirst(pngr, ChunkCopyBehaviour.COPY_ALL);
pngw.GetMetadata().SetText(myKey, myText,false,false); // provide your own data
for (int row = 0; row < pngr.ImgInfo.Rows; row++) {
ImageLine l1 = pngr.ReadRowInt(row);
pngw.WriteRow(l1, row);
}
pngw.CopyChunksLast(pngr, ChunkCopyBehaviour.COPY_ALL);
pngw.End(); // dont forget this
pngr.End();
the problem has been solved by using CsXMpToolKit.dll ,which is the best option to fetch the metadat from any type of file.