Is there a library or code somewhere that does that?
Some questions suggest software like Convert a PDF to a Transparent PNG with GhostScript
I need something that's done by program. So my site, which is an asp site, should have a function
function PNGfromPDF (someFile as String) as PNGSomething
end function
Something like that.
Any open source solution for that?
Try:
PdfDocument inputDocument = PdfReader.Open(fileNames[i], PdfDocumentOpenMode.Import);
// for each page create a new PDF file and save it on the disk
for (int pageCount = 0; pageCount < inputDocument.PageCount; pageCount++)
{
fileNameWithoutExtension = Path.GetFileNameWithoutExtension(fileNames[i]);
fileName = string.Format("{0}\\Documents\\{1}", Session.CentralWorkingDirectory, String.Format("{0} ({1}-{2}).pdf", fileNameWithoutExtension, pageCount + 1, inputDocument.PageCount));
pdfFile = PDFFile.Open(fileName);
pdfFile.SerialNumber = Configurations.PDFVIEW_KEY;
// Get image file name
string imageFileName = string.Format("{0}.png", fileName.Remove(fileName.Length - 4));
// If thumbnail already exists delete it
if (File.Exists(imageFileName))
{
File.Delete(imageFileName);
}
// Convert page to PNG and save it.
//Bitmap pageImage = pdfFile.GetPageImage(0, 32);
Bitmap pageImage = pdfFile.GetPageImage(0, 92);
pageImage.Save(imageFileName, ImageFormat.Png);
// Cleanup resources
pageImage.Dispose();
pdfFile.Dispose();
}
Here I am using below namespace...
using PdfSharp.Drawing;
using O2S.Components.PDFRender4NET; // Thrid party components so you use PDF sharp with this componets
using System.Drawing.Imaging;
Related
I need to create thumbnail starting from a WebP image but ImageIO doesn't support this format. Are there any library that allow me to do something like this ?
String format = getImageFormat(imageFile);
Iterator readers = ImageIO.getImageReadersByFormatName(format);
// rescaling the image
BufferedImage bi = loadImageRescalingIfNeeded(imageFile, metadata,...);
//resample if needed
bi = resampleImageIfNeeded(bi, thumbWidth, thumbHeight);
// rotate if degree > 0
bi = rotateBufferedImage(bi, degree);
// create jpeg in output
try (ImageOutputStream imageOut = ImageIO.createImageOutputStream(fileOutputStream)){
try {
ImageWriter writer = ImageIO.getImageWritersBySuffix("jpeg").next();
ImageWriteParam iwp = writer.getDefaultWriteParam();
iwp.setProgressiveMode(ImageWriteParam.MODE_DEFAULT);
writer.setOutput(imageOut);
writer.write(null, new IIOImage(bi, null, metadata), iwp);
} catch (Exception e) {...}
....}
I have generated some PDF report using MigraDoc. Initial code is as follows:-
MigraDoc.DocumentObjectModel.Document document = new MigraDoc.DocumentObjectModel.Document();
MigraDoc.DocumentObjectModel.Section section = document.AddSection();
...
Paragraph paragraph = section.Headers.Primary.AddParagraph();
....
table = section.AddTable();
...
paragraph = section.Footers.Primary.AddParagraph();
...
The PDF was rendered successfully. Now I want to add some graphics into the pages of this document. I have gone through several articles for that and found that everyone using PdfDocument class instead of MigraDoc.DocumentObjectModel.Document. Is it possible to apply graphics into pages of a document of type MigraDoc.DocumentObjectModel.Document using XGraphics? If it is not possible, what is the best way to mix PdfDocument with MigraDoc.DocumentObjectModel.Document to accomplish the same?
MigraDoc uses PDFsharp and an XGraphics object to create the PDF pages.
There are several ways to add content to pages created by MigraDoc.
This MigraDoc sample shows some options:
http://pdfsharp.net/wiki/MixMigraDocAndPdfSharp-sample.ashx
You can even call MigraDoc to use "your" XGraphics object for drawing:
// Alternative rendering with progress indicator.
// Set a callback for phase 1.
pdfRenderer.DocumentRenderer.PrepareDocumentProgress += PrepareDocumentProgress;
// Now start phase 1: Preparing pages (i.e. calculate the layout).
pdfRenderer.PrepareRenderPages();
// Now phase 2: create the PDF pages.
Console.WriteLine("\r\nRendering document ...");
int pages = pdfRenderer.DocumentRenderer.FormattedDocument.PageCount;
for (int i = 1; i <= pages; ++i)
{
var page = pdfRenderer.PdfDocument.AddPage();
Console.Write("\rRendering page " + i + "/" + pages);
PageInfo pageInfo = pdfRenderer.DocumentRenderer.FormattedDocument.GetPageInfo(i);
page.Width = pageInfo.Width;
page.Height = pageInfo.Height;
page.Orientation = pageInfo.Orientation;
using (XGraphics gfx = XGraphics.FromPdfPage(page))
{
gfx.MUH = pdfRenderer.Unicode ? PdfFontEncoding.Unicode : PdfFontEncoding.WinAnsi;
gfx.MFEH = pdfRenderer.FontEmbedding;
pdfRenderer.DocumentRenderer.RenderPage(gfx, i);
}
}
Console.WriteLine("\r\nSaving document ...");
Sample code taken from this post:
http://forum.pdfsharp.net/viewtopic.php?p=9293#p9293
I have a pdf file that as the follow security properties: printing: allowed; document assembly: NOT allowed; content copy: allowed; content copy for accessibility: allowed; page extraction:NOT allowed;
I try to get text with sample code as documentation sample as follow:
pdftext.Text = null;
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(filename);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
text.Append(System.Environment.NewLine);
text.Append("\n Page Number:" + page);
text.Append(System.Environment.NewLine);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
progressBar1.Value++;
}
pdftext.Text += text.ToString();
pdfReader.Close();
but the output text is lines with ""??? ? ???????\n?? ??? ? " values;
seems that file is crypted or we have a encoding problem...
note that in the follow lines
var f = pdfReader.IsOpenedWithFullPermissions; -> FALSE
var f1 = pdfReader.IsEncrypted(); - > FALSE
var f2 = pdfReader.ComputeUserPassword(); - > NULL
var f3 = pdfReader.Is128Key(); - > FALSE
var f4 = pdfReader.HasUsageRights();
f, f1, f3, f4 return FALSE ...than seems that the document is not crypted,
...so I don't know if is a Encoding problem or question related to encrypet strings...
Someone can help me?
thanks in advance.
G.G.
Whenever you have trouble extracting text from a document using standard code, the first thing to do is try and copy&paste the text from it using Adobe Acrobat Reader. Adobe Reader copy&paste implements text extraction according to the recommendations of the PDF specification, and if this fails, this usually means that the necessary information required for text extraction in the document are either missing or broken (by accident or by design). To extract the text, one either needs to customize the code specifically to the specific PDF or resort to OCR.
In case of the document at hand, Adobe Reader copy&paste does result in garbage, too, just like when extracting with iText. Thus, there is something fishy in the document.
Inspecting the document one finds that the fonts contain ToUnicode mappings like this:
/CIDInit /ProcSet
findresource begin 12 dict begin begincmap /CIDSystemInfo<</Registry(Adobe)
/Ordering(Identity)
/Supplement 0
>>
def
/CMapName/F18 def
1 begincodespacerange <0000> <FFFF> endcodespacerange
44 beginbfrange
<20> <20> <0020>
<21> <21> <E0F9>
<22> <22> <E0F1>
<23> <23> <E0FA>
<24> <24> <E0F7>
<25> <25> <E0A3>
<26> <26> <E084>
<27> <27> <E097>
<28> <28> <E098>
<29> <29> <E09A>
<2A> <2A> <E08A>
<2B> <2B> <E099>
<2C> <2C> <E0A5>
<2D> <2D> <E086>
<2E> <2E> <E094>
<2F> <2F> <E0DE>
<30> <30> <E0A6>
<31> <31> <E096>
<32> <32> <E088>
<33> <33> <E082>
<34> <34> <E04C>
<35> <35> <E0A4>
<36> <36> <E0F6>
<37> <37> <E0F2>
<38> <38> <E0D8>
<39> <39> <E0AA>
<3A> <3A> <E06C>
<3B> <3B> <E087>
<3C> <3C> <E095>
<3D> <3D> <E0C4>
<3E> <3E> <E07E>
<3F> <3F> <E055>
<40> <40> <E089>
<41> <41> <E085>
<42> <42> <E083>
<43> <43> <E070>
<44> <44> <E0E6>
<45> <45> <E080>
<46> <46> <E0C8>
<47> <47> <E0F4>
<48> <48> <E062>
<49> <49> <E0F3>
<4A> <4A> <E04E>
<4B> <4B> <E05E>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
I.e., if you are not into this, the fonts claim that all their glyphs (with the exception of the space glyph at 0x20) represent characters U+E0xx from the Unicode private use area. As the name of that area indicates, there is no common meaning of characters with these values.
Thus, text extraction according to the PDF specification will return strings of characters with undefined meaning with results as you observed in iText or I saw in Adobe Reader.
Sometimes in such a situation one can still enforce proper text extraction by ignoring the ToUnicode map and using either the font Encoding or information inside the embedded font program.
Unfortunately it turns out that here the Encoding effectively contains the same information as does the ToUnicode map, e.g. for the same font as above
/Differences [ 32 /space /uniE0F9 /uniE0F1 /uniE0FA /uniE0F7 /uniE0A3 /uniE084 /uniE097 /uniE098
/uniE09A /uniE08A /uniE099 /uniE0A5 /uniE086 /uniE094 /uniE0DE /uniE0A6 /uniE096
/uniE088 /uniE082 /uniE04C /uniE0A4 /uniE0F6 /uniE0F2 /uniE0D8 /uniE0AA /uniE06C
/uniE087 /uniE095 /uniE0C4 /uniE07E /uniE055 /uniE089 /uniE085 /uniE083 /uniE070
/uniE0E6 /uniE080 /uniE0C8 /uniE0F4 /uniE062 /uniE0F3 /uniE04E /uniE05E ]
and the fonts turns out to be Type3 fonts, i.e. there is no embedded font program but each glyph is defined as an individual PDF canvas without further character information.
Thus, nothing to gain here either.
Actually these small PDF canvasses contain inlined bitmap graphics of the respective glyph which also is the cause of the poor graphical quality of the document (if you don't see that immediately, simply zoom in a bit and you'll see the ragged outlines of the glyphs).
By the way, such a construct usually means that the producer of the PDF explicitly wants to prevent text extraction.
If you happen to have to extract text from many such documents, you can try and determine a mapping from their U+E0xx characters to actually sensible Unicode characters and apply that mapping to your extracted text.
If all those fonts in all those documents happen to use the same U+E0xx codepoints for the same actual characters, you'll be able to do text extraction from those documents after investing a certain amount of initial work.
Otherwise do try OCR.
The following code adds pages to a document which map the ToUnicode values to the characters shown:
void AddFontsTo(PdfReader reader, PdfStamper stamper)
{
int documentPages = reader.NumberOfPages;
for (int page = 1; page <= documentPages; page++)
{
// ignore inherited resources for now
PdfDictionary pageResources = reader.GetPageResources(page);
if (pageResources == null)
continue;
PdfDictionary pageFonts = pageResources.GetAsDict(PdfName.FONT);
if (pageFonts == null || pageFonts.Size == 0)
continue;
List<BaseFont> fonts = new List<BaseFont>();
List<string> fontNames = new List<string>();
HashSet<char> chars = new HashSet<char>();
foreach (PdfName key in pageFonts.Keys)
{
PdfIndirectReference fontReference = pageFonts.GetAsIndirectObject(key);
if (fontReference == null)
continue;
DocumentFont font = (DocumentFont) BaseFont.CreateFont((PRIndirectReference)fontReference);
if (font == null)
continue;
PdfObject toUni = PdfReader.GetPdfObjectRelease(font.FontDictionary.Get(PdfName.TOUNICODE));
CMapToUnicode toUnicodeCmap = null;
if (toUni is PRStream)
{
try
{
byte[] touni = PdfReader.GetStreamBytes((PRStream)toUni);
CidLocationFromByte lb = new CidLocationFromByte(touni);
toUnicodeCmap = new CMapToUnicode();
CMapParserEx.ParseCid("", toUnicodeCmap, lb);
}
catch
{
toUnicodeCmap = null;
}
}
if (toUnicodeCmap == null)
continue;
ICollection<int> mapValues = toUnicodeCmap.CreateDirectMapping().Values;
if (mapValues.Count == 0)
continue;
fonts.Add(font);
fontNames.Add(key.ToString());
foreach (int value in mapValues)
chars.Add((char)value);
}
if (fonts.Count == 0 || chars.Count == 0)
continue;
Rectangle size = (fonts.Count > 10) ? PageSize.A4.Rotate() : PageSize.A4;
PdfPTable table = new PdfPTable(fonts.Count + 1);
table.AddCell("Page " + page);
foreach (String name in fontNames)
{
table.AddCell(name);
}
table.HeaderRows = 1;
float[] widths = new float[fonts.Count + 1];
widths[0] = 2;
for (int i = 1; i <= fonts.Count; i++)
widths[i] = 1;
table.SetWidths(widths);
table.WidthPercentage = 100;
List<char> charList = new List<char>(chars);
charList.Sort();
foreach (char character in charList)
{
table.AddCell(((int)character).ToString("X4"));
foreach (BaseFont font in fonts)
{
table.AddCell(new PdfPCell(new Phrase(character.ToString(), new Font(font))));
}
}
stamper.InsertPage(reader.NumberOfPages + 1, size);
ColumnText columnText = new ColumnText(stamper.GetUnderContent(reader.NumberOfPages));
columnText.AddElement(table);
columnText.SetSimpleColumn(size);
while ((ColumnText.NO_MORE_TEXT & columnText.Go(false)) == 0)
{
stamper.InsertPage(reader.NumberOfPages + 1, size);
columnText.Canvas = stamper.GetUnderContent(reader.NumberOfPages);
columnText.SetSimpleColumn(size);
}
}
}
I applied it to your document like this:
string input = #"4700198773.pdf";
string output = #"4700198773-fonts.pdf";
using (PdfReader reader = new PdfReader(input))
using (FileStream stream = new FileStream(output, FileMode.Create, FileAccess.Write))
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AddFontsTo(reader, stamper);
}
The additional pages look like this:
Now you have to compare the outputs for the different fonts and pages of this document with each other and with those of a representative selection of file. If you find good enough a pattern, you can try this replacement way.
I am using PngCs dll to fetch the chunk data for Png image file in asp.net, I am able to do that but now I want to update the chunk data for that PNG.
I used PngWriter but it is creating whole new file without inheriting chunk data.
PngReader pngr = FileHelper.CreatePngReader(path);
pngr.GetMetadata().GetTxtForKey(PngChunkITXT.KEY_Title);
Response.Write(pngr.GetMetadata().GetTxtForKey(PngChunkITXT.KEY_Title));
Below code is for writing new Png Image through PngWriter ,I want embed new itxt chunk while creating new file.
PngReader pngr = FileHelper.CreatePngReader(origFilename); // or you can use the constructor
PngWriter pngw = FileHelper.CreatePngWriter(destFilename, pngr.ImgInfo, true); // idem
Console.WriteLine(pngr.ToString()); // just information
int chunkBehav = ChunkCopyBehaviour.COPY_ALL_SAFE; // tell to copy all 'safe' chunks
pngw.CopyChunksFirst(pngr, chunkBehav); // copy some metadata from reader
for (int row = 0; row < pngr.ImgInfo.Rows; row++)
{
ImageLine l1 = pngr.ReadRowInt(row); // format: RGBRGB... or RGBARGBA...
pngw.WriteRow(l1, row);
}
pngw.CopyChunksLast(pngr, chunkBehav); // metadata after the image pixels? can happen
pngw.End(); // dont forget this
pngr.End();
for further reference click this link
Try this
PngReader pngr = FileHelper.CreatePngReader(origFilename);
PngWriter pngw = FileHelper.CreatePngWriter(destFilename, pngr.ImgInfo, true);
pngw.CopyChunksFirst(pngr, ChunkCopyBehaviour.COPY_ALL);
pngw.GetMetadata().SetText(myKey, myText,false,false); // provide your own data
for (int row = 0; row < pngr.ImgInfo.Rows; row++) {
ImageLine l1 = pngr.ReadRowInt(row);
pngw.WriteRow(l1, row);
}
pngw.CopyChunksLast(pngr, ChunkCopyBehaviour.COPY_ALL);
pngw.End(); // dont forget this
pngr.End();
the problem has been solved by using CsXMpToolKit.dll ,which is the best option to fetch the metadat from any type of file.
In the YSOD below, the stacktrace (and the source file line) contain the full path to the source file. Unfortunately, the full path to the source file name contains my user name, which is firstname.lastname.
I want to keep the YSOD, as well as the stack trace including the filename and line number (it's a demo and testing system), but the username should vanish from the sourcefile path. Seeing the file's path is also OK, but the path should be truncated at the solution root directory.
(without me having to copy-paste the solution every time to another path before publishing it...)
Is there any way to accomplish this ?
Note: Custom error pages aren't an option.
Path is embedded in .pdb files, which are produced by the compiler. The only way to change this is to build your project in some other location, preferably somewhere near the build server.
Never mind, I found it out myself.
Thanks to Anton Gogolev's statement that the path is in the pdb file, I realized it is possible.
One can do a binary search-and-replace on the pdb file, and replace the username with something else.
I quickly tried using this:
https://codereview.stackexchange.com/questions/3226/replace-sequence-of-strings-in-binary-file
and it worked (on 50% of the pdb files).
So mind the crap, that code-snippet in the link seems to be buggy.
But the concept seems to work.
I now use this code:
public static void SizeUnsafeReplaceTextInFile(string strPath, string strTextToSearch, string strTextToReplace)
{
byte[] baBuffer = System.IO.File.ReadAllBytes(strPath);
List<int> lsReplacePositions = new List<int>();
System.Text.Encoding enc = System.Text.Encoding.UTF8;
byte[] baSearchBytes = enc.GetBytes(strTextToSearch);
byte[] baReplaceBytes = enc.GetBytes(strTextToReplace);
var matches = SearchBytePattern(baSearchBytes, baBuffer, ref lsReplacePositions);
if (matches != 0)
{
foreach (var iReplacePosition in lsReplacePositions)
{
for (int i = 0; i < baReplaceBytes.Length; ++i)
{
baBuffer[iReplacePosition + i] = baReplaceBytes[i];
} // Next i
} // Next iReplacePosition
} // End if (matches != 0)
System.IO.File.WriteAllBytes(strPath, baBuffer);
Array.Clear(baBuffer, 0, baBuffer.Length);
Array.Clear(baSearchBytes, 0, baSearchBytes.Length);
Array.Clear(baReplaceBytes, 0, baReplaceBytes.Length);
baBuffer = null;
baSearchBytes = null;
baReplaceBytes = null;
} // End Sub ReplaceTextInFile
Replace firstname.lastname with something that has equally many characters, for example "Poltergeist".
Now I only need to figure out how to run the binary search and replace as a post-build action.