c# Tiff files compression methodology - asp.net

I am trying to understand and implement a piece of code for Tiff compression.
I have already used 2 separate techniques - Using 3rd party dll's LibTiff.NEt (1st method is bulky) and the Image save method, http://msdn.microsoft.com/en-us/library/ytz20d80%28v=vs.110%29.aspx (2nd method works only on windows 7 machine but not on windows 2003 or 2008 server).
Now I am looking to explore this 3rd method.
using System.Windows.Forms;
using System.Windows.Media.Imaging;
using System.Drawing.Imaging;
int width = 800;
int height = 1000;
int stride = width/8;
byte[] pixels = new byte[height*stride];
// Try creating a new image with a custom palette.
List<System.Windows.Media.Color> colors = new List<System.Windows.Media.Color>();
colors.Add(System.Windows.Media.Colors.Red);
colors.Add(System.Windows.Media.Colors.Blue);
colors.Add(System.Windows.Media.Colors.Green);
BitmapPalette myPalette = new BitmapPalette(colors);
// Creates a new empty image with the pre-defined palette
BitmapSource image = BitmapSource.Create(
width,
height,
96,
96,
System.Windows.Media.PixelFormats.BlackWhite,
myPalette,
pixels,
stride);
FileStream stream = new FileStream(Original_File, FileMode.Create);
TiffBitmapEncoder encoder = new TiffBitmapEncoder();
encoder.Compression = TiffCompressOption.Ccitt4;
encoder.Frames.Add(BitmapFrame.Create(image));
encoder.Save(stream);
But I don't have a full understanding of what is happening here.
There is obviously some kind of a memory stream that the compression technique is being applied to. But I am a bit confused how to apply this to my specific case. I have an original tiff file, I want to use this method to set its compression to CCITT and save it back. Can anyone help?
I copied the above code and the code runs. But my end output file is a solid black background image. Although on the positive side it is of the correct compression type.
http://msdn.microsoft.com/en-us/library/ms616002%28v=vs.110%29.aspx
http://msdn.microsoft.com/en-us/library/system.windows.media.imaging.tiffcompressoption%28v=vs.100%29.aspx
http://social.msdn.microsoft.com/Forums/vstudio/en-US/1585c562-f7a9-4cfd-9674-6855ffaa8653/parameter-is-not-valid-for-compressionccitt4-on-windows-server-2003-and-2008?forum=netfxbcl

LibTiff.net is a little bulky because it's based off LibTiff, which has its own set of problems.
My company (Atalasoft) has the ability to do that fairly easily, and the free version of the SDK will do the task you want with a few restrictions. The code for re-encoding a file would look like this:
public bool ReencodeFile(string path)
{
AtalaImage image = new AtalaImage(path);
if (image.PixelFormat == PixelFormat.Pixel1bppIndexed)
{
TiffEncoder encoder = new TiffEncoder();
encoder.Compression = TiffCompression.Group4FaxEncoding;
image.Save(path, encoder, null); // destroys the original - use carefully
return true;
}
return false;
}
Things you should be aware of:
this code will only work properly on 1bpp images
this code will NOT work properly on multi-page TIFFs
this code does NOT preserve metadata within the original file
and I would want the code to at least check for that. If you are inclined to have a solution that better preserves what's in the content of the file, you would want to do this:
public bool ReencodeFile(string origPath, string outputPath)
{
if (origPath == outputPath) throw new ArgumentException("outputPath needs to be different from input path.");
TiffDocument doc = new TiffDocuemnt(origPath);
bool needsReencoding = false;
for (int i=0; i < doc.Pages; i++) {
if (doc.Pages[i].PixelFormat == PixelFormat.Pixel1bppIndexed) {
doc.Pages[i] = new TiffPage(new AtalaImage(origPath, i, null), TiffCompression.Group4FaxEncoding);
needsReencoding = true;
}
}
if (needsReendcoding)
doc.Save(outputPath);
return needsReencoding;
}
This solution will respect all pages within the document as well as document metadata.

Related

Annotation in pdfclown

I am trying to put a sticky note at some x,y location. For this i am using the pdfclown annotation class in .net.
Below is what is available.
using files = org.pdfclown.files;
public override bool Run()
{
files::File file = new files::File();
Document document = file.Document;
Populate(document);
Serialize(file, false, "Annotations", "inserting annotations");
return true;
}
private void Populate(Document document)
{
Page page = new Page(document);
document.Pages.Add(page);
PrimitiveComposer composer = new PrimitiveComposer(page);
StandardType1Font font = new StandardType1Font(document, StandardType1Font.FamilyEnum.Courier, true, false);
composer.SetFont(font, 12);
annotations::Note note = new annotations::Note(page, new Point(78, 658), "this is my annotation...");
note.IconType = annotations::Note.IconTypeEnum.Help;
note.ModificationDate = new DateTime();
note.IsOpen = true;
composer.Flush();
}
Link for annotation
This is putting a sticky note at 78, 658 cordinates in a blank pdf.
The problem is that i want that sticky note in a particular pdf which has some data. How can i modify it...thanks for the help..
I'm the author of PDF Clown -- this is the right way to insert an annotation like a sticky note into an existing page:
using org.pdfclown.documents;
using annotations = org.pdfclown.documents.interaction.annotations;
using files = org.pdfclown.files;
using System.Drawing;
. . .
// Open the PDF file!
using(files::File file = new files::File(#"C:\mypath\myfile.pdf"))
{
// Get the document (high-level representation of the PDF file)!
Document document = file.Document;
// Get, e.g., the first page of the document!
Page page = document.Pages[0];
// Insert your sticky note into the page!
annotations::Note note = new annotations::Note(page, new Point(78, 658), "this is my annotation...");
note.IconType = annotations::Note.IconTypeEnum.Help;
note.ModificationDate = new DateTime();
note.IsOpen = true;
// Save the PDF file!
file.Save(files::SerializationModeEnum.Incremental);
}
Please consider that there are lots of options about the way you can save your file (to an output (in-memory) stream, to a distinct path, as a compacted file, as an appended file...).
If you look at the 50+ samples accompanying the library's distribution, along with the API documentation, you can discover how expressive and powerful it is. Its architecture strictly adheres to the official Adobe PDF Reference 1.7.
enjoy!

Image resizing for image gallery on Tridion 2011

I'm currently working on a web site that will show kind a image gallery on some detail pages. It must show a navigation at the bottom with small thumbnail images and it must show per each element some basic information and the big image.
The big image must be resized too, because there is a maximun size allowed for them.
The point is to use just a source image per multimedia component and being able to resize the images on publishing time so, from the source image would be sent to the client browser a thumbnail and a big image. It's possible to show small and big images using just styles or HTML, but this is quite uneficient because the source (some of them really heavy) image is always sent to the customer.
My first thought was a custom code fragment, something written in C# but I find complicated to resize only some images to a certain size and then resize them again to another size too. I don't find the way to replace the SRC on the final HTML with the appropiate paths neither.
Another idea was to create an old-style PublishBinary method but I find this really complex because looks like the current Tridion architecture is not meant to do this at all...
And the most important point, even in case we can do the resizing succesfully (somehow) it's currently a Tridion 2011 issue to publish twice the same image. Both the big and the small version would came actually from the same multimedia component so shouldn't be possible to publish both of them or playing with the names, the first one would be allways gone, because the path would be updated with the second one :-S.
Any ideas?
I have built an image re-sizing TBB in the past which reads the output of a Dreamweaver or XSLT template. The idea is to produce a tag like the following with the first template.
<img src="tcm:1-123" maxWidth="250" maxHeight="400"
cropPosition="middle" variantId="250x400"
action="PostProcess" enlargeIfTooSmall="true"
/>
The "Re-Sizing" TBB then post processes the Output item in the package, looking for nodes with the PostProcess action.
It then creates a variant of the Multimedia Component using the System.Drawing library according to the maxHieght and maxWidth dimention attributes, and publishes it using the AddBinary() method #frank mentioned and using the variantId attribute for a filename prefix, and variant id (and replaces the SRC attribute with the URL of the new binary).
To make this 100% flexible, if either of the maxHeight or maxWidth attributes are set to 0, the TBB re-sizes based on just the "non-zero" dimension, or if both are set it crops the image based on the cropPosition attribute. This enables us to make sqare thumbnails for both landscape and portrait images without distorting them. The enlargeIfTooSmall attribute is use to prevent small images from being stretched to much.
You can see samples of the final galleries here: http://medicine.yale.edu/web/help/examples/specific/photo/index.aspx
and other image re-sizeing examples here:
http://medicine.yale.edu/web/help/examples/general/images.aspx
All of the images are just uploaded to the CMS once, and then re-sized and cropped on the fly at publish time.
Tridion can perfectly well publish multiple variants on a single MMC. When you call AddBinary you can specify that this binary is a variant of the MMC, with each variant being identified by a simple string that you specify.
public Binary AddBinary(
Stream content,
string filename,
StructureGroup location,
string variantId,
Component relatedComponent,
string mimeType
)
As you can see you can also specify the filename for the binary. If you do, it is your responsibility that variants have unique filenames and filenames between different MMCs remain unique. So typically, it is easiest to simply prefix or suffix the filename with some indication of the variantId: <MmcImageFileName>_thumbnail.jpg.
For a recent demo project, I took a completely different approach. The binaries are all published to a broker database. They are extracted from the broker with an HttpModule, which writes the binary data to the file system.
I made it possible to encode the desired width or height in the URL of the image (of course, for binaries that are not images this part of the logic will not work). The module then resizes the image on the fly (truly on the fly, not during publishing!) and writes the resized version to the disk.
For example: if I request /Images/photo.jpg, I will get the original image. If I request /Images/photo_h100.jpg, I get a version of 100 pixels high. The url /Images/photo_w150.jpg leads to a width of 150 pixels.
No variants needed, no republishing because of different size requirements either: resizing is completely done on demand! The performance penalty on the server is negligible: each size is generated only once, until the image is republished.
I used .NET, but of course it can work in Java as well.
Following the Frank's and Quirijn's answer you may be interested on resize the image in a Cartridge Claims processor using the Ambient Data Framework. This solution would be technology agnostic and can be re-used in both Java and .Net. You just need to put the resized image bytes in a Claim and then use it in Java or .Net.
Java Claims Processor:
public void onRequestStart(ClaimStore claims) throws AmbientDataException {
int publicationId = getPublicationId();
int binaryId = getBinaryId();
BinaryContentDAO bcDAO = (BinaryContentDAO)StorageManagerFactory.getDAO(publicationId, StorageTypeMapping.BINARY_CONTENT);
BinaryContent binaryContent = bcDAO.findByPrimaryKey(publicationId, binaryId, null);
byte[] binaryBuff = binaryContent.getContent();
pixelRatio = getPixelRatio();
int resizeWidth = getResizeWidth();
BufferedImage original = ImageIO.read(new ByteArrayInputStream(binaryBuff));
if (original.getWidth() < MAX_IMAGE_WIDTH) {
float ratio = (resizeWidth * 1.0f) / (float)MAX_IMAGE_WIDTH;
float width = original.getWidth() * ratio;
float height = original.getHeight() * ratio;
BufferedImage resized = new BufferedImage(Math.round(width), Math.round(height), original.getType());
Graphics2D g = resized.createGraphics();
g.setComposite(AlphaComposite.Src);
g.drawImage(original, 0, 0, resized.getWidth(), resized.getHeight(), null);
g.dispose();
ByteArrayOutputStream output = new ByteArrayOutputStream();
BinaryMeta meta = new BinaryMetaFactory().getMeta(String.format("tcm:%s-%s", publicationId, binaryId));
String suffix = meta.getPath().substring(meta.getPath().lastIndexOf('.') + 1);
ImageIO.write(resized, suffix, output);
binaryBuff = output.toByteArray();
}
claims.put(new URI("taf:extensions:claim:resizedimage"), binaryBuff);
}
.Net HTTP Handler:
public void ProcessRequest(HttpContext context) {
if (context != null) {
HttpResponse httpResponse = HttpContext.Current.Response;
ClaimStore claims = AmbientDataContext.CurrentClaimStore;
if (claims != null) {
Codemesh.JuggerNET.byteArray javaArray = claims.Get<Codemesh.JuggerNET.byteArray>("taf:extensions:claim:resizedimage");
byte[] resizedImage = javaArray.ToNative(javaArray);
if (resizedImage != null && resizedImage.Length > 0) {
httpResponse.Expires = -1;
httpResponse.Flush();
httpResponse.BinaryWrite(resizedImage);
}
}
}
}
Java Filter:
#Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
ClaimStore claims = AmbientDataContext.getCurrentClaimStore();
if (claims != null) {
Object resizedImage = claims.get(new URI("taf:extensions:claim:resizedimage"));
if (resizedImage != null) {
byte[] binaryBuff = (byte[])resizedImage;
response.getOutputStream().write(binaryBuff);
}
}
}

AS3: reading embedded metadata in flex

I saw that you can embed meta-data into images very much like you can in mp3s, here.
Can someone point me to a tutorial of how to embed and read this sort of information w/ photoshop and flex together?
I really wouldn't know where to start... Tried googling but I'm not sure I have the right keywords down.
Thanks!
I've written a small snippet on the matter. this snippet is far from being proper tested, and is most definite not written in a clear and coherent way. But for now it seems to work. I'll update as I work on it.
private function init(event:Event):void
{
var ldr:Loader = new Loader();
ldr.contentLoaderInfo.addEventListener(Event.COMPLETE, imgLoaded);
var s:String = "link/to/asset.jpg";
ldr.load(new URLRequest(s));
}
private function imgLoaded(e:Event):void{
var info:LoaderInfo = e.target as LoaderInfo;
var xmpXML:XML = getXMP(info.bytes);
//trace(xmpXML);
var meta:XMPMeta = new XMPMeta(xmpXML);
}
private function trim(s:String):String{
return s.replace( /^([\s|\t|\n]+)?(.*)([\s|\t|\n]+)?$/gm, "$2" );
}
private function getXMP(ba:ByteArray):XML{
var LP:ByteArray = new ByteArray();
var PACKET:ByteArray = new ByteArray();
var l:int;
ba.readBytes(LP, 2, 2);
/*
http://www.adobe.com/devnet/xmp.html
read part 3: Storage in Files.
that will explain the -2 -29 and other things you see here.
*/
l = LP.readInt() - 2 -29;
ba.readBytes(PACKET, 33, l);
var p:String = trim(""+PACKET);
var i:int = p.search('<x:xmpmeta xmlns:x="adobe:ns:meta/"');
/* Delete all in front of the XMP XML */
p = p.substr(i);
/*
For some reason this left some rubbish in front, so I'll hardcode it out for now
TODO clean up
*/
var ar:Array = p.split('<');
var s:String = "";
var q:int;
var j:int = ar.length;
for(q=1;q<j;q++){
s += '<'+ar[q];
}
i = s.search('</x:xmpmeta>');
i += ('</x:xmpmeta>').length;
s = s.slice(0,i);
/* Delete all behind the XMP XML */
return XML(s);
}
Originally from http://snipplr.com/view/51037/xmp-metadata-from-jpg/
Photoshop (CS4+ I think) can also add XMP headers (XML style) which will be easier to parse than bytes but it contains different information.
http://code.google.com/p/exif-as3/
Here is a class that should do the job. It is non-commercial only but there is another option.
www.ultrashock.com/forums/server-side/extracting-metadata-from-photos-86065.html
Here is a php script that will do it that could be ported to as3 - it might be easier than creating one from scratch. If you did want php to read the info I would use the built in exif functions :)
Well AS3 don't have a built-in class to read jpg header.
BUT, if you are loading the image using URLLoader you can use the ByteArray to read if manually.
You can find the spec here:
http://www.obrador.com/essentialjpeg/HeaderInfo.htm
If you need some tutorial of using Bytearray you can start from here:
How to convert bytearray to image or image to bytearray ?
or here:
http://digitalmedia.oreilly.com/pub/a/oreilly/digitalmedia/helpcenter/flex3cookbook/chapter8.html?page=7
The principle is the same -read the bytes, convert them to readable data using the spec above and use it.
Good luck!
Yes, entirely possible. ByteArray is your friend.
You may want to give a read to this:
http://www.anttikupila.com/flash/getting-jpg-dimensions-with-as3-without-loading-the-entire-file/
This may also be of use, but I'd rather go with the first option:
http://download.macromedia.com/pub/developer/xmp/sdk/XMPLibrary-v1.0.zip

Out Of Memory exception on System.Drawing.Image.FromFile()

I have an image uploader and cropper which creates thumbnails and I occasionally get an Out Of Memory exception on the following line:
Dim bm As Bitmap = System.Drawing.Image.FromFile(imageFile)
The occurance of the error is tiny and very rare, but I always like to know what might be causing it. The imageFile variable is just a Server.MapPath to the path of the image.
I was curious if anyone had experience this issue previously and if they had any ideas what might be causing it? Is it the size of the image perhaps?
I can post the code if necessary and any supporting information I have, but would love to hear people's opinions on this one.
It's worth knowing that OutOfMemoryException doesn't always really mean it's out of memory - particularly not when dealing with files. I believe it can also happen if you run out of handles for some reason.
Are you disposing of all your bitmaps after you're done with them? Does this happen repeatably for a single image?
If this wasn't a bad image file but was in fact the normal issue with Image.FromFile wherein it leaves file handles open, then the solution is use Image.FromStream instead.
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
using (Image original = Image.FromStream(fs))
{
...
Using an explicit Dispose(), a using() statement or setting the value to null on the bitmap doesn't solve the issue with Image.FromFile.
So if you App runs for a time and opens a lot of files consider using Image.FromStream() instead.
I hit the same issue today while creating Thumbnail images for a folder full of images. It turns out that the "Out Of Memory" occured exactly at the same point each time. When I looked at the folder with the images to be converted I found that the file that was creating the problem was thumbs.db. I added some code to make sure that only image files were being converted and the issue was resolved.
My code is basically
For Each imageFile as FileInfo in fileList
If imageFile.Extension = ".jpg" Or imageFile.Extension = ".gif" Then
...proceed with the conversion
End If
Next
Hope this helps.
Also check if you haven't opened the same file somewhere else. Apparently, when you open a file twice (even with File.Open()) OutOfMemoryException is thrown too...
Also you can open it in read mode, (if you want to use it in two place same time)
public Image OpenImage(string previewFile)
{
FileStream fs = new FileStream(previewFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
return Image.FromStream(fs);
}
This happens when the image file is corrupted. It is a bad error message, because memory has nothing to do with it. I haven;t worked out the coding, but a try/catch/finally will stop the program from abending.
I had a similar problem today when I was trying to resize an image and then crop it, what happened is I used this code to resize the image.
private static Image resizeImage(Image imgToResize, Size size)
{
int sourceWidth = imgToResize.Width;
int sourceHeight = imgToResize.Height;
float nPercent = 0;
float nPercentW = 0;
float nPercentH = 0;
nPercentW = ((float)size.Width / (float)sourceWidth);
nPercentH = ((float)size.Height / (float)sourceHeight);
if (nPercentH < nPercentW)
nPercent = nPercentH;
else
nPercent = nPercentW;
int destWidth = (int)(sourceWidth * nPercent);
int destHeight = (int)(sourceHeight * nPercent);
Bitmap b = new Bitmap(destWidth, destHeight);
Graphics g = Graphics.FromImage((Image)b);
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.DrawImage(imgToResize, 0, 0, destWidth, destHeight);
g.Dispose();
return (Image)b;
}
And then this code for the crop...
private static Image cropImage(Image img, Rectangle cropArea)
{
Bitmap bmpImage = new Bitmap(img);
Bitmap bmpCrop = bmpImage.Clone(cropArea,
bmpImage.PixelFormat);
return (Image)(bmpCrop);
}
Then this is how I called the above code...
Image img = Image.FromFile(#"C:\Users\****\Pictures\image.jpg");
img = ImageHandler.ResizeImage(img, new Size(400, 300));
img = ImageHandler.CropImage(img, new Rectangle(0, 25, 400, 250));
long quality = 90;
I kept getting errors on the crop part, the resizer worked fine!
Turns out, what was happening inside the resizer was throwing errors in the crop function. The resized calculations were making the actual dimensions of the image come out to be like 399 rather than 400 that I passed in.
So, when I passed in 400 as the argument for the crop, it was trying to crop a 399px wide image with a 400px width bmp and it threw the out of memory error!
Most of the above code was found on http://www.switchonthecode.com/tutorials/csharp-tutorial-image-editing-saving-cropping-and-resizing
If an image is an icon then different loading handling is required, like in next function:
public static Image loadImage(string imagePath)
{
Image loadedImage = null;
if (!File.Exists(imagePath)) return loadedImage;
try
{
FileInfo fileInfo = new FileInfo(imagePath);
if (fileInfo.Extension.Equals(".jpg") || fileInfo.Extension.Equals(".jpeg") ||
fileInfo.Extension.Equals(".bmp") || fileInfo.Extension.Equals(".png") ||
fileInfo.Extension.Equals(".gif"))
{
loadedImage = Image.FromFile(imagePath);
}
else if (fileInfo.Extension.Equals(".ico"))
{
Bitmap aBitmap = Bitmap.FromHicon(new
Icon(imagePath, new Size(200, 200)).Handle);
loadedImage = ImageFuncs.ResizeImage(aBitmap, new Size(30, 30));
}
}
catch (Exception eLocal)
{
MessageBox.Show(imagePath + " loading error: " + eLocal.Message);
}
return loadedImage;
}
I had the same problem with a utility I wrote to convert TIFF(s) to PDF(s). Often I would get the "out of memory" error on the same line as you.
System.Drawing.Image.FromFile(imageFile)
Then I discovered the error only happened when the file extension was ".tiff" and worked fine after I renamed it with an extension of ".tif"
I have had the same issue, before looking else where in the code wanted to make sure if I can open the Image with any Image viewer and figured out that the Image is corrupted/damaged though it's a .PNG file with 1KB size. Added a new Image in the same location, then It worked fine.
I am having same problem batch processing Tiff files. Most of the files aren't throwing an exception but few files are throwing "Out of Memory" exception in ASP.NET 4.0. I have used binary data to find out why just for few files and from within same folder. It can't be permission issue for ASP.NET ASPNET or NETWORK SERVICE account because other files are working file.
I have opened iTextSharp.text.Image class and found that there are many overloaded methods for GetInstance(). I have resolved my problem using following code: note: catch block will run for problematic files.
iTextSharp.text.Image image = null;
try
{
var imgStream = GetImageStream(path);
image = iTextSharp.text.Image.GetInstance(imgStream);
}
catch {
iTextSharp.text.pdf.RandomAccessFileOrArray ra = null;
ra = new iTextSharp.text.pdf.RandomAccessFileOrArray(path);
image = iTextSharp.text.pdf.codec.TiffImage.GetTiffImage(ra, 1);
if (ra != null)
ra.Close();
}
If you're serving from IIS, try recycling the Application Pool. This solved a similar image upload "Out of Memory" error for me.
I created a minimal form example that still gives me errors.
private void button1_Click(object sender, EventArgs e)
{
string SourceFolder = ImageFolderTextBox.Text;
string FileName = "";
DirectoryInfo Mydir = new DirectoryInfo(SourceFolder);
FileInfo[] JPEGS = Mydir.GetFiles("*.jpg");
for (int counter = 0; counter < JPEGS.Count(); counter++)
{
FileName = Mydir + "\\" + JPEGS[counter].Name;
//using (Image MyImage = System.Drawing.Image.FromFile(FileName))
using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
StatusBtn.BackColor = Color.Green;
}
}
}
I tried both the commented out line using Image.FromFile() as well as the line using FileStream(). Both produced file errors.
The Image.FromFile() error was:
System.OutOfMemoryException: 'Out of Memory'
The filestream() error was:
System.UnaurthorizedAccessException: 'Access to the path 'E:\DCIM\100Canon\dsc_7218.jpg' is denied.
I placed a Breakpoint just prior to the lines producing the error and I am able to open the image file using the Windows image viewer. I then closed the viewer and after I advanced to the next line and get the error, I can no longer view the image with the Windows viewer. Instead, I get a message that I do not have permission to access the file. I am able to delete the file.
This error is repeatable. I've done it over 10 times. Each time, after I get the error, I delete the file used for FileName.
All files were verified to be non-corrupt.
My original code that used Image.FromFile() worked fine when I compiled it 2 years ago. In fact, the .exe file runs just fine. I made a minor change somewhere else in the code and was surprised to find that the code would not compile without this error. I tried the FileStream() method based on the information on this page.

Fill a word document in asp.net?

I am working on Asp.Net project which needs to fill in a word document. My client provides a word template with last name, firstname, birth date,etc... . I have all those information in the sql database, and the client want the users of the application be able to download the word document with filled in information from the database.
What's the best way to archive this? Basically, I need identify those "fillable spot" in word document, fill those information in when the application user clicks on the download button.
If you can use Office 2007 the way to go is to use the Open XML API to format the documents:
http://support.microsoft.com/kb/257757. The reason you have to go that route is that you can't really use Word Automation in a server environment. (you CAN, but it's a huge pain to get working properly, and can EASILY break).
If you can't go the 2007 route, I've actually had pretty good success with just opening up a word template as a stream and finding and replacing the tokens and serving that to the user. This has actually worked surprisingly well in my experience and it's REALLY simple to implement.
I'm not sure about some of the ASP.Net aspects, but I am working on something similar and you might want to look into using an RTF instead. You can use pattern replacement in the RTF. For example you can add a tag like {USER_FIRST_NAME} in the RTF document. When the user clicks the download button, your application can take the information from the database and replace every instance of {USER_FIRST_NAME} with the data from the database. I am currently doing this with PHP and it works great. Word will open the RTF without a problem so that is another reason I chose this method.
I have used Aspose.Words for .NET. It's a little on the pricey side, but it works extremely well and the API is fairly intuitive for something that is potentially very complex.
If you want to pre-design your documents (or allow others to do that for you), anyone can put fields into the document. Aspose can open the document, find and fill the fields, and save a new filled-out copy for download.
Aspose works okay, but again: it's pricey.
Definitely avoid Office Automation in web apps as much as possible. It just doesn't scale well.
My preferred solution for this kind of problem is xml: specifically here I recommend WordProcessingML. You create an Xml document according to the schema, put a .doc extension on it, and MS Word will open it as if it were native in any version as far back as Office XP. This supports most Word features, and this way you can safely reduce the problem to replacing tokens in a text stream.
Be careful googling for more information on this: there's a lot of confusion between this and new Xml-based format for Office 2007. They're not the same thing.
This code works for WordMl text boxes and checkboxes. It's index based, so just pass in an array of strings for all textboxes and an array of bool's for all checkboxes.
public void FillInFields(
Stream sourceStream,
Stream destinationStream,
bool[] pageCheckboxFields,
string[] pageTextFields
) {
StreamUtil.Copy(sourceStream, destinationStream);
sourceStream.Close();
destinationStream.Seek(0, SeekOrigin.Begin);
Package package = Package.Open(destinationStream, FileMode.Open, FileAccess.ReadWrite);
Uri uri = new Uri("/word/document.xml", UriKind.Relative);
PackagePart packagePart = package.GetPart(uri);
Stream documentPart = packagePart.GetStream(FileMode.Open, FileAccess.ReadWrite);
XmlReader xmlReader = XmlReader.Create(documentPart);
XDocument xdocument = XDocument.Load(xmlReader);
List<XElement> textBookmarksList = xdocument
.Descendants(w + "fldChar")
.Where(e => (e.AttributeOrDefault(w + "fldCharType") ?? "") == "separate")
.ToList();
var textBookmarks = textBookmarksList.Select(e => new WordMlTextField(w, e, textBookmarksList.IndexOf(e)));
List<XElement> checkboxBookmarksList = xdocument
.Descendants(w + "checkBox")
.ToList();
IEnumerable<WordMlCheckboxField> checkboxBookmarks = checkboxBookmarksList
.Select(e => new WordMlCheckboxField(w, e, checkboxBookmarksList.IndexOf(e)));
for (int i = 0; i < pageTextFields.Length; i++) {
string value = pageTextFields[i];
if (!String.IsNullOrEmpty(value))
SetWordMlElement(textBookmarks, i, value);
}
for (int i = 0; i < pageCheckboxFields.Length; i++) {
bool value = pageCheckboxFields[i];
SetWordMlElement(checkboxBookmarks, i, value);
}
PackagePart newPart = packagePart;
StreamWriter streamWriter = new StreamWriter(newPart.GetStream(FileMode.Create, FileAccess.Write));
XmlWriter xmlWriter = XmlWriter.Create(streamWriter);
if (xmlWriter == null) throw new Exception("Could not open an XmlWriter to 4311Blank-1.docx.");
xdocument.Save(xmlWriter);
xmlWriter.Close();
streamWriter.Close();
package.Flush();
destinationStream.Seek(0, SeekOrigin.Begin);
}
private class WordMlTextField {
public int? Index { get; set; }
public XElement TextElement { get; set; }
public WordMlTextField(XNamespace ns, XObject element, int index) {
Index = index;
XElement parent = element.Parent;
if (parent == null) throw new NicException("fldChar must have a parent.");
if (parent.Name != ns + "r") {
log.Warn("Expected parent of fldChar to be a run for fldChar at position '" + Index + "'");
return;
}
var nextSibling = parent.ElementsAfterSelf().First();
if (nextSibling.Name != ns + "r") {
log.Warn("Expected a 'r' element after the parent of fldChar at position = " + Index);
return;
}
var text = nextSibling.Element(ns + "t");
if (text == null) {
log.Warn("Expected a 't' element inside the 'r' element after the parent of fldChar at position = " + Index);
}
TextElement = text;
}
}
private class WordMlCheckboxField {
public int? Index { get; set; }
public XElement CheckedElement { get; set; }
public readonly XNamespace _ns;
public WordMlCheckboxField(XNamespace ns, XContainer checkBoxElement, int index) {
_ns = ns;
Index = index;
XElement checkedElement = checkBoxElement.Elements(ns + "checked").FirstOrDefault();
if (checkedElement == null) {
checkedElement = new XElement(ns + "checked", new XAttribute(ns + "val", "0"));
checkBoxElement.Add(checkedElement);
}
CheckedElement = checkedElement;
}
public static void Copy(Stream readStream, Stream writeStream) {
const int Length = 256;
Byte[] buffer = new Byte[Length];
int bytesRead = readStream.Read(buffer, 0, Length);
// write the required bytes
while (bytesRead > 0) {
writeStream.Write(buffer, 0, bytesRead);
bytesRead = readStream.Read(buffer, 0, Length);
}
readStream.Flush();
writeStream.Flush();
}
In general you are going to want to avoid doing Office automation on a sever, and Microsoft has even stated that it is a bad idea as well. However, the technique that I generally use is the Office Open XML that was noted by aquinas. It does take a bit of time to learn your way around the format, but it is well worth it once you do as you don't have to worry about some of the issues involved with Office automation (e.g. processes hanging).
Awhile back I answered a similar question to this that you might find useful, you can find it here.
If you need to do this in DOC files (as opposed to DOCX), then the OpenXML SDK won't help you.
Also, just want to add another +1 about the danger of automating the Office apps on servers. You will run into problems with scale - I guarantee it.
To add another reference to a third-party tool that can be used to solve your problem:
http://www.officewriter.com
OfficeWriter lets you control docs with a full API, or a template-based approach (like what your requirement is) that basically lets you open, bind, and save DOC and DOCX in scenarios like this with little code.
Could you not use Microsofts own InterOp Framework to utilise Word Functionality
See Here

Resources