Programmatically generating editable Word docs from ASP.NET? - asp.net

The purpose is to generate proposal documents that can manually be edited in Word after the fact, but before sending them out to the customers.
Much proposal content would be drawn from existing HTML website content (backing CMS) and also some custom (non-HTML) injection for certain scenarios. Of course the conditional logic could go into server-side ASP.NET to vary the content appropriately.
I'm open to 3rd-party tools if raw manipulation of the Word API is arduous. In fact a good 3rd party tool might be the answer.

Use the Aspose Words component for .Net.
Aspose Words Component Link
The component natively understands the Microsoft Word file format without having to install any Microsoft Office products on your application environment. You can then start from an existing word template or programatically build up an entire Microsoft Word document from scratch. The Word object model then allows you to export to doc / docx etc and save as a native Word file to wherever you required.
They have plenty of demos set up on their website.

I've not used any third-party tools before, as I've only ever written Office automation applications for PCs which already have Office installed.
Creating documents from scratch, or basing them on a template, is quite straightforward. With templates, you can define bookmarks and mail-merge fields to make finding and replacing document elements easier.
Here's a few things that you may find useful:
Named and Optional Arguments
The Word object model is reasonably easy to work with. VB.NET used to be easier to work with than C#: as the Office automation APIs were originally written with VB in mind, you could take advantage of optional parameters. In earlier versions of C#, you had to specify every argument in API calls, which was quite tedious. I understand that this has changed in Visual C# 2010:
How to: Use Named and Optional Arguments in Office Programming (C# Programming Guide)
http://msdn.microsoft.com/en-us/library/dd264738.aspx
Tutorials
I found these tutorials quite handy:
Automating Office Programs with VB.NET
http://www.xtremevbtalk.com/showthread.php?t=160433
VB.NET Office Automation FAQ
http://www.xtremevbtalk.com/showthread.php?t=160459
Understanding the Word Object Model from a .NET Developer's Perspective
http://msdn.microsoft.com/en-us/library/aa192495%28office.11%29.aspx
Early and Late binding
One point worth mentioning: late-binding is normally recommended against, but it can be very useful if you don't know what version of Office will be deployed on the application's host. Early-binding tends to operate faster, and has the advantage of intellisense in your IDE:
Using early binding and late binding in Automation
http://support.microsoft.com/kb/245115
Early vs. Late Binding
http://word.mvps.org/faqs/interdev/earlyvslatebinding.htm
Search and Replace
One thing to be aware of is that the find and replacement objects may not work as you would expect. Rather than searching the whole document, it searches just the main text. If you have text frames in the document, these will be ignored. Instead, you have to loop through all the StoryRanges, and search the content of each. Here's what I do in VB.NET to search the main text story and text frames:
Private Sub FindReplaceAll(ByVal objDoc As Object, ByVal strFind As String, ByVal strReplacement As String)
Dim rngStory As Object
For Each rngStory In objDoc.StoryRanges
Do
If rngStory.StoryType = wdMainTextStory Or rngStory.StoryType = wdTextFrameStory Then
With rngStory.Find
.Text = strFind
.Replacement.Text = strReplacement
.Wrap = wdFindContinue
.Execute(Replace:=wdReplaceAll)
End With
End If
rngStory = rngStory.NextStoryRange
Loop Until rngStory Is Nothing
Next rngStory
End Sub
StoryRanges Collection Object
http://msdn.microsoft.com/en-us/library/bb178940%28office.12%29.aspx

I have a long history regarding document generation and mail merge. In the old days we were using Office COM extensively even in server side (ASP) applications. In years we have learnt that this approach was causing many problems and today I’m always advocating against using Office COM (Word automation) in almost any scenario.
With the Microsoft’s introduction of Open XML SDK we managed to create a solid mail-merge component that was many times faster and much more robust than the solution(s) with Office COM. In my experience Open XML SDK allows a developer to create a solid solution, but it takes a lot of effort and time to make it useful and robust.
There are several good document generation/processing libraries on the market. We later ended up purchasing one and in my opinion creating your own solution (based on Open XML SDK or Office COM) simply never pays off.
Currently we are using Docentric Toolkit which is a general purpose document processing library and even better template-based/mail-merge toolkit for .NET. It allows template design in MS Word and then populating them with application data and producing final documents in different formats.

You can look into using XSL to generate some WordML.
This technique is definitely convoluted but gives you a lot power in your layout.

You don't need any 3rd party controls to create a Word document. From 2007 and onward Word can read html as a word document. You simply save any web page with the ".doc" extension and Word will sort it out.
Simply create your web page with whatever formatting you want then save it with a .doc extension.
I used HttpWebRequestto call the Url (with parmaters) to my page then used WebResponse and Stream to get my page into a buffer, then StreamReader and StreamWriter to save it to an actual document. I've then got my own custom function to download the file.
If anyone wants my code let me know

Related

MS Word template with loops, tables and charts

For our SaaS (LAMP) product reporting we are currently using JasperReports. We find it too cumbersome to develop reports with and the output in Word unworkable. Moreover, a couple of customers request to be able to develop simple reports themselves (to be used as mail merge). We would therefore like to develop templates right in Word. The idea is to have an application/webservice that would receive the Word template and JSON data from the LAMP application and return the filled-in report. The report has to support:
Loops inside content (repeating a document section several times while filling in array data)
Filling in tables (populating rows from array)
Filling in chart data in pre-created charts (from array)
This is the functionality we are using in JasperReports right now. Are there existing solutions to this? I've found quite a lot that can substitute simple variables, but no info about the the above three points. Will it be a lot of effort to write one from scratch? I would prefer a Windows OpenXML-based solution rather than a Linux PHPOffice-based one as I presume the former would handle the text split by spell-checker and language tags (though I'm not sure).
Windward and Docmosis are both commercial products that support the features you've listed and they are intended to be added to your application to provide reporting capabilities. Neither is are not OpenXML based. They can use Word documents as templates and perform the data merge into different output formats. Please note I work for Docmosis.
Aspose Words is another tool and it can populate a template but most of the power is through code rather than controls/directives in the template. Given your OpenXML thoughts, perhaps this is more what you are looking for.
More tools are recommended here in StackExchange.
I hope that helps.
ReportBox is a Web based reporting solution that can be used by any software application to generate documents and reports in Microsoft Word/ Excel/ PowerPoint/ HTML(DocX/Xlsx/PPTx/HTML) using OpenXML.
The process starts by building a Microsoft Word/ Excel/ PowerPoint/ HTML document as a template and uploading to ReportBox portal. Your application either sends data to ReportBox or ReportBox can pull data from your application database, which is then merged with the template to produce the finished report. Please note that I work for GreenThoughts.

How to feed Word 2010 (.docx) documents/templates with data from MySQL database?

What would be the best approach to replace placeholders in a .docx document (Word 2010) with data coming from a MySQL database?
Can I just open the file using a server side language and do a string replace per each placeholder?
Is there any existing tool/library available?
Thanks
Disclosure: I work for Invantive.
Using Invantive Composition (http://www.invantive.com/products/invantive-composition) you can fill Word documents (letters, legal pleadings, insurancy policies) with data from a database (IBM DB2, Oracle, MySQL, Teradata and SQL Server) and then fully change the contents at will manually. It is intended for real Microsoft Word end-users (both the guys that make the template and the ones that use it) that access the databases through a central webservice and models with queries. Invantive Composition allows nested repeating groups of data and lay-out. Integrates into Microsoft Word using click once.
In the past, I personally have also been using JasperReports (http://community.jaspersoft.com/project/jasperreports-library) to generate letters using the RTF output target of JasperReports. It is free and works fine as long as you do not want to edit the output more than a few words and have Java/SQL development skills. Just as Invantive Composition it works fine for large numbers of different reports.
As long as you can control the environment completely, you can also consider using RTF as intermediate language (not for end-users, only real developers). Save document as RTF, replace parts of the text you need to be replacable, write a webservice that accepts the parameter and dumps back the resulting RTF. Takes some time to generate more complex tables (tables are obviously something invented by the human race after the RTF specification was written :-) This approach only works with very limited number of templates and when you have sufficient developer time available to get it up and running and stabilized.
As an independent reviewer, I have also seen cases where XML templates were used, but the results were not as good as with JasperReports.
**Disclosure: I lead the docx4j project **
There are heaps of existing tools/libraries available!
Yes, you can just do a string replace, but that is a brittle approach, since Word may have split the string across runs.
You can use MERGEFIELDs, or content control data binding.
docx4j supports all three approaches, but content control data binding is the most powerful.
ContentControlsMergeXML
MERGEFIELDs
VariableReplace
One thing to consider especially is "repeats". If you want say a row of a table in Word, for each matching row in your MySQL table, then you need a way to make this happen.
docx4j does this with a "repeat" content control around the table row; whichever solution you choose, I'd make sure up front that it can handle repeats.
If you want to use PHP the most complete available solution is PHPDocX.
You may check in the tutorial how to substitute placeholder variables by data coming from any data source (like a MySQL DB).
In particular, you may populate table rows with an indefinite number of entries and you may delete whole blocks of the Word document depending on the data fed to the application or build dynamical Word charts.
You may check the available DEMO for a simple but quite illustrative example (its inner workings are explained in the tutorial section).
You can use open Open XML SDK and replace your placeholders like this.
Disclosure: I lead the docxgenjs project
I think you shouldn't have to code everything by yourself, that's why I created a Mustache-like templating engine for docx
Demo:
http://javascript-ninja.fr/docxgenjs/examples/demo.html
Repo
https://github.com/edi9999/docxgenjs
It is JS-based and works client and server side.
Yes, you can use server side language to do it.
Check on apache POI.
http://poi.apache.org
Hello I read the above esp the comments and Ivantive looks impressive - but the solution I needed was much simpler. Use Selection.Range.InsertDatabase in Word to fetch records from an access database or excel spreadsheet or even just another word document. With the access solution you can choose the layout of the records to fetch and have it fetch just particular recordds based on a field (eg ID). Google the words above and it'll take you to MS guidance and an example VB script. Worked well in just a few mins. Now looking for VB script that asks the person what ID they want from the dbase and we're done.
it uses docx templates that have merge fields with java objects (the objects have the information you load from mysql or any other source). The xdoc report is an project for java language, the home page of the project is https://code.google.com/p/xdocreport/.
*Disclosure: I create the templ4docx project *
Hello
You can use templ4docx java library, which is on maven central repository, so you can just add it to your maven dependencies:
<dependency>
<groupId>pl.jsolve</groupId>
<artifactId>templ4docx</artifactId>
<version>2.0.0</version>
</dependency>
Example usage:
Docx docx = new Docx("E:\\template.docx");
Variables variables = new Variables();
variables.addTextVariable(new TextVariable("${firstName}", "John"));
variables.addTextVariable(new TextVariable("${lastName}", "Sky"));
docx.fillTemplate(variables);
docx.save("E:\\filledTemplate.docx");
More details you can find here: http://jsolve.github.io/java/templ4docx/

Mailmerge in asp.net

How to do a mail merge in asp.net without installing word on the server?
any dlls or any components available?
Edits
The template document is already available. im not trying to create a word document. Just want to link the word document with the data.
Thanks
Personally, I would just look at using the System.Net.Mail class and its templating abilities. There is a nice library here: https://github.com/lukencode/FluentEmail which you can pass templates into and send emails that way with the data you require inserted into it.
EDIT: noticed you didn't actually specify whether it was print mailmerge or email, apologies if it is a print mailmerge you are trying to create, but for mass emailing with customized data in it, templating is definitely the way to go.
To accomplish the Word doc creation part of the question there is a previous thread about this: How can a Word document be created in C#?
To send the completed doc check out the System.Net.Mail namespace: http://msdn.microsoft.com/en-us/library/system.net.mail.aspx or if you can afford it I have had great experience with http://www.aspnetemail.com/.
We use Aspose.Words to perform mail merges from .net code. It's not cheap but once you get to grips with it it's very powerful.
Edit: I'm assuming you are looking to merge data from some sort of data store into a template word document which can be printed and distributed.
Another option is Docentric Toolkit. It is pure .NET and based on OpenXML without any dependency on MS Word, so it is a good fit for server side report generation.
Merging with data is done through placeholders, which get filled up with data at run time. Data can come from database or XML.
Templates are created in MS Word which needs Docentric Toolkit add-in installed (license is needed).
It is really easy to create templates and to merge them with data from .NET code.

Replace text in Word Document via ASP.NET

How can I replace a string/word in a Word Document via ASP.NET? I just need to replace a couple words in the document, so I would like to stay AWAY from 3rd party plugins & interop. I would like to do this by opening the file and replacing the text.
The following attempts were made:
I created a StreamReader and Writer to read the file but I think that I am reading and writing in the wrong format. I think that Word Documents are stored in binary?? If word documents are binary, how would I read and write the file in binary?
Dim template As String = Request.MapPath("documentName.doc")
If File.Exists(template) Then
Dim sr As New StreamReader(template)
Dim content As String = sr.ReadToEnd()
sr.Close()
Dim sw As New StreamWriter(template)
content = content.Replace("# T O D A Y S D A T E", Date.Now.ToString("MM/dd/yyyy"))
sw.Write(content)
sw.Close()
Else
Word binary format is proprietary to Microsoft. The specification to read the binary format is complex and will take you ages to learn about the document structure and the internal bit and byte structure. I really dont think you will save yourself anytime going down this path, so consider the below:
Use Open XML
Automate Word
Use third party library like Aspose
Use RTF rather than Doc. You can then look for specific RTF tag with your text and replace it with another set of RTF text block. This is probably the simplest for what you want to do if RTF is an acceptable format.
Personal experience, automating Word isn't as bad as it sounds. It is really not suitable for server high volume environment, but for smaller load, it works well of course if you write your code well to manage the application object and handling exceptions.
EDITED: Corrected about my initial NDA comment mentioned. This was the case when I worked on this back in 2005/6 and didnt realize Microsoft had decided to publish that in the recent year.
Lots of choices:
Some of them expensive (Apose)
Some of them hard (binary formats)
Some of them require Interop (VSTO)
or newer formats (Open XML)
Some of them not mentioned yet, like
running Word on the server and just
writing to that (not recommended by
MSFT, but probably your only real
choice for a) cheap, b) simple
OfficeWriter.
If word documents are binary, how would I read and write the file in binary?
They are, and that's why you should use a third party library to program against them.
I would like to stay AWAY from 3rd party plugins & interop
This requirement makes the task extremely hard. If your documents are in the "old Word format" (.doc), I will almost say that you are out of luck. If you can use Word 2007 documents (.docx) instead, you should be able to solve the problem by unzipping the file (it's essentially a ZIP archive), do search/replace in contained XML files and zip the document up again.
See also: Generating a Word Document with C#
You could perform Word automation on the server to easily do it, but that route is fraught with danger. Automation is not designed to run server side and you will find it regularly hangs when Word pop's up a prompt or confirmation box waiting for input that nobody can see.
You have to make a trade off, use Word automation and accept it may hang pretty regularly (anything from daily to weekly), or buy a third party solution. I use Aspose and it has solved a lot of problems.

How do you use Excel server-side?

A client wants to "Web-enable" a spreadsheet calculation -- the user to specify the values of certain cells, then show them the resulting values in other cells.
(They do NOT want to show the user a "spreadsheet-like" interface. This is not a UI question.)
They have a huge spreadsheet with lots of calculations over many, many sheets. But, in the end, only two things matter -- (1) you put numbers in a couple cells on one sheet, and (2) you get corresponding numbers off a couple cells in another sheet. The rest of it is a black box.
I want to present a UI to the user to enter the numbers they want, then I'd like to programatically open the Excel file, set the numbers, tell it to re-calc, and read the result out.
Is this possible/advisable? Is there a commercial component that makes this easier? Are their pitfalls I'm not considering?
(I know I can use Office Automation to do this, but I know it's not recommended to do that server-side, since it tries to run in the context of a user, etc.)
A lot of people are saying I need to recreate the formulas in code. However, this would be staggeringly complex.
It is possible, but not advisable (and officially unsupported).
You can interact with Excel through COM or the .NET Primary Interop Assemblies, but this is meant to be a client-side process.
On the server side, no display or desktop is available and any unexpected dialog boxes (for example) will make your web app hang – your app will behave flaky.
Also, attaching an Excel process to each request isn't exactly a low-resource approach.
Working out the black box and re-implementing it in a proper programming language is clearly the better (as in "more reliable and faster") option.
Related reading: KB257757: Considerations for server-side Automation of Office
You definitely don't want to be using interop on the server side, it's bad enough using it as a kludge on the client side.
I can see two options:
Figure out the spreadsheet logic. This may benefit you in the long term by making the business logic a known quantity, and in the short term you may find that there are actually bugs in the spreadsheet (I have encountered tons of monster spreadsheets used for years that turn out to have simple bugs in them - everyone just assumed the answers must be right)
Evaluate SpreadSheetGear.NET, which is basically a replacement for interop that does it all without Excel (it replicates a huge chunk of Excel's non-visual logic and IO in .NET)
Although this is certainly possible using ASP.NET, it's very inadvisable. It's un-scalable and prone to concurrency errors.
Your best bet is to analyze the spreadsheet calculations and duplicate them. Now, granted, your business is not going to like the time it takes to do this, but it will (presumably) give them a more usable system.
Alternatively, you can simply serve up the spreadsheet to users from your website, in which case you do almost nothing.
Edit: If your stakeholders really insist on using Excel server-side, I suggest you take a good hard look at Excel Services as #John Saunders suggests. It may not get you everything you want, but it'll get you quite a bit, and should solve some of the issues you'll end up with trying to do it server-side with ASP.NET.
That's not to say that it's a panacea; your mileage will certainly vary. And Sharepoint isn't exactly cheap to buy or maintain. In fact, short-term costs could easily be dwarfed by long-term costs if you go the Sharepoint route--but it might the best option to fit a requirement.
I still suggest you push back in favor of coding all of your logic in a separate .NET module. That way you can use it both server-side and client-side. Excel can easily pass calculations to a COM object, and you can very easily publish your .NET library as COM objects. In the end, you'd have a much more maintainable and usable architecture.
Neglecting the discussion whether it makes sense to manipulate an excel sheet on the server-side, one way to perform this would probably look like adopting the
Microsoft.Office.Interop.Excel.dll
Using this library, you can tell Excel to open a Spreadsheet, change and read the contents from .NET. I have used the library in a WinForm application, and I guess that it can also be used from ASP.NET.
Still, consider the concurrency problems already mentioned... However, if the sheet is accessed unfrequently, why not...
The simplest way to do this might be to:
Upload the Excel workbook to Google Docs -- this is very clean, in my experience
Use the Google Spreadsheets Data API to update the data and return the numbers.
Here's a link to get you started on this, if you want to go that direction:
http://code.google.com/apis/spreadsheets/overview.html
Let me be more adamant than others have been: do not use Excel server-side. It is intended to be used as a desktop application, meaning it is not intended to be used from random different threads, possibly multiple threads at a time. You're better off writing your own spreadsheet than trying to use Excel (or any other Office desktop product) form a server.
This is one of the reasons that Excel Services exists. A quick search on MSDN turned up this link: http://blogs.msdn.com/excel/archive/category/11361.aspx. That's a category list, so contains a list of blog posts on the subject. See also Microsoft.Office.Excel.Server.WebServices Namespace.
It sounds like you're talking that the user has the spreadsheet open on their local system, and you want a web site to manipulate that local spreadsheet?
If that's the case, you can't really do that. Even Office automation won't help, unless you want to require them to upload the sheet to the server and download a new altered version.
What you can do is create a web service to do the calculations and add some vba or vsto code to the Excel sheet to talk to that service.

Resources