How to feed Word 2010 (.docx) documents/templates with data from MySQL database? - docx

What would be the best approach to replace placeholders in a .docx document (Word 2010) with data coming from a MySQL database?
Can I just open the file using a server side language and do a string replace per each placeholder?
Is there any existing tool/library available?
Thanks

Disclosure: I work for Invantive.
Using Invantive Composition (http://www.invantive.com/products/invantive-composition) you can fill Word documents (letters, legal pleadings, insurancy policies) with data from a database (IBM DB2, Oracle, MySQL, Teradata and SQL Server) and then fully change the contents at will manually. It is intended for real Microsoft Word end-users (both the guys that make the template and the ones that use it) that access the databases through a central webservice and models with queries. Invantive Composition allows nested repeating groups of data and lay-out. Integrates into Microsoft Word using click once.
In the past, I personally have also been using JasperReports (http://community.jaspersoft.com/project/jasperreports-library) to generate letters using the RTF output target of JasperReports. It is free and works fine as long as you do not want to edit the output more than a few words and have Java/SQL development skills. Just as Invantive Composition it works fine for large numbers of different reports.
As long as you can control the environment completely, you can also consider using RTF as intermediate language (not for end-users, only real developers). Save document as RTF, replace parts of the text you need to be replacable, write a webservice that accepts the parameter and dumps back the resulting RTF. Takes some time to generate more complex tables (tables are obviously something invented by the human race after the RTF specification was written :-) This approach only works with very limited number of templates and when you have sufficient developer time available to get it up and running and stabilized.
As an independent reviewer, I have also seen cases where XML templates were used, but the results were not as good as with JasperReports.

**Disclosure: I lead the docx4j project **
There are heaps of existing tools/libraries available!
Yes, you can just do a string replace, but that is a brittle approach, since Word may have split the string across runs.
You can use MERGEFIELDs, or content control data binding.
docx4j supports all three approaches, but content control data binding is the most powerful.
ContentControlsMergeXML
MERGEFIELDs
VariableReplace
One thing to consider especially is "repeats". If you want say a row of a table in Word, for each matching row in your MySQL table, then you need a way to make this happen.
docx4j does this with a "repeat" content control around the table row; whichever solution you choose, I'd make sure up front that it can handle repeats.

If you want to use PHP the most complete available solution is PHPDocX.
You may check in the tutorial how to substitute placeholder variables by data coming from any data source (like a MySQL DB).
In particular, you may populate table rows with an indefinite number of entries and you may delete whole blocks of the Word document depending on the data fed to the application or build dynamical Word charts.
You may check the available DEMO for a simple but quite illustrative example (its inner workings are explained in the tutorial section).

You can use open Open XML SDK and replace your placeholders like this.

Disclosure: I lead the docxgenjs project
I think you shouldn't have to code everything by yourself, that's why I created a Mustache-like templating engine for docx
Demo:
http://javascript-ninja.fr/docxgenjs/examples/demo.html
Repo
https://github.com/edi9999/docxgenjs
It is JS-based and works client and server side.

Yes, you can use server side language to do it.
Check on apache POI.
http://poi.apache.org

Hello I read the above esp the comments and Ivantive looks impressive - but the solution I needed was much simpler. Use Selection.Range.InsertDatabase in Word to fetch records from an access database or excel spreadsheet or even just another word document. With the access solution you can choose the layout of the records to fetch and have it fetch just particular recordds based on a field (eg ID). Google the words above and it'll take you to MS guidance and an example VB script. Worked well in just a few mins. Now looking for VB script that asks the person what ID they want from the dbase and we're done.

it uses docx templates that have merge fields with java objects (the objects have the information you load from mysql or any other source). The xdoc report is an project for java language, the home page of the project is https://code.google.com/p/xdocreport/.

*Disclosure: I create the templ4docx project *
Hello
You can use templ4docx java library, which is on maven central repository, so you can just add it to your maven dependencies:
<dependency>
<groupId>pl.jsolve</groupId>
<artifactId>templ4docx</artifactId>
<version>2.0.0</version>
</dependency>
Example usage:
Docx docx = new Docx("E:\\template.docx");
Variables variables = new Variables();
variables.addTextVariable(new TextVariable("${firstName}", "John"));
variables.addTextVariable(new TextVariable("${lastName}", "Sky"));
docx.fillTemplate(variables);
docx.save("E:\\filledTemplate.docx");
More details you can find here: http://jsolve.github.io/java/templ4docx/

Related

MS Word template with loops, tables and charts

For our SaaS (LAMP) product reporting we are currently using JasperReports. We find it too cumbersome to develop reports with and the output in Word unworkable. Moreover, a couple of customers request to be able to develop simple reports themselves (to be used as mail merge). We would therefore like to develop templates right in Word. The idea is to have an application/webservice that would receive the Word template and JSON data from the LAMP application and return the filled-in report. The report has to support:
Loops inside content (repeating a document section several times while filling in array data)
Filling in tables (populating rows from array)
Filling in chart data in pre-created charts (from array)
This is the functionality we are using in JasperReports right now. Are there existing solutions to this? I've found quite a lot that can substitute simple variables, but no info about the the above three points. Will it be a lot of effort to write one from scratch? I would prefer a Windows OpenXML-based solution rather than a Linux PHPOffice-based one as I presume the former would handle the text split by spell-checker and language tags (though I'm not sure).
Windward and Docmosis are both commercial products that support the features you've listed and they are intended to be added to your application to provide reporting capabilities. Neither is are not OpenXML based. They can use Word documents as templates and perform the data merge into different output formats. Please note I work for Docmosis.
Aspose Words is another tool and it can populate a template but most of the power is through code rather than controls/directives in the template. Given your OpenXML thoughts, perhaps this is more what you are looking for.
More tools are recommended here in StackExchange.
I hope that helps.
ReportBox is a Web based reporting solution that can be used by any software application to generate documents and reports in Microsoft Word/ Excel/ PowerPoint/ HTML(DocX/Xlsx/PPTx/HTML) using OpenXML.
The process starts by building a Microsoft Word/ Excel/ PowerPoint/ HTML document as a template and uploading to ReportBox portal. Your application either sends data to ReportBox or ReportBox can pull data from your application database, which is then merged with the template to produce the finished report. Please note that I work for GreenThoughts.

Mailmerge in asp.net

How to do a mail merge in asp.net without installing word on the server?
any dlls or any components available?
Edits
The template document is already available. im not trying to create a word document. Just want to link the word document with the data.
Thanks
Personally, I would just look at using the System.Net.Mail class and its templating abilities. There is a nice library here: https://github.com/lukencode/FluentEmail which you can pass templates into and send emails that way with the data you require inserted into it.
EDIT: noticed you didn't actually specify whether it was print mailmerge or email, apologies if it is a print mailmerge you are trying to create, but for mass emailing with customized data in it, templating is definitely the way to go.
To accomplish the Word doc creation part of the question there is a previous thread about this: How can a Word document be created in C#?
To send the completed doc check out the System.Net.Mail namespace: http://msdn.microsoft.com/en-us/library/system.net.mail.aspx or if you can afford it I have had great experience with http://www.aspnetemail.com/.
We use Aspose.Words to perform mail merges from .net code. It's not cheap but once you get to grips with it it's very powerful.
Edit: I'm assuming you are looking to merge data from some sort of data store into a template word document which can be printed and distributed.
Another option is Docentric Toolkit. It is pure .NET and based on OpenXML without any dependency on MS Word, so it is a good fit for server side report generation.
Merging with data is done through placeholders, which get filled up with data at run time. Data can come from database or XML.
Templates are created in MS Word which needs Docentric Toolkit add-in installed (license is needed).
It is really easy to create templates and to merge them with data from .NET code.

Reporting platform for Asp.net - with excel/pdf/word export

I am looking for a reporting platform for our asp.net application, which will allow the report to be exported in excel (for tabular data), or PDF/Word (for document reports like Invoice prints).
Are there any standard options available?
I tried Rdlc, but it does not seem to help in the second case (at least I dint see a way, if you can please enlighten me :) ).
Currently we are using Interop for excel export (I know its not recommended for asp.net, we are planning to switch soon), use rtf templates for word reports (which also makes them somewhat customizable) and we dont have pdf export (planning to build it). But it seems like a waste of effort if standard controls are already available!
Cheaper the better! Free rocks!!
What's the issue with Rdlc? You can create any kind of format into it. For invoice prints etc you can use list data region. Its used for free flow kind of stuff. Its like ASP.NET repeater. In your case, you will have only one row of data.
Edit: even Crystal reports has equivalent functionality. As said, you will have only one row of data for invoices etc.
In both Crystal & RDLC, you can even supply multiple rows of data to your free flow report and generate multiple invoices in one go. Can be very helpful feature for users.

Excel Upload to database table

I'm looking for the best solution to allow our users to upload XLS spreadsheet so that they can be used to populate tables in our data warehouse (DW).
Our users are heavy Business Object (BO) users, and BO lets you export to XLS. When they have data in a spreadsheet that needs to be loaded to the DW, they need a process to upload the data in the XLS to the DW's db. As a result, we end up with many of these "interfaces" when I think that what we really need is a programmatic automated feed. Using Excel as a data source for inter-system feeds, in my gut, just seems like a bad idea to me.
Question #1: I'd like to see if you agree and why or why not.
OK, there is no swimming against that tide, so I now take as a given that XLS uploads are here to stay for us. Now I need to find the best solution. First, I'll explain what we do now and then what I don't like about it:
Via web pages, we provide empty XLS files (no rows) with a defined set of columns. Each file is intended to be used to update a different target dest table. In each spreadsheet is an "upload" button. Pushing the Upload button results in the macro in the spreadsheet serializing the contents of the file to CSV and FTPing the data to server folder. Periodically, a scheduler fires off an Informatica ETL job that uses the CSV file as input and loads the data into a custom XLS-specific staging table and then, if the records pass edits, into the appropriate target table. Any errors encountered are logged to an error table. For each XLS file uploaded, the data ends up in a separate staging and error table that is specific for the file.
Some of the things I don't like include about our process are:
1) The macro code in the XLS is too exposed, includes passwords for example, can be tampered with and there are issues ensuring that the users are using the latest XLS templates.
2) Business Rule edits are placed in the ETL program, where they should probably be, but because we would like to catch the errors ASAP, i.e, in the spreadsheet, edits are also added to the macro code. This results in duplication of business edits. I want these rules in one place and centrally controlled. IMHO, I think putting any macro code in the XLS introduces a maintenance issue, even calls to stored procedures (some of which we have) or calls to web services (we haven't yet tried to call .NET Web Services from XLS macros.)
3) Every XLS file upload template has its own process with distinct set of staging and error tables and a custom screen for reporting errors encountered. It seems like we need a more generalized re-usable solution.
Besides often getting data exported to XLS from BO, the users like also Excel because it is easier to edit a large number of records and less clunkier than editing individual records via a web interface.
This is the general direction that I am thinking:
First, I want the users to have the ease of editing of Excel with editing, but without including embedded macros in the spreadsheet. I experimented with Farpoint's Grid with Excel compatibility...
http://www.fpoint.com/netproducts/spreadweb/tour/excel.aspx
...and I found that it was quite easy to allow a user the ability to open up an XLS file that resides on their PC and have it open up in a browser and be able to easily access the data read from server-side .NET web code. Excel isn't running locally in their browser, but the functionality of Excel is reproduced, presumably through a lot of client sided scripting that I expect would be a real pain to duplicate myself. You can even cut and paste from a local spreadsheet into the web's spreadsheet. This sounds great, by biggest problem is cost. Our company is near death and won't allow us to purchase any new software.
Next, I want to identify the common components across all spreadsheet upload processing and come up with generic processing code. For example, I imagine a table which defines each of our spreadsheets and the format of each including the column names and data type definitions, perhaps in terms of their destination columns instead of hard coding. Based on this table template definition, I can generate XLS templates for download from this table definition. I can also perform simple generic edits to ensure that the data entered matches the table definition. And one common web page can be used to present the data and allow report data type mismatch errors and allow for the user to correct them. I would also define a common table for storing the data in a "staging" table, using a table with two columns, submission #, row num, name and value, perhaps. No more "custom everything" is the goal.
Next I need to decide where to put the business rules. My dept's mgt firmly believes that all loading of data should be done by Informatica ETL batch processes and therefore the rules/edits belong "in Informatica". I have zero experience with Informatica tools, I am more of a .NET guy. I am therefore unsure as to how these rules are implemented but I suspect that they are not reusable in the sense that they can be used by a .NET web page to validate a particular record against. You see, in some cases, when the user is not performing a bulk upload, they do have the ability to edit a specific record and I would like the same edits that were applied by the ETL bulk insert process to be applied to an individual update attempt to a single record via a web page. If the solution to write a single web service or stored procedure that can be called from either the web page doing an update of a single record or called thousands of times for each record in a bulk upload? The latter sounds inefficient.
Your thoughts on anything above would be very much welcomed.
From a cost perspective, the efforts you'll need to go through to re-create spreadsheet functionality on the web will exceed the cost of Farpoint or other controls. Even if you made $20 an hour, do you think you could complete a working product in under 2 weeks? I think you have the facts on your side when you discussed maintenance issues if you allow ETL functionality to exist in Excel - you have twice the amount of work to maintain the transformation rules. I think you need to convince management that in order to create a maintainable, robust solution you need some flexible utilities.
Farpoint is a good choice. There is also SpreadsheetGear that is a .Net engine that interprets Excel macros and can run on a web server. It has a Win32 control that allows you to create a WinForms solution with very Excel interface functionality. Last time I checked there was no web control for the product. It does an excellent job of providing Excel capability for processing large amounts of data.
Good luck. I think you will find a good solution since you seem to have a good grasp of the pro's and con's of all the different potential solutions.

Best way to create a search function ASP.NET and SQL server

I have an SQL database with multiple tables, and I am working on creating a searching feature. Other than having multiple queries for the different tables, is there a different way to go about said searching function?
I should probably add that a lot of my content is database driven to make upkeep easier. Lucene will not work for this, correct?
Different approaches to consider:
1) Multiple queries pre-baked, like you described.
2) Dynamic sql that you put together on the fly based on user-entered criteria.
3) If text is involved, based on SQL Server full text search or Lucene.
In my open source app BugTracker.NET, I do both 2 and 3 (using Lucene.NET).
I documented how I use Lucene.NET here:
http://www.ifdefined.com/blog/post/2009/02/Full-Text-Search-in-ASPNET-using-LuceneNET.aspx
Since you have tagged the question with Asp.net I suppose you want to search your webpages. In that case you can use Indexing Server to perform freetext searches easily that search the generated html and any keywords you have set up.
As Corey Trager suggested, using Lucene.NET is also an option. It has a good reputation of being fast and quite easy to use.
Although the other answers provide good suggestions such as using Lucene, I have much preferred using a custom caching method.
So for a website that I help create, we cached the searchable data every couple of hours, from many tables, into one simple table with columns such as:
URL
Item/Page Name
Main Keywords
Text Only Contents
Date Updated
I would then write my SQL statement to search this field using different functions to determin the rank.
You might want to check out this post i wrote on writing full text queries, its in C#, but its easilly portable, or just stick it in a library and use it as it.
How to build an SQL full text index search term in c#

Resources