How to replace a list of words into a word document automatically? - dictionary

I have a custom Dictionary I have made in word and it is about 200 words. it contain the common errors that some people mistype and it didn't exist in the original word dictionary.
how to replace any word in my list in any document automatically. I mean whenever word see the wrong word it replace it with the one existed in my list.
I don't want to auto correct every single word every time I open a document.

I used to to something like this some years ago with Windows scripting host and OLE.
So today this would be kind of old fashioned but I bet it still works. However it requires you to have Word installed.
Since Word is remote controllable via COM, you could use nearly every programming language (C++ works natively but is a native nightmare) to remote control it. Then it would be easy to traverse thru and replace the words. Or invoke the search and replace function multiple times.
200 words can easily be held in your source code.
As the "included" feature Windows Scripting host is ok - you can choose between JScript and VB. Just create a .js file and by default it should be executed with WSH. But why don't you use macros (Essentially the same functionality - only seen thru the Word -GUI). But you may also use C# or Delphi....

Related

count words in word file that was uploaded

is there a way so i can count the words in a word file (all versions) in classic asp or asp.net?
what i need is to know how many words and if possible to make an array of word length and how many from each so words of 1,2,3 letters will get less attention from the code later.
i was thinking of using FSO or something like that but that won't work for docx
i can upload the file with aspupload or any other object if needed. if there is an object that can be bought that will upload and count words i don't have a problem purchasing it
thanks in advance
You have several options -
If you can have office installed on the server and don't require this to be an fast solution, you can try Word Interop. See Word count using Microsoft.Office.Interop.Word. A similar option is to have OpenOffice installed and work with that, never did that myself.
You can use the IFilter interface (http://msdn.microsoft.com/en-us/library/ms691105(v=vs.85).aspx). Microsoft already implemented logic to take Word files and give you access to the inner text, so all you'll have to do is count the words. Look at the first answer here Are IFilters necessary to index full text documents using Lucene.NET and the link it provides or How to extract text from MS office documents in C#. You can also look at http://blogs.msdn.com/b/jasonz/archive/2009/08/31/sample-parsing-content-in-c-using-ifilter.aspx
You can use 3rd party tools, I know there are some out there, but I'm not really familiar with any of them. For example see http://www.aspose.com/.net/word-component.aspx
If you don't really need support for ALL word versions, then there are various ways to work with Word 2007+ files - for example - the official openXML or the open source docx
Option (2) seems like the way to go to me.

Change Word 2013 autocorrect behaviour

This question involves bending Microsoft Word 2013 to one's will.
I have been asked to help fix a problem with Word 2013's autocorrect.
We are working on a spell checker for my native language (Afrikaans), and many Afrikaans words contain a diacritical/umlaut (ë, ö, Ü, etc).
The spell checker consists of a .dic file which is basically just a text file that contains about 508 000 words, and an autocorrect list (.acl) file that is used to automatically replace text as you type.
The spell checker works very well for the most part. It replaces the text as you type, which is the desired effect. The problem is that autocorrect doesn't work with all words.
For example, if I want to type the Afrikaans word 'pêrels' (which means 'pearls'), I should only have to type 'perels' (without the ^ character on the 'e'), and autocorrect should automatically change it to the correct form.
Same with 'reën' (rain). If I type 'reen' (without the umlaut), it is supposed to automatically correct it.
However, in both of the above cases, the words remain unchanged. A red line appears under the words, and when you right-click, you can select the correct word from the pop-up autocorrect menu as shown in the image below.
As you can see, the correct form of the word is the first one in the context menu. I need autocorrect to automatically change the wrong word into the first word that appears in said menu. It should completely ignore the other menu items, and just go with the first word.
My initial instinct was to manually add the words to the *.acl file using a text editor, but the file is encrypted and not readable (I used Notepad++).
I then tried adding them inside Word's autocorrect options menu. However, Word 2013 has a maximum autocorrect memory of 64KB, and the size of the file is already at that maximum. Whenever I add more words, it bombs out and basically wipes the file contents. This doesn't seem like the most efficient strategy anyway, since I would need to manually enter hundreds, if not thousands of autocorrect cases. Ain't nobody got time for that!
What makes this even more complicated (ironically), is that there is no real "program". In other words, this isn't a C# program with source code that I can manipulate. I have the two files mentioned above, and Word's built-in options (which I have already explored). That's it. Nothing else.
I'm stuck. Does anyone have any ideas?
Is it perhaps possible for me to hack Word to increase the autocorrect memory to, let's say, 128 KB? Google hasn't turned up anything of use.
Or, is there a way to set Word to not give the autocorrect context menu, and instead default to the first matching word in the dictionary, as mentioned above?
I can probably write a batch script, C# program, or edit the registry if need be. I just need to know where to start.
Thanks for any help!
In case you are still looking for a solution, you might consider using AutoHotkey (http://www.autohotkey.com). It is a very powerful free open-source utility, and can handle substitutions similar to AutoCorrect. Whenever the built-in program features of Word and others fail to handle my needs, I use AutoHotkey. It has the added benefit of not being tied to any specific program (e.g., Word), so the substitutions can occur anywhere needed. I hope it helps you. I have used and depended on AutoHotkey for years of new Windows versions, new Office versions, and highly recommend having a look. You might even get new ideas about time-saving automation with AutoHotkey. Good luck!

How to feed Word 2010 (.docx) documents/templates with data from MySQL database?

What would be the best approach to replace placeholders in a .docx document (Word 2010) with data coming from a MySQL database?
Can I just open the file using a server side language and do a string replace per each placeholder?
Is there any existing tool/library available?
Thanks
Disclosure: I work for Invantive.
Using Invantive Composition (http://www.invantive.com/products/invantive-composition) you can fill Word documents (letters, legal pleadings, insurancy policies) with data from a database (IBM DB2, Oracle, MySQL, Teradata and SQL Server) and then fully change the contents at will manually. It is intended for real Microsoft Word end-users (both the guys that make the template and the ones that use it) that access the databases through a central webservice and models with queries. Invantive Composition allows nested repeating groups of data and lay-out. Integrates into Microsoft Word using click once.
In the past, I personally have also been using JasperReports (http://community.jaspersoft.com/project/jasperreports-library) to generate letters using the RTF output target of JasperReports. It is free and works fine as long as you do not want to edit the output more than a few words and have Java/SQL development skills. Just as Invantive Composition it works fine for large numbers of different reports.
As long as you can control the environment completely, you can also consider using RTF as intermediate language (not for end-users, only real developers). Save document as RTF, replace parts of the text you need to be replacable, write a webservice that accepts the parameter and dumps back the resulting RTF. Takes some time to generate more complex tables (tables are obviously something invented by the human race after the RTF specification was written :-) This approach only works with very limited number of templates and when you have sufficient developer time available to get it up and running and stabilized.
As an independent reviewer, I have also seen cases where XML templates were used, but the results were not as good as with JasperReports.
**Disclosure: I lead the docx4j project **
There are heaps of existing tools/libraries available!
Yes, you can just do a string replace, but that is a brittle approach, since Word may have split the string across runs.
You can use MERGEFIELDs, or content control data binding.
docx4j supports all three approaches, but content control data binding is the most powerful.
ContentControlsMergeXML
MERGEFIELDs
VariableReplace
One thing to consider especially is "repeats". If you want say a row of a table in Word, for each matching row in your MySQL table, then you need a way to make this happen.
docx4j does this with a "repeat" content control around the table row; whichever solution you choose, I'd make sure up front that it can handle repeats.
If you want to use PHP the most complete available solution is PHPDocX.
You may check in the tutorial how to substitute placeholder variables by data coming from any data source (like a MySQL DB).
In particular, you may populate table rows with an indefinite number of entries and you may delete whole blocks of the Word document depending on the data fed to the application or build dynamical Word charts.
You may check the available DEMO for a simple but quite illustrative example (its inner workings are explained in the tutorial section).
You can use open Open XML SDK and replace your placeholders like this.
Disclosure: I lead the docxgenjs project
I think you shouldn't have to code everything by yourself, that's why I created a Mustache-like templating engine for docx
Demo:
http://javascript-ninja.fr/docxgenjs/examples/demo.html
Repo
https://github.com/edi9999/docxgenjs
It is JS-based and works client and server side.
Yes, you can use server side language to do it.
Check on apache POI.
http://poi.apache.org
Hello I read the above esp the comments and Ivantive looks impressive - but the solution I needed was much simpler. Use Selection.Range.InsertDatabase in Word to fetch records from an access database or excel spreadsheet or even just another word document. With the access solution you can choose the layout of the records to fetch and have it fetch just particular recordds based on a field (eg ID). Google the words above and it'll take you to MS guidance and an example VB script. Worked well in just a few mins. Now looking for VB script that asks the person what ID they want from the dbase and we're done.
it uses docx templates that have merge fields with java objects (the objects have the information you load from mysql or any other source). The xdoc report is an project for java language, the home page of the project is https://code.google.com/p/xdocreport/.
*Disclosure: I create the templ4docx project *
Hello
You can use templ4docx java library, which is on maven central repository, so you can just add it to your maven dependencies:
<dependency>
<groupId>pl.jsolve</groupId>
<artifactId>templ4docx</artifactId>
<version>2.0.0</version>
</dependency>
Example usage:
Docx docx = new Docx("E:\\template.docx");
Variables variables = new Variables();
variables.addTextVariable(new TextVariable("${firstName}", "John"));
variables.addTextVariable(new TextVariable("${lastName}", "Sky"));
docx.fillTemplate(variables);
docx.save("E:\\filledTemplate.docx");
More details you can find here: http://jsolve.github.io/java/templ4docx/

How to find all strings from a list of strings that do NOT appear in any of my source files?

I've got a bit of a messy DB. There are a lot of stored procedures that I strongly suspect are not used. I can easily get all their names in a text file, one name per line. Now I would like to search all through my code files to find which ones are mentioned and, more importantly, which ones are not.
How could I do such a thing? I'm using Windows 7, Visual Studio 2008 - if it matters.
Write your own small app (console app would do).
read your file, get the procedure-names inside a list, and then for each name search your code file, and log the name if you did not find the nam einside your code-files.
See System.IO.FileStream class for the easy (but not that fast) way of search a file.

Replace text in Word Document via ASP.NET

How can I replace a string/word in a Word Document via ASP.NET? I just need to replace a couple words in the document, so I would like to stay AWAY from 3rd party plugins & interop. I would like to do this by opening the file and replacing the text.
The following attempts were made:
I created a StreamReader and Writer to read the file but I think that I am reading and writing in the wrong format. I think that Word Documents are stored in binary?? If word documents are binary, how would I read and write the file in binary?
Dim template As String = Request.MapPath("documentName.doc")
If File.Exists(template) Then
Dim sr As New StreamReader(template)
Dim content As String = sr.ReadToEnd()
sr.Close()
Dim sw As New StreamWriter(template)
content = content.Replace("# T O D A Y S D A T E", Date.Now.ToString("MM/dd/yyyy"))
sw.Write(content)
sw.Close()
Else
Word binary format is proprietary to Microsoft. The specification to read the binary format is complex and will take you ages to learn about the document structure and the internal bit and byte structure. I really dont think you will save yourself anytime going down this path, so consider the below:
Use Open XML
Automate Word
Use third party library like Aspose
Use RTF rather than Doc. You can then look for specific RTF tag with your text and replace it with another set of RTF text block. This is probably the simplest for what you want to do if RTF is an acceptable format.
Personal experience, automating Word isn't as bad as it sounds. It is really not suitable for server high volume environment, but for smaller load, it works well of course if you write your code well to manage the application object and handling exceptions.
EDITED: Corrected about my initial NDA comment mentioned. This was the case when I worked on this back in 2005/6 and didnt realize Microsoft had decided to publish that in the recent year.
Lots of choices:
Some of them expensive (Apose)
Some of them hard (binary formats)
Some of them require Interop (VSTO)
or newer formats (Open XML)
Some of them not mentioned yet, like
running Word on the server and just
writing to that (not recommended by
MSFT, but probably your only real
choice for a) cheap, b) simple
OfficeWriter.
If word documents are binary, how would I read and write the file in binary?
They are, and that's why you should use a third party library to program against them.
I would like to stay AWAY from 3rd party plugins & interop
This requirement makes the task extremely hard. If your documents are in the "old Word format" (.doc), I will almost say that you are out of luck. If you can use Word 2007 documents (.docx) instead, you should be able to solve the problem by unzipping the file (it's essentially a ZIP archive), do search/replace in contained XML files and zip the document up again.
See also: Generating a Word Document with C#
You could perform Word automation on the server to easily do it, but that route is fraught with danger. Automation is not designed to run server side and you will find it regularly hangs when Word pop's up a prompt or confirmation box waiting for input that nobody can see.
You have to make a trade off, use Word automation and accept it may hang pretty regularly (anything from daily to weekly), or buy a third party solution. I use Aspose and it has solved a lot of problems.

Resources