Change Word 2013 autocorrect behaviour - dictionary

This question involves bending Microsoft Word 2013 to one's will.
I have been asked to help fix a problem with Word 2013's autocorrect.
We are working on a spell checker for my native language (Afrikaans), and many Afrikaans words contain a diacritical/umlaut (ë, ö, Ü, etc).
The spell checker consists of a .dic file which is basically just a text file that contains about 508 000 words, and an autocorrect list (.acl) file that is used to automatically replace text as you type.
The spell checker works very well for the most part. It replaces the text as you type, which is the desired effect. The problem is that autocorrect doesn't work with all words.
For example, if I want to type the Afrikaans word 'pêrels' (which means 'pearls'), I should only have to type 'perels' (without the ^ character on the 'e'), and autocorrect should automatically change it to the correct form.
Same with 'reën' (rain). If I type 'reen' (without the umlaut), it is supposed to automatically correct it.
However, in both of the above cases, the words remain unchanged. A red line appears under the words, and when you right-click, you can select the correct word from the pop-up autocorrect menu as shown in the image below.
As you can see, the correct form of the word is the first one in the context menu. I need autocorrect to automatically change the wrong word into the first word that appears in said menu. It should completely ignore the other menu items, and just go with the first word.
My initial instinct was to manually add the words to the *.acl file using a text editor, but the file is encrypted and not readable (I used Notepad++).
I then tried adding them inside Word's autocorrect options menu. However, Word 2013 has a maximum autocorrect memory of 64KB, and the size of the file is already at that maximum. Whenever I add more words, it bombs out and basically wipes the file contents. This doesn't seem like the most efficient strategy anyway, since I would need to manually enter hundreds, if not thousands of autocorrect cases. Ain't nobody got time for that!
What makes this even more complicated (ironically), is that there is no real "program". In other words, this isn't a C# program with source code that I can manipulate. I have the two files mentioned above, and Word's built-in options (which I have already explored). That's it. Nothing else.
I'm stuck. Does anyone have any ideas?
Is it perhaps possible for me to hack Word to increase the autocorrect memory to, let's say, 128 KB? Google hasn't turned up anything of use.
Or, is there a way to set Word to not give the autocorrect context menu, and instead default to the first matching word in the dictionary, as mentioned above?
I can probably write a batch script, C# program, or edit the registry if need be. I just need to know where to start.
Thanks for any help!

In case you are still looking for a solution, you might consider using AutoHotkey (http://www.autohotkey.com). It is a very powerful free open-source utility, and can handle substitutions similar to AutoCorrect. Whenever the built-in program features of Word and others fail to handle my needs, I use AutoHotkey. It has the added benefit of not being tied to any specific program (e.g., Word), so the substitutions can occur anywhere needed. I hope it helps you. I have used and depended on AutoHotkey for years of new Windows versions, new Office versions, and highly recommend having a look. You might even get new ideas about time-saving automation with AutoHotkey. Good luck!

Related

VSCode : mvbasic extension on editing Unidata code with MV marks in code, ie CHAR(253), CHAR(254)

I have searched for a setting within the mvbasic extension within VSCode but I may have hit a dead end. I am new to using VSCode with the rocket mvbasic extension and still in the learning process, so please bear with me.
Our development for the most part has always been directly on the server using the editor within it to code and develop on a Unix/Aix platform with Unidata. Some of our code has array assignments with CHAR(253)/CHAR(254) characters within them. See the link to the image that shows how its done. Now I didn't do this code, the original software developer did this many many years ago and we just aren't going to go and change it all.
How code looks on actual server
The issue is when pulling the code to edit in VSCode, the extension is changing it, and I uploaded it back and didn't pay attention and it was implemented in our production incorrectly, which created a few bugs.
ALIST="H�V�P�R�M�D"
How code looks in VSCode
How code looks after uploaded back to server from VSCode
Easy to fix, no biggie, but now to my question.
Does anyone have this issue, or has a direction to point me into that maybe I need to create a setting to keep the characters in the correct ASCII format so that this doesn't happen again by mistake?
VSCode defaults to the sane choice for character encoding in 2022: utf-8, but sometimes you have to deal with legacy stuff.
https://code.visualstudio.com/docs/editor/codebasics#_file-encoding-support
If you click on the UTF-8 in the bottom right corner you can choose "Reopen with Encoding":
After that, you can select a different encoding. I chose DOS (CP437) at a guess and literal MV characters are displayed as superscript 2 (²), and for me I can save to the server and confirm those characters remain as #VM after a round trip (though for my terminal emulator they appear as } which is useful).
You can edit preferences and set "files.encoding": "cp437". One other thing that can be helpful if your programs don't have a standard extension (like .bas) as most don't is to set the default mode to basic so most of what you're editing will identify as MVbasic, and you can do a quick CTRL-K M to switch to any other modes if you're just pasting in something else like SQL.
Some useful links - the Rocket forums are helpful and the folks there are always super nice
https://community.rocketsoftware.com/forums/multivalue?CommunityKey=521bce2e-71d5-4d32-b560-dfa95e950eb5
The MV Extensions Community extension is a good group and always has been helpful when I've had issues. I've made some small contributions - they're very open. I prefer this extension, but honestly haven't done a deep comparison.
https://github.com/mvextensions

Is there a way in Windows 10 to convert a hexadecimal code to its symbol regardless of the program?

I've read many pages that point out that many office applications allow for this by typing the code followed by Alt + X, but frequently, I want to insert a symbol when I'm not in one of those applications. Is there a universal way to achieve this?
The character map is useless, unless you have time to manually search through all the characters available.
I posted the question at Super User, and basically, the response I got there was to use Alt codes for the symbols. However, I discovered that, on the whole, these only work for the first 256 Alt codes. So basically, the answer to my question is "No, there's not a good way."

How to replace a list of words into a word document automatically?

I have a custom Dictionary I have made in word and it is about 200 words. it contain the common errors that some people mistype and it didn't exist in the original word dictionary.
how to replace any word in my list in any document automatically. I mean whenever word see the wrong word it replace it with the one existed in my list.
I don't want to auto correct every single word every time I open a document.
I used to to something like this some years ago with Windows scripting host and OLE.
So today this would be kind of old fashioned but I bet it still works. However it requires you to have Word installed.
Since Word is remote controllable via COM, you could use nearly every programming language (C++ works natively but is a native nightmare) to remote control it. Then it would be easy to traverse thru and replace the words. Or invoke the search and replace function multiple times.
200 words can easily be held in your source code.
As the "included" feature Windows Scripting host is ok - you can choose between JScript and VB. Just create a .js file and by default it should be executed with WSH. But why don't you use macros (Essentially the same functionality - only seen thru the Word -GUI). But you may also use C# or Delphi....

Interpreting Search Results

I am tasked with writing a program that, given a search term and the HTML source of a page representing search results of some unknown search engine (it can really be anything, a blog, a shop, Google, eBay, ...), needs to build a data structure of the results containing "what's in the results": a title for earch result, the "details" link, the position within the results etc. It is not known whether the results page contains any of the data at all, and whether there are any search results. The goal is to feed the data structure into another program that extracts meaning.
What I am looking for is not BeautifulSoup or a RegExp but rather some clever ideas or algorithms on how to interpret the HTML source. What do I do to find out what part of the page constitutes a single result item? How do I filter the markup noise to extract the important bits? What would you do? Pointers to fields of research covering what I try to to are aly greatly appreciated.
Thanks, Simon
I doubt that there exist a silver-bullet algorithm that without any training will just work on any arbitrary search query output.
However, this task can be solved and is actually solved in many applications, but with different approach. First you have to define general structure of single search result item based on what you actually going to do with it (it could be name, date, link, description snippet, etc.), and then write number of html parsers that will extract necessary necessary fields from search result output of particular web sites.
I know it is not super sexy solution, but it probably the only one that works. And it is not rocket science. Writing parsers is actually extremly simple, you can make dozen per day. If you will look into html source of search result, you will notice that output results are typically very structured and marked with specific div sections or class atributes, so it is very easy to find it in the document. You dont have even use any complicated HTML parsing library for that, something grep-like will be enough.
For example, on this particular page your question starts with <div class="post-text"> and ends with </div>. Everything in between is actually a post text with some HTML formatting that you may want to remove along with extra spaces and "\n". And this <div class="post-text"> appears on the page only once.
Once you go at large scale with your retrieval applicaiton, you will find out that there is not that big variety of different search engines on different sites, and you will be able to re-use already created parsers for sties using similar search engines.
The only thing you have to remember is built-in self-testing. Sites tend to upgrade and change design from time to time. If your application is going to live for some time, you will need to include into your parsers some logic that will check validity of their results and notify you every time search output has changed and is not compatible anymore with your parser. Then you will have to modify particular parser or write new one.
Hope this helps.

How to determine a text block of a file in one version come from which file in the previous version?

The problem is described below:
Suppose I have a list of files in one version(say A,B,C,D). In the next version I have the following files(A,E,F,G). There are some similarities in their contents. The files in the later version comes from the previous version by file name renaming, content addition, deletion or partial modification or without any change( for example A is not changed).
I take a block of text from a file(E, 2nd version) and check which files(in the 1st version) contain this text block. I found that B,C and D contain the text fragment. I want to determine from which file(B or c or d) this text block actually comes from.(I assume that E is a file whose name change in the second version).
Since the contents may be changed, added or deleted in the later version, so in order to determine similarity I use LCS algorithm. But I cannot map the file with its previous version.
I think one possible approach might be to use the location information of the match text blocks. But this heuristics not always work. Is there any research or algorithm exist to find so. Any direction will be helpful. Thanks in advance.
I think it may be helpful to take a look at Subversion, and its capability to track file renaming between versions. http://svnbook.red-bean.com/
It's tried and tested, because it's used by so many developers. Renaming has to occur by using subversion tools though, but there are many (command line, file explorer integration for different OS, GUIs, IDEs, you name it). It also covers moving files between directories, and merging several lines of changes (branches).

Resources