Comment in aspell .dic files? - dictionary

They look like this:
abaft
abbreviation/M
abdicate/DNGSn
Abelard/M
abider/M
Abidjan
ablaze
abloom
I am using this kind of dictionary with Node.js application, but I will need it to be smarter. Specifically, I want to remember occurrence probability of every word based on already processed text. I'd like to save this information in existing .dic file - but how to do that without making it invalid?
Is there any comment syntax that would allow me to store additional data next to the words in file? Such that normal dictionary parser will ignore it?

Related

Open a XML file not knowing the complete name and parse xml

I am using robot framework with RIDE, and for a test I need to find a XML file on my computer and open it to parse the xml and be able to use the datas.
The thing is that I don't know the exact name of the file; the format is numberNameOfTheFile, so it could be 1NameOfTheFile or 25NameOfTheFile.
How can I use regexp in my keyword? Or any other way to achieve this?
Thank you
How would you do it manually - how would you pick the file to use for the verification?
I presume, you are going to look at all the files that are matching a specific name pattern; in Robot Framework you can do that with OperatingSystem's List Files In Directory keyword, which supports passing a name pattern:
${the files}= List Files In Directory /the/path/to/the/dir *NameOfTheFile.xml
Now you have a list object with the filenames that match; if it's empty - there's no such file, which may be a problem (depends on your test/reqs, I don't know). If it has a single member - great, that's your file.
And if there are multiple files - that's another "problem". How would you pick the right file manually? It could be that the newest file is the target one - for that you would go over all of them and find the one through OperatingSystem's Get Modified Tume; or it can be the largest; or the number in its suffix would be the biggest. This really depends on your requirements, and what you are trying to achieve.
"How would you do it manually" is probably the most important question to ask. Think and break down to steps the individual tasks you would do, and now you have the algorithm; see how to put that in code - and presto, the implementation. This applies to scripts, test cases, and business process automation (e.g. software).
I was tempted to mark the question for closing, because precisely this - the algorithm - was missing, only the end goal is stated - while SO is for helping in the implementation part. But, here we are :)

how access specific part of data as an input of AWK

Suppose I want to access an online dictionary and need to look for a specific word. I just like to have the specific part of data, which is those related to word and its translation as input of AWK,any idea?
In other words, I just want to have on my machine a margin of data, How can I prevent downloading all the data and hopefully save space and time. Is there any way to do so without downloading all the data to local machine?
This question is related to my last question here.
Edit 1:
I select dictionary as an example because when you want to look up for a word, it is enough to access a specific part of data and there is no need to process whole of it.
I am not an expert in programming so i was thinking I can modify this answer to make it work(that is why I add AWK tag again). I dont use any specific OS or tool. this is just a basic idea to see what are the possibilities so I dont know how can I improve the tags.
awk cannot download. You must download the file and pipe it into a command that terminates as soon as it finds a result:
wget -qqO- http://example.com/path |grep -wim1 "word"
wget -qqO- URL will have no output other than the content of the given URL, which is placed on standard out so you can then parse it. grep -wim1 "word" will find the first bounded word matching "word" and then terminate. If you don't need it outputted, you can use -wiq instead. If the dictionary has one word per line (and nothing else), you're better off with -x instead of -w so that you can match "can" in its entirety rather than "can't" (' is a word boundary). Remove the -i if you want to match case.
In the comments, you asked:
it may improve to jumpt to start of "w" character maybe so not to download whole data from "a" to "w". is it possible? I guess not
Some programs can "resume" downloads and you may be able to play with that, but you'd have to guess where to start. This would be a lot of work and you might seek too far and therefore fail to get a match.
If you are querying this dictionary more than once, I'd recommend downloading it and saving it so you can query it locally. Even the largest dictionary I know of is only 213MB (compressed, search with zgrep), though I am assuming you're talking about a traditional word list rather than a hash table or other arbitrary data form. Of course, anything longer would take such a long time to download that you'd only want to do it once.
If you really don't want to store it locally, you should probably consider a database rather than a flat file.

How to replace a list of words into a word document automatically?

I have a custom Dictionary I have made in word and it is about 200 words. it contain the common errors that some people mistype and it didn't exist in the original word dictionary.
how to replace any word in my list in any document automatically. I mean whenever word see the wrong word it replace it with the one existed in my list.
I don't want to auto correct every single word every time I open a document.
I used to to something like this some years ago with Windows scripting host and OLE.
So today this would be kind of old fashioned but I bet it still works. However it requires you to have Word installed.
Since Word is remote controllable via COM, you could use nearly every programming language (C++ works natively but is a native nightmare) to remote control it. Then it would be easy to traverse thru and replace the words. Or invoke the search and replace function multiple times.
200 words can easily be held in your source code.
As the "included" feature Windows Scripting host is ok - you can choose between JScript and VB. Just create a .js file and by default it should be executed with WSH. But why don't you use macros (Essentially the same functionality - only seen thru the Word -GUI). But you may also use C# or Delphi....

biztalk: identify message

In my case, I need to parse a bunch of text files and search for a specific strings in each. Each text file is formatted differently, so I can't create a generic flat file schema(or can I?).
Is there a way to simply parse the text in each file, and then use orchestration to make decisions based on the result of the search?
This thread answers my question
MSDN Forum: Multiple flat files on single rcv location, which recommended to use different receive locations and file masks to distinguish the different files

Merge translation files (.ts) with existing .ts files using QT Utilities (lconvert)

Here's my problem: We've got .ts files for nine different languages for our product. We've added about 100 new strings that need to be translated, but some are for our next release, and some are for the release after that. We've run into problems with translators missing strings or translating strings ahead of time. We want to be able to send them smaller .ts file containing only the strings we want translated now, and then merge that .ts file into the larger .ts file containing the rest of the translation.
Our translators are required to use QT Linguist (previously we let them edit the raw XML with less than stellar results).
One solution would be to use contexts, but our dev team is not very keen on that idea. Another would be to merge the .ts files by hand, but that seems like a recipe for cut & paste errors.
Is there a method with lupdate & the project file to add or merge secondary .ts files? I've read through the forums in QT-land w/o finding the answer, but the switches in lupdate allude to being able to point to other translation files. Specifically the -pro switch which says:
-pro <filename>
Name of a .pro file. Useful for files with .pro file syntax but
different file suffix. Projects are recursed into and merged.
Example1: we have a German .ts file, we want to add 20 strings from a separate German translation file such that the primary translation file contains all the strings including the 20 new ones.
Example2: we have a German .ts file, we want to add 20 strings from a separate German translation file such that the secondary translation file will be merged with the primary during lupdate so that the resultant .qm file contains all the strings including the 20 new ones.
Has anyone done either of these (and either would work) and can you give me some insight?
The answer doesn't use lupdate, it lies in another utility called lconvert. It's quite easy to create a secondary file that only contains the strings you're interested in (and delete those same strings from the primary file), then run:
lconvert -i primary.ts secondary.ts -o complete.ts
This will take all the strings from the two input files and put them together into the output file. Using this method I was able to create a zero difference file (other than time stamp) of the original file that I'd split the two primary & secondary files from.
This question didn't get a lot of attention, but maybe someone will have this same problem and this will help.
thanks for this tip. It seems to work properly for my case :
I tried to extract updated and new strings from my project, which is currently under translation in an older version/release that I do not already have translated strings.
The problem was to send the new/updated strings only to translators.
I passed older strings in status resolved, adding new string using Lupdate, make a research using OxygenXML Editor with an XPath "/TS/context/message[not(translation/#type)]" to delete older strings, and clean it from useless blanks and carriage returns.
I tried a merge using lconvert with your solution, in order to merge translated strings : older and newer. It pass correctly lrelease and are displayed properly.

Resources