English dictionary needed for a word game - dictionary

I'm looking for a way to include a full blown English dictionary in an iPhone app (a word game), the database must be able to include all conjugation possibilities for verbs, must include singular and plural spellings. So my app can query the database to check if the spelling is correct.
Is there a free or commercial database that would include those data?

You can use UITextChecker for spell-checking.
Regarding a dictionary, when I built an iOS dictionary library sometime ago (www.lexicontext.com) I used WordNet. WordNet contains a lot of interesting semantic info ...

NSSpellChecker is your easiest option, but it might be more complete to use the online Scrabble official dictionary as well and check it against both (only one match required.)
You could do a web-service request using http://www.hasbro.com/scrabble/en_US/search.cfm

http://www.a2zwordfinder.com/cgi-bin/scrabble.cgi?Letters=&Pattern=______&MatchType=Exactly&MinLetters=3&SortBy=Alpha&SearchType=Scrabble
Change min letters to get different results

The best place to find a database for a spell-checker is probably a free text processing application. So, I'd try with Open Office version of Word. Download it, install it and simply find the dictionary file.
Open Office is licensed under LGPL, so it should be fine, just check if the licence covers the data as well (i.e. the dictionary file).

Maybe this English corpus helps: http://www.wordfrequency.info/free.asp

Related

count words in word file that was uploaded

is there a way so i can count the words in a word file (all versions) in classic asp or asp.net?
what i need is to know how many words and if possible to make an array of word length and how many from each so words of 1,2,3 letters will get less attention from the code later.
i was thinking of using FSO or something like that but that won't work for docx
i can upload the file with aspupload or any other object if needed. if there is an object that can be bought that will upload and count words i don't have a problem purchasing it
thanks in advance
You have several options -
If you can have office installed on the server and don't require this to be an fast solution, you can try Word Interop. See Word count using Microsoft.Office.Interop.Word. A similar option is to have OpenOffice installed and work with that, never did that myself.
You can use the IFilter interface (http://msdn.microsoft.com/en-us/library/ms691105(v=vs.85).aspx). Microsoft already implemented logic to take Word files and give you access to the inner text, so all you'll have to do is count the words. Look at the first answer here Are IFilters necessary to index full text documents using Lucene.NET and the link it provides or How to extract text from MS office documents in C#. You can also look at http://blogs.msdn.com/b/jasonz/archive/2009/08/31/sample-parsing-content-in-c-using-ifilter.aspx
You can use 3rd party tools, I know there are some out there, but I'm not really familiar with any of them. For example see http://www.aspose.com/.net/word-component.aspx
If you don't really need support for ALL word versions, then there are various ways to work with Word 2007+ files - for example - the official openXML or the open source docx
Option (2) seems like the way to go to me.

How to feed Word 2010 (.docx) documents/templates with data from MySQL database?

What would be the best approach to replace placeholders in a .docx document (Word 2010) with data coming from a MySQL database?
Can I just open the file using a server side language and do a string replace per each placeholder?
Is there any existing tool/library available?
Thanks
Disclosure: I work for Invantive.
Using Invantive Composition (http://www.invantive.com/products/invantive-composition) you can fill Word documents (letters, legal pleadings, insurancy policies) with data from a database (IBM DB2, Oracle, MySQL, Teradata and SQL Server) and then fully change the contents at will manually. It is intended for real Microsoft Word end-users (both the guys that make the template and the ones that use it) that access the databases through a central webservice and models with queries. Invantive Composition allows nested repeating groups of data and lay-out. Integrates into Microsoft Word using click once.
In the past, I personally have also been using JasperReports (http://community.jaspersoft.com/project/jasperreports-library) to generate letters using the RTF output target of JasperReports. It is free and works fine as long as you do not want to edit the output more than a few words and have Java/SQL development skills. Just as Invantive Composition it works fine for large numbers of different reports.
As long as you can control the environment completely, you can also consider using RTF as intermediate language (not for end-users, only real developers). Save document as RTF, replace parts of the text you need to be replacable, write a webservice that accepts the parameter and dumps back the resulting RTF. Takes some time to generate more complex tables (tables are obviously something invented by the human race after the RTF specification was written :-) This approach only works with very limited number of templates and when you have sufficient developer time available to get it up and running and stabilized.
As an independent reviewer, I have also seen cases where XML templates were used, but the results were not as good as with JasperReports.
**Disclosure: I lead the docx4j project **
There are heaps of existing tools/libraries available!
Yes, you can just do a string replace, but that is a brittle approach, since Word may have split the string across runs.
You can use MERGEFIELDs, or content control data binding.
docx4j supports all three approaches, but content control data binding is the most powerful.
ContentControlsMergeXML
MERGEFIELDs
VariableReplace
One thing to consider especially is "repeats". If you want say a row of a table in Word, for each matching row in your MySQL table, then you need a way to make this happen.
docx4j does this with a "repeat" content control around the table row; whichever solution you choose, I'd make sure up front that it can handle repeats.
If you want to use PHP the most complete available solution is PHPDocX.
You may check in the tutorial how to substitute placeholder variables by data coming from any data source (like a MySQL DB).
In particular, you may populate table rows with an indefinite number of entries and you may delete whole blocks of the Word document depending on the data fed to the application or build dynamical Word charts.
You may check the available DEMO for a simple but quite illustrative example (its inner workings are explained in the tutorial section).
You can use open Open XML SDK and replace your placeholders like this.
Disclosure: I lead the docxgenjs project
I think you shouldn't have to code everything by yourself, that's why I created a Mustache-like templating engine for docx
Demo:
http://javascript-ninja.fr/docxgenjs/examples/demo.html
Repo
https://github.com/edi9999/docxgenjs
It is JS-based and works client and server side.
Yes, you can use server side language to do it.
Check on apache POI.
http://poi.apache.org
Hello I read the above esp the comments and Ivantive looks impressive - but the solution I needed was much simpler. Use Selection.Range.InsertDatabase in Word to fetch records from an access database or excel spreadsheet or even just another word document. With the access solution you can choose the layout of the records to fetch and have it fetch just particular recordds based on a field (eg ID). Google the words above and it'll take you to MS guidance and an example VB script. Worked well in just a few mins. Now looking for VB script that asks the person what ID they want from the dbase and we're done.
it uses docx templates that have merge fields with java objects (the objects have the information you load from mysql or any other source). The xdoc report is an project for java language, the home page of the project is https://code.google.com/p/xdocreport/.
*Disclosure: I create the templ4docx project *
Hello
You can use templ4docx java library, which is on maven central repository, so you can just add it to your maven dependencies:
<dependency>
<groupId>pl.jsolve</groupId>
<artifactId>templ4docx</artifactId>
<version>2.0.0</version>
</dependency>
Example usage:
Docx docx = new Docx("E:\\template.docx");
Variables variables = new Variables();
variables.addTextVariable(new TextVariable("${firstName}", "John"));
variables.addTextVariable(new TextVariable("${lastName}", "Sky"));
docx.fillTemplate(variables);
docx.save("E:\\filledTemplate.docx");
More details you can find here: http://jsolve.github.io/java/templ4docx/

Version diff in alfresco

Alfresco allows uploading newer versions of documents in the repository and also keeps track of the version history, it seems. However, I could not find any way to compare or diff a document with its prior versions.
Is this possible? are there any good external plugins or tools for this?
I assume you think of something similiar to the good old Unix diff tool which basically compares text files and can show the result in a human readable form.
The general equivalent situation in alfresco is far more complicated. You have an arbitrary amount of properties of different type. The text file you might think of just happens to have character bytes in cm:content.
So to answer your question : I don't know of any extension providing a general diff between versions, but it should not be two hard rolling your own for text files other simple special comparisons. In the former case you might want to have a look at Java library for free-text diff for libraries providing the base functionality.
Looking for the same functionality and found this Alfresco addon
http://code.google.com/p/versions-difference-alfresco-plug-in/

Are there any preset I18n word lists / resource files?

I'm creating a web application that uses I18n. As I don't want to translate very common basic strings like "forgot password?" on my own I'm asking you if there are already any resource files or word lists containing these strings. One option is to download an existing framework and extract somehow these strings but this might be a hassle?
Especially I'm looking for translation regarding user authentication and translations from English to Italian, French and German. The file format doesn't matter.
Professional translators use a tool, TMX is the generic term i think, Translation Memory Exchange, that does what you are talking about by building up standard phrase lists in other languages so when they translate they can bring these phrases in to speed up their job and reduce the repetitive tedium. So these lists exist.
There is a free plugin for MS Word that does this and may come with lists (sorry cannot remember the name although Rosetta rings a bell).
There is an FOSS TMX tool called Okapi at Sourceforge. It may come with the dictionaries but if not it is a point where you can investigate.
You could also approach a site called Proz which is a site for translators and might be able to point you in the right direction
Take care over MT like Google API as it can give some weird results but you could use it to build you list and then double check. Remember that when you check a language that you need to do it with a native speaker who can pick up on the nuances and colloquialisms.
You can use google translator api. and your custom resource bundle

Where can I obtain dictionary files for use in checking spelling?

I thought this was asked before, but 15 minutes of searching on Google and the site search didn't turn anything up...so:
Where can I obtain free (as in beer and/or as in speech) dictionary files? I'm mainly interested in English, but if you know of any dictionary files, please point them out.
Note: This question doesn't have a right/wrong answer, so I made it community-wiki. However, I feel that it might be valuable to not only myself, but anyone who wishes to implement or use a spell checker with various dictionary files.
I have found a SourceForge project called Word List, which appears to have a number of dictionaries. I have downloaded a couple and am currently checking them out.
On Linux you can look in places like /usr/share/dict/words
I would presume that OpenOffice contains dictionaries for several languages.
I don't know what your target platform is but here is a solution that is for VB.NET. It uses the Office libraries which Office in itself isn't free but if your users are all internal and have Office then you could leverage these libs. There is a zip file with the example source code you can download as well.
Check spelling and grammar
There is what appears to be a half-decent dictionary available for free here on CodeProject.com (registration required unfortunately).

Resources