Where can I obtain dictionary files for use in checking spelling? - dictionary

I thought this was asked before, but 15 minutes of searching on Google and the site search didn't turn anything up...so:
Where can I obtain free (as in beer and/or as in speech) dictionary files? I'm mainly interested in English, but if you know of any dictionary files, please point them out.
Note: This question doesn't have a right/wrong answer, so I made it community-wiki. However, I feel that it might be valuable to not only myself, but anyone who wishes to implement or use a spell checker with various dictionary files.

I have found a SourceForge project called Word List, which appears to have a number of dictionaries. I have downloaded a couple and am currently checking them out.

On Linux you can look in places like /usr/share/dict/words

I would presume that OpenOffice contains dictionaries for several languages.

I don't know what your target platform is but here is a solution that is for VB.NET. It uses the Office libraries which Office in itself isn't free but if your users are all internal and have Office then you could leverage these libs. There is a zip file with the example source code you can download as well.
Check spelling and grammar

There is what appears to be a half-decent dictionary available for free here on CodeProject.com (registration required unfortunately).

Related

Possible to search through all JSPs in Adobe CQ5 repository with CRXDE?

We have a couple of relatively simple websites running on Adobe CQ 5.5 that were developed by a third party. I'm pretty familiar with how CQ works, but I'm working with somebody else's code here and I need to be able to search through all components in the system for a particular string.
The issue is that I can't seem to find a way to search across all of the various .jsp files stored with the various system components. I would have figured that the query tool in CRXDE Lite would have done the trick with something like this:
/jcr:root//*[jcr:contains(., 'Find this exact string in a JSP')] order by #jcr:score
But I've had no luck.
What I am looking for is some sort of global search that includes JSP files. Is that possible? Were I using a regular Java system, any IDE worth the download would be able to do this.
Thanks.
Might not be easiest way, but you can use the VLT tool to checkout the repository into your filesystem. Then you can lookup using whatever tool you prefer. It might even be faster in the long run
I don't have the actual answer but I suppose the JSPs are indexed via a filter that strips out some of their content.
It should be possible to configure the repository to index them as is instead, based on the info at http://wiki.apache.org/jackrabbit/IndexingConfiguration and http://jackrabbit.apache.org/jackrabbit-text-extractors.html
Sorry about the vagueness of this answer - I know the basic principles but to provide the details I would need more time than I can afford now ;-)

Version diff in alfresco

Alfresco allows uploading newer versions of documents in the repository and also keeps track of the version history, it seems. However, I could not find any way to compare or diff a document with its prior versions.
Is this possible? are there any good external plugins or tools for this?
I assume you think of something similiar to the good old Unix diff tool which basically compares text files and can show the result in a human readable form.
The general equivalent situation in alfresco is far more complicated. You have an arbitrary amount of properties of different type. The text file you might think of just happens to have character bytes in cm:content.
So to answer your question : I don't know of any extension providing a general diff between versions, but it should not be two hard rolling your own for text files other simple special comparisons. In the former case you might want to have a look at Java library for free-text diff for libraries providing the base functionality.
Looking for the same functionality and found this Alfresco addon
http://code.google.com/p/versions-difference-alfresco-plug-in/

Are there any preset I18n word lists / resource files?

I'm creating a web application that uses I18n. As I don't want to translate very common basic strings like "forgot password?" on my own I'm asking you if there are already any resource files or word lists containing these strings. One option is to download an existing framework and extract somehow these strings but this might be a hassle?
Especially I'm looking for translation regarding user authentication and translations from English to Italian, French and German. The file format doesn't matter.
Professional translators use a tool, TMX is the generic term i think, Translation Memory Exchange, that does what you are talking about by building up standard phrase lists in other languages so when they translate they can bring these phrases in to speed up their job and reduce the repetitive tedium. So these lists exist.
There is a free plugin for MS Word that does this and may come with lists (sorry cannot remember the name although Rosetta rings a bell).
There is an FOSS TMX tool called Okapi at Sourceforge. It may come with the dictionaries but if not it is a point where you can investigate.
You could also approach a site called Proz which is a site for translators and might be able to point you in the right direction
Take care over MT like Google API as it can give some weird results but you could use it to build you list and then double check. Remember that when you check a language that you need to do it with a native speaker who can pick up on the nuances and colloquialisms.
You can use google translator api. and your custom resource bundle

English dictionary needed for a word game

I'm looking for a way to include a full blown English dictionary in an iPhone app (a word game), the database must be able to include all conjugation possibilities for verbs, must include singular and plural spellings. So my app can query the database to check if the spelling is correct.
Is there a free or commercial database that would include those data?
You can use UITextChecker for spell-checking.
Regarding a dictionary, when I built an iOS dictionary library sometime ago (www.lexicontext.com) I used WordNet. WordNet contains a lot of interesting semantic info ...
NSSpellChecker is your easiest option, but it might be more complete to use the online Scrabble official dictionary as well and check it against both (only one match required.)
You could do a web-service request using http://www.hasbro.com/scrabble/en_US/search.cfm
http://www.a2zwordfinder.com/cgi-bin/scrabble.cgi?Letters=&Pattern=______&MatchType=Exactly&MinLetters=3&SortBy=Alpha&SearchType=Scrabble
Change min letters to get different results
The best place to find a database for a spell-checker is probably a free text processing application. So, I'd try with Open Office version of Word. Download it, install it and simply find the dictionary file.
Open Office is licensed under LGPL, so it should be fine, just check if the licence covers the data as well (i.e. the dictionary file).
Maybe this English corpus helps: http://www.wordfrequency.info/free.asp

Which library should I use to generate RSS in Common Lisp?

What's the best library to use to generate RSS for a webserver written in Common Lisp?
Most anything will probably do. Personally, I've been using xml-emitter for my blog's Atom feed, which has worked out well so far.
Just choose whichever XML generation library you like and hack away, I'd say. As others have remarked, RSS is simple; it's little work to generate it manually.
That said, I recommend not generating plain strings directly. Having to deal with quoting data is more of a hassle than installing an XML library, and it's also insecure in case your feed contains data submitted by visitors of your website.
xml-emitter says it has an RSS 2.0 emitter built in.
CL-WHO can generate XML pretty easily.
I am not aware of any specific RSS library. But the format is fairly simple so any library that can write xml will do at that level.
You could have e.g. a look at the nuclblog (http://cyrusharmon.org/projects?project=nuclblog) project as that has the capability to generate an RSS feed for the blog entries it maintains.
cl-rss-gen is a tiny library (LGPL, depends on CL-WHO) that does some boilerplate work for you (supports generating RSS entries directly from CLOS class instances by specifying which slot maps to which attribute).
Take a look at the code before using it, it may give you the idea how it's working and whether you need it or not (as other posters said, you can generate RSS yourself with CL-WHO or any XML generation library).
Oh, and sorry for resurrecting a four years old thread, but if anyone searches for similar library, he/she will find the answer here.

Resources