Accent insensitive search with SQLite for html5sql - sqlite

I'm a little bit confused with a quite simple looking problem.
I'm working on an offline HTLM5 web sql database for mobile, using html5sql.js.
So far, everything works fine, except the search with accents.
Here is an example:
Records: "Céline", "Elisa"
Search "el" --> "Elisa"
Search "él" --> No result
I would expect to find both results in both searches.
So far, I don't have any encoding specs in my queries. I read several posts about collation, but I did not manage to use any of them.
Would you have a typical example of how to write a query for this search?

PhoneGap does not have any portable way to do case-insensitive comparisons of non-ASCII characters.
You could remove accents from your strings and store those separately.

You could try my open source library, ydn-db, which has full text search. A demo app can be found here. Currently, it only has English language normalization though. You can find more languages on NaturalNode repo and recompiled it.

Related

trying to understand how regex works

I'm learning about regex expressions and confused by how this whole field works. I've taken an example from a tutorial here and pasted it into https://regexr.com/.
The regex below is supposed to capture email addresses but it doesn't seem to work, at least as is.
I'm posting here in the hopes that there's a simple explanation I might look further into.
From the tutorial website, I gleaned that there are different "flavors" of regex. From the regexr.com site, it seems I have the option to choose a JavaScript or PCRE engine (I assume engine is a synonym for flavor). It doesn't seem to make a difference.
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b
Ultimately I'm working in R so have added the R tag to this post. I suspect R may use yet a different flavour from the one above.

count words in word file that was uploaded

is there a way so i can count the words in a word file (all versions) in classic asp or asp.net?
what i need is to know how many words and if possible to make an array of word length and how many from each so words of 1,2,3 letters will get less attention from the code later.
i was thinking of using FSO or something like that but that won't work for docx
i can upload the file with aspupload or any other object if needed. if there is an object that can be bought that will upload and count words i don't have a problem purchasing it
thanks in advance
You have several options -
If you can have office installed on the server and don't require this to be an fast solution, you can try Word Interop. See Word count using Microsoft.Office.Interop.Word. A similar option is to have OpenOffice installed and work with that, never did that myself.
You can use the IFilter interface (http://msdn.microsoft.com/en-us/library/ms691105(v=vs.85).aspx). Microsoft already implemented logic to take Word files and give you access to the inner text, so all you'll have to do is count the words. Look at the first answer here Are IFilters necessary to index full text documents using Lucene.NET and the link it provides or How to extract text from MS office documents in C#. You can also look at http://blogs.msdn.com/b/jasonz/archive/2009/08/31/sample-parsing-content-in-c-using-ifilter.aspx
You can use 3rd party tools, I know there are some out there, but I'm not really familiar with any of them. For example see http://www.aspose.com/.net/word-component.aspx
If you don't really need support for ALL word versions, then there are various ways to work with Word 2007+ files - for example - the official openXML or the open source docx
Option (2) seems like the way to go to me.

English dictionary needed for a word game

I'm looking for a way to include a full blown English dictionary in an iPhone app (a word game), the database must be able to include all conjugation possibilities for verbs, must include singular and plural spellings. So my app can query the database to check if the spelling is correct.
Is there a free or commercial database that would include those data?
You can use UITextChecker for spell-checking.
Regarding a dictionary, when I built an iOS dictionary library sometime ago (www.lexicontext.com) I used WordNet. WordNet contains a lot of interesting semantic info ...
NSSpellChecker is your easiest option, but it might be more complete to use the online Scrabble official dictionary as well and check it against both (only one match required.)
You could do a web-service request using http://www.hasbro.com/scrabble/en_US/search.cfm
http://www.a2zwordfinder.com/cgi-bin/scrabble.cgi?Letters=&Pattern=______&MatchType=Exactly&MinLetters=3&SortBy=Alpha&SearchType=Scrabble
Change min letters to get different results
The best place to find a database for a spell-checker is probably a free text processing application. So, I'd try with Open Office version of Word. Download it, install it and simply find the dictionary file.
Open Office is licensed under LGPL, so it should be fine, just check if the licence covers the data as well (i.e. the dictionary file).
Maybe this English corpus helps: http://www.wordfrequency.info/free.asp

Is there a utility for finding SQL statements in multiple files and listing any referenced tables and stored procedures

I'm currently looking at a terrible legacy ColdFusion app written with very few stored procedures and lots of nasty inline SQL statements (it has a similarly bad database too).
Does anyone know of any app which could be used to search the files of the app picking out any SQL statements and listing the tables/stored procedures which are referenced?
Dreamweaver will allow you to search the code of the entire site. If the site is setup properly including the RDS password and provide a data source it can tell you a lot of information. I've only set it up once so I can't remember exactly what information it gives you, I think maybe just the DB structure. Application window > databases. Even if it isn't set up properly just searching for "cfquery" will quickly find all your queries.
You could also write a CF script using CFDirectory/CFFile to loop the .cfm files and parse everything between cfquery and /cfquery tags.
CFBuilder may have some features like that but I'm not to familiar with it yet.
edit I've heard that CFBuilder can't natively find all your cfqueries that don't have cfqueryparam but you can use CF to extend CFB to do so. I imagine you could find/write something for CFB to help you with your problem.
another edit
I know it isn't indexing the contents of the query, but you can use regex to search using the editor as well. searching for <cfquery.+(select|insert|update|delete) checking the regex box should find the queries that aren't using cfstoredProc (be sure to uncheck the match case option if there is one). I know Dreamweaver and Eclipse can both search for Regex.
HTH
As mentioned above I would try a grep with a regex looking for
"<cfquery*" "</cfquery>" and "<cfstoredproc*" "</cfstoredproc>"
In addition if you have tests that have good code coverage or even just feel like the app is fully exercised in production you could try turning on "Log Database Calls" in Admin - > Datasources or maybe even at the JDBC driver level, just monitor performance to make sure it does not slow the site down unacceptably.
In short: no. You'd have to do alot of tricky parsing to make sure you get all the SQL. And because you can glob SQL together from lots of strings, you'll almost always miss some of it.
The best you're likely to do will be a case insensitive grep for "SELECT|INSERT|UPDATE|DELETE" and then manually pulling out the table names.
Depending on how the code is structured, you might be able to get the table names by regexing the SQL from clause. But that's not foolproof. Alot of people use string concatenation to build SQL statements. This is bad because it can introduce SQL injection attacks, and it also make this particular problem harder.

How do I handle an apostrophe (') in MS SQL 2008 FTS?

I have a website that utilizes MS SQL 2008's FTS (Full-Text Search). The search works fine if the user searches for a string with an apostrophe like that's - it returns any results that contain that's. However, it will not return a result if the user searches for thats, and the database stores that's.
Also, ideally a search for that's should also return thats and that, etc.
Anybody know if FTS supports this, and what I can do to enable it?
WT
use a parameter
myCommand.Parameters.AddWithValue("#searchterm", txtSearch.Text.Trim());
it will be handle by you, without any hassle.
regarding that, that's and thats you should look up SOUNDEX keyword. 4GuysfromRolla have an old article about it as well.
updated
there is a great talk from one of the TFS Team member regarding this here.
quoting him:
Daniel is correct Full-text Search (FTS) does not use SOUNDEX directly,
but it can be used in combination with SOUNDEX. Additionally, you may
want to review the following links as well as the below TSQL
examples of combining CONTAINS & SOUNDEXYou may want to look at
some of the improved soundex algorithms as well as the
Levenshtein Distance algorithm You
should be able to search Google to find more code examples, for example:
'METAPHONE soundex "sql server" fuzzy name search' and I quickly found -
"Double Metaphone Sounds Great" at http://www.winnetmag.com/Article/ArticleID/26094/26094.html
You can freely download the code in a zip file that has several a user-defined function (UDF) that implement Double Metaphone. Below
are some additional SOUNDEX links:
http://www.merriampark.com/ld.htm
http://www.bcs-mt.org.uk/nala_006.htm
(omitted for out of scoop)
use pubs
-- Combined SOUNDEX OR CONTAINS query that
-- Searches for names that sound like "Michael".
SELECT
au_lname, au_fname
FROM
authors -- returns 2 rows
WHERE
contains(au_fname, 'Mich*') or SOUNDEX(au_fname) = 'M240'
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

Resources