SQLite 3: Character Issue While Ordering By Records - sqlite

In my SQLite 3 Database, I have some records with Turkish characters such as "Ö", "Ü", "İ" etc. When I select my values with SELECT * FROM TABLE ORDER BY COLUMN_NAME query, the records that begin with these characters are coming at the end.
Normally, they should've come after the letter that is dot-less version of each. Like "Ö" is after "O", "Ü" is after "U".
Is it something about regional settings? Is there a way to control these settings?
I use SQLite Manager in Firefox to manage my DB.
Thanks in advance.
P.S. I know it's not a solution for SQLite but for those who need to use SQLite DB in Objective-C, they can sort the data array after getting from SQLite DB. Here's a good solution: How to sort an NSMutableArray with custom objects in it?

Unfortunately, it seems there's no direct solution for this. For iOS at least. But there are ways to follow.
After I subscribed to mailing list of SQLite, user Named Jean-Christophe Deschamps came with this reply:
"In my SQLite 3 Database, I have some records with Turkish characters
such as "Ö", "Ü", "İ" etc. When I select my values with 'SELECT * FROM
TABLE ORDER BY COLUMN_NAME' query, the records that begin with these
characters are coming at the end."
Bare bone SQLite only collates correctly on the lower ASCII charset.
While that's fine for plain english, it doesn't work for most of us.
"Normally, they should've come after the letter that is dot-less
version of each. Like "Ö" is after "O", "Ü" is after "U". Is it
something about regional settings? Is there a way to control these
settings?"
You have the choice among some ways to get it right or close to right
for your language(s):
o) use ICU either as an extension (for third-party managers) or
linked to
your application.
Advantages: it works 100% correctly for a given language at a time in each
operation.
Drawbacks: it's huge and slow and it requires you register a collation for
every language you deal with. Also it won't work well for columns
containing several non-english languages.
o) write your own collation(s) invoking your OS' ICU routines to
collate
strings.
Advantages: doesn't bloat your code with huge libraries.
Drawbacks: requires you write this extension (in C or something), same
other drawbacks as ICU.
o) If you use Windows, download and use the functions in the
extension I
wrote for a close-to-correct result.
Advantages: it's small, fairly fast and ready to use, it is language-
independant yet works decently well for many languages at
the same time; it also offers a number of Unicode-aware
string manipulation functions (unaccenting or not) functions,
a fuzzy search function and much more. Comes as a C source and
x86 DLL, free for any purpose.
Drawback: it probably doesn't work 100% correctly for any language using
more than "vanilla english letters": your dotless i will collate
along dotted i, for instance. It's a good compromise between
absolute correctness for ONE language and "fair" correctness for
most languages (including some asian languages using diacritics)
Download: http://dl.dropbox.com/u/26433628/unifuzz.zip
"I use SQLite Manager in Firefox to manage my DB."
My little extension will work with this one. You might also want to
try SQLite Expert which has ICU built-in (at least in its Pro version)
and much more.

It could be the regional settings but first I would verify UTF-8 encoding is being used.

Related

parse uniVerse hash / data files in R

I have inherited a uniVerse database (link to Rocketsoftware site) and would like to know if it's possible to read/parse the underlying data files (which I believe are hash tables?) into 'R'?
I'm aware there are ODBC drivers as well as .NET libraries, but I'm interested in parsing the files in R (if possible) without these drivers?
(I've searched and seen a few topics on parsing hash tables in Java and C#, but nothing in R yet)
It's a propriety format, so unless you want to reverse engineer it and re-implement in R that isn't the path forward. Also note that it isn't a single hash-table format either, aside from the standard modulo and bucket sizes, there are several different formats you'll encounter.
If you don't want work with any of the native APIs of the database to read the data, you can issue database commands that will dump it to CSV or XML flat files. Take a look into the RetrieVe query language manuals to learn more.

ASP.NET - How to properly split a string for search?

I'm trying to build a search that is similar to that on Google (with regards to exact match encapsulated in double quotes).
Let's use the following phrase for an example
"phrase search" single terms [different phrase]
Currently if I use the following code
Dim searchTermsArray As String() = searchTerms.Split(New String() {" ", ",", ";"}, StringSplitOptions.RemoveEmptyEntries)
For Each entry In searchTermsArray
Response.Write(entry & "<br>")
Next
my output is
"phrase
search"
single
terms
[different
phrase]
but what I really need is to build a key value pair
phrase search | table1
single | table1
terms | table1
different phrase | table2
where table1 is a table with general info, and table2 is a table of "tags" similar to that on stackoverflow.
Can anybody point me in the right direction on how to properly capture the input?
What are you trying to do is not that trivial. Implementing a search "similar to Google's" is far beyond parsing the search string.
I'd suggest you not to reinvent the wheel and instead use production ready solutions such as Apache Lucene.NET or Apache Solr. Those cope with both parsing and fulltext search.
But if you only need to parse this kind of strings then you should really consider solution Pete pointed to.
Regex is your friend. See this question
Depending on how fancy you plan in getting, you might consider the search grammar/implementation that's included with Irony.
http://irony.codeplex.com/
Search string parsing is a non-regular problem. That means that while a regular expression can get deceptively close, it won't take you all the way there without using proprietary extensions, building an unmaintainable mess of an expression, leaving nasty edge cases open that don't work how you'd like, or some combination of the three.
Instead, there are three correct ways to handle this:
Use a third-party solution like Lucene.
Build a grammar via something like antlr.
Build your own state machine.
For a problem of this level (and assuming that search is core enough to what you're doing to really want to implement it yourself), I'd probably go with option 3. This makes more sense when you realize that regular expressions are themselves instructions for how to set up state machines. All you're doing is building that right into your code. This should give you the ability to tune performance and features as well, without requiring adding a larger lexer component into your code.
For an example of how you might do this take a look at my answer to this question:
Reading CSV files in C#
hat I would do is build a state machine to parse the string character by character. This will be the easiest way to implement a fully-correct solution, and should also result in the fastest code.

Is there a utility for finding SQL statements in multiple files and listing any referenced tables and stored procedures

I'm currently looking at a terrible legacy ColdFusion app written with very few stored procedures and lots of nasty inline SQL statements (it has a similarly bad database too).
Does anyone know of any app which could be used to search the files of the app picking out any SQL statements and listing the tables/stored procedures which are referenced?
Dreamweaver will allow you to search the code of the entire site. If the site is setup properly including the RDS password and provide a data source it can tell you a lot of information. I've only set it up once so I can't remember exactly what information it gives you, I think maybe just the DB structure. Application window > databases. Even if it isn't set up properly just searching for "cfquery" will quickly find all your queries.
You could also write a CF script using CFDirectory/CFFile to loop the .cfm files and parse everything between cfquery and /cfquery tags.
CFBuilder may have some features like that but I'm not to familiar with it yet.
edit I've heard that CFBuilder can't natively find all your cfqueries that don't have cfqueryparam but you can use CF to extend CFB to do so. I imagine you could find/write something for CFB to help you with your problem.
another edit
I know it isn't indexing the contents of the query, but you can use regex to search using the editor as well. searching for <cfquery.+(select|insert|update|delete) checking the regex box should find the queries that aren't using cfstoredProc (be sure to uncheck the match case option if there is one). I know Dreamweaver and Eclipse can both search for Regex.
HTH
As mentioned above I would try a grep with a regex looking for
"<cfquery*" "</cfquery>" and "<cfstoredproc*" "</cfstoredproc>"
In addition if you have tests that have good code coverage or even just feel like the app is fully exercised in production you could try turning on "Log Database Calls" in Admin - > Datasources or maybe even at the JDBC driver level, just monitor performance to make sure it does not slow the site down unacceptably.
In short: no. You'd have to do alot of tricky parsing to make sure you get all the SQL. And because you can glob SQL together from lots of strings, you'll almost always miss some of it.
The best you're likely to do will be a case insensitive grep for "SELECT|INSERT|UPDATE|DELETE" and then manually pulling out the table names.
Depending on how the code is structured, you might be able to get the table names by regexing the SQL from clause. But that's not foolproof. Alot of people use string concatenation to build SQL statements. This is bad because it can introduce SQL injection attacks, and it also make this particular problem harder.

String sort with special characters (ä, ö) in Flex/AS3

is there any way to sort Strings correctly in different languages than English? In German we have Umlaute, and e.g. 'ä' should come right after 'a' in the ascending sort order.
I am using ObjectUtil.stringCompare(), but it puts those special characters always to the end. Any ideas how to solve this? I thought the locale (de_DE) would take care of it, but it does not.
Thanks,
Martin
In ECMAScript Third Edition (and hence both ActionScript and current browser JavaScript) there is the string.localeCompare method. This does a comparison that depends on the current client locale. For example if I set my system locale (in Windows terms, “language to match the language version of the non-Unicode programs you want to use”) to “German (Germany)” and put javascript:alert('ä'.localeCompare('b')) I get -1, but with English I get 1.
It's generally questionable to depend on the client end locale though. Your application would work differently depending on the client OS installation, and it is not nearly as easy for the user to change their system locale as it is to choose a different language in the web browser prefs UI. I'd avoid it if at all possible, and either:
do an ad-hoc string replacement (eg. ä with ae) before comparison. This may be OK if you are only worried about a few umlauts, but is unfeasible for covering the whole of Unicode... even the whole of the Latin diacritical set.
try to do the comparison on the server side, in a scripting language with better character model support than ECMAScript.
You can write your own compare function and pass it to array.sort(compareFunction). compareFunction(a, b):int should compare two strings and return -1 if a > b, 0 if a=b, and 1 if a < b.
In that function you'll want to compare your strings symbol-by-symbol, taking in account special german symbols.
I don't know AS3, but almost all language that supports Locale should have locale aware comparison function for string or sorting too.
For Example, here is the one of locale aware string comparison localeCompare().

How do I handle an apostrophe (') in MS SQL 2008 FTS?

I have a website that utilizes MS SQL 2008's FTS (Full-Text Search). The search works fine if the user searches for a string with an apostrophe like that's - it returns any results that contain that's. However, it will not return a result if the user searches for thats, and the database stores that's.
Also, ideally a search for that's should also return thats and that, etc.
Anybody know if FTS supports this, and what I can do to enable it?
WT
use a parameter
myCommand.Parameters.AddWithValue("#searchterm", txtSearch.Text.Trim());
it will be handle by you, without any hassle.
regarding that, that's and thats you should look up SOUNDEX keyword. 4GuysfromRolla have an old article about it as well.
updated
there is a great talk from one of the TFS Team member regarding this here.
quoting him:
Daniel is correct Full-text Search (FTS) does not use SOUNDEX directly,
but it can be used in combination with SOUNDEX. Additionally, you may
want to review the following links as well as the below TSQL
examples of combining CONTAINS & SOUNDEXYou may want to look at
some of the improved soundex algorithms as well as the
Levenshtein Distance algorithm You
should be able to search Google to find more code examples, for example:
'METAPHONE soundex "sql server" fuzzy name search' and I quickly found -
"Double Metaphone Sounds Great" at http://www.winnetmag.com/Article/ArticleID/26094/26094.html
You can freely download the code in a zip file that has several a user-defined function (UDF) that implement Double Metaphone. Below
are some additional SOUNDEX links:
http://www.merriampark.com/ld.htm
http://www.bcs-mt.org.uk/nala_006.htm
(omitted for out of scoop)
use pubs
-- Combined SOUNDEX OR CONTAINS query that
-- Searches for names that sound like "Michael".
SELECT
au_lname, au_fname
FROM
authors -- returns 2 rows
WHERE
contains(au_fname, 'Mich*') or SOUNDEX(au_fname) = 'M240'
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

Resources