How does implicit conversions occur in sqlite? - sqlite

In SQLite, dynamic typing is used and implicit conversions are done in expressions. For example
SELECT (3 < 2); -- false
SELECT (3 < '2'); -- true (what's happning here?)
SELECT ('3' < '2'); -- false
SELECT (3 < 20); -- true
SELECT (3 < '20'); -- true (what's happning here?)
SELECT ('3' < '20'); -- false
But the official documentation and the O'REILLY book Using SQLite say nothing about how operands are casted in implicit conversions.
In C++, Standard strictly defines (i.e. explicitly explains) how implicit conversions occur. For example, if either operand is of the type long double, another operand is casted to long double.
Is there such a rule in SQLite?

From Datatypes In SQLite Version 3/4.1. Sort Order:
An INTEGER or REAL value is less than any TEXT or BLOB value.
Obviously:
SELECT TYPEOF(3);
returns integer
and
SELECT TYPEOF('2');
SELECT TYPEOF('20');
return text.
So there is no conversion here:
SELECT 3 < '2';
which returns true.
But there is an implicit conversion in an expression like this:
SELECT 3 < '2' + 0;
which returns false and this conversion is forced by the operator + which applies a numeric operation to '2' thus converting it to an integer.
Edit to clarify:
This behavior applies only to literal values like 3 and '2'.
When it comes to expressions or column values then an implicit conversion does happen.
So if you define a table like:
create table test(id integer);
insert into test(id) values (1), (2), (3);
a statement like:
select * from test where id > '1'
will return:
| id |
| --- |
| 2 |
| 3 |
see the demo.

Implicit conversions sometimes occur but sometimes don't. The conditions which determine whether conversions are done before comparisons are described in 4.2. Type Conversions Prior To Comparison. According to the section,
Affinity is applied to operands of a comparison operator prior to the comparison according to the following rules in the order shown:
If one operand has INTEGER, REAL or NUMERIC affinity and the other operand has TEXT or BLOB or no affinity then NUMERIC affinity is applied to other operand.
If one operand has TEXT affinity and the other has no affinity, then TEXT affinity is applied to the other operand.
Otherwise, no affinity is applied and both operands are compared as is.
But how is the type affinity of an expression (including literals) defined? It is explained in 3.2. Affinity Of Expressions as
Every table column has a type affinity (one of BLOB, TEXT, INTEGER, REAL, or NUMERIC) but expressions do no necessarily have an affinity.
Expression affinity is determined by the following rules:
The right-hand operand of an IN or NOT IN operator has no affinity if the operand is a list and has the same affinity as the affinity of the result set expression if the operand is a SELECT.
When an expression is a simple reference to a column of a real table (not a VIEW or subquery) then the expression has the same affinity as the table column.
Parentheses around the column name are ignored. Hence if X and Y.Z are column names, then (X) and (Y.Z) are also considered column names and have the affinity of the corresponding columns.
Any operators applied to column names, including the no-op unary "+" operator, convert the column name into an expression which always has no affinity. Hence even if X and Y.Z are column names, the expressions +X and +Y.Z are not column names and have no affinity.
An expression of the form "CAST(expr AS type)" has an affinity that is the same as a column with a declared type of "type".
A COLLATE operator has the same affinity as its left-hand side operand.
Otherwise, an expression has no affinity.
So in the cases of examples in OP, literals have no affinity and are thus compared as-is. Since
An INTEGER or REAL value is less than any TEXT or BLOB value.
as pointed out in forpas's answer, 3 < '2' returns true.
These rules correctly describes apparently strange behavior referred to in this comment. CAST ('1' AS INTEGER) does have the type affinity INTEGER so that >= '1' is interpreted as >= 1 and thus CAST ('1' AS INTEGER) >= '1' returns true whereas 1 >= '1' returns false.

Related

SQLite Affinity Bug? [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 4 years ago.
EDIT:
The answer here: Is floating point math broken? assists in understanding this question. However, this question is not language agnostic. It is specific to the documented behavior and affinity of floating point numbers as handled by SQLite. Having a very similar answer to a different question != duplicate question.
QUESTION:
I have a rather complex SQLite Where Clause comparing numerical values. I have read and "think" I understand the Datatype Documentation here: https://www.sqlite.org/datatype3.html
Still confused as to the logic SQLite uses to determine datatypes in comparison clauses such as =, >, <, <> etc. I can narrow my example down to this bit of test SQL of which the results make little sense to me.
SELECT
CAST(10 AS NUMERIC) + CAST(254.53 AS NUMERIC) = CAST(264.53 AS NUMERIC) AS TestComparison1,
CAST(10 AS NUMERIC) + CAST(254.54 AS NUMERIC) = CAST(264.54 AS NUMERIC) AS TestComparison2
Result: "1" "0"
The second expression in the select statement (TestComparison2) is converting the left-side of the equation to a TEXT value. I can prove this by casting the right-side of the equation to TEXT and the result = 1.
Obviously I'm missing something in the way SQLite computes Affinity. These are values coming from columns in a large/complex query. Should I be casting both sides of the equations in WHERE/Join Clauses to TEXT to avoid these issues?
The reason why you are not getting the expected result is that the underlying results will be floating point.
Although DataTypes in SQLite3 covers much, you should also consider the following section from Expressions :-
Affinity of type-name Conversion Processing
NONE
Casting a value to a type-name with no affinity causes the value to be converted into a BLOB. Casting to a BLOB consists of first
casting the value to TEXT in the encoding of the database connection,
then interpreting the resulting byte sequence as a BLOB instead of as
TEXT.
TEXT
To cast a BLOB value to TEXT, the sequence of bytes that make up the BLOB is interpreted as text encoded using the database
encoding.
Casting an INTEGER or REAL value into TEXT renders the value as if via
sqlite3_snprintf() except that the resulting TEXT uses the encoding of
the database connection.
REAL
When casting a BLOB value to a REAL, the value is first converted to TEXT.
When casting a TEXT value to REAL, the longest possible prefix of the
value that can be interpreted as a real number is extracted from the
TEXT value and the remainder ignored. Any leading spaces in the TEXT
value are ignored when converging from TEXT to REAL.
If there is no prefix that can be interpreted as a real number, the
result of the conversion is 0.0.
INTEGER
When casting a BLOB value to INTEGER, the value is first converted to TEXT.
When casting a TEXT value to INTEGER, the longest possible prefix of the value >that can be interpreted as an integer number is extracted
from the TEXT value and the remainder ignored. Any leading spaces in
the TEXT value when converting from TEXT to INTEGER are ignored.
If there is no prefix that can be interpreted as an integer number,
the result of the conversion is 0.
If the prefix integer is greater than +9223372036854775807 then the
result of the cast is exactly +9223372036854775807.
Similarly, if the
prefix integer is less than -9223372036854775808 then the result of
the cast is exactly -9223372036854775808.
When casting to INTEGER, if the text looks like a floating point value with an exponent, the exponent will be ignored because it is no
part of the integer prefix. For example, "(CAST '123e+5' AS INTEGER)"
results in 123, not in 12300000.
The CAST operator understands decimal integers only — conversion of hexadecimal integers stops at the "x" in the "0x" prefix of the
hexadecimal integer string and thus result of the CAST is always zero.
A cast of a REAL value into an INTEGER results in the integer between the REAL value and zero that is closest to the REAL value. If
a REAL is greater than the greatest possible signed integer
(+9223372036854775807) then the result is the greatest possible signed
integer and if the REAL is less than the least possible signed integer
(-9223372036854775808) then the result is the least possible signed
integer.
Prior to SQLite version 3.8.2 (2013-12-06), casting a REAL value greater than +9223372036854775807.0 into an integer resulted in the
most negative integer, -9223372036854775808. This behavior was meant
to emulate the behavior of x86/x64 hardware when doing the equivalent
cast.
NUMERIC
Casting a TEXT or BLOB value into NUMERIC first does a forced conversion into REAL but then further converts the result into
INTEGER if and only if the conversion from REAL to INTEGER is lossless
and reversible. This is the only context in SQLite where the NUMERIC
and INTEGER affinities behave differently.
Casting a REAL or INTEGER value to NUMERIC is a no-op, even if a real
value could be losslessly converted to an integer.
NOTE
Before this section there is a section on Literal Values (i.e. casting probably only needs to be applied to values extracted from columns).
Try :-
SELECT
round(CAST(10 AS NUMERIC) + CAST(254.53 AS NUMERIC),2) = round(CAST(264.53 AS NUMERIC),2) AS TestComparison1,
round(CAST(10 AS NUMERIC) + CAST(254.54 AS NUMERIC),2) = round(CAST(264.54 AS NUMERIC),2) AS TestComparison2
:-

General Comparisons vs Value Comparisons

Why does XQuery treat the following expressions differently?
() = 2 returns false (general Comparison)
() eq 2 returns an empty sequence (value Comparison)
This effect is explained in the XQuery specifications. For XQuery 3, it is in chapter 3.7.1, Value Comparisons (highlighting added by me):
Atomization is applied to the operand. The result of this operation is called the atomized operand.
If the atomized operand is an empty sequence, the result of the value comparison is an empty sequence, and the implementation need not evaluate the other operand or apply the operator. However, an implementation may choose to evaluate the other operand in order to determine whether it raises an error.
Thus, if you're comparing two single element sequences (or scalar values, which are equal to those), you will as expected receive a true/false value:
1 eq 2 is false
2 eq 2 is true
(1) eq 2 is false
(2) eq 2 is true
(2) eq (2) is true
and so on
But, if one or both of the operands is the empty list, you will receive the empty list instead:
() eq 2 is ()
2 eq () is ()
() eq () is ()
This behavior allows you to pass-through empty sequences, which could be used as a kind of null value here. As #adamretter added in the comments, the empty sequence () has the effective boolean value of false, so even if you run something like if ( () eq 2) ..., you won't observe anything surprising.
If any of the operands contains a list of more than one element, it is a type error.
General comparison, $sequence1 = $sequence2 tests if any element in $sequence1 has an equal element in $sequence2. As this semantically already supports sequences of arbitrary length, no atomization must be applied.
Why?
The difference comes from the requirements imposed by the operators' signatures. If you compare sequences of arbitrary length in a set-based manner, there is no reason to include any special cases for empty sequences -- if an empty sequence is included, the comparison is automatically false by definition.
For the operators comparing single values, one has to consider the case where an empty sequence is passed; the decision was to not raise an error, but also return a value equal to false: the empty sequence. This allows to use the empty sequence as a kind of null value, when the value is unknown; anything compared to an unknown value can never be true, but must not (necessarily) be false. If you need to, you could check for an empty(...) result, if so, one of the values to be compared was unknown; otherwise they're simply different. In Java and other languages, a null value would have been used to achieve similar results, in Haskell there's the Data.Maybe.

devexpress vgridcontrol displayformatstring don't work

I have an unbound vgrid control. One field's unbound expression is like this:
Iif([NETSAL]=0, 0, [GP] / [NETSAL] * 100 )
The unbound type is decimal, the format type is numeric, the format string n1.
The problem is, that I don't get the right formatted values. F. e. if gp=200 and netsal=1500, I should get: 13,3, but I get 0,0. I checked the computed value, this is 0,0 too.
But if gp=2500 ant netsal=1000, then the value is 200, so it seems, that the value is rounded.
But why?
Thanks.
The expression result type depends on expression members types. In your case, all members of the expression [GP] / [NETSAL] are integer values. This is why the result is rounded to the nearest integer value.
Adding a decimal constant value to the expression will change the type of the expression result to decimal. According to the Criteria Language Syntax, the type of a numeric constant can be declared using special literals. For the decimal type, the literal is 'm'.
Try the following expression, it should work as you expect:
Iif([NETSAL]=0, 0, 1m * [GP] / [NETSAL] * 100 )

When would indeterminate NULL in PL/SQL be useful?

I was reading some PL/SQL documentation, and I am seeing that NULL in PL/SQL is indeterminate.
In other words:
x := 5;
y := NULL;
...
IF x != y THEN -- yields NULL, not TRUE
sequence_of_statements; -- not executed
END IF;
The statement would not evaluate to true, because the value of y is unknown and therefore it is unknown if x != y.
I am not finding much info other than the facts stated above, and how to deal with this in PL/SQL. What I would like to know is, when would something like this be useful?
This is three valued logic, see http://en.wikipedia.org/wiki/Three-valued_logic, and - specific for SQL - in http://en.wikipedia.org/wiki/Null_(SQL).
It follows the concept that a NULL value means: this value is currently unknown, and might be filled with something real in future. Hence, the behavior is defined in a way that would be correct in all cases of future non-null values. E. g. true or unknown is true, as - no matter if the unknown (which is the truth value of NULL) will later be replaced by something that is true or something that is false, the outcome will be true. However, true and unknown is unknown, as the result will be true if the unknown will later be replaced by a true value, while it will be false, if theunknown` will later be replaced by something being false.
And finally, this behavior is not "non determinictic", as the result is well defined, and you get the same result on each execution - which is by definition deterministic. It is just defined in a way that is a bit more complex than the standard Boolean two-valued logic used in most other programming languages. A non-deterministic function would be dbms_random.random, as it returns a dfferent value each time it is called, or even SYSTIMESTAMP, which also returns different values if called several times.
You can find good explanation why NULL was introduced and more in Wikipedia.
In PL/SQL you deal with NULL by
using IS (NOT) NULL as a comparision, when you would like to test against NULL
using COALESCE and NVL functions, when you want to substitute NULL with something else, like here IF NVL(SALARY, 0) = 0

What are union types and intersection types? [duplicate]

This question already has answers here:
Union types and Intersection types
(4 answers)
Closed 9 years ago.
What are union types and intersection types?
I have consulted this question, but some small working type systems
would be better, not necessary practical ones.
Specifically, by union types I'm referring to the one mentioned in this blog post
instead of sum types, where the pseudo-code looks like
{String, null} findName1() {
if (...) {
return "okay";
} else {
return null;
}
}
The wikipedia page has a short explanation on intersection types and union
types, but there seems to be no further references about this.
Union and intersection types are just considering types to be sets of values (infinite sets, mostly). If that's how you're thinking of types then any operation on sets that results in a set could be applied to types (sets of values) to make a new type (set of values), at least conceptually.
A union type is similar in some senses to a sum type, which you seem to be familiar with. Indeed I've often heard sum types described as "discriminated union" types. The basic difference is that a sum type like (Haskell notation) data FooBar = Foo Integer | Bar String allows you to tell whether a given FooBar value contains an Integer or a String (because FooBar values are tagged with Foo or Bar). Even if we write data FooBar = Foo Integer | Bar Integer where both types are the same, the "tag" adds extra information and we can tell "which integer" the FooBar value is.
The union type equivalent would be something like (not valid Haskell) data FooBar = Integer | String. The values in FooBar simply are all the string values and all the integer values. If we make a union type of the same two types like data FooBar = Integer | Integer it should be logically indistinguishable from just Integer, since the union of a set with itself is itself.
In principle, the things you could do with values in a type U that is the union of types A and B are just the operations that work on As and also work on Bs; any operation that only works on either As or Bs might get the wrong kind of input, because a U has no information to say whether it's an A or a B.1
(Undiscriminated) union types wouldn't be very interesting in languages with type systems similar to Haskell's, because concrete types are disjoint2, so the only operations that work on both As and Bs work on all values (unless A is B, in which it's just all the operations that work on that single type).
But in a way, a type classes (if you're familiar with them) are a way of providing something a bit like a union type. A type which is polymorphic but constrained to be a member of some type class is a little like a union of all the types which are in the type class (except that you don't know what those are, because type classes are in principle open); the only things you can do with such a value are the thins which have been declared to work on values of every type in the type class.
Union types could be interesting in a language with sub-typing (as is common in object-oriented programming languages). If you union together two subtypes of a common super-type you get something that supports at least the operations of the super-type, but it excludes any other subtypes of the super-type, so it isn't the same as just using the super-type.
An intersection type is exactly the concept, but using intersection instead of union. This means the things you could do with a value in a type I that is the intersection of types A and B are the operations that work on As plus the operations that work on Bs; anything in I is guaranteed to be both an A and a B, so it can safely be given to either kind of operation.
These also wouldn't be very interesting in languages with Haskell-like type systems. Because concrete types are disjoint2, any non-trivial intersection is empty. But again, type class constraints can provide something a bit like an intersection type; if you add multiple type class constraints to the same type variable, then the only values that can be used where that type variable is expected are of types that are in the "intersection" of all the type classes, and the operations you can use on such values are the operations that work with any of the type classes.
1 You could imagine combining an operation A -> C and an operation B -> D to get an operation (A | B) -> (C | D), much like you can use the tags of sum types to "route" a sum type to the appropriate operation. But it gets murky for fully general union types. If A and B overlap (and overlapping types enter the fray as soon as you've got union types), then which operation do you invoke on a value in the overlapping region? If you can tell whether it's an A or a B then you've really got a sum type rather than a union type, and if you apply some arbitrary resolution strategy like picking the A -> C operation because A was listed earlier in the union type's definition, then things work fine in simple cases but get very confusing if you've got types like (A | B) & (B | A) (where I'm using & to denote intersection).
2 Although the "disjoint types" point is debatable. In types like data Maybe a = Nothing | Just a you could justifiably argue that the Nothing is the "same value" even for different a. If so, then the union of Maybe String and Maybe Integer only contains one Nothing (rather than both the Nothing that is "no string" and the Nothing that is "no integer"). And the intersection of Maybe String and Maybe Integer contains only one value, that being Nothing.
The Whiley programming language supports union and intersection types. If you think of types as sets (i.e. the type int is the set of all valid integers), then union types correspond to set union, whilst intersection types to set intersection.
The classic example in Whiley for union types is representing a "nullable" type, as follows:
null|int indexOf(string str, char c):
for i in 0..|str|:
if str[i] == c:
return i // found a match
// didn't find a match
return null
Here, the type null|int may hold any valid integer, or the special null value. In Whiley, you cannot perform arithmetic on such a type. Therefore, you first have to type test to check for the null value before you can use the returned value. For example, like so:
string replaceFirst(string str, char old, char new):
idx = indexOf(str,old)
if idx is int:
str[idx] = new
// return potentially updated string
return str
I've written some posts on union types in Whiley here and here. Intersection types are similar, although not currently very well supported in the compiler.
A type can be seen as a set of values. For example, if Boolean is the set of the values true and false, saying that some value has type Boolean means it is either the value true or false.
Note that some types, like String, can have infinitely many possible values.
As you probably know, union and intersection are set operations, and hence they apply to types as well. For example, when one has types T1 = {male, female} and T2 = {not-applicable} one can build the type T3 = T1 \union T2 = {male, female, not-applicable}. An example where this type would be useful would be answers to the question: "What is the gender of your first born child?" Since some people don't have any children, they could answer: not-applicable.

Resources