May a Cassandra (CQL 3) map hold null values? I thought null values were permitted, but failure of my program suggests otherwise. Or is there a bug in the driver I am using?
The official documentation for CQL maps says:
A map is a typed set of key-value pairs, where keys are unique. Furthermore, note that the map are internally sorted by their keys and will thus always be returned in that order.
So the keys may not be null (otherwise sorting would be impossible), but there is no mention of a requirement that map values are not null.
I have a field that is a map<timestamp,uuid>, which I am trying to write to using values in a Java Map< Date, UUID >. One of the map values (UUIDs) is null. This seems to cause a NPE in the Cassandra client code (Cassandra version 1.2.6, called from DataStax Java driver 1.0.1) when marshalling the UUID of the map:
java.lang.NullPointerException
at org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:82)
at org.apache.cassandra.cql.jdbc.JdbcUUID.decompose(JdbcUUID.java:55)
at org.apache.cassandra.db.marshal.UUIDType.decompose(UUIDType.java:187)
at org.apache.cassandra.db.marshal.UUIDType.decompose(UUIDType.java:43)
at org.apache.cassandra.db.marshal.MapType.decompose(MapType.java:122)
at org.apache.cassandra.db.marshal.MapType.decompose(MapType.java:29)
at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:188)
at [my method]
The UUIDGen.decompose(UUID) method has no special handling of a null UUID, hence the NPE. Contrast with JdbcBoolean.decompose(Boolean), which decomposes a null Boolean to an empty byte-buffer. Similarly, JdbcDate.decompose(Date) decomposes a null Date to an empty byte-buffer.
I can produce a similar problem if I have a map holding null integers (using a Java Map< Date, Integer > with a null value, for a Cassandra map<timestamp,int>), so this problem is not restricted to uuid values.
You are right, null values are not (yet?) supported inside Maps. I faced this thing before and like you couldn't find relative documentation -- In similar situation I help myself with cqlsh
A small test give you the answer
CREATE TABLE map_example (
id text,
m map<text, text>,
PRIMARY KEY ((id))
)
try
insert into map_example (id, m ) VALUES ( 'a', {'key':null});
> Bad Request: null is not supported inside collections
HTH, Carlo
Related
Using Delphi 10.2, SQLite and Teecharts. My SQLite database has two fields, created with:
CREATE TABLE HistoryRuntime ('DayTime' DateTime, Device1 INTEGER DEFAULT (0));
I access the table using a TFDQuery called qryGrpahRuntime with the following SQL:
SELECT DayTime AS TheDate, Sum(Device1) As DeviceTotal
FROM HistoryRuntime
WHERE (DayTime >= "2017-06-01") and (DayTime <= "2017-06-26")
Group by Date(DayTime)
Using the Field Editor in the Delphi IDE, I can add two persistent fields, getting TheDate as a TDateTimeField and DeviceTotal as a TLargeIntField.
I run this query in a program to create a TeeChart, which I created at design time. As long as the query returns some records, all this works. However, if there are no records for the requested dates, I get an EDatabaseError exception with the message:
qryGrpahRuntime: Type mismatch for field 'DeviceTotal', expecting: LargeInt actual: Widestring
I have done plenty of searching for solutions on the web on how to prevent this error on an empty query, but have had not luck with anything I found. From what I can tell, SQLite defaults to the wide string field when no data is returned. I have tried using CAST in the query and it did not seem to make any difference.
If I remove the persistent fields, the query will open without problems on an empty return set. However, in order to use the TeeChart editor in the IDE, it appears I need persistent fields.
Is there a way I can make this work with persistent fields, or am I going to have to throw out the persistent fields and then add the TeeChart Series at runtime?
This behavior is described in Adjusting FireDAC Mapping chapter of the FireDAC's SQLite manual:
For an expression in a SELECT list, SQLite avoids type name
information. When the result set is not empty, FireDAC uses the value
data types from the first record. When empty, FireDAC describes those
columns as dtWideString. To explicitly specify the column data type,
append ::<type name> to the column alias:
SELECT count(*) as "cnt::INT" FROM mytab
So modify your command e.g. this way (I used BIGINT, but you can use any pseudo data type that maps to a 64-bit signed integer data type and is not auto incrementing, which corresponds to your persistent TLargeIntField field):
SELECT
DayTime AS "TheDate",
Sum(Device1) AS "DeviceTotal::BIGINT"
FROM
HistoryRuntime
WHERE
DayTime BETWEEN {d 2017-06-01} AND {d 2017-06-26}
GROUP BY
Date(DayTime)
P.S. I did a small optimization by using BETWEEN operator (which evaluates the column value only once), and used an escape sequence for date constants (which, in real you replace by parameter, I guess; so just for curiosity).
This data type hinting is parsed by the FDSQLiteTypeName2ADDataType procedure that takes and parses column name in format <column name>::<type name> in its AColName parameter.
I a modeling an OLAP cube using Modrian Workbench Schema and using Jaspersoft to present it. The cube is built upon a fact table with FKs to dimension tables.
Currently my fact table has nullable foreign keys to the dimensions, which I personally find interesting (and, as far as I know, it is just s styling decision whether to use nullable or not nullable FKs ( https://dba.stackexchange.com/questions/3512/fact-table-foreign-keys-null ).
The problem is that when selecting ALL States (State is a dimension in my design), I get only the records that have a state, not the records without states (in which the state id is null).
Is Mondrian capable of getting the rows that have not state id information? How can I define that?
I think you'll have to go with non-nullable FKs and a none / n/a / unknown etc. member if you want the ALL member to refer to all facts.
If you later want to write queries that only consider rows with real dimension values, you can exclude the none member again.
I am currently using DynamoDB and having a problem scanning. I am able to get paged results in forward order by using the ExclusiveStartKey. However, regardless of whether I set ScanIndexForward true or false, I get results in forward order from my scan operation. How can i get results in reverse order from a Scan in DynamoDB?
ScanIndexForward is the correct way to get items in descending order by the range key of the table or index you are querying. From the AWS API Reference:
A value that specifies ascending (true) or descending (false)
traversal of the index. DynamoDB returns results reflecting the
requested order determined by the range key. If the data type is
Number, the results are returned in numeric order. For type String,
the results are returned in order of ASCII character code values. For
type Binary, DynamoDB treats each byte of the binary data as unsigned
when it compares binary values.
Based on the docs for Scan, I conclude that there is no way to Scan in reverse. However, I would say that you are not using DynamoDB correctly if you need to do that. When designing a schema for a database like DyanmoDB you should plan the schema based on your expected queries to ensure that almost all application queries have a good index. Scans are meant more for sys admin operations or for feeding into MapReduce or analytics. "A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set." (Query and Scan Performance) That can lead to performance problems and other issues.
Using DynamoDB is fundamentally different from working with a traditional relational database and requires a big change in the way you think about using it. You need to decide whether DynamoDB's advantages of availability in storage and performance, reliability and availability are worth accepting its limitations.
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying
I'm concerned about read performance, I want to know if putting an indexed field value as null is faster than giving it a value.
I have lots of items with a status field. The status can be, "pending", "invalid", "banned", etc...
my typical request is to find the status "ok" (or null). Since null fields are not saved to datastore, it is already a win to avoid to have a "useless" default value I can replace with null. So I already have less disk space use.
But I was wondering, since datastore is noSql, it doesn't know about the data structure and it doesn't know there is a missing column status. So how does it do the status = null request check?
Does it have to check all columns of each row trying to find my column? or is there some smarter mechanism?
For example, index (null=Entity,key) when we pass a column explicitly saying it is null (if this is the case, does Objectify respect that and keep the field in the list when passing it to the native API if it's null?)
And mainly, which request is more efficient?
The low level API (and Objectify) stores and indexes nulls if you specify that a field/property should be indexed. For Objectify, you can specify #Ignore(IfNull.class) or #Unindex(IfNull.class) if you want to alter this behavior. You are probably confusing this with documentation for other data access APIs.
Since GAE only allows you to query for indexed fields, your question is really: Is it better to index nulls and query for them, or to query for everything and filter out non-null values?
This is purely a question of sparsity. If the overwhelming majority of your records contain null values, then you're probably better off querying for everything and filtering out the ones you don't want manually. A handful of extra entity reads are probably cheaper than updating and storing an extra index. On the other hand, if null records are a small percentage of your data, then you will certainly want the index.
This indexing dilema is not unique to GAE. All databases present this question with respect to low-cardinality fields; it's just that they'll do the table scan (testing and skipping rows) for you.
If you really want to fine-tune this behavior, read Objectify's documentation on Partial Indexes.
null is also treated as a value in datastore and there will be entries for null values in indexes. Datastore doc says, "Datastore distinguishes between an entity that does not possess a property and one that possesses the property with a null value"
Datastore will never check all columns or all records. If you have this property indexed, it will get records from the index only If not indexed, you cannot query by that property.
In terms of query performance, it should be the same, but you can always profile and check.
I'm working with SQLite in Flash.
I have this unique index:
CREATE UNIQUE INDEX songsIndex ON songs ( DiscID, Artist, Title )
I have a parametised recursive function set up to insert any new rows (single or multiple).
It works fine if I try to insert a row with the same DiscID, Artist and Title as an existing row - ie it ignores inserting the existing row, and tells me that 0 out of 1 records were updated - GOOD.
However, if, for example the DiscId is blank, but the artist and title are not, a new record is created when there is already one with a blank DiscId and the same artist and title - BAD.
I traced out the disc id prior to the insert, and Flash is telling me it's undefined. So I've coded it to set anything undefined to "" (an empty string) to make sure it's truly an empty string being inserted - but subsequent inserts still ignore the unique index and add a brand new row even though the same row exists.
What am I misunderstanding?
Thanks for your time and help.
SQLite allows NULLable fields to participate in UNIQUE indexes. If you have such an index, and if you add records such that two of the three columns have identical values and the other column is NULL in both records, SQLite will allow that, matching the behavior you're seeing.
Therefore the most likely explanation is that despite your effort to INSERT zero-length strings, you're actually still INSERTing NULLs.
Also, unless you've explicitly included OR IGNORE in your INSERT statements, the expected behavior of SQLite is to throw an error when you attempt to insert a duplicate INDEX value into a UNIQUE INDEX. Since you're not seeing that behavior, I'm guessing that Flash provides some kind of wrapper around SQLite that's hiding the true behavior from you (and could also be translating empty strings to NULL).
Larry's answer is great. To anyone having the same problem here's the SQLite docs citation explaining that in this case all NULLs are treated as different values:
For the purposes of unique indices, all NULL values are considered
different from all other NULL values and are thus unique. This is one
of the two possible interpretations of the SQL-92 standard (the
language in the standard is ambiguous). The interpretation used by
SQLite is the same and is the interpretation followed by PostgreSQL,
MySQL, Firebird, and Oracle. Informix and Microsoft SQL Server follow
the other interpretation of the standard, which is that all NULL
values are equal to one another.
See here: https://www.sqlite.org/lang_createindex.html