I have the following seed file entry:
shapes = Shape.create([
{
id: 1,
name: 'Rectangle',
surface_count: 4,
created_at: Time.now,
updated_at: Time.now
},
{
id: 3,
name: 'H-shape',
surface_count: 12,
created_at: Time.now,
updated_at: Time.now
}
])
When I seed my SQLite database I end up with two rows that have ids one and two, not one and three. This is just a sample. The actual table is much larger. I am trying to get my test environment to match my production environment where the second row was just deleted and the remaining rows were left as-is.
Sqlite allows you to override an autoincrement simply by including the field in your insert statement.
From your description, it seems likely that your actual insert statement is not including id and your id is defined as autoincrement. If you are using some kind of framework to serialize, you'll need to see how it is defined there and how it treats autoincrement fields. Tracing the actual SQL statement should tell you.
Related
CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.
Please forgive my ignorance on sqlalchemy, up until this point I've been able to navigate the seas just fine. What I'm looking to do is this:
Return a count of how many items are in the table.
Return a count of many times different statuses appear in the table.
I'm currently using sqlalchemy, but even a pure sqlite solution would be beneficial in figuring out what I'm missing.
Here is how my table is configured:
class KbStatus(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String, nullable=False)
It's a very basic table but I'm having a hard time getting back the data I'm looking for. I have this working with 2 separate queries, but I have to believe there is a way to do this all in one query.
Here are the separate queries I'm running:
total = len(cls.query.all())
status_count = cls.query.with_entities(KbStatus.status, func.count(KbStatus.id).label("total")).group_by(KbStatus.status).all()
From here I'm converting it to a dict and combining it to make the output look like so:
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
Any help is greatly appreciated.
I don't know about sqlalchemy, but it's possible to generate the results you want in a single query with pure sqlite using the JSON1 extension:
Given the following table and data:
CREATE TABLE data(id INTEGER PRIMARY KEY, status TEXT);
INSERT INTO data(status) VALUES ('Assigned'),('In Progress'),('Peer Review'),('Ready to Publish')
,('Unassigned'),('Unassigned'),('Unassigned'),('Unassigned');
CREATE INDEX data_idx_status ON data(status);
this query
WITH individuals AS (SELECT status, count(status) AS total FROM data GROUP BY status)
SELECT json_object('data'
, json_object('status_count'
, json_group_object(status, total)
, 'total_requests'
, (SELECT sum(total) FROM individuals)))
FROM individuals;
will return one row holding (After running through a JSON pretty printer; the actual string is more compact):
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
If the sqlite instance you're using wasn't built with support for JSON1:
SELECT status, count(status) AS total FROM data GROUP BY status;
will give
status total
-------------------- ----------
Assigned 1
In Progress 1
Peer Review 1
Ready to Publish 1
Unassigned 4
which you can iterate through in python, inserting each row into your dict and adding up all total values in another variable as you go to get the total_requests value at the end. No need for another query just to calculate that number; do it manually. I bet it's really easy to do the same thing with your existing second sqlachemy query.
I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.
Assume I have a MySQL table like this:
CREATE TABLE foo (
id CHAR(64) NOT NULL,
scheduledDelivery DATETIME NOT NULL,
-- ...other columns...
PRIMARY KEY(id),
INDEX schedIndex (scheduledDelivery)
);
Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):
SELECT *
FROM foo
WHERE scheduledDelivery <= NOW()
ORDER BY scheduledDelivery ASC
LIMIT 100;
That is: Take the 100 oldest items that are due to be delivered.
With DynamoDB I can use the id column as primary partition key.
However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:
the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless
I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?
It's not possible to avoid a full table scan with this kind of query.
However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).
You must first create a GSI. Let's name it scheduled_delivery-index.
We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.
fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.
GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.
You would query the table like this:
var now = Date.now();
//...
{
TableName: "foo",
IndexName: "scheduled_delivery-index",
ExpressionAttributeNames: {
"#f": "fixed_value",
"#d": "scheduled_delivery"
},
ExpressionAttributeValues: {
":f": 1,
":d": now
},
KeyConditionExpression: "#f = :f and #d <= :d",
ScanIndexForward: true
}
Let's say that I am storing records with following structure in DynamoDB:
{
"id": "57cf5b43-f9ec-4796-9de6-6a50f556cfd8",
"created_at": "2015-09-18T13:27:00+12:00",
"count": 3
}
Now, is it possible to achieve the following in one request:
if the record with given id doesn't exist it should be created with count = 1
if the record for that id exists the counter is being updated.
Currently I'm doing a query to check if the record exist and depending on the result I do a put or an update. It would be nice to fold that into a single operation.
What I didn't mention in my question was that I wanted the count go up for subsequent events without modifying the created_at.
My final working UpdateInput looks like that:
{
Key: {
id: {
S: "some_unique_id"
}
},
TableName: "test",
ExpressionAttributeNames: {
#t: "created_at",
#c: "count"
},
ExpressionAttributeValues: {
:t: {
S: "2015-09-26T15:58:57+12:00"
},
:c: {
N: "1"
}
},
UpdateExpression: "SET #t = if_not_exists(#t, :t) ADD #c :c"
}
You can do this with UpdateItem API and the UpdateExpression because of your use case. Since count will be a Number type here, you can use the SET or ADD expressions:
The documentation for ADD tells you that you can use it for Number types (emphasis mine):
ADD - Adds the specified value to the item, if the attribute does not already exist. If the attribute does exist, then the behavior of ADD depends on the data type of the attribute:
If the existing attribute is a number, and if Value is also a number, then Value is mathematically added to the existing attribute. If Value is a negative number, then it is subtracted from the existing attribute.
If you use ADD to increment or decrement a number value for an item that doesn't exist before the update, DynamoDB uses 0 as the initial value. Similarly, if you use ADD for an existing item to increment or decrement an attribute value that doesn't exist before the update, DynamoDB uses 0 as the initial value. For example, suppose that the item you want to update doesn't have an attribute named itemcount, but you decide to ADD the number 3 to this attribute anyway. DynamoDB will create the itemcount attribute, set its initial value to 0, and finally add 3 to it. The result will be a new itemcount attribute in the item, with a value of 3.
For your example, you could have your UpdateExpression be ADD #c :n, where :n has an ExpressionAttributeValue of the Number type, 1 is the value, and #c has the ExpressionAttributeName substitution for count. You need to use a placeholder for count because it is a reserved word.
See more examples on the Modifying Items and Attributes with Update Expressions
I am trying to move form DynamoDB to DynamoDB2 to use tables with global secondary indices. I need to create a table and then batch-write items into it. Here's a block of tets code:
from boto.dynamodb2.fields import HashKey, RangeKey, GlobalAllIndex
from boto.dynamodb2.layer1 import DynamoDBConnection
from boto.dynamodb2.table import Table
from boto.dynamodb2.items import Item
import boto
conn = DynamoDBConnection(aws_access_key_id=<MYID>,aws_secret_access_key=<MYKEY>)
tables = conn.list_tables()
table_name = 'myTable001'
if table_name not in tables['TableNames']:
Table.create(table_name, schema=[HashKey('firstKey')], throughput={'read': 5, 'write': 2}, global_indexes=[
GlobalAllIndex('secondKeyIndex', parts=[HashKey('secondKey')], throughput={'read': 5, 'write': 3})], connection=conn)
table = Table(table_name, connection=conn)
with table.batch_write() as batch:
batch.put_item(data={'firstKey': 'fk01', 'secondKey':'sk001', 'message': '{"firstKey":"fk01", "secondKey":"sk001", "comments": "fk01-sk001"}'})
# ...
batch.put_item(data={'firstKey': 'fk74', 'secondKey':'sk112', 'message': '{"firstKey":"fk74", "secondKey":"sk012", "comments": "fk74-sk012"}'})
When I run this code for the 1st time with a new value of table_name, I get the following error at the last line of the block:
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
When I run it one more time, it get executed fine. I suspect that the reason is simply that the table is still being created when I run it for the first time. How do I check for table's status in DDB2? In DDB I used table.status but this does not seem to be available in DDB2. What should I use instead?
UPDATE: Based on final response here, the right way to extract table status is:
tdescr = conn.describe_table(tName)
print "%s" % ((tdescr['Table'])['TableStatus'])
Here are the other elements of the description dictionary:
for key in tdescr['Table'].keys():
print key
GlobalSecondaryIndexes
AttributeDefinitions
ProvisionedThroughput
TableSizeBytes
TableName
TableStatus
KeySchema
ItemCount
CreationDateTime
You can use conn.describe_table('table') to fetch the details about table & then check for the TableStatus field in the returned output.