I am trying to move form DynamoDB to DynamoDB2 to use tables with global secondary indices. I need to create a table and then batch-write items into it. Here's a block of tets code:
from boto.dynamodb2.fields import HashKey, RangeKey, GlobalAllIndex
from boto.dynamodb2.layer1 import DynamoDBConnection
from boto.dynamodb2.table import Table
from boto.dynamodb2.items import Item
import boto
conn = DynamoDBConnection(aws_access_key_id=<MYID>,aws_secret_access_key=<MYKEY>)
tables = conn.list_tables()
table_name = 'myTable001'
if table_name not in tables['TableNames']:
Table.create(table_name, schema=[HashKey('firstKey')], throughput={'read': 5, 'write': 2}, global_indexes=[
GlobalAllIndex('secondKeyIndex', parts=[HashKey('secondKey')], throughput={'read': 5, 'write': 3})], connection=conn)
table = Table(table_name, connection=conn)
with table.batch_write() as batch:
batch.put_item(data={'firstKey': 'fk01', 'secondKey':'sk001', 'message': '{"firstKey":"fk01", "secondKey":"sk001", "comments": "fk01-sk001"}'})
# ...
batch.put_item(data={'firstKey': 'fk74', 'secondKey':'sk112', 'message': '{"firstKey":"fk74", "secondKey":"sk012", "comments": "fk74-sk012"}'})
When I run this code for the 1st time with a new value of table_name, I get the following error at the last line of the block:
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
When I run it one more time, it get executed fine. I suspect that the reason is simply that the table is still being created when I run it for the first time. How do I check for table's status in DDB2? In DDB I used table.status but this does not seem to be available in DDB2. What should I use instead?
UPDATE: Based on final response here, the right way to extract table status is:
tdescr = conn.describe_table(tName)
print "%s" % ((tdescr['Table'])['TableStatus'])
Here are the other elements of the description dictionary:
for key in tdescr['Table'].keys():
print key
GlobalSecondaryIndexes
AttributeDefinitions
ProvisionedThroughput
TableSizeBytes
TableName
TableStatus
KeySchema
ItemCount
CreationDateTime
You can use conn.describe_table('table') to fetch the details about table & then check for the TableStatus field in the returned output.
Related
This is my first time using boto3 to query items from my DynamoDB and I can't figure out how to grab a certain value.
My table has a primary key of "Company" and a sort key of "DailyPrice".
I looked at the boto3 docs and used the example they had and I'm able to return all of the information related to AAPL by searching for that key value.
Here's my python script
import boto3
client = boto3.client('dynamodb')
response = client.query(
ExpressionAttributeValues={
':AAPL': {
'S': 'AAPL',
},
},
KeyConditionExpression='Company = :AAPL',
TableName='stock_tracker',
)
number_of_days = response['Count']
items = response['Items']
print(items)
Here's the response
{'Items': [
{'Company': {'S': 'AAPL'}, 'DailyPrice': {'S': '142.56'}},
{'Company': {'S': 'AAPL'}, 'DailyPrice': {'S': '154.51'}},
{'Company': {'S': 'AAPL'}, 'DailyPrice': {'S': '156.77'}}],
'Count': 3,
'ScannedCount': 3,}
I basically want to grab the daily price of every item for AAPL, because I want to add them all up in a separate python script. I'm not sure how I can grab the daily price specifically using my DynamoDB query
Your life will be easier with boto3.resource than boto3.client because you don't need all that 'S' type stuff around.
Here's a repo with sample code:
https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithQueries/query_equals.py
Then just loop over the returned values in Python.
I'm writing a function which takes in raw data table (contains multijson telemetry data) and reformat it to a multiple cols. I use .set MyTable <| myfunction|limit 0 to create my target table based off of the function and use update policy to alert my target table.
Here is the code :
.set-or-append MyTargetTable <|
myfunction
| limit 0
.alter table MyTargetTable policy update
#'[{ "IsEnabled": true, "Source": "raw", "Query": "myfunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
But I'm getting ingestion failures: Here is the ingestion failure message :
Failed to invoke update policy. Target Table = 'MyTargetTable', Query = '
let raw = __table("raw", 'All', 'AllButRowStore')
| where extent_id() in (guid(659e3b3c-6859-426d-9c37-003623834455));
myfunction()': Query schema does not match table schema
I double check the query schema and target table; they are the same . I'm not sure what this error means.
Also, I ran count on both the raw and mytarget tables; there are relatively large discrepancies (400 rows for My target and 2000 rows in raw table).
Any advise will be appreciated.
Generally speaking - to find the root of the mismatch between schemas, you can run something along the following lines, and filter for differences:
myfunction
| getschema
| join kind=leftouter (
table('MyTargetTable')
| getschema
) on ColumnOrdinal, ColumnType
In addition - you should make sure the output schema of the function you use in your update policy is 'stable', i.e. isn't affected by the input data
The output schema of some query plugins such as pivot() and bag_unpack() depends on the input data, and therefore it isn't recommended to use those in update policies.
I have a table as : AdvgCountries which has two columns
a. CountryId (String) (Parition Key)
b. CountryName(String) Sort Key
While creating the table , I created with only Partition Key and then later added a Global Secondary Index with Index name as:
CountryName-index
Type : GSI
Partition key : CountryId
Sort Key : CountryName
I am able to retrieve CountryName based upon CountryId but unable to retrieve CountryId based upon CountryName. Based upon my reading I found that there are options to do this by providing indexname but I get the following error:
botocore.exceptions.ClientError: An error occurred
(ValidationException) when calling the Query operation: Query
condition missed key schema element: CountryId
import boto3
import json
import os
from boto3.dynamodb.conditions import Key, Attr
def query_bycountryname(pCountryname, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="https://dynamodb.us-east-1.amazonaws.com")
table = dynamodb.Table('AdvgCountires')
print(f"table")
attributes = table.query(
IndexName="CountryName-index",
KeyConditionExpression=Key('CountryName').eq(pCountryname),
)
if 'Items' in attributes and len(attributes['Items']) == 1:
attributes = attributes['Items'][0]
print(f"before return")
return attributes
if __name__ == '__main__':
CountryName = "India"
print(f"Data for {CountryName}")
countries = query_bycountryname(CountryName)
for country in countries:
print(country['CountryId'], ":", country['CountryName'])
Any help is appreciated.
You can't be able to fetch primary key value based on sort key. DynamoDB does not work like this.
In Dynamodb, each item’s location is determined by the hash value of
its partition key.
The Query operation in Amazon DynamoDB finds items based on primary
key values.
KeyConditionExpression are used to write conditional statements by
using comparison operators that evaluate against a key and limit the
items returned. In other words, you can use special operators to
include, exclude, and match items by their sort key values.
Please forgive my ignorance on sqlalchemy, up until this point I've been able to navigate the seas just fine. What I'm looking to do is this:
Return a count of how many items are in the table.
Return a count of many times different statuses appear in the table.
I'm currently using sqlalchemy, but even a pure sqlite solution would be beneficial in figuring out what I'm missing.
Here is how my table is configured:
class KbStatus(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String, nullable=False)
It's a very basic table but I'm having a hard time getting back the data I'm looking for. I have this working with 2 separate queries, but I have to believe there is a way to do this all in one query.
Here are the separate queries I'm running:
total = len(cls.query.all())
status_count = cls.query.with_entities(KbStatus.status, func.count(KbStatus.id).label("total")).group_by(KbStatus.status).all()
From here I'm converting it to a dict and combining it to make the output look like so:
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
Any help is greatly appreciated.
I don't know about sqlalchemy, but it's possible to generate the results you want in a single query with pure sqlite using the JSON1 extension:
Given the following table and data:
CREATE TABLE data(id INTEGER PRIMARY KEY, status TEXT);
INSERT INTO data(status) VALUES ('Assigned'),('In Progress'),('Peer Review'),('Ready to Publish')
,('Unassigned'),('Unassigned'),('Unassigned'),('Unassigned');
CREATE INDEX data_idx_status ON data(status);
this query
WITH individuals AS (SELECT status, count(status) AS total FROM data GROUP BY status)
SELECT json_object('data'
, json_object('status_count'
, json_group_object(status, total)
, 'total_requests'
, (SELECT sum(total) FROM individuals)))
FROM individuals;
will return one row holding (After running through a JSON pretty printer; the actual string is more compact):
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
If the sqlite instance you're using wasn't built with support for JSON1:
SELECT status, count(status) AS total FROM data GROUP BY status;
will give
status total
-------------------- ----------
Assigned 1
In Progress 1
Peer Review 1
Ready to Publish 1
Unassigned 4
which you can iterate through in python, inserting each row into your dict and adding up all total values in another variable as you go to get the total_requests value at the end. No need for another query just to calculate that number; do it manually. I bet it's really easy to do the same thing with your existing second sqlachemy query.
I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.
The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps
Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)