Scan all records in AWS DynamoDB - amazon-dynamodb

I have a Python program to scan all the records from DynamoDB table, however its not retrieving all the records. I am using LastEvaluatedKey to scan all the records due to 1mb record retrieval limitation. it looks like LastEvaluatedKey is not present in my response. Can someone please help?
import json
import sys
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
table = dynamodb.Table('Your_Table_Name')
queryCount = 1
response = table.scan()
print("Total Records:-", response['ScannedCount'])
#Extract the Results
items = response['Items']
for item in items:
print(item)
queryCount = queryCount + 1
while 'LastEvaluatedKey' in response:
print('1---------')
key = response['LastEvaluatedKey']
response = table.scan(ExclusiveStartKey=key)
items = response['Items']
for item in items:
queryCount = queryCount + 1
print("2---------")

Related

How to load joined table in SQLAlchemy where joined table does not provide foreign key(relationship) in ORM

I have tables like below
import sqlalchemy as sa
class A(Base):
id = sa.Column(sa.Integer)
name = sa.Column(sa.String)
class B(Base):
id = sa.Column(sa.Integer)
a_id = sa.Column(sa.Integer)
and has query:
# Basic query
query = sa.select(B).join(A, A.id == B.a_id)
result = await session.execute(query)
results = result.scalars().all()
How should I change to get desired result?
query = sa.select(B).join(A, A.id == B.a_id)
result = session.execute(query)
results = result.scalars().all()
# Problem
# SOME_KEY should be indicated in query as loading column
# SOME_KEY's type should be A class
# I want below thing
results[0].SOME_KEY.name # it should give joined `A` entity's property value
I have read documentation, have seen loading techniques, but could not find solution , it is mostly for relations.
Arbitrary query with multiple objects per result
with Session(engine) as session:
for (b, a) in session.execute(select(B, A).join(A, B.a_id == B.id)).all():
print (b, a)
Relationship without ForeignKey
from sqlalchemy.orm import Session, declarative_base, aliased, relationship, remote, foreign
class A(Base):
__tablename__ = 'a_table'
id = Column(Integer, primary_key=True)
name = Column(String)
b_list = relationship('B', primaryjoin="remote(A.id) == foreign(B.a_id)", back_populates='a')
class B(Base):
__tablename__ = 'b_table'
id = Column(Integer, primary_key=True)
a_id = Column(Integer)
a = relationship('A', primaryjoin="remote(A.id) == foreign(B.a_id)", back_populates='b_list')
with Session(engine) as session:
for (b,) in session.execute(select(B).join(B.a)).all():
print (b, b.a_id, b.a, b.a.id, b in b.a.b_list)

DynamoDB primary key for date range and user id

I'm still trying to wrap my head around primary key selection in DynamoDB. My current structure is the following, where userId is HASH and sort is RANGE.
userId
sort
event
1
2021-01-18#u2d3-f3d5-s22d-3f52
...
1
2021-01-08#f1d3-s30x-s22d-w2d3
...
2
2021-02-21#s2d2-u2d3-230s-3f52
...
2
2021-02-13#w2d3-e5d5-w2d3-3f52
...
1
2021-01-19#f2d4-f3d5-s22d-3f52
...
1
2020-12-13#f3d5-e5d5-s22d-w2d3
...
2
2020-11-11#e5d5-u2d3-s22d-0j32
...
What I want to achieve is to query all events for a particular user between date A and date B. I have tested a few of solutions that all work, like
Figure out a closest common begins_with for the range I want. If date A is 2019-02-01 and date B is 2021-01-03, then it would be userId = 1 and begins_with (sort, 20), which would return everything from the twenty-first century.
Loop through all months between date A and date B and do a bunch of small queries like userId = 1 and begins_with (sort, 2021-01), then concat the results afterwards.
They all work but have their drawbacks. I'm also a bit unsure of when I'm just complicating things to the point where a scan might actually be worth it instead. Being able to use between would of course be the best option, but I need to put the unique #guid at the end of the range key in order to make each primary key unique.
Am I approaching this the wrong way?
I created a little demo app to show how this works.
You can just use the between condition, because it uses byte-order to implement the between condition. The idea is that you use the regular starting date A and convert it to a string as the beginning of the range. Then you add a day to your end, convert it to string and use that as the end.
The script creates this table (it will look different when you run it):
PK | SK
------------------------------------------------------
demo | 2021-02-26#a4d0f5f3-588a-49d9-8eaa-a3e2f9436ade
demo | 2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
demo | 2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
demo | 2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
demo | 2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
demo | 2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
demo | 2021-03-04#ee30b1ce-3910-4a59-9e62-09f051b0dc72
demo | 2021-03-05#f0e2405f-6ce9-4fcb-a798-394f7a2f9490
demo | 2021-03-06#bcf76e07-7582-4fe3-8ffd-14f450e60120
demo | 2021-03-07#58d01231-a58d-4c23-b1ed-e525ba102b80
And when I run this function to select the items between two given dates, it returns the result below:
def select_in_date_range(pk: str, start: datetime, end: datetime):
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start = start.isoformat()[:10]
end = (end + timedelta(days=1)).isoformat()[:10]
print(f"Requesting all items starting at {start} and ending before {end}")
result = table.query(
KeyConditionExpression=\
conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
)
print("Got these items")
for item in result["Items"]:
print(f"PK={item['PK']}, SK={item['SK']}")
Requesting all items starting at 2021-02-27 and ending before 2021-03-04
Got these items
PK=demo, SK=2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
PK=demo, SK=2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
PK=demo, SK=2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
PK=demo, SK=2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
PK=demo, SK=2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
Full script to try it yourself.
import uuid
from datetime import datetime, timedelta
import boto3
import boto3.dynamodb.conditions as conditions
TABLE_NAME = "sorting-test"
def create_table():
ddb = boto3.client("dynamodb")
ddb.create_table(
AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}, {"AttributeName": "SK", "AttributeType": "S"}],
TableName=TABLE_NAME,
KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}, {"AttributeName": "SK", "KeyType": "RANGE"}],
BillingMode="PAY_PER_REQUEST"
)
def create_sample_data():
pk = "demo"
amount_of_events = 10
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start_date = datetime.now()
increment = timedelta(days=1)
print("PK | SK")
print("------------------------------------------------------")
for i in range(amount_of_events):
date = start_date.isoformat()[:10]
unique_id = str(uuid.uuid4())
sk = f"{date}#{unique_id}"
print(f"{pk} | {sk}")
start_date += increment
table.put_item(Item={"PK": pk, "SK": sk})
def select_in_date_range(pk: str, start: datetime, end: datetime):
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start = start.isoformat()[:10]
end = (end + timedelta(days=1)).isoformat()[:10]
print(f"Requesting all items starting at {start} and ending before {end}")
result = table.query(
KeyConditionExpression=\
conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
)
print("Got these items")
for item in result["Items"]:
print(f"PK={item['PK']}, SK={item['SK']}")
def main():
pass
# create_table()
# create_sample_data()
start = datetime.now() + timedelta(days=1)
end = datetime.now() + timedelta(days=5)
select_in_date_range("demo",start, end)
if __name__ == "__main__":
main()

Different behavior of QSqlQueryModel for refresing data after adding and update record

I've got the QTableView with the menu for adding and editing records. The model is prepared with QSqlQueryModel, as it shows also the associated data (sums of amount) for the records.
After doing the actions I want to refresh the table.
I don't understand why for the edit action it is enough to do the model.query().exec_() to see the update, but for the new action, I need to additionally do model.setQuery(model.query()) to see newly inserted rows.
def build_model(self):
self.model = QSqlQueryModel()
self.model.setQuery("SELECT b.id, b.name, SUM(coalesce(s.amount, 0.00)) as amount\
FROM budget AS b\
LEFT OUTER JOIN transaction_split as s ON s.id_budget = b.id\
GROUP BY b.id\
ORDER BY name")
self.table.setModel(self.model)
def act_new(self):
dlg = BudgetEd()
dlg.dialog.exec()
self.model.query().exec_()
self.model.setQuery(self.model.query()) # Why I need to this? to refresh the view
def act_ed(self):
# ... retrieve id_
dlg = BudgetEd(id_)
dlg.dialog.exec()
self.model.query().exec_() # or why this works without setting the query in the model again?
Finally, I decided not to use QSqlQueryModel(), as it was conflicting with sqlalchemy queries on Windows (windows locks files for reading).
I implemented my own TableModel(QAbstractTableModel) as it was a really easy task:
class TableModel(QAbstractTableModel):
sql: text
def __init__(self):
super().__init__()
self._data = []
self.sql = None
def set_sql(self, sql: str):
self.sql = text(sql)
def load_data(self):
sess: session = Session()
self._data = sess.execute(self.sql).fetchall()
self.layoutChanged.emit()
def set_data(self, data):
self._data = data
def data(self, index: PySide2.QtCore.QModelIndex, role: int = ...) -> typing.Any:
if role == Qt.DisplayRole:
value = self._data[index.row()][index.column()]
return value
if role == Qt.UserRole:
return self._data[index.row()][index.column()]
if role == Qt.TextAlignmentRole:
value = self._data[index.row()][index.column()]
if isinstance(value, int) or isinstance(value, float):
return int(Qt.AlignRight | Qt.AlignVCenter)
def rowCount(self, parent: PySide2.QtCore.QModelIndex = ...) -> int:
return len(self._data)
def columnCount(self, parent: PySide2.QtCore.QModelIndex = ...) -> int:
return len(self._data[0])
# ...
def act_new(self):
dlg = AssetEd()
dlg.dialog.exec()
self.model.load_data()

Which column should I pick up as secondary index in this dynamodb table?

I have a dynamodb table with the following attributes:
PurchaseOrderNumber (partition key)
CustomerID
PurchaseDate
TotalPurchaseValue is what my application must retrieve items from the table to calculate the total value of purchases for a particular customer over a date range. What secondary index should I add to the table?
Thank you.
You can create a Global Secondary Index where the partition key will be CustomerID and sort key will be PurchaseDate. By this, you can perform query operations on a particular customer by CustomerID within a date range of PurchaseDate. You can try out the query code below in AWS lambda IDE :)
import json
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
dynamodb = boto3.resource("dynamodb")
table_name = "your_table_name"
table = dynamodb.Table(table_name)
customer_id = 1
start_date = "2019-12-10"
end_date = "2019-12-16"
response = table.query(
KeyConditionExpression=Key('CustomerID').eq(customer_id) &
Key('PurchaseDate').between(start_date, end_date)
)
return response

sqlalchemy with sqlite to_sql not creating table nor database

Am new to sqlalchemy. When I run this code there is NO database. I want it to create the database, add the table defined and the data. Reading the documentation for to_sql this code should create the table if it doesn't exist ( it doesn't), when I run it it throws an error that the table has no column num 1 ??? AND does NOT create the database. What am I doing wrong please?
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
date_stuff = [ (20171219, 13.71,28), (20171319, 144.71,33), (20171919, 99.99,99)]
labels = ['date', 'num 1' , 'num 2']
dev_env = "/home/test/Desktop/mtest/hvdata/"
db_name = "tinydatabase.db"
def new_sql_add ( todays_data ):
todays_data.to_sql(name='mcm_trends', con = db ,if_exists='append')
if __name__ == '__main__' :
db_path = dev_env + db_name
db = create_engine('sqlite:///db_path')
df_for_sql = pd.DataFrame.from_records( date_stuff , columns = labels)
new_sql_add ( df_for_sql )

Resources