Bulk Insert and Returning IDs using sqlite

Bulk Insert and Returning IDs using sqlite - sqlite

I understand that sqlite doesn't support RETURNING at least that is what sqlAlchemy is telling me:
sqlalchemy.exc.CompileError: RETURNING is not supported by this dialect's statement compiler.
I get this error when using sqlAlchemy's Core library. Here is a code example:
from sqlalchemy.engine.url import URL
from sqlalchemy import create_engine, MetaData
from sqlalchemy import Table, Column, Integer, String
engine = create_engine('sqlite:///:memory:', echo=False)
# create table
meta = MetaData(engine)
table = Table('userinfo', meta,
Column('id', Integer, primary_key=True),
Column('first_name', String),
Column('age', Integer),
)
meta.create_all()
# generate rows
data = [{'first_name': f'Name {i}', 'age': 18+i} for i in range(10)]
# this seems to work on PostgreSQL only
stmt = table.insert().values(data).returning(table.c.id)
for rowid in engine.execute(stmt).fetchall():
print(rowid['id'])
Now, when I use similar code with sqlAlchemy's ORM library the IDs are returned. Here is the source code:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
from sqlalchemy import ForeignKey
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.orm import relationship
Base = declarative_base()
class UserInfo(Base):
__tablename__ = "userinfo"
id = Column(Integer, primary_key=True)
first_name = Column(String)
age = Column(Integer)
engine = create_engine('sqlite:///:memory:', echo=False)
Base.metadata.create_all(engine)
session = scoped_session(sessionmaker(bind=engine))
data = [dict(first_name=f'Name {i}', age=18+1) for i in range(10)]
session.bulk_insert_mappings(UserInfo, data, return_defaults=True)
session.commit()
print([s['id'] for s in data])
How come that this is working while the Core one is not? When I look at the generated sql I don't see.

After some digging I found this link
In this document the use of bulk_insert_mappings is just Batched INSERT statements via the ORM "bulk", using dictionaries.. When setting return_defaults=True I assume sqlalchemy is repeatedly calling upon last row id. Hence the IDs are available.

Related

Invalid KeyConditionExpression in boto3 dynamodb query

This is how my table looks like:
toHash is my primary partition key and timestamp is the sort key.
So, when I am executing this code [For getting the reverse sorted list of timestamp]:
import boto3
from boto3.dynamodb.conditions import Key, Attr
client = boto3.client('dynamodb')
response = client.query(
TableName='logs',
Limit=1,
ScanIndexForward=False,
KeyConditionExpression="toHash = :X",
)
I get the following error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: An expression attribute value used in expression is not defined; attribute value: :X
Am I doing something wrong here? Why isn't X considered a valid attribute value?

This is why I don't use boto directly anymore. The API for what should be simple stuff like this is nuts. #notionquest is correct, you need to add the ExpressionAttributeValues. I don't mess with these things anymore. I use dynamof. Heres an example with your query:
from boto3 import client
from dynamof import db as make_db
from dynamof import query
client = client('dynamodb', endpoint_url='http://localstack:4569')
db = make_db(client)
db(query(
table_name='logs',
conditions=attr('toHash').equals('somevalue')
))
dynamof build the KeyConditionExpression and ExpressionAttributeValues for you.
disclaimer: I wrote dynamof

Add ExpressionAttributeValues as mentioned below:-
response = client.query(
TableName='logs',
Limit=1,
ScanIndexForward=False,
KeyConditionExpression="toHash = :X",
ExpressionAttributeValues={":X" : {"S" : "somevalue"}}
)

Query DynamoDB with a hash key and a range key with Boto3

I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.

The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps

Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)

Migrate Sqlalchemy schema from mssql to sqlite db?

I have all my table classes written for mssql but now I want to test my application locally so I need sqlitedb.Is there a way through which I can Replicate my database in sqlite.
I am facing some issues like sqlite does not support Float as a Primary key.I have more than 200 tables I can not go and edit all just for testing.I can have all the tables in one metadata.
My idea is to use sqlite just for testing and for production I will still be using mssql.
Note I changed Float to Integer but still my tables are not created instead it just creates a empty db.
My code
for table in metadata.tables:
keys_to_change = []
for pkey_column in metadata.tables[table].primary_key.columns.keys():
keys_to_change.append(pkey_column)
for data in list(metadata.tables[table].foreign_keys):
keys_to_change.append(data.column.name)
for column in metadata.tables[table].columns:
if column.name in keys_to_change:
if str(column.type) == 'FLOAT':
column.type = INTEGER
engine = create_engine('sqlite:///mytest.db', echo=True, echo_pool=True)
metadata.create_all(engine)

If you are able to change the model code, I would suggest to create an alias to the Float and use it to define those primary_key and ForeignKey columns, which you could just change for your sqlite testing:
# CONFIGURATION
PKType = Float # default: MSSQL; or Float(N, M)
# PKType = Integer # uncomment this for sqlite
and your model becomes like below:
class MyParent(Base):
__tablename__ = 'my_parent'
id = Column(PKType, primary_key=True)
name = Column(String)
children = relationship('MyChild', backref='parent')
class MyChild(Base):
__tablename__ = 'my_child'
id = Column(PKType, primary_key=True)
parent_id = Column(PKType, ForeignKey('my_parent.id'))
name = Column(String)
Alternatively, if you would like to be only changing the engine and not another configuration variable, you can use dialect-specific custom type handling:
import sqlalchemy.types as types
class PKType(types.TypeDecorator):
impl = Float
def load_dialect_impl(self, dialect):
if dialect.name == 'sqlite':
return dialect.type_descriptor(Integer())
else:
return dialect.type_descriptor(Float())

How to use sqlalchemy to select data from a database?

I have two sqlalchemy scripts, one that creates a database and a few tables and another that selects data from them.
create_database.py
from sqlalchemy import create_engine, Table, Column, Integer, String, MetaData, ForeignKey, select
engine = create_engine('sqlite:///test.db', echo=True)
metadata = MetaData()
addresses = Table ('addresses', metadata,
Column('id', Integer, primary_key=True),
Column('user_id', None, ForeignKey('users.id')),
Column('email_addresses', String, nullable=False)
)
users = Table ('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('fullname', String),
)
metadata.create_all(engine)
select.py
from sqlalchemy import create_engine, select
engine = create_engine('sqlite:///test.db', echo=True)
conn = engine.connect()
s = select([users])
result = conn.execute(s)
I am able to run the create_database.py script but when I run the select.py script I get the following error
$ python select.py
Traceback (most recent call last):
File "select.py", line 5, in <module>
s = select([users])
I am able to run the select statement from within the create_database.py by appending the following to create_database.py
conn = engine.connect()
s = select([users])
result = conn.execute(s)
How can I run the select statements from a separate script than create_database.py

The script select.py does not see users and addresses defined in create_database.py. Import them in select.py before using them.
In select.py:
from create_database import users, addresses
## Do something with users and addresses

ProgrammingError Thread error in SQLAlchemy

I have a two simple tables in a sqlite db.
from sqlalchemy import MetaData, Table, Column, Integer, ForeignKey, \
create_engine, String
from sqlalchemy.orm import mapper, relationship, sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('sqlite:///dir_graph.sqlite', echo=True)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
session = Session()
Base = declarative_base()
class NodeType(Base):
__tablename__ = 'nodetype'
id = Column(Integer, primary_key=True)
name = Column(String(20), unique=True)
nodes = relationship('Node', backref='nodetype')
def __init__(self, name):
self.name = name
def __repr__(self):
return "Nodetype: %s" % self.name
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
name = Column(String(20), unique=True)
type_id = Column(Integer,
ForeignKey('nodetype.id'))
def __init__(self, _name, _type_id):
self.name = _name
self.type_id = _type_id
Base.metadata.create_all(engine)
After the run I interact with the interpreter. e.g. n1= Node('Node1',1) to learn about sqlalchemy. After I did a session.commit() and try another statement e.g. n2 = Node('n2',1) I get this error:
sqlalchemy.exc.ProgrammingError: (ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 3932 and this is thread id 5740 None None.
How can I continue a session after I did a commit ?
tnx

SQLite by default prohibits the usage of a single connection in more than one thread.
just add connect_args={'check_same_thread': False} parameter to your engine variable like
engine = create_engine('sqlite:///dir_graph.sqlite', connect_args={'check_same_thread': False}, echo=True)
According to sqlite3.connect:
By default, check_same_thread is True and only the creating thread may
use the connection. If set False, the returned connection may be
shared across multiple threads. When using multiple threads with the
same connection writing operations should be serialized by the user to
avoid data corruption.