How to add a DynamoDB global secondary Index via Python/Boto3 - amazon-dynamodb

Is it possible to add a Global Secondary Index to and existing DynamoDB table AFTER it has been created? I am using Python 3.x with Boto3 and have not been able to find any examples of them being added to the table after it was created.

In general, yes it is possible to add a Global Secondary Index (GSI) after the table is created.
However, it can take a long time for the change to come into effect, because building the GSI requires a table scan.
In the case of boto3, have a look at the documentation for update_table
For example, you try something like this:
response = client.update_table(
TableName = 'YourTableName',
# ...snip...
GlobalSecondaryIndexUpdates=[
{
'Create': {
'IndexName': 'YourGSIName',
'KeySchema': [
{
'AttributeName': 'YourGSIFieldName',
'KeyType': 'HASH'
}
],
'Projection': {
'ProjectionType': 'ALL'
},
'ProvisionedThroughput': {
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}
}
}
],
# ...snip...
)

Related

Get a filtered list from a terraform map of objects?

I have a list of users with characteristics like this and I want to create a local variable that includes the names of the users in the "maker" group.
variable "users" {
type = map(object({
groups = list(string)
}))
default = {
"kevin.mccallister" = {
groups = ["kids", "maker"],
},
"biff" = {
groups = ["kids", "teens", "bully"],
},
}
}
I want to write the local like this, but it complains
Error: Invalid 'for' expression ... Key expression is required when
building an object.
locals {
makers_list = flatten({
for user, attr in var.users: user
if contains(attr.groups, "makers")
})
}
How can I take that map of objects and get out a list of names based on group affiliation?
flatten() is not required for this. Also, the {} is pushing this to build an object. You can instead build a list using [] and then it will create a list of the users filtered by their group association.
makers_list = [
for user, attr in var.users: user
if contains(attr.groups, "makers")
]

Weaviate: using near_text with the exact property doesn't return a distance of 0

Here's a minimal example:
import weaviate
CLASS = "Superhero"
PROP = "superhero_name"
client = weaviate.Client("http://localhost:8080")
class_obj = {
"class": CLASS,
"properties": [
{
"name": PROP,
"dataType": ["string"],
"moduleConfig": {
"text2vec-transformers": {
"vectorizePropertyName": False,
}
},
}
],
"moduleConfig": {
"text2vec-transformers": {
"vectorizeClassName": False
}
}
}
client.schema.delete_all()
client.schema.create_class(class_obj)
batman_id = client.data_object.create({PROP: "Batman"}, CLASS)
by_text = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_text({"concepts": ["Batman"]})
.do()
)
print(by_text)
batman_vector = client.data_object.get(
uuid=batman_id, with_vector=True, class_name=CLASS
)["vector"]
by_vector = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_vector({"vector": batman_vector})
.do()
)
print(by_vector)
Please note that I specified both "vectorizePropertyName": False and "vectorizeClassName": False
The code above returns:
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 0.08034378, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 1.1920929e-07, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
If I look up the exact vector I get 'distance': 1.1920929e-07, which I guess is actually 0 (for some floating point evil magic), as expected.
But if I use near_text to search for the exact property, I get a distance > 0.
This is leading me to believe that, when using near_text, the embedding is somehow different.
My question is:
Why does this happen?
With two corollaries:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Is there a way to check the embedding created during the near_text search?
here is some information that may help:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Yes, this value 1.1920929e-07 should be interpreted as 0. I think there are some unfortunate float32/64 conversions going on that need to be rooted out.
Is there a way to check the embedding created during the near_text search?
The embeddings are either imported or generated during object creation, not at search-time. So performing multiple queries on an unchanged object will utilize the same search vector.
We are looking into both of these issues.

RethinkDB filtering Object Array

I'm new to rethinkdb and I wanted to filter something like... get all with Kiwi or Strawberry as preferred fruit
{
"id": "65dbaa34-f7d5-4a25-b01f-682032fc6e05" ,
"fruits": {
"favorite": "Mango" ,
"preferred": [
"Kiwi" ,
"Watermelon"
]
}
}
I tried something like this after reading contains doc:
r.db('appname').table('food')
.filter(r.row('fruits').contains(function(doc) {
return doc('preferred').contains('Kiwi');
}))
And I'm getting a e: Cannot convert OBJECT to SEQUENCE in: error.
This is what you're looking for:
r.db('appname').table('food')
.filter((row) => {
r.or( // Returns true if any of the following are true
row('fruits')('preferred').contains('Kiwi'),
row('fruits')('preferred').contains('Strawberry')
)
});
You should know as well, that you can create your own index that calculates this for you, then you'd be able to do a .getAll query using your custom index and return all documents that fit this constraint very quickly.
Lastly, for something that would also work but is probably less efficient on large arrays:
r.db("appname").table('food')
.filter((row) => {
return row('fruits')('preferred').setIntersection(['Kiwi', 'Strawberry']).count().gt(0)
})

Intersystems caché - relationale mapping (custom sql storage)

I have more globals in caché db with same data structure. For each global I defined class with SQL storage map, but I need to do it generically for all globals. Is it possible to define one class with sql storage map which will be used for mapping before every SQL query execution? I need to avoid class declaration for each global which I need to be accessible via SQL. I use ODBC for execute SQL statements.
If someone can help me, i will very appreciate it
My globals looks like this:
^glob1("x","y","SL",1) = "Name"
^glob1("x","y","SL",1,"Format") = "myFormat"
^glob1("x","y","SL",1,"Typ") = "my Type"
^glob1("x","y","SL",2) = "Name2"
^glob1("x","y","SL",2,"Format") = "myFormat2"
^glob1("x","y","SL",2,"Typ") = "Type2"
^nextGlob("x","y","SL",1) = "Next Name"
^nextGlob("x","y","SL",1,"Format") = "Next myFormat"
^nextGlob("x","y","SL",1,"Typ") = "my Type"
^another("x","y","SL",13) = "Another Name"
^another("x","y","SL",13,"Format") = "Another myFormat"
^another("x","y","SL",13,"Typ") = "Another Type"
I want to have sql access to globals using one ObjectScript class.
If you needed only read data from Caché by ODBC. So, in ODBC you can use CALL statement. And you can write some SqlProc, which can be called by ODBC.
As I can see, all of your globals with the same structure. If it so, it will be easy. You can put something like this, in your class.
Query Test() As %Query(ROWSPEC = "ID:%String,Global:%String,Name:%String,Typ:%String,Format:%String") [ SqlProc ]
{
}
ClassMethod TestExecute(ByRef qHandle As %Binary) As %Status
{
#; Initial settings
#; List of Globals
set $li(qHandle,1)=$lb("glob1","nextGlob","another")
#; Current Global index
set $li(qHandle,2)=1
#; Current ID in global
set $li(qHandle,3)=""
Quit $$$OK
}
ClassMethod TestClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = TestExecute ]
{
Quit $$$OK
}
ClassMethod TestFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = TestExecute ]
{
set globals=$lg(qHandle,1)
set globalInd=$lg(qHandle,2)
set id=$lg(qHandle,3)
set AtEnd=1
for {
set global=$lg(globals,globalInd)
quit:global=""
set globalData="^"_global
set globalData=$na(#globalData#("x","y","SL"))
set id=$o(#globalData#(id),1,name)
if id'="" {
set AtEnd=0
set typ=$get(#globalData#(id,"Typ"))
set format=$get(#globalData#(id,"Format"))
set Row=$lb(id,global,name,typ,format)
set $li(qHandle,3)=id
quit
} elseif $i(globalInd) {
set id=""
set $li(qHandle,2)=globalInd
}
}
Quit $$$OK
}
And then you can execute statement like this
CALL pkg.classname_test()
And as a result it will be something like on this picture
If all of the globals are the same then you could do this but it is likely that your globals are all different making a single storage map unlikely. Do you already have a data dictionary/meta data system that describes your existing globals? If so I would consider writing a conversion from your existing data dictionary definition to cache classes.

Why does DynamoDB simple deleteItem operation use 2 CapacityUnits?

I have a simple delete operation which goes like this:
{
"TableName":"demo_events",
"Key":{
"category":{"S":"Demo"},
"DynamoID":{"S":"164933868Slt1396454204"}
},
"Expected":{
"category":{
"Exists":true,
"Value"{"S":"Demo"}
}
},
"ReturnConsumedCapacity":"TOTAL",
"ReturnItemCollectionMetrics":"SIZE"}
There is only a single item in database with that ID. The response is this:
{
ConsumedCapacity: {
CapacityUnits: 2,
TableName: 'demo_events'
},
ItemCollectionMetrics: {
ItemCollectionKey: {
category: { S: 'Demo' }
},
SizeEstimateRangeGB: [ 0, 1 ] }
}
Shouldn't this only consume 1 write unit?
Many thanks.
For PutItem, UpdateItem, and DeleteItem, which write only one item, DynamoDB rounds the item size up to the next 1 KB. If you have other attributes in the item in addition to the key attributes, they all together could add up to more than 1 KB.
If there is a Local Secondary Index (LSI) on the table, DeleteItem would also delete the corresponding item from the LSI and item size would contribute to the total Write Capacity Units consumed. DeleteItem response returns an ItemCollectionMetrics when there is a LSI defined for the table. There seems to be a LSI defined for the table based on the sample response
regards

Resources