I'm using the Cosmos Data migration tool to migrate data between environments. During my migration, I need to update the hostname of the website in the data. I was able to do this pretty easily with a query like this with the top level object data:
SELECT Farms["name"], Farms["farmerInfo"], REPLACE(Farms["websiteLink"], "thiswebsite", "newHostName") AS websiteLink FROM Farms
My Cosmos DB data is structured like (data is just for the example):
{
"name": "Red's Farm",
"websiteLink": "www.thiswebsite.com/goats/",
"farmerInfo": {
"name": "Bob",
"websiteLink": "www.thiswebsite.com/goats/",
"hasGoats": true,
"numGoats": 17
}
}
I don't actually need to modify any of the top level data. The data I need to modify is within the "farmerInfo" object. I've tried a few things but I've had no luck. How can I replace a string in this object using the SQL api?
I want the data to look like this after the migration:
{
"name": "Red's Farm",
"websiteLink": "www.thiswebsite.com/goats/",
"farmerInfo": {
"name": "Bob",
"websiteLink": "www.newHostName.com/goats/", <--- Updated data
"hasGoats": true,
"numGoats": 17
}
}
You can use a SELECT statement in your SELECT statement to build up the sub objects. As example:
SELECT
c.name,
c.websiteLink,
(
SELECT
c.farmerInfo.name,
REPLACE(c.farmerInfo.websiteLink, "thiswebsite", "newHostName") AS websiteLink
) AS farmerInfo
FROM c
Related
Databrew recipes can be written under JSON for transformations that will be used more than once for multiple datasets.
This is an example that i copied from Databrew Developer Guide to do joins between datasets:
`
{
"Action": {
"Operation": "JOIN",
"Parameters": {
"joinKeys": "[{\"key\":\"assembly_session\",\"value\":\"assembly_session\"},{\"key\":\"state_code\",\"value\":\"state_code\"}]",
"joinType": "INNER_JOIN",
"leftColumns": "[\"year\",\"assembly_session\",\"state_code\",\"state_name\",\"all_votes\",\"yes_votes\",\"no_votes\",\"abstain\",\"idealpoint_estimate\",\"affinityscore_usa\",\"affinityscore_russia\",\"affinityscore_china\",\"affinityscore_india\",\"affinityscore_brazil\",\"affinityscore_israel\"]",
"rightColumns": "[\"assembly_session\",\"vote_id\",\"resolution\",\"state_code\",\"state_name\",\"member\",\"vote\"]",
"secondInputLocation": "s3://databrew-public-datasets-us-east-1/votes.csv",
"secondaryDatasetName": "votes"
}
}
}
`
It's possible to select all columns with a * within "leftColumns" or anything close to that?
I've tried to add only * but it doesn't work.
I will do the same transformations in multiple tables and this functionality would work great if i could select everything on a left join, without needing to specify all the columns.
I ingest raw telemetry data as JSON records into a single-column table called RawEvents, in the column called Event. This is what a record/event looks like:
{
"clientId": "myclient1",
"IotHubDeviceId": "myiothubdevice1",
"deviceId": "mydevice1",
"timestamp": "2022-04-12T10:29:00.123",
"telemetry": [
{
"telemetryId: "total.power"
"value": 123.456
},
{
"telemetryId: "temperature"
"value": 34.56
},
...
]
}
The RawEvents table is created and set up like this:
.create table RawEvents (Event: dynamic)
.create table RawEvents ingestion json mapping 'MyRawEventMapping' '[{"column":"Event","Properties":{"path":"$"}}]'
There is also the Telemetry table that will be used for queries and analysis. The Telemetry table has the strongly-typed columns that match raw data record structure from the RawEvents table. It gets created like this:
.create table Telemetry (ClientId:string, IotHubDeviceId:string, DeviceId:string, Timestamp:datetime, TelemetryId:string, Value: real)
In order to get Telemetry table updated with records whenever a new raw event gets ingested into RawEvents, I have tried to define a data transformation function and to use that function inside an update policy which would be attached to the Telemetry table.
To that end, I have used the following script to verify that my data transformation logic works as expected:
datatable (event:dynamic)
[
dynamic(
{
"clientId": "myclient1",
"IotHubDeviceId": "myiothubdevice1",
"deviceId": "mydevice1",
"timestamp": "2022-04-12T10:29:00.123",
"telemetry": [
{
"telemetryId": "total.power",
"value": 123.456
},
{
"telemetryId": "temperature",
"value": 34.56
}
]
}
)
]
| evaluate bag_unpack(event)
| mv-expand telemetry
| evaluate bag_unpack(telemetry)
Executing that script gives me the desired output which matches the Telemetry table structure:
clientId deviceId IotHubDeviceId timestamp telemetryId value
myclient1 mydevice1 myiothubdevice1 2022-04-12T10:29:00.123Z total.power 123.456
myclient1 mydevice1 myiothubdevice1 2022-04-12T10:29:00.123Z temperature 34.56
Next, I have created a function called ExpandTelemetryEvent which contains that same data transformation logic applied to RawEvents.Event:
.create function ExpandTelemetryEvent() {
RawEvents
| evaluate bag_unpack(Event)
| mv-expand telemetry
| evaluate bag_unpack(telemetry)
}
And as a final step, I have tried to create an update policy for the Telemetry table which would use RawEvents as a source and ExpandTelemetryEvent() as the transformation function:
.alter table Telemetry policy update #'[{"Source": "RawEvents", "Query": "ExpandTelemetryEvent()", "IsEnabled": "True"}]'
This is where I got the error message saying
Error during execution of a policy operation: Caught exception while validating query for Update Policy: 'IsEnabled = 'True', Source = 'RawEvents', Query = 'ExpandTelemetryEvent()', IsTransactional = 'False', PropagateIngestionProperties = 'False''. Exception: Request is invalid and cannot be processed: Semantic error: SEM0100: 'mvexpand' operator: Failed to resolve scalar expression named 'telemetry'
I sort of understand why the policy cannot be applied. With the sample script, the data transformation worked because there was enough information to infer what the telemetry is, whereas in this case there is nothing in the RawEvents.Event which would provide the information about the structure of the raw events which will be stored in the Event column.
How can this be solved? Is this the right approach at all?
As the bag_unpack plugin documentation indicates:
The plugin's output schema depends on the data values, making it as "unpredictable" as the data itself. Multiple executions of the plugin, using different data inputs, may produce different output schema.
Use well-defined transformation instead
RawEvent
| project clientId = event.clientId, deviceId = event.deviceId, IotHubDeviceId = event.IotHubDeviceId, timestamp = event.timestamp, event.telemetry
| mv-expand event_telemetry
| extend telemetryId = event_telemetry.telemetryId, value = event_telemetry.value
| project-away event_telemetry
I have a table with PK (String) and SK (Integer) - e.g.
PK_id SK_version Data
-------------------------------------------------------
c3d4cfc8-8985-4e5... 1 First version
c3d4cfc8-8985-4e5... 2 Second version
I can do a conditional insert to ensure we don't overwrite the PK/SK pair using ConditionalExpression (in the GoLang SDK):
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version)"),
}
However I would also like to ensure that the SK_version is always consecutive but don't know how to write the expression. In pseudo-code this is:
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version) **AND attribute_exists(SK_version = :SK_prev_version)**"),
}
Can someone advise how I can write this?
in SQL I'd do something like:
INSERT INTO example_table (PK_id, SK_version, Data)
SELECT {pk}, {sk}, {data}
WHERE NOT EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk}
)
AND EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk} - 1
)
Thanks
A conditional check is applied to a single item. It cannot be spanned across multiple items. In other words, you simply need multiple conditional checks. DynamoDb has transactWriteItems API which performs multiple conditional checks, along with writes/deletes. The code below is in nodejs.
const previousVersionCheck = {
TableName: 'example_table',
Key: {
PK_id: 'prev_pk_id',
SK_version: 'prev_sk_version'
},
ConditionExpression: 'attribute_exists(PK_id)'
}
const newVersionPut = {
TableName: 'example_table',
Item: {
// your item data
},
ConditionExpression: 'attribute_not_exists(PK_id)'
}
await documentClient.transactWrite({
TransactItems: [
{ ConditionCheck: previousVersionCheck },
{ Put: newVersionPut }
]
}).promise()
The transaction has 2 operations: one is a validation against the previous version, and the other is an conditional write. Any of their conditional checks fails, the transaction fails.
You are hitting your head on some of the differences between a SQL and a no-SQL database. DynamoDB is, of course, a no-SQL database. It does not, out of the box, support optimistic locking. I see two straight forward options:
Use a software layer to give you locking on your DynamoDB table. This may or may not be feasible depending on how often updates are made to your table. How fast 'versions' are generated and the maximum time your application can be gated on the lock will likely tell you if this can work foryou. I am not familiar with Go, but the Java API supports this. Again, this isn't a built-in feature of DynamoDB. If there is no such Go API equivalent, you could use the technique described in the link to 'lock' the table for updates. Generally speaking, locking a no-SQL DB isn't a typical pattern as it isn't exactly what it was created to do (part of which is achieving large scale on unstructured documents to allow fast access to many consumers at once)
Stop using an incrementor to guarantee uniqueness. Typically, incrementors are frowned upon in DynamoDB, in part due to the lack of intrinsic support for it and in part because of how DynamoDB shards you don't want a lot of similarity between records. Using a UUID will solve the uniqueness problem, but if you are porting an existing application that means more changes to the elements that create that ID and updates to reading the ID (perhaps to include a creation-time field so you can tell which is the newest, or the prepending or appending of an epoch time to the UUID to do the same). Here is a pertinent link to a SO question explaining on why to use UUIDs instead of incrementing integers.
Based on Hung Tran's answer, here is a Go example:
checkItem := dynamodb.TransactWriteItem{
ConditionCheck: &dynamodb.ConditionCheck{
TableName: "example_table",
ConditionExpression: aws.String("attribute_exists(pk_id) AND attribute_exists(version)"),
Key: map[string]*dynamodb.AttributeValue{"pk_id": {S: id}, "version": {N: prevVer}},
},
}
putItem := dynamodb.TransactWriteItem{
Put: &dynamodb.Put{
TableName: "example_table",
ConditionExpression: aws.String("attribute_not_exists(pk_id) AND attribute_not_exists(version)"),
Item: data,
},
}
writeItems := []*dynamodb.TransactWriteItem{&checkItem, &putItem}
_, _ = db.TransactWriteItems(&dynamodb.TransactWriteItemsInput{TransactItems: writeItems})
attribute itemJson stored as follow
"itemJson": {
"S": "{\"sold\":\"3\",\"listingTime\":\"20210107211621\",\"listCountry\":\"US\",\"sellerCountry\":\"US\",\"currentPrice\":\"44.86\",\"updateTime\":\"20210302092220\",\"itemLocation\":\"Miami,FL,USA\",\"listType\":\"FixedPrice\",\"categoryName\":\"Machines\",\"itemID\":\"293945109477\",\"sellerID\":\"holiday_for_you\",\"s3Key\":\"US/2021/2/FixedPrice/293945109477.json\",\"visitCount\":\"171\",\"createTime\":\"20210201233158\",\"listingStatus\":\"Completed\",\"endTime\":\"2021-02-28T20:22:57\",\"currencyID\":\"USD\"}"
},
i want to query with filter:contains(itemJson, "sold":"0") with java sdk,i tried those syntax,all fail
expressionValues.put(":v2", AttributeValue.builder().s("\\\"sold\\\":\\\"0\\\"").build());
expressionValues.put(":v2", AttributeValue.builder().s("sold:0"").build());
what is the right way to my filter syntax?
I try #Balu Vyamajala's syntax on the dynamodb web console as follow,did not get the solution yet
contains (itemJson, :subValue) with value of "sold\":\"3\"" seems to be working.
Working example on a Query Api and worked as expected:
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("pk = :v_pk")
.withFilterExpression("contains (itemJson, :subValue)")
.withValueMap(new ValueMap().withString(":v_pk", "6").withString(":subValue", "sold\":\"3\""));
and to test from Aws console we just need to enter "sold":"2"
I'm new to DynamoDB and I'm trying to query a table from javascript using the Dynamoose library. I have a table with a primary partition key of type String called "id" which is basically a long string with a user id. I have a second column in the table called "attributes" which is a DynamoDB map and is used to store arbitrary user attributes (I can't change the schema as this is how a predefined persistence adapter works and I'm stuck working with it for convenience).
This is an example of a record in the table:
Item{2}
attributes Map{2}
10 Number: 2
11 Number: 4
12 Number: 6
13 Number: 8
id String: YVVVNIL5CB5WXITFTV3JFUBO2IP2C33BY
The numeric fields, such as the "12" field, in the Map can be interpreted as "week10", "week11","week12" and "week13" and the numeric values 2,4,6 and 8 are the number of times the application was launched that week.
What I need to do is get all user ids of the records that have more than 4 launches in a specific week (eg week 12) and I also need to get the list of user ids with a sum of 20 launches in a range of four weeks (eg. from week 10 to 13).
With Dynamoose I have to use the following model:
dynamoose.model(
DYNAMO_DB_TABLE_NAME,
{id: String, attributes: Map},
{useDocumentTypes: true, saveUnknown: true}
);
(to match the table structure generated by the persistence adapter I'm using).
I assume I will need to do DynamoDB "scan" to achieve this rather than a "query" and I tried this to get started and get a records where week 12 equals 6 to no avail (I get an empty set as result):
const filter = {
FilterExpression: 'contains(#attributes, :val)',
ExpressionAttributeNames: {
'#attributes': 'attributes',
},
ExpressionAttributeValues: {
':val': {'12': 6},
},
};
model.scan(filter).all().exec(function (err, result, lastKey) {
console.log('query result: '+ JSON.stringify(result));
});
If you don't know Dynamoose but can help with solving this via the AWS SDK tu run a DynamoDB scan directly that might also be helpful for me.
Thanks!!
Try the following.
const filter = {
FilterExpression: '#attributes.#12 = :val',
ExpressionAttributeNames: {
'#attributes': 'attributes',
'#12': '12'
},
ExpressionAttributeValues: {
':val': 6,
},
};
Sounds like what you are really trying to do is filter the items where attributes.12 = 6. Which is what the query above will do.
Contains can't be used for objects or arrays.