Where is Update Policy failure logged - azure-data-explorer

I had a table with an update policy applied like so:
.create table Foo (
data: dynamic
)
.create function ParseFoo () {
Foo
| project
a = tostring(data.a),
b = tostring(data.b)
}
.create table Bar (
a: string,
b: string
)
.alter table Bar policy update
```
[{
"IsEnabled": true,
"Source": "Foo",
"Query": "ParseFoo",
"IsTransactional": false,
"PropagateIngestionProperties": false
}]
```
Someone1 changed the definition of ParseFoo to extract another column:
.alter function ParseFoo () {
Foo
| project
a = tostring(data.a),
b = tostring(data.b),
c = tostring(data.c)
}
The difference in schema prevented the update policy from applying and data ingestion was stopped. I was able to figure out the mismatch and correct it, but I would like to proactively monitor for this in future.
From a very cursory glance, I don't see any errors related to the failed ingestion to this table logged in any of the places I've thought to check so far
.show journal
ADXCommand table
1. (me)

From the link the this Q&A: https://learn.microsoft.com/en-us/azure/kusto/management/updatepolicy#failures
.show ingestion failures
| where TableName == 'Bar'
| project-reorder Details
Failed to invoke update policy. Target Table = 'Bar', Query = 'let Foo =
__table("Foo", 'All', 'AllButRowStore')
| where extent_id() in (guid(29f13c11-e6cf-472a-a1cd-91af4cfb2c44));
ParseFoo': Query schema does not match table schema

Related

Data ingestion issue with KQL update policy ; Query schema does not match table schema

I'm writing a function which takes in raw data table (contains multijson telemetry data) and reformat it to a multiple cols. I use .set MyTable <| myfunction|limit 0 to create my target table based off of the function and use update policy to alert my target table.
Here is the code :
.set-or-append MyTargetTable <|
myfunction
| limit 0
.alter table MyTargetTable policy update
#'[{ "IsEnabled": true, "Source": "raw", "Query": "myfunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
But I'm getting ingestion failures: Here is the ingestion failure message :
Failed to invoke update policy. Target Table = 'MyTargetTable', Query = '
let raw = __table("raw", 'All', 'AllButRowStore')
| where extent_id() in (guid(659e3b3c-6859-426d-9c37-003623834455));
myfunction()': Query schema does not match table schema
I double check the query schema and target table; they are the same . I'm not sure what this error means.
Also, I ran count on both the raw and mytarget tables; there are relatively large discrepancies (400 rows for My target and 2000 rows in raw table).
Any advise will be appreciated.
Generally speaking - to find the root of the mismatch between schemas, you can run something along the following lines, and filter for differences:
myfunction
| getschema
| join kind=leftouter (
table('MyTargetTable')
| getschema
) on ColumnOrdinal, ColumnType
In addition - you should make sure the output schema of the function you use in your update policy is 'stable', i.e. isn't affected by the input data
The output schema of some query plugins such as pivot() and bag_unpack() depends on the input data, and therefore it isn't recommended to use those in update policies.

Return the content of a specific object in an array — CosmosDB

This is a follow up to question 56126817
My current query
SELECT c.EventType.EndDeviceEventDetail FROM c
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND c.EventType.EndDeviceEventType.subdomain = '137'
AND c.EventType.EndDeviceEventType.domain = '26'
AND c.EventType.EndDeviceEventType.type = '3'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
My Query Output
[
{
"EndDeviceEventDetail": [
{
"name": "Spontaneous",
"value": "true"
},
{
"name": "DetectionActive",
"value": "true"
},
{
"name": "RCDSwitchReleased",
"value": "true"
}
]
}
]
Question
How could change my query so that I select only the "value" of the array that contains the "name" "DetectionActive" ?
The idea behind is to filter the query on one array entry and get as output the "value" of another array entry. From reading here, UDF (not the best in this case) and JOIN should be used.
First attempt
SELECT t.value FROM c JOIN t in c.EventType.EndDeviceEventDetail
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND c.EventType.EndDeviceEventType.subdomain = '137'
AND c.EventType.EndDeviceEventType.domain = '26'
AND c.EventType.EndDeviceEventType.type = '3'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
Gets Bad Request (400) error
Your idea and direction is right absolutely, I simplified and tested your sql.
SELECT detail.value FROM c
join detail in c.EventType.EndDeviceEventDetail
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
Found the error message as below:
It because that the value is the reserved word in cosmos db sql syntax,please refer to this case:Using reserved word field name in DocumentDB
You could try to modify the sql like:
SELECT detail["value"] FROM c

Delete large data with same partition key from DynamoDB

I have DynamoDB table structured like this
A B C D
1 id1 foo hi
1 id2 var hello
A is the partition key and B is the sort key.
Let' say I only have the partition key and don't know the sort key and I'd like to delete all entries have the same partition key.
So I am thinking about loading entries by query with a fixed size (e.g 1000) and delete them in a batch until there are no more entries with the partition key left in DynamoDB.
Is it possible to delete entries without loading them first?
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteItem.html
DeleteItem
Deletes a single item in a table by primary key.
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
In order to delete an item you must provide the whole primary key (partition + sort key). So in your case you would need to query on the partition key, get all of the primary keys, then use those to delete each item. You can also use BatchWriteItem
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
BatchWriteItem
The BatchWriteItem operation puts or deletes multiple items in one or
more tables. A single call to BatchWriteItem can write up to 16 MB of
data, which can comprise as many as 25 put or delete requests.
Individual items to be written can be as large as 400 KB.
DeleteRequest - Perform a DeleteItem operation on the specified item. The item to be deleted is identified by a Key subelement: Key -
A map of primary key attribute values that uniquely identify the item.
Each entry in this map consists of an attribute name and an attribute
value. For each primary key, you must provide all of the key
attributes. For example, with a simple primary key, you only need to
provide a value for the partition key. For a composite primary key,
you must provide values for both the partition key and the sort key.
No, but you can Query all the items for the partition, and then issue an individual DeleteRequest for each item, which you can batch in multiple BatchWrite calls of up to 25 items.
JS code
async function deleteItems(tableName, partitionId ) {
const queryParams = {
TableName: tableName,
KeyConditionExpression: 'partitionId = :partitionId',
ExpressionAttributeValues: { ':partitionId': partitionId } ,
};
const queryResults = await docClient.query(queryParams).promise()
if (queryResults.Items && queryResults.Items.length > 0) {
const batchCalls = chunks(queryResults.Items, 25).map( async (chunk) => {
const deleteRequests = chunk.map( item => {
return {
DeleteRequest : {
Key : {
'partitionId' : item.partitionId,
'sortId' : item.sortId,
}
}
}
})
const batchWriteParams = {
RequestItems : {
[tableName] : deleteRequests
}
}
await docClient.batchWrite(batchWriteParams).promise()
})
await Promise.all(batchCalls)
}
}
// https://stackoverflow.com/a/37826698/3221253
function chunks(inputArray, perChunk) {
return inputArray.reduce((all,one,i) => {
const ch = Math.floor(i/perChunk);
all[ch] = [].concat((all[ch]||[]),one);
return all
}, [])
}
For production databases and critical Amazon DynamoDB tables, recommendation is to use batch-write-item to purge huge data.
batch-write-item (with DeleteRequest) is 10 to 15 times faster than delete-item.
aws dynamodb scan --table-name "test_table_name" --projection-expression "primary_key, timestamp" --filter-expression "timestamp < :oldest_date" --expression-attribute-values '{":oldest_date":{"S":"2020-02-01"}}' --max-items 25 --total-segments "$TOTAL_SEGMENT" --segment "$SEGMENT_NUMBER" > $SCAN_OUTPUT_FILE
cat $SCAN_OUTPUT_FILE | jq -r ".Items[] | tojson" | awk '{ print "{\"DeleteRequest\": {\"Key\": " $0 " }}," }' | sed '$ s/.$//' | sed '1 i { "test_table_name": [' | sed '$ a ] }' > $INPUT_FILE
aws dynamodb batch-write-item --request-items file://$INPUT_FILE
Please find more information # https://medium.com/analytics-vidhya/how-to-delete-huge-data-from-dynamodb-table-f3be586c011c
You can use "begins_with" on the range key.
For example (pseudo code)
DELETE WHERE A = '1' AND B BEGINS_WITH 'id'

Query DynamoDB with a hash key and a range key with Boto3

I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.
The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps
Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)

How to check status of table in DynamoDB2?

I am trying to move form DynamoDB to DynamoDB2 to use tables with global secondary indices. I need to create a table and then batch-write items into it. Here's a block of tets code:
from boto.dynamodb2.fields import HashKey, RangeKey, GlobalAllIndex
from boto.dynamodb2.layer1 import DynamoDBConnection
from boto.dynamodb2.table import Table
from boto.dynamodb2.items import Item
import boto
conn = DynamoDBConnection(aws_access_key_id=<MYID>,aws_secret_access_key=<MYKEY>)
tables = conn.list_tables()
table_name = 'myTable001'
if table_name not in tables['TableNames']:
Table.create(table_name, schema=[HashKey('firstKey')], throughput={'read': 5, 'write': 2}, global_indexes=[
GlobalAllIndex('secondKeyIndex', parts=[HashKey('secondKey')], throughput={'read': 5, 'write': 3})], connection=conn)
table = Table(table_name, connection=conn)
with table.batch_write() as batch:
batch.put_item(data={'firstKey': 'fk01', 'secondKey':'sk001', 'message': '{"firstKey":"fk01", "secondKey":"sk001", "comments": "fk01-sk001"}'})
# ...
batch.put_item(data={'firstKey': 'fk74', 'secondKey':'sk112', 'message': '{"firstKey":"fk74", "secondKey":"sk012", "comments": "fk74-sk012"}'})
When I run this code for the 1st time with a new value of table_name, I get the following error at the last line of the block:
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
When I run it one more time, it get executed fine. I suspect that the reason is simply that the table is still being created when I run it for the first time. How do I check for table's status in DDB2? In DDB I used table.status but this does not seem to be available in DDB2. What should I use instead?
UPDATE: Based on final response here, the right way to extract table status is:
tdescr = conn.describe_table(tName)
print "%s" % ((tdescr['Table'])['TableStatus'])
Here are the other elements of the description dictionary:
for key in tdescr['Table'].keys():
print key
GlobalSecondaryIndexes
AttributeDefinitions
ProvisionedThroughput
TableSizeBytes
TableName
TableStatus
KeySchema
ItemCount
CreationDateTime
You can use conn.describe_table('table') to fetch the details about table & then check for the TableStatus field in the returned output.

Resources