I have creead an AES key via the folloowing Azure CLI:
az keyvault key create --hsm-name myhsmkeyvault --name myaeskey2 --ops encrypt decrypt --tags --kty oct-HSM --size 128
Now, I have this expectation, if I show the key, then I see the n value (key byte) of this AES key. but the n value is null :
{
"attributes": {
"created": "2022-08-30T10:21:38+00:00",
"enabled": true,
"expires": null,
"exportable": false,
"notBefore": null,
"recoverableDays": 7,
"recoveryLevel": "CustomizedRecoverable",
"updated": "2022-08-30T10:21:38+00:00"
},
"key": {
"crv": null,
"d": null,
"dp": null,
"dq": null,
"e": null,
"k": null,
"keyOps": [
"encrypt",
"decrypt"
],
"kid": "https://myhsm.managedhsm.azure.net/keys/myaeskey2/88adxxxxx2865",
"kty": "oct-HSM",
"n": null,
"p": null,
"q": null,
"qi": null,
"t": null,
"x": null,
"y": null
},
"managed": null,
"releasePolicy": null,
"tags": null
}
Did I miss something here?
The Get Key operation only returns the public portions of keys.
From the Get Key documentation:
If the requested key is symmetric, then no key material is released in the response.
The idea behind Key Vault is that keys don't leave the vault. Instead you're intended to send ENCRYPT/DECRYPT requests to have data encrypted/decrypted by the service.
From the "Key Operations" section of the "Key types, algorithms, and operations" documentation:
Key Vault doesn't support EXPORT operations. Once a key is provisioned in the system, it cannot be extracted or its key material modified.
An alternative approach is generating AES key locally, and then using the Key Vault WRAP/UNWRAP operations to encrypt the key before storing/transmitting it.
If you're using Python, see the documentation for wrap_key() and unwrap_key().
Example usage would be something like:
from azure.keyvault.keys.crypto import KeyWrapAlgorithm
# Create CryptographyClient instance.
key_id = "https://<your vault>.vault.azure.net/keys/<key name>/fe4fdcab688c479a9aa80f01ffeac26"
crypto_client = CryptographyClient(key_id, credential)
# Generate a 32-byte (256-bit) AES key.
key_bytes = secrets.token_bytes(32)
# Wrap the AES key.
wrap_result = client.wrap_key(KeyWrapAlgorithm.rsa_oaep, key_bytes)
encrypted_key = wrap_result.encrypted_key
# At this point, you can safely store/transmit the encrypted_key for later
# usage. In addition, using Key Vault in this manner allows enforcing
# access controls, logging usage, etc.
# When you want to use the key, you have to unwrap it.
unwrap_result = client.unwrap_key(KeyWrapAlgorithm.rsa_oaep, encrypted_key)
key = unwrap_result.key
I'm not super familiar with PySpark, but you'll probably want to convert the key to a BINARY literal format before using it in an expression. This sidesteps any potential issues with implicit crosscasting of STRING to BINARY.
import binascii
key_hex = f"X'{binascii.hexlify(key)}'"
Also note that the key property contains a JSON Web Key (JWK). The n property within the JWK is for an RSA modulus, while the k field would contain the symmetric key, if it were populated.
Related
How do I query a dynamodb with both dataset_id and an image_name. Using the code below:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('table_name')
response = table.query(
IndexName='dataset_id',
KeyConditionExpression='dataset_id = :value AND begins_with (image_name, :name)',
ExpressionAttributeValues={
':value': str(dataset_id),
':name': {'S', 'a'}
},
Limit=int(results_per_page)
This is my dynamodb GSIs.
dymamodb GSIs
What I'm I doing wrong here?
I am expecting the dynamodb response to return images that start with 'a'.
First of all I assume your GSI is called dataset_id and its partition key is dataset_id and its sort key is image_name, if this assumption is false then your use-case is not valid.
Now to the issue I see, you are using Resource client which uses native JSON not DDB-JSON, so your query should look like this:
response = table.query(
IndexName='dataset_id',
KeyConditionExpression='dataset_id = :value AND begins_with (image_name, :name)',
ExpressionAttributeValues={
':value': str(dataset_id),
':name': 'a'
},
Limit=int(results_per_page))
I have a DynamoDB table, I need to find the records which are between the given date range.
So here is my table structure
{
"Id":"String",
"Name":"String",
"CrawledAt":"String"
}
In this table partition key as Id and CrawledAt fileds used. And also created local secondary index with CrawledAt field and it's name "CrawledAt-index"
When querying most of the articles using Id with CreatedAt. But in my case I don't know what is the Id, I only need to retrieve records for a particular date range.
Here is the code I have tried
request = {
"TableName": "sflnd00001-test",
"IndexName": "CrawledAt-index",
"ConsistentRead": False,
"ProjectionExpression": "Name",
"KeyConditionExpression":
"CrawledAt between :v_start and :v_end",
"ExpressionAttributeValues": {
":v_start": {"S": "2020-01-31T00:00:00.000Z"},
":v_end": {"S": "2025-11-31T00:00:00.000Z"} }
}
response = table.query(**request)
It's returning this error
"An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: BETWEEN, operand type: M",
Can someone please tell me how to find data set with the given date range without providing primary key
You cannot do a between or any other function on a partition key, you must always provide the entire key.
For your use-case your GSI partition key should be a single value, and the crawledAt should be the sort key.
{
"Id":"String",
"Name":"String",
"CrawledAt":"String",
"GsiPk": "Number"
}
.
"KeyConditionExpression":
"GsiPk = 1 AND CrawledAt between :v_start and :v_end"
This would then allow you to retrieve all the data in the table between two dates. But be aware of the caveat of doing this, using a single value for a GSIPK is not scalable, and would cap the write requests to approx 1000WCU.
If you need more scale you can assign a random number to the GSIPK (n) to increase the number of partitions which would then require you to make (n) queries to collect all the data.
Alternatively you can Scan the table and use FilterExpression which is also not a scalable solution:
aws dynamodb scan \
--table-name MusicCollection \
--filter-expression "timestamp between :a and :b" \
--expression-attribute-names file://expression-attribute-names.json \
--expression-attribute-values file://expression-attribute-values.json
CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.
I'm currently trying to create a dynamic query using AppSync and Apache Velocity Template Language (VTL).
I want to evaluate series of begins_with with "OR"
Such as:
{
"operation": "Query",
"query": {
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1)",
"expressionValues": {
":pk": { "S": "tenant:${context.args.tenantId}",
":sk": {"S": "my-sort-key-${context.args.evidenceId[0]}"},
":sk1": {"S": "my-sort-key-${context.args.evidenceId[1]}"}
}
}
But that isn't working. I've also tried using | instead of or but it hasn't worked either. I get:
Invalid KeyConditionExpression: Syntax error; token: "|", near: ") | begins_with" (Service: AmazonDynamoDBv2;
How can I achieve this using VTL?
Original answer
you're missing a closing parenthesis after the begins_with(sk, :sk1). That is, the third line should be:
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1))"
I just ran the fixed expression and it worked as expected.
Revised
Actually, there are subtleties.
the or operator can be used in filter-expression but not in key-condition-expressions. For instance, a = :v1 and (b = :v2 or b = :v3) will work as long as a and b are "regular" attributes. If a and b are the table's primary key (partition key, sort key) then DDB will reject the query.
Reading this answer seems that this isn't possible, as DynamoDB only accepts a single Sort key value and a single operation.
There's also no "OR" condition in the operation:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-KeyConditionExpression
If you also want to provide a condition for the sort key, it must be combined using AND with the condition for the sort key. Following is an example, using the = comparison operator for the sort key:
I am going to be restructuring the access pattern to better match my request.
Here is the sample JSON which we are planning to insert into DynamoDB table. As of now we are having organizationID as primary partition key and __id__ as sort key. Since we will query based on organizationID we kept it as primary partition key. Is it a good approach to keep __id__ as sort key.
{
"__class__": "package",
"__updated__": "2015-10-19T14:30:13Z",
"__created__": "2015-10-19T12:32:28Z",
"transactions": [
{
transaction1
},
{
transaction2
}
],
"carrier": "USPS",
"organizationID": "6406fa6fd32393908125d4d81ec358",
"barcode": "9400110891302408",
"queryString": [
"xxxxxxx",
"YYYY",
"delivered",
],
"deliveredTo": null,
"__id__": "3232d1a045476786fg22dfg32b82209155b32"
}
As per the best practice, you can have timestamp as sort key for the above data model. One advantage of having timestamp as sort key is that you can sort the data for the particular partition key and identity the latest updated item. This is the very common use case for having sort key.
It doesn't make much sense to keep both partition and sort key as randomly generated value because you can't use sort key efficiently (unless I miss something here).