Dynamodb SK range key is not returning data as expected - amazon-dynamodb

I am using the below query to get from DB (GSI):
results = table.query(
IndexName="Table-ID-index",
KeyConditionExpression=Key("id").eq(id),
)
However, my data is not sort based on the range key set with the GSI. Sample response I am getting with above query:
{
"value": "test1",
"sk": "1#1",
"id": "1"
},
{
"value": "test19",
"sk": "19#19",
"id": "19"
},
{
"value": "test2",
"sk": "2#2",
"id": "2"
}
sk 19 should come after sk 2. Is there anything I have missed in my query?

If memory serves, this is because the strings being stored and sorted in their UTF-8 encoded form. From the documentation:
"DynamoDB collates and compares strings using the bytes of the underlying UTF-8 string encoding. For example, "a" (0x61) is greater than "A" (0x41), and "¿" (0xC2BF) is greater than "z" (0x7A)."

Related

How can I use jq to sort by datetime field and filter based on attribute?

I am trying to sort following json response based on "startTime" and also want to filter based on "name" and fetch only "dataCenter" of matched record. Can you please help with jq function for doing it?
I tried something like this jq '.[]|= sort_by(.startTime)' but it doesnt return correct result.
[
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "G9HC8",
"scheduleTable": "FD33515",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-11-11 17:45:48.0",
"endTime": "2021-11-11 17:45:51.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-07-13 10:49:47.0",
"endTime": "2021-07-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-10-13 10:49:47.0",
"endTime": "2021-10-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
}
]
You can use the following expression to sort the input -
sort_by(.startTime | sub("(?<time>.*)\\..*"; "\(.time)") | strptime("%Y-%m-%d %H:%M:%S") | mktime)
The sub("(?<time>.*)\\..*"; "\(.time)") expression removes the trailing decimal fraction.
I assume you can use the result from the above query to perform desired filtering.
Welcome. From what I'm guessing you're asking, you want to supply a value to filter the records on using the name property, sort the results by the startTime property and then just output the value of the dataCenter property for those records. How about this:
jq --arg name JPCSKELT '
map(select(.name==$name))|sort_by(.startTime)[].dataCenter
' data.json
Based on your sample data, this produces:
"mvsADM"
"mvsADM"
"mvsADM"
So I'm wondering if this is what you're really asking?

Group nested array objects to parent key in JQ

I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'

How to group by parent and collect all property values of child in gremlin?

I want to collect all shows and their associated genres together. GENRES are child relationship of SHOWS
Sample gemlin graph
So that the output is something similar to:
"1" [a,b]
"2" [c,d]
Sample graph: https://gremlify.com/x8i8stszn2
You can accomplish this using the project() step within Gremlin like this:
g.V("2789").out('WATCHED').hasLabel('SHOW').
project('show', 'genre').
by('NAME').
by(out('HAS_GENRE').values('NAME').fold())
This will return your data formatted like this this:
[
{
"show": 1,
"genre": [
"a",
"b"
]
},
{
"show": 2,
"genre": [
"c",
"d"
]
}
]

Is there an R library or function for formatting international currency strings?

Here's a snippet of the JSON data I'm working with:
{
"item" = "Mexican Thing",
...
"raised": "19",
"currency": "MXN"
},
{
"item" = "Canadian Thing",
...
"raised": "42",
"currency": "CDN"
},
{
"item" = "American Thing",
...
"raised": "1",
"currency": "USD"
}
You get the idea.
I'm hoping there's a function out there that can take in a standard currency abbreviation and a number and spit out the appropriate string. I could theoretically write this myself except I can't pretend like I know all the ins and outs of this stuff and I'm bound to spend days and weeks being surprised by bugs or edge cases I didn't think of. I'm hoping there's a library (or at least a web api) already written that can handle this but my Googling has yielded nothing useful so far.
Here's an example of the result I want (let's pretend "currency" is the function I'm looking for)
currency("USD", "32") --> "$32"
currency("GBP", "45") --> "£45"
currency("EUR", "19") --> "€19"
currency("MXN", "40") --> "MX$40"
Assuming your real json is valid, then it should be relatively simple. I'll provide a valid json string, fixing the three invalid portions here: = should be :; ... is obviously a placeholder; and it should be a list wrapped in [ and ]:
js <- '[{
"item": "Mexican Thing",
"raised": "19",
"currency": "MXN"
},
{
"item": "Canadian Thing",
"raised": "42",
"currency": "CDN"
},
{
"item": "American Thing",
"raised": "1",
"currency": "USD"
}]'
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste(raised, currency))
# [1] "19 MXN" "42 CDN" "1 USD"
Edit: in order to change to specific currency characters, don't make this too difficult: just instantiate a lookup vector where "USD" (for example) prepends "$" and appends "" (nothing) to the raised string. (I say both prepend/append because I believe some currencies are always post-digits ... I could be wrong.)
pre_currency <- Vectorize(function(curr) switch(curr, USD="$", GDP="£", EUR="€", CDN="$", "?"))
post_currency <- Vectorize(function(curr) switch(curr, USD="", GDP="", EUR="", CDN="", "?"))
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste0(pre_currency(currency), raised, post_currency(currency)))
# [1] "?19?" "$42" "$1"
I intentionally left "MXN" out of the vector here to demonstrate that you need a default setting, "?" (pre/post) here. You may choose a different default/unknown currency value.
An alternative:
currency <- function(val, currency) {
pre <- sapply(currency, switch, USD="$", GDP="£", EUR="€", CDN="$", "?")
post <- sapply(currency, switch, USD="", GDP="", EUR="", CDN="", "?")
paste0(pre, val, post)
}
with(jsonlite::parse_json(js, simplifyVector = TRUE),
currency(raised, currency))
# [1] "?19?" "$42" "$1"

Use jq to combine two arrays of objects on a certain key

I am trying to use jq to solve this problem.
Suppose I have the following object
{
"listA": [
{
"id": "12345",
"code": "001"
}
]
"listB": [
{
"id": "12345",
"prop": "AABBCC"
}
]
}
In reality my two lists are longer, but the id isn't repeated within each list.
How may I combine the two lists into a single list where each item is an object with the non-id properties for the given id are collected into a single object?
For example, from the object above, I'd like the following:
{
"listC" : [
{
"id": "12345",
"code": "001",
"prop": "AABBCC"
}
]
}
A simple way would be to concatenate the arrays, group the elements by id and map each group into a single object using add;
jq '.listA+.listB | group_by(.id) | map(add)' test.json
If there may be more than two arrays you need to merge in the file, you could instead use flatten to concatenate all of them.
Test case below
# cat test.json
{
"listA": [
{ "id": "12345", "code": "001" },
{ "id": "12346", "code": "002" }
],
"listB": [
{ "id": "12345", "prop": "AABBCC" }
]
}
# jq 'flatten | group_by(.id) | map(add)' test.json
# or
# jq '.listA+.listB | group_by(.id) | map(add)' test.json
[
{
"id": "12345",
"code": "001",
"prop": "AABBCC"
},
{
"id": "12346",
"code": "002"
}
]
Using group_by entails a sort, which is unnecessary, so if efficiency is a concern, then an alternative approach such as the following should be considered:
INDEX(.listA[]; .id) as $one
| INDEX(.listB[]; .id) as $two
| reduce ($one|keys_unsorted[]) as $k ($two; .[$k] += $one[$k])
| {listC: [.[]] }

Resources