R list/dataframes to JSON array of objects - r

I am trying to create a JSON array of objects using jsonlite in R.
The goal is a JSON like this:
{
"top":[
{
"master1": {
"item1": "value1"
}
},
{
"master2": {
"item2": "value2"
}
}
]
}
I tried list of lists and dataframes with list column but can't get the desired output.
Apart from that, converting the above with fromJSON/toJSON results in a different format:
library(jsonlite)
txt <- '{
"top":[
{
"master1": {
"item1": "value1"
}
},
{
"master2": {
"item2": "value2"
}
}]
}'
toJSON(fromJSON(txt), pretty = T)
# Output
{
"top": [
{
"master1": {
"item1": "value1"
},
"master2": {}
},
{
"master1": {},
"master2": {
"item2": "value2"
}
}
]
}
Do I need to set a parameter for this to work?

By default, the fromJSON call converts your input to a data frame, and therefore adds NA values, which result in the empty entries in your output JSON:
$top
item1 item2
1 value1 <NA>
2 <NA> value2
You need to add simplifyDataFrame = FALSE to the fromJSON call to prevent it from creating a data frame.
toJSON(fromJSON(txt, simplifyDataFrame = FALSE), pretty = T)
gives
{
"top": [
{
"master1": {
"item1": ["value1"]
}
},
{
"master2": {
"item2": ["value2"]
}
}
]
}

Related

Is there a way to transform these 2 arrays by using jq, into a set of objects, like in the example down below?

Example json data:
{
"data": [
{
"place": "FM346",
"id": [
"7_day_A",
"7_day_B",
"7_day_C",
"7_day_D"
],
"values": [
0,
30,
23,
43
]
},
{
"place": "LH210",
"id": [
"1_day_A",
"1_day_B",
"1_day_C",
"1_day_D"
],
"values": [
4,
45,
100,
9
]
}
]
}
what i need to transform it into:
{
"data": [
{
"place": "FM346",
"7_day_A": {
"value": 0
},
"7_day_B": {
"value": 30
},
"7_day_C": {
"value": 23
},
"7_day_D": {
"value": 43
}
},
{
"place": "LH210",
"1_day_A": {
"value": 4
},
"1_day_B": {
"value": 45
},
"1_day_C": {
"value": 100
},
"1_day_D": {
"value": 9
}
}
]
}
i have tried this:
{
data:[.data |.[]|
{
place: (.place),
(.id[]):
{
value: (.values[])
}
}]
}
(in jqplay: https://jqplay.org/s/f4BBtN9gwmp)
and this:
{
data:[.data |.[]|
{
place: (.place),
test:
[{
(.id[]):
{
value: (.values[])
}
}]
}]
}
(in jqplay: https://jqplay.org/s/pKIvQe1CzgX)
but they arent grouped in the way i wanted and it gives each value to each id, not the corresponding one.
I have been trying for some time now, but im new to jq and have no idea how to transform it this way, thanks in advance for any answers.
You can use transpose here, which can play a key role in converting the arrays to key/value pairs
.data[] |= {place} +
([ .id, .values ] | transpose | map({(.[0]): { value: .[1] } }) | add)
The solution works by converting the array-of-arrays [.id, .values] by transposing them, i.e. converting
[["7_day_A","7_day_B","7_day_C","7_day_D"],[0,30,23,43]]
[["1_day_A","1_day_B","1_day_C","1_day_D"],[4,45,100,9]]
to
[["7_day_A",0],["7_day_B",30],["7_day_C",23],["7_day_D",43]]
[["1_day_A",4],["1_day_B",45],["1_day_C",100],["1_day_D",9]]
With the transformation done, we construct an object with key as the zeroth index element and value as an object comprising of the value of first index element, and combine the results together with add
Demo - jqplay

aws glue job to import dynamodb data

We are trying to do DynamoDB migration from prod account to stage account.
In the source account, we are making use of "Export" feature of DDB to put the compressed .json.gz files into destination S3 bucket.
We have written a glue script which will read the exported .json.gz files and writes it to DDB table.
We are making the code generic, so we should be able to migrate any DDB table from prod to stage account.
As part of that process, while testing we are facing issues when we are trying to write a NUMBER SET data to target DDB table.
Following is the sample snippet which is raising ValidationException when trying to insert into DDB
from decimal import Decimal
def number_set(datavalue):
# datavalue will be ['0', '1']
set_of_values = set()
for value in datavalue:
set_of_values.add(Decimal(value))
return set_of_values
When running the code, we are getting following ValidationException
An error occurred while calling o82.pyWriteDynamicFrame. Supplied AttributeValue is empty, must contain exactly one of the supported datatypes (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: UKEU70T0BLIKN0K2OL4RU56TGVVV4KQNSO5AEMVJF66Q9ASUAAJG; Proxy: null)
However, if instead of Decimal(value) if we use int(value) then no ValidationException is being thrown and the job succeeds.
I feel that write_dynamic_frame_from_options will try to infer schema based on the values the element contains, if the element has "int" values then the datatype would be "NS", but if the element contains all "Decimal type" values, then it is not able to infer the datatype.
The glue job we have written is
dyf = glue_context.create_dynamic_frame_from_options(
connection_type="s3",
connection_options={
"paths": [file_path]
},
format="json",
transformation_ctx = "dyf",
recurse = True,
)
def number_set(datavalue):
list_of_values = []
for value in datavalue:
list_of_values.append(Decimal(value))
print("list of values ")
print(list_of_values)
return set(list_of_values)
def parse_list(datavalue):
list_of_values = []
for object in datavalue:
list_of_values.append(generic_conversion(object))
return list_of_values
def generic_conversion(value_dict):
for datatype,datavalue in value_dict.items():
if datatype == 'N':
value = Decimal(datavalue)
elif datatype == 'S':
value = datavalue
elif datatype == 'NS':
value = number_set(datavalue)
elif datatype == 'BOOL':
value = datavalue
elif datatype == 'M':
value = construct_map(datavalue)
elif datatype == 'B':
value = datavalue.encode('ascii')
elif datatype == 'L':
value = parse_list(datavalue)
return value
def construct_map(row_dict):
ddb_row = {}
for key,value_dict in row_dict.items():
# value is a dict with key as N or S
# if N then use Decimal type
ddb_row[key] = generic_conversion(value_dict)
return ddb_row
def map_function(rec):
row_dict = rec["Item"]
return construct_map(row_dict)
mapped_dyF = Map.apply(frame = dyf, f = map_function, transformation_ctx = "mapped_dyF")
datasink2 = glue_context.write_dynamic_frame_from_options(
frame=mapped_dyF,
connection_type="dynamodb",
connection_options={
"dynamodb.region": "us-east-1",
"dynamodb.output.tableName": destination_table,
"dynamodb.throughput.write.percent": "0.5"
},
transformation_ctx = "datasink2"
)
can anyone help us in how can we unblock from this situation?
Record that we are trying to insert
{
"region": {
"S": "to_delete"
},
"date": {
"N": "20210916"
},
"number_set": {
"NS": [
"0",
"1"
]
},
"test": {
"BOOL": false
},
"map": {
"M": {
"test": {
"S": "value"
},
"test2": {
"S": "value"
},
"nestedmap": {
"M": {
"key": {
"S": "value"
},
"nestedmap1": {
"M": {
"key1": {
"N": "0"
}
}
}
}
}
}
},
"binary": {
"B": "QUFBY2Q="
},
"list": {
"L": [
{
"S": "abc"
},
{
"S": "def"
},
{
"N": "123"
},
{
"M": {
"key2": {
"S": "value2"
},
"nestedmaplist": {
"M": {
"key3": {
"S": "value3"
}
}
}
}
}
]
}
}

jq build string from multiple levels

I have the following JSON
{
"count": 2,
"load_arg": {
"limit": 100
},
"limit": 100,
"result": [
{
"id": 1234,
"name": "Item1",
"parts_list": {
"subattributes": {
"1234": {
"chassisid": 111236,
"part_attributes": {
"subattributes": {
"134322": {
"attribute_id": 27,
"attribute_value": 3
}
}
}
},
"1235": {
"chassisid": 76,
"part_attributes": {
"subattributes": {
"192134": {
"attribute_id": 17,
"attribute_value": "steel"
}
}
}
}
}
}
},
{
"id": 4321,
"name": "Item2",
"parts_list": {
"subattributes": {
"2212": {
"chassisid": 93245,
"part_attributes": {
"subattributes": {
"76423": {
"attribute_id": 17,
"attribute_value": "plastic"
}
}
}
},
"65": {
"chassisid": 2,
"part_attributes": {
"subattributes": {
"1285": {
"attribute_id": 27,
"attribute_value": 94
}
}
}
}
}
}
}
],
"offset": 0
}
]
and I'm trying to get jq to output the following from the results array
Item1: has a cover made of steel and requires part ID 27 to be 3
Item2: has a cover made of plastic and requires part ID 27 to be 94
I know some basic jq queries but I'm strugglig with grabbing a top level value and then building a string using some specific child values. In my JSON each result could have 10's of subattributes and not all of them are the same so I'd need to search for the attribute ID 17
Effectively I'm trying to pull (pseudo code as I know it's not right)
.[].result[].name + ": has a cover made of " + .[].result[].parts_list.subattributes[].part_attributes[][] | select(.attribute_id == 17).attribute_value + " and requires part ID 27 to be " + [].result[].parts_list.subattributes[].part_attributes[][] | select(.attribute_id == 27).attribute_value
This should get you on your way:
def id2value($id): .. | objects | select(.attribute_id == $id) | .attribute_value;
.result[]
| "\(.name): has a cover made of \(.parts_list | id2value(17))"
With your input and using the -r command-line option, the above produces:
Item1: has a cover made of steel
Item2: has a cover made of plastic

using toJSON: How to get an array nested in a data.frame into a JSON object

I am trying to convert a data.frame into JSON text. How can I include an array in this data.frame? I.e. the variable characteristics needs to contain multiple elements. I do not succeed in this.
Here is my code:
data <- data.frame(sequenceNbr = c(1:3),
requestTypeCode = "MR001",
managementStartingDate = format(Sys.Date(),"%Y-%m-%d"),
TopicCode = "TH02",
Nbr = c("1760448", "6580364", "1391363"),
TypeCode = "R003008",
char1 = "CK001001", char2 = "CK001002",
stringsAsFactors = FALSE) %>%
group_by(sequenceNbr) %>%
nest( Assignment=c(TopicCode),
entity = c(Nbr),
ToControl = c(TypeCode),
characteristics = c(char1,char2)
)
data <- data %>%
mutate( Assignment = purrr::map(Assignment, as.list),
entity = purrr::map(entity, as.list))
json <- jsonlite::toJSON(list(`DefList`=data),
auto_unbox = TRUE, pretty = TRUE)
This gives as output
{
"DefList": [
{
"sequenceNbr": 1,
"requestTypeCode": "MR001",
"managementStartingDate": "2020-06-16",
"Assignment": {
"TopicCode": "TH02"
},
"entity": {
"Nbr": "1760448"
},
"ToControl": [
{
"TypeCode": "R003008"
}
],
"characteristics": [
{
"char1": "CK001001",
"char2": "CK001002"
}
]
},
{
"sequenceNbr": 2,
"requestTypeCode": "MR001",
"managementStartingDate": "2020-06-16",
"Assignment": {
"TopicCode": "TH02"
},
"entity": {
"Nbr": "6580364"
},
"ToControl": [
{
"TypeCode": "R003008"
}
],
"characteristics": [
{
"char1": "CK001001",
"char2": "CK001002"
}
]
}
]
}
What I want as output is this (there is a difference in the characteristics-element):
{
"DefList": [
{
"sequenceNbr": 1,
"requestTypeCode": "MR001",
"managementStartingDate": "2020-06-16",
"Assignment": {
"TopicCode": "TH02"
},
"entity": {
"Nbr": "1760448"
},
"ToControl": [
{
"TypeCode": "R003008"
}
],
"characteristics": [
"CK001001",
"CK001002"
]
},
{
"sequenceNbr": 2,
"requestTypeCode": "MR001",
"managementStartingDate": "2020-06-16",
"Assignment": {
"TopicCode": "TH02"
},
"entity": {
"Nbr": "6580364"
},
"ToControl": [
{
"TypeCode": "R003008"
}
],
"characteristics": [
"CK001001",
"CK001002"
]
}
]
}
It should also work when characteristics contains only one value. Any idea's? I think the data.frame structure is the limiting factor, since it cannot contain an array. I see in this post : https://blog.exploratory.io/working-with-json-data-in-very-simple-way-ad7ebcc0bb89 that it is correct JSON syntax, but I do not manage to obtain it. Thanks in advance for your help.

How can I duplicate multiple times an existing object within a JSON array using jq?

I have the following json file:
{
"actions": [
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
}
]
}
I would like to append n-times the object within the "actions" array to the end of it, creating n+1 total objects.
Expected output if n=2:
{
"actions": [
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
},
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
},
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
}
]
}
I found this answer [How can I duplicate an existing object within a JSON array using jq? however it only works with one element at the end.
You can just use a reduce() function with range() together to create the index to include the object at.
jq --arg n 2 'reduce range(0, ($n|tonumber)) as $d (.; .actions[$d+1] += .actions[0] )' json

Resources