Data structure for key-value list in RFC-enabled function module? - dictionary

I am writing a new RFC callable function in ABAP which should be able to import a list of key-values mapping.
The RFC calling application will use Python with the PyRFC library.
I am unsure whether I should create a new custom data structure of if I can re-use an existing data structure.
The import argument should be able to contain a list of this:
('key1', ['key1val1', 'key1val2', ...])
('key2', ['key2val1', 'key2val2', ...])
....
If possible I would like to re-use an existing data structure.
One ugly hack would be to design the API like this: use a string and parse at as json. But this is a work-around which I would like to avoid.
I found the data structure WDY_KEY_VALUE but there the value is a string. I would need a structure where the value is a list of strings.

You can create a deep structure with KEY defined with type STRING and VALUE defined with type STRINGTAB.

modelling such data is perfectly possible in ABAP DDIC:
create table type z_t_values with row being built in type string
create structure type z_s_key_values with fields key type string and values type z_t_values
create table type z_t_key_values with row type z_s_key_values
now, the type z_t_key_values corresponds to your example input: it is a table of rows, each row contains a single key and a table of values

Related

Kusto table creation where column name from file or list

I want to create a kusto table where columns are result of some function or from a file.
I have a file which contains column name and datatype as key value pair.
Query should take column name and datatype from file.
Thanks in advance
you can use any such language like java script or C# to do so, but again you have need to define the data type along with column name, so if you have proper data type defined along with column in list/file then you can read the file and create table on the fly.

Parsing nested JSON data within a Kusto column

After parsing the JSON data in a column within my Kusto Cluster using parse_json, I'm noticing there is still more data in JSON format nested within the resulting projected value. I need to access that information and make every piece of the JSON data its own column.
I've attempted to follow the answer from this SO post (Parsing json in kusto query) but haven't been successful in getting the syntax correct.
myTable
| project
Time,
myColumnParsedJSON = parse_json(column)
| project myColumnParsedNestedJSON = parse_json(myColumnParsedJSON.nestedJSONDataKey)
I expect the results to be projected columns, each named as each of the keys, with their respective values displayed in one row record.
please see the note at the bottom of this doc:
It is somewhat common to have a JSON string describing a property bag in which one of the "slots" is another JSON string. In such cases, it is not only necessary to invoke parse_json twice, but also to make sure that in the second call, tostring will be used. Otherwise, the second call to parse_json will simply pass-on the input to the output as-is, because its declared type is dynamic
once you're able to get parse_json to properly have your payload parsed, you could use the bag_unpack plugin (doc) in order to achieve this requirement you mentioned:
I expect the results to be projected columns, each named as each of the keys, with their respective values displayed in one row record.

How can I read the parquet dictionary in java

I have seen that parquet format uses dictionaries to store some columns and that these dictionaries can be used to speed up the filters if useDictionaryFilter() is used on the ParquetReader.
Is there any way to access these dictionaries from java code ?
I'd like to use them to create a list of distinct members of my column and though that it would be faster to read only the dictionary values than scanning the whole column.
I have looked into org.apache.parquet.hadoop.ParquetReader API but did not found anything.
The methods in org.apache.parquet.column.Dictionary allow you to:
Query the range of dictionary indexes: Between 0 and getMaxId().
Look up the entry corresponding to any index, for example for an int field you can use decodeToInt().
Once you have a Dictionary, you can iterate over all indexes to get all entries, so the question boils down to getting a Dictionary. To do that, use ColumnReaderImpl as a guide:
getDictionary(ColumnDescriptor path, PageReader pageReader) {
DictionaryPage dictionaryPage = pageReader.readDictionaryPage();
if (dictionaryPage != null) {
Dictionary dictionary = dictionaryPage.getEncoding().initDictionary(path, dictionaryPage);
}
}
Please note that a column chunk may contain a mixture of data pages, some dictionary-encoded and some not, because if the dictionary "gets full" (reaches the maximum allowed size), then the writer outputs the dictionary page and the dictionary-encoded data pages and switches to not using dictionary-encoding for the rest of the data pages.

Dynamodb index with Json attribute

I am referring to a thread creating an index with JSON
I have a column called data in my DynamoDB table. This is in JSON and the structure of this file looks like this:
{
"config": "aasdfds",
"state":"PROCESSED",
"value" "asfdasasdf"
}
The AWS documentation says that I can create an index with the top level JSON attribute. However I don't know how to do this exactly. When I create the index, should I specify the partition key as data.state, then, in my code, use a query with the column data.state with the value set to PROCESSED, or should I create the partition key as data, then, in my code, look for the column data with the value set to state = "PROCESSED" ?
Top level attribute means DynamoDB supports creating index on Scalar attributes only (String, Number, or Binary).
The JSON attribute is stored as Document data type. So, index can't be created on Document data type.
The key schema for the index. Every attribute in the index key schema
must be a top-level attribute of type String, Number, or Binary. Other
data types, including documents and sets, are not allowed.
Scalar Types – A scalar type can represent exactly one value. The
scalar types are number, string, binary, Boolean, and null.
Document Types – A document type can represent a complex structure
with nested attributes—such as you would find in a JSON document. The
document types are list and map.
Set Types – A set type can represent multiple scalar values. The set
types are string set, number set, and binary set.

filter pushdown using spark-sql on map type column in parquet

I am trying to store my data in nested way in parquet and using map type column to store complex objects as values.
If somebody could let me know whether filter push down works on map type of columns or not.For example below is my sql query -
`select measureMap['CR01'].tenorMap['1M'] from RiskFactor where businessDate='2016-03-14' and bookId='FI-UK'`
measureMap is a map with key as String and value as a custom data type containing 2 attributes - String and another map of String,Double pair.
I want to know whether pushdown will work on map or not i.e if map has 10 key value pairs , Spark will bring whole map's data in memort and create the object model or it will filter out the data depending upon the key at I/O read level.
Also I want ot know is there is any way to specify key in where clause, something like - where measureMap.key = 'CR01' ?
The short answer is No. Parquet predicate pushdown doesn't work with mapType columns or for the nested parquet structure.
Spark catalyst optimizer only understands the top level column in the parquet data. It uses the column type, column data range, encoding etc to finally generate the whole stage code for the query.
When the data is in a MapType format it is not possible to get this information from the column. You could have hundreds of key-value pair inside a map which is impossible with current spark infrastructure to do a predicate pushdown.

Resources