error when using Kotlin groupBy function applied to list - dictionary

I'm very new in Kotlin, and I can't seem to figure out how to do something that is really quite simple; grouping my data by the name field. I've tried with map and groupBy, using one or using both, but I either get an error, or the data is in a list, and the duplicates are filtered out - which I don't want. I need the duplicates grouped.
As an aside, my sql function (in the repository), which finds records in my DB based on the date entered has fields from two different classes, which isn't the optimal way of doing things, but I didn't know how else to combine the data from two tables. I don't mind correcting that if someone can tell me how. The error that I get is "Type inference failed. Expected Type mismatch. Required: List Found Map
String?, List>
below is my code
TIA
my repository code
#Query("SELECT new XXX.report.model.ReportWithBatches(r.adlsPath, r.fileSize, r.lastUpdate, r.remoteFileName, b.dataPath , b.version, b.dataSource, r.recordCount, r.transferStatus, r.businessDate) FROM ReportOutput r INNER JOIN BatchInput b ON r.job.jobUuid = b.reportJob.jobUuid WHERE r.businessDate = ?1")
fun findAllByBusinessDateJoinBatches(date: LocalDate): List<ReportWithBatches>
}
my service code
fun findAllByCreationDateJoinBatches(date: LocalDate): List<ReportWithBatches> {
val reportBatchesList = reportRepository.findAllByBusinessDateJoinBatches(date)
return reportBatchesList.groupBy({it.adlsFullPath}, {it})
// .map { it.value }
// return reportBatchesList
// return reportBatchesList.map { rB ->
// ReportWithBatches(
// rB.adlsFullPath,
// rB.contentLength,
// rB.lastModified,
// rB.remoteFileName,
// rB.dataPath,
// rB.version,
// rB.source,
// rB.numberOfRecords,
// rB.transferStatus,
// rB.creationDate)
//
// }
}
code in my controller
#GetMapping(value = ["/linkBatches/{today}"])
fun findAllByCreationDateJoinBatches(#PathVariable("today") #DateTimeFormat(pattern = "yyyyMMdd") date: LocalDate): List<ReportWithBatches> {
return eligibleService.findAllByCreationDateJoinBatches(date)
}
My result is this - note there's only one batch per record in the result:
[
{
"adlsFullPath": "part-00000-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 20,
"lastModified": "2020-02-07T16:50:16.132-05:00",
"remoteFileName": null,
"dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07"
},
{
"adlsFullPath": "part-00007-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 1104,
"lastModified": "2020-02-07T16:50:16.133-05:00",
"remoteFileName": null,
"dataPath": "preprd/datalake/lake/nes/negotiation/execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07"
},
{
"adlsFullPath": "part-00015-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 1057,
"lastModified": "2020-02-07T16:50:16.133-05:00",
"remoteFileName": null,
"dataPath": "preprd/datalake/lake/nes/negotiation/execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07"
},
]
I would like to get something like this:
[{
"adlsFullPath": "part-00000-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 20,
"lastModified": "2020-02-07T16:50:16.132-05:00",
"remoteFileName": null,
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07",
"batches":{
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=2/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=1/version=1",
"version": 1,
"source": "NETS",
}
},
{
"adlsFullPath": "part-00007-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 1104,
"lastModified": "2020-02-07T16:50:16.133-05:00",
"remoteFileName": null,
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07",
"batches":{
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=2/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=1/version=1",
"version": 1,
"source": "NETS",
}
},
{
"part-00015-1399b2e0-5fa5-484b-91f1-9dec0601b885-c000.csv.gz",
"contentLength": 1104,
"lastModified": "2020-02-07T16:50:16.133-05:00",
"remoteFileName": null,
"numberOfRecords": -1,
"transferStatus": "REPORT_CREATED",
"creationDate": "2020-02-07",
"batches":{
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=3/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=2/version=1",
"version": 1,
"source": "NETS",
},
{ "dataPath": "execution_v1/integration_date=2020-02-07/business_date=2020-02-07/batch_id=1/version=1",
"version": 1,
"source": "NETS",
}
}
}]

Since groupBy returns a Map not a List, you get an error when you do
return reportBatchesList.groupBy({it.adlsFullPath}, {it})
From a function having return type List<ReportWithBatches>. So you will have to change the return type of your function to Map<String, ReportWithBatches> in service.
In your case groupBy will give you a Map which will look something as following
{adlsFullPath1=[Report1, Report2], adlsFullPath2=[Report3]}
For more information read the docs.

Related

jq array filter for nested array elements

I am trying to add a new user in below json which matches group NP01-RW. i am able to do without NP01-RW but not able to select users under NP01-RW and return updated json.
{
"id": 181,
"guid": "c9b7dbde-63de-42cc-9840-1b4a06e13364",
"isEnabled": true,
"version": 17,
"service": "Np-Hue",
"name": "DATASCIENCE-CUROPT-RO",
"policyType": 0,
"policyPriority": 0,
"isAuditEnabled": true,
"resources": {
"database": {
"values": [
"hive_cur_acct_1dev",
"hive_cur_acct_1eng",
"hive_cur_acct_1rwy",
"hive_cur_acct_1stg",
"hive_opt_acct_1dev",
"hive_opt_acct_1eng",
"hive_opt_acct_1stg",
"hive_opt_acct_1rwy"
],
"isExcludes": false,
"isRecursive": false
},
"column": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
},
"table": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
}
},
"policyItems": [
{
"accesses": [
{
"type": "select",
"isAllowed": true
},
{
"type": "update",
"isAllowed": true
},
{
"type": "create",
"isAllowed": true
},
{
"type": "drop",
"isAllowed": true
},
{
"type": "alter",
"isAllowed": true
},
{
"type": "index",
"isAllowed": true
},
{
"type": "lock",
"isAllowed": true
},
{
"type": "all",
"isAllowed": true
},
{
"type": "read",
"isAllowed": true
},
{
"type": "write",
"isAllowed": true
}
],
"users": [
"user1",
"user2",
"user3"
],
"groups": [
"NP01-RW"
],
"conditions": [],
"delegateAdmin": false
},
{
"accesses": [
{
"type": "select",
"isAllowed": true
}
],
"users": [
"user1"
],
"groups": [
"NP01-RO"
],
"conditions": [],
"delegateAdmin": false
}
],
"denyPolicyItems": [],
"allowExceptions": [],
"denyExceptions": [],
"dataMaskPolicyItems": [],
"rowFilterPolicyItems": [],
"options": {},
"validitySchedules": [],
"policyLabels": [
"DATASCIENCE-CurOpt-RO_NP01"
]
}
below is what i have tried but it returns part of the JSON matching NP01-RW and not full JSON
jq --arg username "$sync_userName" '.policyItems[] | select(.groups[] | IN("NP01-RO")).users += [$username]' > ${sync_policyName}.json
Operator precedence in jq is not always intuitive. Your program is parsed as:
.policyItems[] | (select(.groups[] | IN("NP01-RO")).users += [$username])
Which first streams all policyItems and only then changes them, leaving you with policyItems only in the output.
You need to make sure that the stream selects the correct values, which you can then assign:
(.policyItems[] | select(.groups[] | IN("NP01-RO")).users) += [$username]
This will do the assignment, but still return the full input (.).

How to Add data to a nested field in Firebase Firestore using a conditional field path

var k = [
{
"category": "Cars",
"products": [
{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
},
{
"productName": "Mercedes",
"quantity": 1,
"costPrice": 220000,
"coverPhoto": "www.mercerdezphotoUrl.jpg"
}
]
},
{
"category": "Food",
"products": [
{
"productName": "Pizza",
"quantity": 50,
"costPrice": 30,
"coverPhoto": "www.pizzaphotoUrl.jpg"
},
{
"productName": "Pancake",
"quantity": 3,
"costPrice": 3,
"coverPhoto": "www.pancakephotoUrl.jpg"
}
]
}
];
Given the above code sample, I'm trying to add a new map to the nested list, product using the firebase.arrayUnion()
Map _map =
{
"productName": 'Tesla',
"quantity": 2,
"costPrice": 45000,
"coverPhoto": "www.teslaPhotoUrl.jpg"
};
I want to only add this map only where the key 'category' is equal to 'Cars'
_firebaseFirestoreRef.collection('data').doc(id).update({
"fieldPath to the the list where key category == Cars": FieldValue.arrayUnion([_map])
});
And I want final result to be like this on my firebase firestore database
[
{
"category": "Cars",
"products": [
{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
},
{
"productName": "Mercedes",
"quantity": 1,
"costPrice": 220000,
"coverPhoto": "www.mercerdezphotoUrl.jpg"
},
{
"productName": 'Tesla',
"quantity": 2,
"costPrice": 45000,
"coverPhoto": "www.teslaPhotoUrl.jpg"
}
]
},
{
"category": "Food",
"products": [
{
"productName": "Pizza",
"quantity": 50,
"costPrice": 30,
"coverPhoto": "www.pizzaphotoUrl.jpg"
},
{
"productName": "Pancake",
"quantity": 3,
"costPrice": 3,
"coverPhoto": "www.pancakephotoUrl.jpg"
}
]
}
];
I know I need to use the FirebaseFirestore FieldPath and some query objects, but I don't know how to use it effectively to achieve this ...
Since your k is an array, you're trying to update an existing element in an array field, which is not possible in Firestore. You'll first need to read the document, get the k array from it, update it in your application code, and then write the resulting field back to the database.
This has been covered quite regularly before, so I recommend looking at some other questions about updating an item in an array.
You can also consider turning the top-level array into a map, using the category for the first-level field name:
products: {
"Cars": [{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
}, ... ]
"Food": [{
...
}]
}
Now you can add an item to the Cars array with an array union on products.Cars.
As Dharmaraj commented, you could also consider putting the products into a subcollection. This will allow you to query the products separately, and allows you to read the parent document without reading all products.

How to retrieve array element from a specific position in CosmosDB document?

Suppose I have a document with the following structure,
{
"VehicleDetailId": 1,
"VehicleDetail": [
{
"Id": 1,
"Make": "BMW"
},
{
"Id": 1,
"Model": "ABDS"
},
{
"Id": 1,
"Trim": "5.6L/ASMD"
},
{
"Id": 1,
"Year": 2008
}
]
}
Now I want to retrieve an array element located at a specific position from VehicleDetail array like I want to retrieve the second element, i.e.,
{
"Id": 1,
"Model": "ABDS"
}
or the third,
{
"Id": 1,
"Trim": "5.6L/ASMD"
}
How should I write the query to achieve this?
Use the built-in ARRAY_SLICE function. This allows you to select part of an array.
Pass the array, starting position, number of elements to select.
SELECT ARRAY_SLICE(c.VehicleDetail, 1, 1) As SecondElement
FROM c
Output:
{
"SecondElement": [
{
"Id": 1,
"Model": "ABDS"
}
]
}

GeoJson data in R

I want to work on GeoJson data having below mentioned format;
{ "id": 1,
"geometry":
{ "type": "Point",
"coordinates": [
-3.706,
40.3],
"properties": {"appuserid": "5b46-7d3c-48a6-9c08-cc894",
"eventtype": "location",
"devicedate": "2016-06-08T07:25:21",
"date": "2016-06-08T07:25:06.507",
"location": {
"building": "2",
"floor": "0",
"elevation": ""
}}}
The problem is i want to use a "Where" clause to "appuserid" and select the selected records for processing. I dont know how to do it ? I have already saved data from a Mongodb in a dataframe.
Right now i am trying to do it as follow;
library(sqldf)
sqldf("SELECT * FROM d WHERE d$properties$appuserid = '0000-0000-0000-0000'")
But it gives an error.
Error: Only lists of raw vectors are currently supported
code is below;
library(jsonlite);
con <- mongo(collection = "geodata", db = "MongoDb", url = "mongodb://192.168.26.18:27017", verbose = FALSE, options = ssl_options());
d <- con$find();
library(jqr)
jq(d, '.features[] | select(d$properties$appuserid == "5b46-7d3c-48a6-9c08-cc894")')
Error : Error in jq.default(d, ".features[] | select(d$properties$appuserid == \"5b46-7d3c-48a6-9c08-cc894\")") :
jq method not implemented for data.frame.
jqr is one option, an R client for jq https://stedolan.github.io/jq/
x <- '{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"population": 200
},
"geometry": {
"type": "Point",
"coordinates": [
10.724029,
59.926807
],
"properties": {
"appuserid": "5b46-7d3c-48a6-9c08-cc894"
}
}
},
{
"type": "Feature",
"properties": {
"population": 600
},
"geometry": {
"type": "Point",
"coordinates": [
10.715789,
59.904778
],
"properties": {
"appuserid": "c7e866a7-e32d-4dc2-adfd-c2ca065b25ce"
}
}
}
]
}'
library(jqr)
jq(x, '.features[] | select(.geometry.properties.appuserid == "5b46-7d3c-48a6-9c08-cc894")')
returns
{
"type": "Feature",
"properties": {
"population": 200
},
"geometry": {
"type": "Point",
"coordinates": [
10.724029,
59.926807
],
"properties": {
"appuserid": "5b46-7d3c-48a6-9c08-cc894"
}
}
}

Different results between date histogram and date range on Elastic Search

I would like to analyse my logs data with Elastic Search/Kibana and count unique customer by month.
Results are different when I use a date histogram aggregation and date range aggregation.
Here is the date histogram query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And results :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 595805
},
"key_as_string": "2015-04-01T00:00:00.000Z",
"key": 1427839200000,
"doc_count": 6410438
},
{
"1": {
"value": 647788
},
"key_as_string": "2015-05-01T00:00:00.000Z",
"key": 1430431200000,
"doc_count": 6669555
},...
Here is the date range query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01",
"to": "2015-05-01"
},
{
"from": "2015-05-01",
"to": "2015-06-01"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And the response :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 592179
},
"key": "2015-04-01T00:00:00.000Z-2015-05-01T00:00:00.000Z",
"from": 1427846400000,
"from_as_string": "2015-04-01T00:00:00.000Z",
"to": 1430438400000,
"to_as_string": "2015-05-01T00:00:00.000Z",
"doc_count": 6411884
},
{
"1": {
"value": 616995
},
"key": "2015-05-01T00:00:00.000Z-2015-06-01T00:00:00.000Z",
"from": 1430438400000,
"from_as_string": "2015-05-01T00:00:00.000Z",
"to": 1433116800000,
"to_as_string": "2015-06-01T00:00:00.000Z",
"doc_count": 6668060
}
]
}
}
In the first case, I have 595,805 for April and 647,788 for May
In the second case, I have 592,179 for April and 616,995 for May
Someone could explain me why I have these differences between these use cases ?
Thank you
I update my first post to add another example
I add another example with fewer data (on 1 day) but with the same issue. Here is the first request with date histogram :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1h",
"pre_zone": "00:00",
"pre_zone_adjust_large_interval": true,
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
And we can see 660 unique count with 1717 doc count for the first hour :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":660
},
"key_as_string":"2015-04-01T00:00:00.000Z",
"key":1427846400000,
"doc_count":1717
},
{
"1":{
"value":324
},
"key_as_string":"2015-04-01T01:00:00.000Z",
"key":1427850000000,
"doc_count":776
},
{
"1":{
"value":190
},
"key_as_string":"2015-04-01T02:00:00.000Z",
"key":1427853600000,
"doc_count":481
}
]
}
}
}
But on the second request with the date range :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01T00:00:00",
"to": "2015-04-01T01:00:00"
},
{
"from": "2015-04-01T01:00:00",
"to": "2015-04-01T02:00:00"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
We can see only 633 unique count with 1717 doc count :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":633
},
"key":"2015-04-01T00:00:00.000Z-2015-04-01T01:00:00.000Z",
"from":1427846400000,
"from_as_string":"2015-04-01T00:00:00.000Z",
"to":1427850000000,
"to_as_string":"2015-04-01T01:00:00.000Z",
"doc_count":1717
},
{
"1":{
"value":328
},
"key":"2015-04-01T01:00:00.000Z-2015-04-01T02:00:00.000Z",
"from":1427850000000,
"from_as_string":"2015-04-01T01:00:00.000Z",
"to":1427853600000,
"to_as_string":"2015-04-01T02:00:00.000Z",
"doc_count":776
}
]
}
}
}
Please someone could tell me why ? Thank you
When using the date_histogram aggregation you need to take into account the timezone, which date_range doesn't as it's always using the GMT timezone.
If you look at the long millisecond values in your results, you'll see the following:
For your date histogram, from: 1427839200000 is actually equal to 2015-03-31T22:00:00.000Z which differs from the key_as_string value (i.e. 2015-04-01T00:00:00.000Z) that is formatted according to the GMT timezone.
In your first aggregation, try explicitly specifying the time_zone parameter to be your current timezone (apparently GMT+2) and you should get the same results:
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1,
"time_zone": -2
},

Resources