Project by, with optional properties - azure-cosmosdb

I believe this question is for Tinkerpop, not specific to the CosmosDB implementation; just some semantics might be baked into my query examples.
I've developed a data layer that creates queries based on some metadata information. Currently, my data layer will only persist non-null data values to the graph vertex; this is causing troubles with my retrieval mechanism.
Provided the following data model, where the field "HomeRoute" may or may not exist on the actual vertex (depending on whether it was populated or not).
{
"ApplicationModule": string
"Title": string
"HomeRoute": string?
}
My initial query structure is as follows, which does not support the optional properties (discussed later).
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by('HomeRoute');
To simulate, we can insert a vertex:
g.addV('ApplicationsTest')
.property('partitionId', '')
.property('ApplicationModule', 'TestApp')
.property('Title', 'Test App')
.property('HomeRoute', 'testapphome');
And we can successfully query it using my base query noted above, which returns it in my desired JSON format.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
}
]
If we now insert a vertex without the HomeRoute property (since it was null within the application layer), my base query will fail.
g.addV('ApplicationsTest')
.property('partitionId', '')
.property('ApplicationModule', 'TestApp')
.property('Title', 'Test App');
Executing my base query now results in error:
Gremlin Query Execution Error: Project By: Next: The provided
traverser of key "HomeRoute" maps to nothing.
I can apply a coalesce operation against "optional" fields; my current understanding has allowed me to return a constant value in the case of undefined properties. Updating my base query as follows will return "!dbnull" when a property does not exist on the vertex:
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by(values('HomeRoute')
.fold()
.coalesce(unfold(), constant('!dbnull')));
This query when executed returns the values as expected, again in JSON format.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
},
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "!dbnull"
}
]
My question (still new to Gremlin / Tinkerpop queries) - is there any way that I can get this result with only the properties which are present on the respective vertices?
My desired output from this example is below, which would allow my data layer to only unbundle the values present on the graph vertex and not have to consider string "!dbnull" values.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
},
{
"ApplicationModule": "TestApp",
"Title": "Test App"
}
]

I've found a way to achieve what I'm looking for. Would still love input from the community though, if there's optimizations or other considerations.
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by(values('HomeRoute')
.fold()
.coalesce(unfold(), constant('!dbnull')))
.local(unfold()
.where(select(values).is(without('!dbnull')))
.group().by(select(keys)).by(select(values)))

If you only need specific keys that already exist on the vertex you can use valueMap no need to use project:
g.V()
.has('ApplicationsTest', 'partitionId', '')
.valueMap("ApplicationModule", "Title", "HomeRoute").by(unfold())
example: https://gremlify.com/9fua9jsu0dh

Related

How can a CosmosDb SQL query projection be done so that the structure of the object is maintained?

I want to project, while maintaining the structure of the object. The below is an example, the solution should work for an arbitrary json schema.
SELECT c["user"]["firstname"] from c
Returns:
{
"firstname": "Foo"
}
Instead, I want it to return
{
"user": {
"firstname": "Foo"
}
}
In addition, if the property does not actually exist on the object, I want the property to not be returned.
This rules out doing something like this because the property "user" will still be populated even if it does not exist on the object.
SELECT VALUE {"user": { "firstname": c["user"]["firstname"] }} from c
The only solution I am aware of is using an alias, and then "unflattening" the results. But that requires having a special character (CosmosDb only allows '_') as a delimiter for nested properties, which I want to avoid. Example:
SELECT c["user"]["firstname"] as user_firstname from c
Try this query:
SELECT {"username": c.user.firstname} AS user from c WHERE IS_DEFINED(c.user.firstname)
In the above query, the projection {"username": c.user.firstname} AS user creates the desired output structure and IS_DEFINED() method filters out the objects without property c.user.username.

Using Retool to Update Dynamodb Item - ExpressionAttributeValues contains invalid key: Syntax error; key: "44" id

I'm using the Dynamodb resource in Retool, which is successful for GETs/Scans/Puts/Queries, but I can't seem to get an UpdateItem statement to work.
I'm trying to update an item to add a key for a list of maps if it doesn't exist and append an item if the key already does exist.
Configuration
Update Expression
SET images = list_append(:val, if_not_exists(images, :emptylist))
ExpressionAttributeValues
In Retool, my ExpressionAttributeValues are
":val": [{"location": "{{s3Uploader1.s3FolderName}}/{{s3Uploader1.s3FileName}}"}], ":emptylist":[], which pulls the s3 folder and file names from an s3Uploader and renders to ":val": [{"location": "redactedpath/redacted/redactedfilename"}], ":emptylist":[]
I originally tried the format of calling out the data types, e.g. "M", "L", etc, but I got exactly the same error.
":val":
{
"L":
[
{
"M":
{
"location":
{
"S": "{{s3Uploader1.s3FolderName}}/{{s3Uploader1.s3FileName}}"
}
}
}
]
},
":emptylist":
{
"L":[]
}
Result/Error
When I run the query, I get the following error:
statusCode:422
error:"Unprocessable Entity"
message:"ExpressionAttributeValues contains invalid key: Syntax error; key: "44""
data:null
estimatedResponseSizeBytes:147
resourceTimeTakenMs:363
isPreview:false
resourceType:"dynamodb"
lastReceivedFromResourceAt:1644774304601
source:"resource"
From my understanding, that error message usually specifies the actual key that caused the problem, but from what I can tell, my ExpressionAttributeValues does not contain the string 44. I'm wondering if this is something coming from Retool or if it's perhaps a location instead of the actual key.
I've dug through what feels like the depths of StackOverflow to try different things, but now I feel like I'm stuck.
Additional Information
My original ExpressionAttributeValues was based on Is it possible to combine if_not_exists and list_append in update_item
Similar question, but no answer and different key: ValidationException: ExpressionAttributeValues contains invalid key
Is there anything in the ExpressionAttributeValues that looks like it could cause that error?

AppSync query resolver: are expressionNames and expressionValues necessary?

https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-query
AppSync doc says that expressionNames and expressionValues are optional fields, but they are always populated by code generation. First question, should they be included as a best practice when working with DynamoDB? If so, why?
AppSync resolver for a query on the partition key:
{
"version": "2017-02-28",
"operation": "Query",
"query": {
"expression": "#partitionKey = :partitionKey",
"expressionNames": {
"#partitionKey": "partitionKey"
},
"expressionValues": {
":partitionKey": {
"S": "${ctx.args.partitionKey}"
}
}
}
}
Second question, what exactly is the layman translation of the expression field here in the code above? What exactly is that statement telling DynamoDB to do? What is the use of the # in "expression": "#partitionKey = :partitionKey" and are the expression names and values just formatting safeguards?
Let me answer your second question first:
expressionNames
expressionNames are used for interpolation. What this means is after interpolation, this filter expression object:
"expression": "#partitionKey = :value",
"expressionNames": {
"#partitionKey": "id"
}
will be transformed to:
"expression": "id = :value",
the #partitionKey acts as a placeholder for your column name id. '#' happens to be the delimiter.
But why?
expressionNames are necessary because certain keywords are reserved by DynamoDB, meaning you can't use these words inside a DynamoDB expression.
expressionValues
When you need to compare anything in a DynamoDB expression, you will need also to use a substitute for the actual value using a placeholder, because the DynamoDB typed value is a complex object.
In the following example:
"expression": "myKey = :partitionKey",
"expressionValues": {
":partitionKey": {
"S": "123"
}
}
:partitionKey is the placeholder for the complex value
{
"S": "123"
}
':' is the different delimiter that tells DynamoDB to use the expressionValues map when replacing.
Why are expressionNames and expressionValues always used by code generation?
It is just simpler for the code generation logic to always use expressionNames and expressionValues because there is no need to have two code paths for reserved/non-reserved DynamoDB words. Using expressionNames will always prevent collisions!

How do I make a Hasura data API query to fetch rows based on the length of the their array relationship's value?

Referring to the default sample schema mentioned in https://hasura.io/hub/project/hasura/hello-world/data-apis i.e. to the following two tables:
1) author: id,name
2) article: id, title, content, rating, author_id
where article:author_id has an array relationship to author:id.
How do I make a query to select authors who have written at least one article? Basically, something like select author where len(author.articles) > 0
TL;DR:
There's no length function that you can use in the Hasura data API syntax right now. Workaround 1) filter on a property that is guaranteed to be true for every row. Like id > 0. 2) Build a view and expose APIs on your view.
Option 1:
Use an 'always true' attribute as a filter.
{
"type": "select",
"args": {
"table": "author",
"columns": [
"*"
],
"where": {
"articles": {
"id": {
"$gt": "0"
}
}
}
}
}
This reads as: select all authors where ANY article has id > 0
This works because id is an auto-incrementing int.
Option 2:
Create a view and then expose data APIs on them.
Head to the Run SQL window in the API console and run a migration:
CREATE VIEW author_article_count as (
SELECT au.*, ar.no_articles
FROM
author au,
(SELECT author_id, COUNT(*) no_articles FROM article GROUP BY author_id) ar
WHERE
au.id = ar.author_id)
Make sure you mark this as a migration (a checkbox below the RunSQL window) so that this gets added to your migrations folder.
Now add data APIs to the view, by hitting "Track table" on the API console's schema page.
Now you can make select queries using no_articles as the length attribute:
{
"type": "select",
"args": {
"table": "author_article_count",
"columns": [
"*"
],
"where": {
"no_articles": {
"$gt": "0"
}
}
}
}

Nest elastic search.net not returning any results via any query

I've created an index in sense which I'm happy with and am trying to implement a typed query in the NEST client as follows:
var node = new Uri("http://elasticsearch-blablablamrfreeman");
var settings = new ConnectionSettings(node)
.SetTimeout(300000)
.SetDefaultIndex("films")
.MapDefaultTypeIndices(d => d
.Add(typeof(film), "films"))
.SetDefaultPropertyNameInferrer(p=>p);
Inject it (amongst the searcher and indexer) with my DI:
builder.Register(c => new ElasticClient(settings)).Named<ElasticClient>("esclient");
Search using any query, such as the below:
var result = _client.Search<film>(s => s
.AllIndices()
.From(0)
.Size(10)
.Query(q => q
.Term(p => p.Title, query)
));
The indexer seems to work fine so code not included here. I've swapped in any number of settings parameters so I know that there's some redundancy in the code set above (or at least the default index would've sufficed).
The result var contains nothing whatsoever, with a big fat 0 across all it's properties, despite my having a wealth of data across my indices (including the "films" index).
I've even tried a raw QueryRaw method with a matchall and nada!
EDIT (Chris Pratt was along the right lines here)
Running:
var result = _client.Search<film>(s => s
.From(0)
.Size(10)
.QueryRaw(#"{ ""match_all"": {} }"));
And having:
var settings = new ConnectionSettings(node)
.SetTimeout(300000)
.MapDefaultTypeIndices(d => d
.Add(typeof (film), "chosen_index"))
.MapDefaultTypeNames(t => t
.Add(typeof (film), "en"));
Returns debug info as:
[Elasticsearch.Net.ElasticsearchResponse<Nest.SearchResponse<film>>] = {StatusCode: 200,
Method: POST,
Url: http://elasticsearch-blablablamrfreeman/chosen_index/film/_search,
Request: {
"from": 0,
"size": 10,
"query": { "match_all": {} }
},
Response: <Response stream not captured or already read...
My question being: It seemed I was in fact querying the wrong URL as per Chris Pratt's comment, but why isn't the type inference working for the type but it is for the index?
/chosen_index/film/_search
should read
/chosen_index/en/_search
If my inferencing is correct.
Should it POST or GET? I usually GET via the search API on sense. And finally, what if I want to write my queries against my native film type but have it override the ES-type in the URL in some instances.
For example if I inject a different language parameter and wish to now query the same index but both "en" and "de" ES-types etc (which are all valid types under the same index as already constructed via sense).
Thanks in advance!
Nothing obvious is jumping out at me for why this isn't working for you. However, I can give you a few avenues to pursue to attempt to resolve the issue.
I'm not familiar with the particular DI container that you're using, but it's possible that it's not binding properly, resulting some of your settings options not actually being utilized in the instance that's created. Might be a long shot, but I'd recommend digging in and at least verifying that the client instance you're getting is setup the way it should be.
It sort of side-steps the issue in a way, but Elasticsearch explicitly recommends you don't handle localization via different types. You should either use different indexes, i.e. chosen_index_en, chosen_index_es, etc., or use multifields:
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"es": {
"type": "string",
"analyzer": "spanish"
}
}
Then you can search on things like title.en or title.es.
As I see you are using the default mappings for the film type. That is, the data are analyzed by the standard analyzer before being indexed.
In the query, you are using the Term query which finds documents that contain the exact term (not analyzed) specified in the inverted index (see here). So be careful what your query is.
Try to use a match query like below:
var result = _client.Search<film>(s => s
.AllIndices()
.From(0)
.Size(10)
.Query(q => q
.Match(p => p.Title, query)
));
The query is now analyzed by the standard analyzer before being applied (see here).

Resources