How do we convert string to number in a gremlin step? - gremlin

I have a graph which has vertex E1 with property "price" and "name" which are storing String values. I need to calculate sum of the column "price" grouped by "name". I am writing the below query using Java:
g.withSideEffect("Neptune#repeatMode","BFS")
.V().hasLabel("E1").group()
.by("name").by(values("price").unfold().sum())
.unfold()
.project("rowName","data")
.by(select(keys).properties(MandatoryCustomerAttributes.firstName.name()).value())
I am getting this error:
{
"requestId": "38b781ce-fc02-4f7d-a71e-476dfd1925ce",
"code": "UnsupportedOperationException",
"detailedMessage": "java.lang.String cannot be cast to java.lang.Number"
}
Please help me in converting the String to any number format so that I can do some mathematical operations.

At this time, you can not do such a conversion with Gremlin steps unless you use a lambda step (which is not always possible depending on the graph database you are using and since you tagged this question with Neptune, you definitely can't take that approach - it also isn't advisable). You would need to store your data natively as a number or do the conversion and related mathematical calculation within your application. There is possibility that Gremlin will address this limitation in 3.7.0 as part of the various primitive operations aimed at String.

Related

Tinkerpop Gremlin is it better to query with hasId or to search by property values

Using Tinkerpop Gremlin (Neptune DB), is there a preferred/"faster" way to query?
For example, let's say I have a graph containing the node:
label: Student
id: 'student/12345'
studentId: '12345'
name: 'Bob'
Is there a preferred query? (for this example let's say we know the field 'studentId' value, which is also part of the id)
g.V().filter('studentId', '12345')
vs
g.V().filter(hasId(TextP.containing('12345'))
or using "has"/"hasId" vs "filter"?
g.V().has('studentId', '12345')
vs
g.V().hasId(TextP.containing('12345'))
So there seems to be two questions here, one about filter() vs has() and the other about using the vertex id versus a property.
The answer to the first question is going to depend on the underlying database implementation and what is has/has not optimized. In general, and in Neptune, I would suggest using the g.V().has('studentId', '12345') pattern to filter on a property as it is optimized and easier to read.
The answer to the second question also depends on the database implementaiton, as not all allow for setting of the vertex ids. Other databases may vary but in Neptune setting ids is allowed and a direct lookup by ID is the fastest (e.g. g.V('12345') or g.V().hasId('12345')) way to look something up as it is a single index lookup. One thing to note is that in Neptune vertex/edge id values need to be globally unique so you need to ensure that you will only have one vertex or edge with a specific id.

Proper handling of date operations in Gremlin

I am using AWS Neptune Gremlin with gremlin_python.
My date in property is stored as datetime as required in Neptune specs.
I created it using Python code like this:
properties_dict['my_date'] = datetime.fromtimestamp(my_date, timezone.utc)
and then constructed the Vertex with properties:
for prop in properties:
query += """.property("%s", "%s")"""%(prop, properties[prop])
Later when interacting with the constructed graph, I am only able to find the vertices by an exact string matching query like the following:
g.V().hasLabel('Object').has("my_date", "2017-12-01 00:00:00+00:00").valueMap(True).limit(3).toList()
What's the best way for dealing with date or datetime in Gremlin?
How can I do range queries such as "give me all Vertices that have date in year 2017"?
Personally, I prefer to store date/time values as days/seconds/milliseconds since epoch. This will definitely work on any Graph DB and makes range queries much simpler. Also, the conversion to days or seconds since epoch and back should be a simple method call in pretty much any language.
So, when you create your properties dictionary, you could simplify your code by changing it to:
properties_dict['my_date'] = my_date
... as my_date should represent the number of seconds since epoch. And a range query would be as simple as:
g.V().has("Object", "my_date", P.between(startTimestamp, endTimestamp)).
limit(3).valueMap(True)

How can I Scan an index in reverse in DynamoDB?

I am currently using DynamoDB and having a problem scanning. I am able to get paged results in forward order by using the ExclusiveStartKey. However, regardless of whether I set ScanIndexForward true or false, I get results in forward order from my scan operation. How can i get results in reverse order from a Scan in DynamoDB?
ScanIndexForward is the correct way to get items in descending order by the range key of the table or index you are querying. From the AWS API Reference:
A value that specifies ascending (true) or descending (false)
traversal of the index. DynamoDB returns results reflecting the
requested order determined by the range key. If the data type is
Number, the results are returned in numeric order. For type String,
the results are returned in order of ASCII character code values. For
type Binary, DynamoDB treats each byte of the binary data as unsigned
when it compares binary values.
Based on the docs for Scan, I conclude that there is no way to Scan in reverse. However, I would say that you are not using DynamoDB correctly if you need to do that. When designing a schema for a database like DyanmoDB you should plan the schema based on your expected queries to ensure that almost all application queries have a good index. Scans are meant more for sys admin operations or for feeding into MapReduce or analytics. "A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set." (Query and Scan Performance) That can lead to performance problems and other issues.
Using DynamoDB is fundamentally different from working with a traditional relational database and requires a big change in the way you think about using it. You need to decide whether DynamoDB's advantages of availability in storage and performance, reliability and availability are worth accepting its limitations.
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying

When is it appropriate to use g.query()...vertex() vs g.V.has()

I'm a little confused about this one. Several similar examples can be found throughout the documentation. Such as :
g.V.has('name','hercules').next()
g.query().has("name",EQUAL,"hercules").vertices()
Could someone clarify what the difference in the process is between the two above?
Thanks
The first is gremlin-groovy syntax:
g.V.has('name','hercules').next()
and either iterates all vertices looking for vertices that have a "name" property with a value of "hercules". In the event that "name" is indexed then titan will utilize the index to avoid the linear scan to find such vertices.
The second is basically Java and the Titan API. The above gremlin-groovy code basically compiles down to your second statement:
g.query().has("name",EQUAL,"hercules").vertices()
however, in the case of the second statement it returns an iterator of all vertices that match the filter and doesn't just pop off the first one as shown in the gremlin statement (given the use of next()).

F# query expressions - restriction using string comparison in SqlProvider with SQLite

SQLite doesn't really have date columns. You can store your dates as ISO-8601 strings, or as the integer number of seconds since the epoch, or as Julian day numbers. In the table I'm using, I want my dates to be human-readable, so I've chosen to use ISO-8601 strings.
Suppose I want to query all the records with dates after today. The ISO-8601 strings will sort properly, so I should be able to use string comparison with the ISO-8601 string for today's date.
However, I see no way to do the comparison using the F# SqlProvider type provider. I'm hoping that this is just a reflection of my lack of knowledge of F# query expressions.
For instance, I can't do:
query {
for calendarEntry in dataContext.``[main].[calendar_entries]`` do
where (calendarEntry.date >= System.DateTime.Today.ToString("yyyy-MM-dd hh:mm:ss"))
... }
I get:
The binary operator GreaterThanOrEqual is not defined for the types 'System.String' and 'System.String'.
I also can't do any variation of:
query {
for calendarEntry in dataContext.``[main].[calendar_entries]`` do
where (calendarEntry.date.CompareTo(System.DateTime.Today.ToString("yyyy-MM-dd hh:mm:ss")) >= 0)
... }
I get:
Unsupported expression. Ensure all server-side objects appear on the left hand side of predicates. The In and Not In operators only support the inline array syntax.
Anyone know how I might do string comparisons in the where clause? It seems that my only option for filtering inside the query is to store seconds-since-epoch in the database and use integer comparisons.
This was a temporary bug with old SQLProvider version and it should be working now. If not, please open a new issue to the GitHub repository: https://github.com/fsprojects/SQLProvider

Resources