Gremlin query to check for two properties value - gremlin

I am trying to do a query in Gremlin similar to the following.
SELECT * FROM profiles WHERE firstName like 'John' OR lastName like 'John'
both firstName and lastName are properties of one vertex

Assuming that profiles is a node label (akin to a table name in SQL), and that the column names are properties on a node, the simple Gremlin form (without like) would be something like:
g.V().hasLabel('profile').
or(has('firstName','John'),has('lastName','John'))
However, the Gremlin language (before release 3.6) did not have a way to express anything along the lines of like. Some implementations offer language extensions or integration with an external index such as Elastic Search or Open Search. In those cases that is a way to achieve the like functionality.
Starting with TinkerPop 3.6 a new regex text predicate has been added. So the query above can be re-written using any supported regular expression. For example, a simple case where you are not sure if the name is capitalized might be queried using:
g.V().hasLabel('profile').
or(has('firstName',regex('[J|j]ohn')),has('lastName',regex('[J|j]ohn')))
It may take a while before implementations move up to this new level, but once they do, this is one way to address queries that need vaguer searches.

Related

Tinkerpop Gremlin is it better to query with hasId or to search by property values

Using Tinkerpop Gremlin (Neptune DB), is there a preferred/"faster" way to query?
For example, let's say I have a graph containing the node:
label: Student
id: 'student/12345'
studentId: '12345'
name: 'Bob'
Is there a preferred query? (for this example let's say we know the field 'studentId' value, which is also part of the id)
g.V().filter('studentId', '12345')
vs
g.V().filter(hasId(TextP.containing('12345'))
or using "has"/"hasId" vs "filter"?
g.V().has('studentId', '12345')
vs
g.V().hasId(TextP.containing('12345'))
So there seems to be two questions here, one about filter() vs has() and the other about using the vertex id versus a property.
The answer to the first question is going to depend on the underlying database implementation and what is has/has not optimized. In general, and in Neptune, I would suggest using the g.V().has('studentId', '12345') pattern to filter on a property as it is optimized and easier to read.
The answer to the second question also depends on the database implementaiton, as not all allow for setting of the vertex ids. Other databases may vary but in Neptune setting ids is allowed and a direct lookup by ID is the fastest (e.g. g.V('12345') or g.V().hasId('12345')) way to look something up as it is a single index lookup. One thing to note is that in Neptune vertex/edge id values need to be globally unique so you need to ensure that you will only have one vertex or edge with a specific id.

Create Vertex only if "from" and "to" vertex exists

I want to create 1000+ Edges in a single query.
Currently, I am using the AWS Neptune database and gremlin.net for creating it.
The issue I am facing is related to the speed. It took huge time because of HTTP requests.
So I am planning to combine all of my queries in a string and executing in a single shot.
_g.AddE("allow").From(_g.V().HasLabel('person').Has('name', 'name1')).To(_g.V().HasLabel('phone').Where(__.Out().Has('sensor', 'nfc'))).Next();
There are chances that the "To" (target) Vertex may not be available in the database. When it is the case this query fails as well. So I had to apply a check if that vertex exists before executing this query using hasNext().
So as of now its working fine, but when I am thinking of combining all 1000+ edge creation at once, is it possible to write a query which doesn't break if "To" (target) Vertex not found?
You should look at using the Element Existence pattern for each vertex as shown in the TinkerPop Recipes.
In your example you would replace this section of your query:
_g.V().HasLabel('person').Has('name', 'name1')
with something like this (I don't have a .NET environment to test the syntax):
__.V().Has('person', 'name', 'name1').Fold().
coalesce(__.Unfold(), __.AddV('person').Property('name', 'name1')
This will act as an Upsert and either return the existing vertex or add a new one with the name property. This same pattern can then be used on your To step to ensure that it exists before the edge is created as well.

Can SQLite return default values for non-existent columns instead of error?

I know how to use IFNULL to get default values for non-existent rows or null values, but for creating queries that are compatible with older schema versions, it would be nice to be able to do this:
Schema v1: CREATE TABLE Employee (Name TEXT, Phone TEXT)
Schema v2: CREATE TABLE Employee (Name TEXT, Phone TEXT, Address TEXT)
Theoretical backward compatible query:
SELECT Name, Phone, IFNULL(Address, '') FROM Employee
Obviously this doesn't work for a file created with schema v1. Is there some way to do this though?
There are 2 alternative workflows, but both are rather annoying. Either 1) update the old db by adding missing columns (which would start with null values); or 2) build the query code dynamically based on schema version.
Create a temporary view that references a particular schema, substituting default values (or even transforming other data) for individual columns which differ between the base schemas.
Sqlite views can even be made modifiable by defining appropriate triggers.
This still requires programming some conditional logic upon connection, but it would allow more uniform queries and interaction with different versions of the schema.
The suggested syntax would perhaps be convenient in some limited cases, but this approach is much more useful since it can be expanded beyond simple "if column exists" Boolean operations and instead could be used to perform dynamic transformation of one schema into another, perhaps joining tables and providing more advanced logic for updates of differing schema, etc.
Pseudo code mixed with view definitions to demonstrate:
db <- Open database connection
db_schema <- determine schema version
If db_schema == 1 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, '' AS Address
FROM main.Employee;" )
Else If db_schema == 2 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, Address
FROM main.Employee;" )
End If
#Later in code
data <- db.getdata("SELECT Name, Address
FROM EmployeeX")
If you're really averse to conditional statements for the schema this may still be annoying, but it would at least reduce/eliminate conditional statements throughout the code--ideally occurring as part of the connection logic at one location in the code.
You might further notice that this pattern is really what object-oriented programming is supposed to solve. There's no mention of the language in the question, but a well-designed object model could be created in a similar fashion so that all database access is done through a unified interface. The implementation details for different schemas are internal to different objects that derive (i.e. implement interfaces and/or inherit from base class) from a basic set of interfaces. Consider the language you're using to see if the problem could be solved this way.

CustTableListPage filtering is too slow

When I'm trying to filter CustAccount field on CustTableListPage it's taking too long to filter. On the other fields there is no latency. I'm trying to filter just part of account number like "*123".
I have done reindexing for custtable and also updated statics but not appreciable difference at all.
When i have added listpage's query in a view it's filtering custAccount field normally like the other fields.
Any suggestion?
Edit:
Our version is AX 2012 r2 cu8, not a user based problem it occurs for every user, Interaction class has some custimizations but just for setting some buttons enable/disable props. etc... i tryed to look query execution what i found is not clear. something like FETCH_API_CURSOR_000000..x
Record a trace of this execution and locate what is a bottleneck.
Keep in mind that that wildcards (such as *) have to be used with care. Using a filter string that starts with a wildcard kills all performance because the SQL indexes cannot be used.
Using a wildcard at the end
Imagine that you have a dictionnary and have to list all the words starting with 'Foo'. You can skip all entries before 'F', then all those before 'Fo', then all those before 'Foo' and start your result list from there.
Similarly, asking the underlying SQL engine to list all CustAccount entries starting with '123' (= filter string '123*') allows using an index on CustAccount to quickly skip to the relevant data.
Using a wildcard at the start
Imagine that you still have that dictionnary and have to list all the words ending with 'ing'. You would have no other choice than going through the entire dictionnary and checking the ending of every word (due to the alphabetical sorting).
This explains why asking the SQL engine to list all CustAccount entries ending with '123' (= filter string '*123') means that all CustAccount values must be investigated. So the AOS loops through all the entries and uses an SQL cursor to do this. That is the FETCH_API_CURSOR statement you see on the SQL level.
Possible solutions
Educate your end user that using a wildcard at the beginning of a filter string will always be slow on a large table.
Step up the SQL server hardware / allocated resources (faster CPU, more RAM, faster disk, ...).
Create a full text index on CustAccount (not a fan of this one and performance impact should be thoroughly investigated).
I've solve the problem. CustTableListPage query had a sorting over DirPartyTable.Name field. When I remove this sorting, filtering with wildcard working like a charm.

Riak search queries via the java client

I am trying to perform queries using the OR operator as following:
MapReduceResult result = riakClient.
mapReduce("some_bucket", "Name:c1 OR c2").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).
execute();
I only get the 1st object in the query (where name='c1').
If I change the order of the query (i.e. Name:c2 OR c1) again I get only the first object in query (where name='c2').
is the OR operator (and other query operators) supported in the java client?
I got this answer from Basho engeneer, Sean C.:
You either need to group the terms or qualify both of them. Without a field identifier, the search query assumes that the default field is being searched. You can determine how the query will be interpreted by using the 'search-cmd explain' command. Here's two alternate ways to express your query:
Name:c1 OR Name:c2
Name:(c1 OR c2)
both options worked for me!

Resources