I'm using the alfresco api on my Java backed webscript. I have an uuid, and I do a new NodeRef with this uuid and then I use getChildAssoc from NodeService to obtain all the direct child nodes. But on folders with more than 100 subfolders it takes too long. Are there another quicky way to obtain all direct children of a node? The fastest method you know to do that.
Thanks!
Related
We are stamping user permission as a property (of SET cardinality) on each nodes and edges. Wondering what is best way to apply the has step on all the visited nodes/edges for a given traversal gremlin query.
like a very simple travarsal query:
// Flights from London Heathrow (LHR) to airports in the USA
g.V().has('code','LHR').out('route').has('country','US').values('code')
add has('permission', 'team1') to all the visited vertices and edges while traversal using the above query.
There are two approaches you may consider.
Write a custom TraversalStrategy
Develop a Gremlin DSL
For a TraversalStrategy you would develop one similar to SubgraphStrategy or PartitionStrategy which would take your user permissions on construction and then automatically inject the necessary has() steps after out() / in() sorts of steps. The drawback here is that your TraversalStrategy must be written in a JVM language and if using Gremlin Server must be installed on the server. If you intend to configure this TraversalStrategy from the client-side in any way you would need to build custom serializers to make that possible.
For a DSL you would create new navigational steps for out() / in() sorts of steps and they would insert the appropriate combination of navigation step and has() step. The DSL approach is nice because you could write it in any programming language and it would work, but it doesn't allow server-side configuration and you must always ensure clients use the DSL when querying the graph.
We are stamping user permission as a property (of SET cardinality) on each nodes and edges.
As a final note, by "SET cardinality" I assume that you mean multi-properties. Edges don't allow for those so you would only be able to stamp such a property on vertices.
We are planning to implement a virtual filesystem using Google Firestore.
The idea of subcollections is nice because it allows us to model our data in terms of a folder hierarchy, like so: /folders/folderA/entities/folderB/entities/fileX
Much like an actual filesystem, we'd like to handle cross-folder moves, such as moving nested subfolder folderB from parent folderA to parent folderC. Indeed, it will often be the case that the folder we want to move may themselves contain their own subcollections of files and folders an arbitrary K levels deep.
This comment suggests that moving a document will not automagically move its associated subcollections. Similarly, deleting a document will forego deleting its underlying subcollections, leaving them as orphans. It seems like the only way to move a folder (and its entities) from one parent to another would be through a recursive clone + delete strategy, which may be difficult to accomplish reliably and transactionally if its sub-entities are massive.
The alternative is to abandon using subcollections and store all folders at the root instead, using a document field like parent_id to point to other docs within the flat collection. This shouldn't impact querying speeds due to Firestore's aggressive indexing, but we've been unable to reproduce this claim locally; i.e., querying via subcollections is vastly more performant as the total # of documents increase in the DB, versus storing everything at the top level. A reproducible repo is available here. Note that the repo uses a local emulator instance, as opposed to an actual Firestore server.
Any advice would be very helpful!
I have a collection of json documents in CosmosDB that can contain references to other documents in the collection (by id).
I'd like to automatically manage graph edges between these documents by using triggers that run whenever a doc is created/updated/deleted.
Can I access the Gremlin API from Javascript inside the trigger function?
Is there any documentation for triggers in the context of graphs? I couldn't find any.
A dirtier alternative would be to just "manually" create the edge document in the trigger but this would break if the CosmosDB team change the underlying format of the documents describing the edges.
The Cosmos DB Trigger will probably work and it will give you a set of Documents which you might need to process first.
Since the Trigger is listening to the Change Feed you will get Documents that represent any insertion / update on the Collection. In the case of a Graph, these can be Vertices or Edges, so you might need to first detect what type of Document it is to work with it.
As for persisting the new relationship, the DocumentDB Output binding might not work for you, because like you said, the internal representation might change. But what you can do is include in your Azure Function some C# / Node Gremlin library and use it to talk to the Cosmos DB Graph API directly.
Graph API is currently not supported within UDFs/database triggers/stored procedures, and we don't have a timeline for when this will be supported.
The next best approach is manually create the graph elements as you described.
New to documentdb and I am trying to determine the best way to store documents. We are uploading documents every 15 minutes and I need to keep them as easily separated by upload as possible. At first glance, I thought I could have a database and a collection for each upload. Then, I discovered you can only have 3 collections per database. This leaves me with either adding a naming convention or trying to use folders and paths. According to the same source (http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/), we are limited to 100 paths per collection. This leaves folders. I have been looking, but I haven't found anything concrete on creating folders within a collection. The object API doesn't have an obvious add/create method.
Is this possible? If so, are we limited to how many (assuming I stay within the allowed collection/database size)?
You could define a sequential naming convention and create a range index on the collection indexing policy. In this way, if you need to retrieve a range of documents, you can do it in this way, which will leverage the indexing capabilities of docdb efficiently.
As a recommendation, you can examine the charge response header on the requests you fire off during your tests. This allows you to gauge how efficient your setup is (how stringent it is against the Db, which will translate into your cost structure for the service)
Sorry about the comment. What we ended up doing was just dumping everything into one collection. The azure documentdb query language (i.e. sql like) seems robust enough to handle detailed queries. Though I am not sure what the efficiency will be like once we have a ton of documents in there.
Using solrcloud 4.6, let's say I have a 8 node cluster with a shard running on each node and many different collections. Basically a collections is made each day (to partition the data). Now the question is how to search all of the collection without knowing their names?
The wiki says I can do this:
http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT
which is basically searching multiple collections at the same time. But in my case collections are dynamically created and I don't know the current names.
Is there a way to send a generic search query that hits all the collections?
Or a way to specify a range of collections like collection1-10 or collection*2013?
I also know I can hook into the ZK and get the info but that would be too advanced for what I'm doing.
This should be done with SOLR-5466 (EDIT this is done as of 4.8), but has no patch ready yet...
In this question on the mailing list two workarounds are given, both retrieving the info from zookeeper, via api call or parsing html response to a get.
Here are the two workarounds:
ZK client API
you could just do a get_children on the zk node
/collections/ to get all collections.
without ZK client API point this url at your solrCloud install
href="http://host:port/solr/zookeeper?detail=true&path=%2Fcollections
you should be looking for children under the collections node.
To my knowledge you need to know something about the collections and you create alias for a group of collections.
You can do something like this:
http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=AliasName&collection=ListOfCollections
More on this topic:http://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/
I found the list of collections by requesting:
http://localhost:8983/solr/admin/collections?action=LIST
My Solr version is: 8.7.0