Google Datastore deletes and multiple parents - google-cloud-datastore

I have a data model in which entity A contains references to two other entities, B and C. If either B or C is deleted, I want A to be deleted.
When creating A, it's possible to name either B or C as its parent. Is it also possible to name both B and C as its parents so that if either B or C is deleted, A is also deleted?
In more concrete terms, say search results, a result might have both a category and a region, say a web page about birds in North America. The result is stored with a reference to its category and region. Later, you want to delete the category birds and you want the result also deleted. Likewise, you delete the region North America and want the result deleted.
I hate to go on at such length about such a trivial scenario. But it doesn't seem to be covered in any of the Datastore documentation. What am I missing? Is it a basically flawed data model?

Single-parent limitation:
A child can have only one parent in Datastore. In other words, A can only be a child of B OR C, not both. Of course, a parent can have multiple children, though.
Alternative:
You can use a KeyProperty with repeated=True argument and store many Entity keys on it. In Python, this would be like this:
class A(ndb.Model):
associated_with = ndb.KeyProperty(repeated=True)
some_other_property = ndb.StringProperty()
a_entity = A(
associated_with = [b_key, c_key],
some_other_property = 'any value'
)
a_entity.put()
Automatically triggering deletes:
Datastore doesn't offer this functionality out of the box, but you can mimic it in your application. Just one idea for implementing in Python, for example, you could extend the Model class with your own delete method (haven't tested this code, it's just for illustration):
class A(ndb.Model):
associated_with = ndb.KeyProperty(repeated=True)
some_other_property = ndb.StringProperty()
def delete_ext(entity): # entity object
if entity.associated_with:
for associated in entity.associated_with:
associated.delete()
entity.key.delete()
You may want to wrap all the deletes in a transaction. Beware that a single transaction can operate on up to 25 entity groups.

Related

Method to export Incidence Matrix from Grakn?

We often use GraphBLAS for graph processing so we need to use the incidence matrix. I haven't been able to find a way to export this from Grakn to a csv or any file. Is this possible?
There isn't a built-in way to dump data to CSV in Grakn right now. However, we do highly encourage our community to contribute open source tooling for these kinds of tasks! Feel free to chat to use about it on our discord.
As to how it can be done, conceptually it's pretty easy:
Query to get stream all hyper-relations out:
match $r isa relation;
and then for each relation, we can pipeline another query (possibly in new transaction if you wish to keep memory usage lower):
match $r iid <iid of $r from previous query>; $r ($x); get $x;
which will get you everything in this particular hyper relation $r playing a role.
If you also wish to extract attributes that are attached to the hyper relation, you can use the following
match $r iid <iid of $r from first query>; $r has $a; get $a;
In effect we can use these steps to build up each column in the A incidence matrix.
There are a couple if important caveats I should bring up:
What you'll end up with, will exclude all type information about the hyper relations, the role players in the relations, and the actual role that is being played by the role player, and attribute types owned.
==> It would be interesting to hear/discuss how one could encode types information for use in GraphBLAS
In Graql, it's entirely possible to have relations participating in relations. in the worst case, this means all hyper-edges E will also be present in the set V. In practice only a few relations will play a role in other relations, so only a subset of E may be in V.
So the incidence matrix is equivalent to the nodes/edges array used in force graph visualisation. In this case it is pretty straight forward.
My approach would be slightly different than the above as all i need to do is pull all of the things in the db (entities, relations, attributes), with
match $ting isa thing;
Now when i get my transaction back, for each $ting I want to pull all of the available properties using both local and remote methods if I am building a force graph viz, but for your incidence matrix, I really only care about pulling 3 bits of data:
The iid of the thing
The attributes the thing may own.
The roles the thing owns if it is a relation
Essentially one tests each returned object to find out the type (e.g. entity, attribute, relation), and then uses some of the local and remote methods to get the data one wants. In Python, the code for pulling the data for relations looks like
# pull relation data
elif thing.is_relation():
rel = {}
rel['type'] = 'relation'
rel['symbol'] = key
rel['G_id'] = thing.get_iid()
rel['G_name'] = thing.get_type().get_label().name()
att_obj = thing.as_remote(r_tx).get_has()
att = []
for a in att_obj:
att.append(a.get_iid())
rel['has'] = att
links = thing.as_remote(r_tx).get_players_by_role_type()
logger.debug(f' links are -> {links}')
edges = {}
for edge_key, edge_thing in links.items():
logger.debug(f' edge key is -> {edge_key}')
logger.debug(f' edge_thing is -> {list(edge_thing)}')
edges[edge_key.get_label().name()] = [e.get_iid() for e in list(edge_thing)]
rel['edges'] = edges
res.append(rel)
layer.append(rel)
logger.debug(f'rel -> {rel}')
This then gives us a node array, which we can easily process to build an edges array (i.e. the links joining an object and the attributes it owns, or the links joining a relation to its role players). Thus, exporting your incidence matrix is pretty straightforward

Function of Rows, Rowsets in PeopleCode

I'm trying to get a better understanding of what Rows and Rowsets are used for in PeopleCode? I've read through PeopleBooks and still don't feel like I have a good understanding. I'm looking to get more understanding of these as it pertains to Application Engine programs. Perhaps walking through an example may help. Here are some specific questions I have:
I understand that Rowsets, Row, Record, and Field are used to access component buffer data, but is this still the case for stand alone Application Engine programs run via Process Scheduler?
What would be the need or advantage to using these as opposed to using SQL objects/functions (CreateSQL, SQLExec, etc...)? I often see in AE programs where the CreateRowset object is instantiated and uses a .Fill method with a SQL WHERE Clause and I don't quite understand why a SQL was not used instead.
I've seen in PeopleBooks that a Row object in a component scroll is a row, how does a component scroll relate to the row? I've seen references to rows having different scroll levels, is this just a way of grouping and nesting related data?
After you have instantiated the CreateRowset object, what are typical uses of it in the program afterwards? How would you perform logic (If, Then, Else, etc..) on data retrieved by the rowset, or use it to update data?
I appreciate any insight you can share.
You can still use Rowsets, Rows, Records and fields in stand alone Application Engines. Application Engines do not have component buffer data as they are not running within the context of a component. Therefore to use these items you need to populate them using built-in methods like .fill() on a rowset, or .selectByKey() on a record.
The advantage of using rowsets over SQL is that it makes the CRUD easier. There are built-in methods for selecting, updating, inserting and deleting. Additionally you don't have to worry about making a large number of variables if there were multiple fields like you would with a SQL object. Another advantage is when you do the fill, the data is read into memory, where if you looped through the SQL, the SQL cursor would be open longer. The rowset, row, record and field objects also have a lot of other useful methods such as allowing you to executeEdits (validation) or copy from one rowset\row\record to another.
This question is a bit less clear to me but I'll try and explain. If you have a Page, it would have a level 0 row. It then could have multiple Level 1 rowsets. Under each of those it could have a level 2 rowsets.
Level0
/ \
Level1 Level1
/ \ / \
Level2 Level2 Level2 Level2
If one of your level1 rows had 3 rows, then you would find 3 rows in the Rowset associated with that level1. Not sure I explained this to answer what you need, please clarify if I can provide more info
Typically after I create a rowset, I would loop through it. Access the record on each row, do some processing with it. In the example below, I look through all locked accounts and prefix their description with LOCKED and then updated the database.
.
Local boolean &updateResult;
local integer &i;
local record &lockedAccount;
Local rowset &lockedAccounts;
&lockedAccounts = CreateRowset(RECORD.PSOPRDEFN);
&lockedAccounts.fill("WHERE acctlock = 1");
for &i = 1 to &lockedAccounts.ActiveRowCount
&lockedAccount = &lockedAccounts(&i).PSOPRDEFN;
if left(&lockedAccount.OPRDEFNDESCR.value,6) <> "LOCKED" then
&lockedAccount.OPRDEFNDESCR.value = "LOCKED " | &lockedAccount.OPRDEFNDESCR.value;
&updateResult = &lockedAccount.update();
if not &updateResult then
/* Error handle failed update */
end-if;
end-if;
End-for;

Adding a new user to neo4j

A totally neo4j noob is talking here,
I like to create a graph to store a set of users, a typical user is as follows:
CREATE
(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),
(node_2 {Address:"Newyork",CountryCode:"US"}),
(node_3 {Location:"Hidden"}),
(node_4 {phoneNumber:11111}),
(node_5 {InternetEmailAddress:"peter#peterland.com")
now the problem is,
Every time I execute this I add 5 more nodes.
I know I need to use a unique key, but all example I saw can use a unique key for a specific node. So how can I make sure a user doesn't get added if it already exists(I can use email address as unique key).
how do I update the nodes if some changes occur. for example, after a week I want to update the graph to contain the following instead of the previous one.(no duplicates)
CREATE(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),(node_2 {Address:"Newyork",CountryCode:"US"}),(node_3 {Location:"public"}),(node_4 {phoneNumber:11111}),(node_5 {InternetEmailAddress:"peter#peterland.com"),(node_6 {status:"Jailed"})
(NOTE the new update changed location to "public" and added a new node for peter
Seeing as you had a load of nodes anyway.
Some of the data you have modelled as Nodes are probably properties as the other answer suggests, some are possibly correctly modelled as Nodes and one could probably form the or a part of the relationship.
Location public/hidden can be modelled in one of three ways, as a property on the Person, as a property between the Person and the Location or as the relationship type. To understand that first you need to have a relationship.
Your address at the moment is another Node, I think this is correct, but possibly you would want two nodes, related something like this:
(s:State)-[:IN_COUNTRY]-(c:Country)
YMMV and clearly that a US centric model, but you can extend it easilly enough.
Now you could create Peter with a LIVES_IN relationship:
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN]->(s), (s)-[:IN_COUNTRY]->(c)
For speed you are better off modelling two relationships which could be LIVES_IN_PUBLIC and LIVES_IN_HIDDEN which means to perform that update that you want above then you have to delete the one and create the other. However, if speed is not of the essence, it is common also to use properties on the relationship.
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]->(c)
So your complete Q&A:
CREATE (p:Person {fullName:"Peter Parker",firstName:"peter",familyName:"parker", phoneNumber:1111, internetEmailAddress:"peter#peterland.com"}),
(s:State {name:"New York"}), (c:Country {code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]-(c)
MATCH (p:Person {internetEmailAddress:"peter#peterland.com"})-[li:LIVES_IN]->()
SET li.public = true, p.status = "jailed"
When adding other People you probably do not want to recreate States and Countries, rather you want to match them, and possibly Merge them, but we'll stick to Create.
MATCH (s:State{name:"New York"})
CREATE (p:Person{name:"John Smith", internetEmailAddress:"john#google.com"})-[:LIVES_IN{public:false}]->(s)
John Smith now implicitly lives in the US too as you can follow the relationship through the State Node.
Treatise complete.
I think you're modeling your data incorrectly here - you're setting up each property of the person as a separate node, which is not a good idea. You don't have any linkages between those nodes, so with this data pattern, later on you won't be able to tell what Peter Parker's address is. You're also not using node labels, which I think could really help here.
The quick question to your answer about updating nodes is that you have to MATCH them, then use SET to modify a property. So if you had a person, you might do this:
MATCH (p:Person { FullName: "Peter Parker" })
SET p.Address = "123 Fake Street"
RETURN p;
But notice I'm making assumptions about the way your data is structured. I'll take that same data you provided, this might be a better way of creating it:
CREATE (node_1:Person {FullName:"Peter Parker",
FirstName:"peter",
FamilyName:"parker",
Address:"Newyork",CountryCode:"US",
Location:"Hidden",
phoneNumber:11111,
InternetEmailAddress:"peter#peterland.com"});
The difference with this suggestion is that I'm putting all the properties into a single node (instead of one property per node) and I'm applying the Person label to the node.
If you structured the data like this, then the update query I provided would work. Structuring the data like you have it, it's not possible to update Peter Parker's address, because there's no relationship between your node_1 and node_2

Database design for complex requirement

I have a table for ticket type, (like new,update,escalate), and they have subtypes (for example new have around 5 subtypes eg. server,access etc).
Now every subtype have different info to be stored..(subtype server needs start date, end date, server name and access needs customer id,access type,confirmation attachment etc).
Basically, every subtype will require diff no.of info... so i want to ask , whether i should create diff table for each subtype...for Server i create a table with 4 column and for access i create a table with 7 col.
Chances are low new subtypes will be required once deployed, but still a possibility, so a new table will be created...
Is going this way the correct thing to do, or any other method is there?
Whether you should create a new table for each subtype depends on the nature of the inheritence. The following rule of thumb applies:
MANDATORY; AND: 1 Table for superclass + all subclasses
MANDATORY; OR: 1 Table for each subclass
OPTIONAL; AND: 1 Table for the superclass, 1 table for all subclasses
OPTIONAL; OR: 1 Table for superclass, 1 table for each subclass
MANDATORY means: every member of the superclass must be a member of a subclass.
AND means: a member of the superclass can be several subtypes
Calculated example: let's say you have a model that contains 1 superclass and 3 subclasses.
The total number of tables you would get for each of the four types of inheritence is respectively: 1, 3, 2, 4

Drupal create views involving LEFT JOIN Sub-Select with non-existent node

i'm using Drupal 6
I have this table relation and I've translated into CCK complete with it's relation.
Basically when I view a Period node, I have tabs to display ALL Faculty nodes combined with Presence Number.
here's the table diagram: http://i.stack.imgur.com/7Y5cU.png
Translated into CCK like these:
CCK Faculty (name),
CCK Period (desc,from,to) and
CCK Presence(node-reference-faculty, node-reference-period, presence_number)
Here's my simple manual SQL query that achieve this result: http://i.stack.imgur.com/oysd3.png
SELECT faculty.name, presence.presence_number FROM Faculty AS faculty
LEFT JOIN (SELECT * FROM Presence WHERE Period_id=1) AS presence ON faculty.id=presence.Faculty_id
The value of 1 for Period_id will be given by the Period Node ID from the url argument.
Now the hardest part, is simulating simple SQL query above into Views. How can I make such query into Views in Drupal-6 or Drupal-7 ?
thanks for any helps.
The main issue, which I think you've noticed, is that if you treat Faculty as the base for your join, then there is no way to join on the Presence nodes. Oppositely, if you treat Presence as the base, then you will not see faculties that have no presence number.
There is no easy way, using your currently defined structure, to do these joins in views.
I would say your easiest option is to remove the 'node-reference-faculty' field from the presence node and add a node-reference-presence field to the faculty. Since CCK fields can have multiple values, you can still have your one-to-many relationship properly.
The one downside of this is that then you need to manage the presence-faculty relationship from the faculty nodes instead of the presence nodes. If that's a show stopper, which it could be depending on your workflow, you could have BOTH node-reference fields, and use a module like http://drupal.org/project/backreference to keep them in sync.
Once you have your reference from faculty -> presence, you will need to add a relationship in Views. Just like adding a field or a filter, open the list of relationships and find the one for your node-reference field.
Next, you will need to add an argument for period id and set it up to use a node id from the url. The key thing is that when you add the argument, it will ask which relationship to use in its options. You will want to tell it to use your newly added presence relationship.
You don't really need to do a subquery in your SQL like that. This should be the same thing and won't make mysql try to create a temporary table. I mention it because you can't really do subqueries in Views unless you are writing a very custom Views handler, but in this case you don't really need the subquery anyway.
Ex.
SELECT f.name, p.presence_number
FROM Faculty AS f
LEFT JOIN Presence AS p ON f.id=p.Faculty_id
WHERE p.Period_id=1;
I wrote an article about how to achieve a similar outcome here. http://scottanderson.com.au/#joining-a-views-query-to-a-derived-table-or-subquery
Basically how to alter a Views query to left join on a sub-query.

Resources