Add node to linked list after given node in cypher - graph

I have the following graph:
(Boxer)-[:STARTS]->(Round)-[:CONTINUES]->(Round)-[:CONTINUES]->(Round)-[:CONTINUES]->(Round)
I want to insert a new Round AFTER a specified Round called prevRound. Right now I am doing this:
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
MERGE (prevRound)-[oldRel:CONTINUES]->(nextRound)
MERGE (prevRound)-[:CONTINUES]->(round)-[:CONTINUES]->(nextRound)
DELETE oldRel
This works but it will actually create an empty node when I try to insert a node at the end of the list. I know it's because of:
MERGE (prevRound)-[oldRel:CONTINUES]->(nextRound)
Indeed, this will create a nextRound node when it does not exists.
How can I prevent that?
I tried with optional match but it did not work at well.

MERGE is not the right clause to use here, since as you saw it will create the pattern if it does not exist, giving you a blank node and a relationship to it from prevRound. OPTIONAL MATCH is the correct clause to use for that line (though you do need a WITH clause between it and the preceding MERGE)...but a better approach would actually be to rearrange your query a little (see the last paragraph).
You should also split up the last MERGE, since a longer pattern like this will likely not do what you expect it to do under certain circumstances. Read our knowledge base article on understanding how MERGE works for some of the finer details that might otherwise trip you up.
We can actually accomplish what you want fairly simply by rearranging parts of your query.
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
WITH round, prevRound
OPTIONAL MATCH (prevRound)-[oldRel:CONTINUES]->(nextRound)
DELETE oldRel
MERGE (prevRound)-[:CONTINUES]->(round)
WITH round, nextRound, oldRel
WHERE nextRound IS NOT NULL
MERGE (round)-[:CONTINUES]->(nextRound)
We guard the MERGE between round and nextRound by the preceding WHERE clause, which filters out any rows where nextRound doesn't exist.
A perhaps simpler way to do this, though slightly less efficient, is to deal with the nodes you know exist first, round and prevRound, then deal with the pattern that may or may not exist, the MATCH to the old node, though you will need to do a bit of filtering, since the MATCH will also pick of the relationship you just created to round:
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
MERGE (prevRound)-[:CONTINUES]->(round)
WITH round, prevRound
MATCH (prevRound)-[oldRel:CONTINUES]->(nextRound)
WHERE nextRound <> round
DELETE oldRel
MERGE (round)-[:CONTINUES]->(nextRound)
You might also consider if there are any places where you know such a relationship does not exist, and if so, use CREATE instead of MERGE. I have a feeling the last MERGE here probably could be a CREATE instead.

Related

Can sqlite-utils convert function select two columns?

I'm using sqlite-utils to load a csv into sqlite which will later be served via Datasette. I have two columns, likes and dislikes. I would like to have a third column, quality-score, by adding likes and dislikes together then dividing likes by the total.
The sqlite-utils convert function should be my best bet, but all I see in the documentation is how to select a single column for conversion.
sqlite-utils convert content.db articles headline 'value.upper()'
From the example given, it looks like convert is followed by the db filename, the table name, then the col you want to operate on. Is it possible to simply add another col name or is there a flag for selecting more than one column to operate on? I would be really surprised if this wasn't possible, I just can't find any documentation to support it.
This isn't a perfect answer as it doesn't resolve whether sqlite-utils supports multiple column selection for transforms, but this is how I solved this particular problem.
Since my quality_score column would just be basic math, I was able to make use of sqlite's Generated Columns. I created a file called quality_score.sql that contained:
ALTER TABLE testtable
ADD COLUMN quality_score GENERATED ALWAYS AS (likes /(likes + dislikes));
and then implemented it by:
$ sqlite3 mydb.db < quality_score.sql
You do need to make sure you are using a compatible version of sqlite, as this only works with version 3.31 or later.
Another consideration is to make sure you are performing math on integers or floats and not text.
Also attempted to create the table with the virtual generated column first then fill it with my data later, but that didn't work in my case - it threw an error that said the number of items provided didn't match the number of columns available. So I just stuck with the ALTER operation after the fact.

Is there a way to extract a substring from a cell in OpenOffice Calc?

I have tens of thousands of rows of unstructured data in csv format. I need to extract certain product attributes from a long string of text. Given a set of acceptable attributes, if there is a match, I need it to fill in the cell with the match.
Example data:
"[ROOT];Earrings;Brands;Brands>JeweleryExchange;Earrings>Gender;Earrings>Gemstone;Earrings>Metal;Earrings>Occasion;Earrings>Style;Earrings>Gender>Women's;Earrings>Gemstone>Zircon;Earrings>Metal>White Gold;Earrings>Occasion>Just to say: I Love You;Earrings>Style>Drop/Dangle;Earrings>Style>Fashion;Not Visible;Gifts;Gifts>Price>$500 - $1000;Gifts>Shop>Earrings;Gifts>Occasion;Gifts>Occasion>Christmas;Gifts>Occasion>Just to say: I Love You;Gifts>For>Her"
Look up table of values:
Zircon, Diamond, Pearl, Ruby
Output:
Zircon
I tried using the VLOOKUP() function, but it needs to match an entire cell and works better for translating acronyms. Haven't really found a built in function that accomplishes what I need. The data is totally unstructured, and changes from row to row with no consistency even within variations of the same product. Does anyone have an idea how to do this?? Or how to write an OpenOffice Calc function to accomplish this? Also open to other better methods of doing this if anyone has any experience or ideas in how to approach this...
ok so I figured out how to do this on my own... I created many different columns, each with a keyword I was looking to extract as a header.
Spreadsheet solution for structured data extraction
Then I used this formula to extract the keywords into the correct row beneath the column header. =IF(ISERROR(SEARCH(CF$1,$D769)),"",CF$1) The Search function returns a number value for the position of a search string otherwise it produces an error. I use the iserror function to determine if there is an error condition, and the if statement in such a way that if there is an error, it leaves the cell blank, else it takes the value of the header. Had over 100 columns of specific information to extract, into one final column where I join all the previous cells in the row together for the final list. Worked like a charm. Recommend this approach to anyone who has to do a similar task.

Regex in R match specified words when they all (two or more) occur in whatever order within certain distance in particular line

I have a double challenge.
First, I want to match lines that contain two (or eventually more) specified words within certain distance in whatever order.
Using lookaround I manage to select lines matching two or more words, regardless of the order within they occur. I can also easily add more words to be found in the same line, so it this can also be applied without much effort when more word must occur in order to be selected. The disadvantage is that can't detail the maximal distance between them.
^(?=.*\john)(?=.*\jack).*$
By using the pipe operator I can detail both orders in which the terms may occur as well as the accepted distance between them, but when more words should be matched the code becomes lengthy and errorsensitive.
jack.{0,100}john|john.{0,100}jack
Is there a way to combine the respective advantages of both approaches in one regular expression?
Second, ideally I would like that only 'jack' and 'john' (and are selected in the line but not the whole line.
Is there a possibility to do this all at once?
For this case, you have to use the second approach. But it can't be possible with regex alone.. You have to ask for language tools help like paste in-order to build a regex (given in the second format).
In python, I would do like below to create a long regex.
>>> def create_reg(lis):
out = []
for i in lis:
out.append(''.join(i) + '|' + ''.join([i[2],i[1], i[0]]))
return '(?:' + '|'.join(out) + ')'
>>> lst = [('john', '{0,100}', 'jack'), ('foo', '{0,100}', 'bar')]
>>> create_reg(lst)
'(?:john{0,100}jack|jack{0,100}john|foo{0,100}bar|bar{0,100}foo)'
>>>

Neo4j MATCH then MERGE too many DB hits

This is the query:
MATCH (n:Client{curp:'SOME_VALUE'})
WITH n
MATCH (n)-[:HIZO]-()-[r:FB]-()-[:HIZO]-(m:Client)
WHERE ID(n)<>ID(m)
AND NOT (m)-[:FB]->(n)
MERGE (n)-[:FB]->(m) RETURN m.curp
Why is the Merge stage getting so many DB hits if the query already narrowed down
n, m pairs to 6,781 rows?
Details of that stage shows this:
n, m, r
(n)-[ UNNAMED155:FB]->(m)
Keep in mind that queries build up rows, and operations in your query get run on every row that is built up.
Because the pattern in your match may find multiple paths to the same :Client, it will build up multiple rows with the same n and m (but possibly different r, but as you aren't using r anywhere else in your query, I encourage you to remove the variable).
This means that even though you mean to MERGE a single relationship between n and a distinct m, this MERGE operation will actually be run for every single duplicate row of n and m. One of those MERGEs will create the relationship, the others will be wasting cycles matching on the relationship that was created without doing anything more.
That's why we should be able to lower our db hits by only considering distinct pairs of n and m before doing the MERGE.
Also, since your query made sure we're only considering n and m where the relationship doesn't exist, we can safely use CREATE instead of MERGE, and it should save us some db hits because MERGE always attempts a MATCH first, which isn't necessary.
An improved query might look like this:
MATCH (n:Client{curp:'SOME_VALUE'})
WITH n
MATCH (n)-[:HIZO]-()-[:FB]-()-[:HIZO]-(m:Client)
WHERE n <> m
AND NOT (m)-[:FB]->(n)
WITH DISTINCT n, m
MERGE (n)-[:FB]->(m)
RETURN m.curp
EDIT
Returning the query to use MERGE for the :FB relationship, as attempts to use CREATE instead ended up not being as performant.

Why does union not return a unique list?

Is there a logical reason why the following statement from the Hyperspec is the way it is?
"If there is a duplication between list-1 and list-2, only one of the duplicate instances will be in the result. If either list-1 or list-2 has duplicate entries within it, the redundant entries might or might not appear in the result."
Until I read this I was assuming that union should return a unique list and frustrated why my code didn't do so. It also seems odd to remove duplicates between lists but not within. Why even specify this?
It seems like one should be able to assume union will produce a unique list of the set's elements or am I missing something?
For the full page in Hyperspec see http://clhs.lisp.se/Body/f_unionc.htm
If your code has sets only with unique elements (like 1 2 3 ), then UNION will preserve this property.
If your code has sets with non-unique elements (like 1 2 2 3 ), then UNION does not need to make any effort to enforce uniqueness in the result set.
Removing duplicates is done with a separate function: REMOVE-DUPLICATES.
Assuming that elements of both lists that are arguments to UNION are unique means that the complexity of the algorithm in the worst case (non-sortable, non-hashable elements) is O(n*m). On the other hand removing duplicates in a list in that case is O(n^2). Making UNION remove duplicates would approximately triple the running time even in the case where there were no duplicates, since most of the time is consumed by doing the comparisons.

Resources