Is there a difference between Alter.Table().AddColumn() and Create.Column().OnTable()? - fluent-migrator

FluentMigrator supports two ways to add a column to a table:
Alter.Table("foo").AddColumn("bar")...
and
Create.Column("bar").OnTable("foo")...
Is there any difference between the two, or are they synonymous?
The documentation on Auto-reversing migrations says that Create.Column is supported, but it does not list Alter.Table.

Related

Efficient way to create relationships from a csv in neo4j

I am trying to populate my graph db with relationships that I currently have access to in a file.
They are in the form were each line in the relationship csv has the unique IDs of the two nodes that relationship is describing as well as the kind of the relationship it is.
Each line in the relationship csv has something along of the lines of:
uniqueid1,uniqueid2,relationship_name,property1_value,..., propertyn_value
I already had all nodes created and was working on matching the nodes that match the uniqueids specified in each of the files and then creating the relationship between them.
However, the tend to be taking a long time to be creating for each of the relationships and my suspicion is that I am doing something wrong.
The csv file has about 2.5 million lines with different relationship types. So i manually set the relationships.rela property to one of them and try to run through creating all nodes involved in that relationship and follow up with the next using my where clause.
The amount of properties each node has has been reduced by an ellipsis(...) and the names redacted.
I currently have the query to create the relationships set up in the following way
:auto USING PERIODIC COMMIT 100 LOAD CSV WITH HEADERS FROM 'file:///filename.csv' as relationships
WITH relationships.uniqueid1 as uniqueid1, relationships.uniqueid2 as uniqueid2, relationships.extraproperty1 as extraproperty1, relationships.rela as rela... , relationships.extrapropertyN as extrapropertyN
WHERE relations.rela = "manager_relationship"
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
MERGE (b) - [rel: relationship_name {propertyvalue1: extraproperty1,...propertyvalueN: extrapropertyN }] -> (a)
RETURN count(rel)
Would appreciate if alternate patterns could be recommended.
Indexing is a mechanism that databases use, to speed up data lookups. In your case, since Item nodes are not indexed, these two matches can take a lot of time, especially if the number of Item nodes is very large.
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
To speed this up, you can create an index on Item nodes uniqueid property, like this:
CREATE INDEX unique_id_index FOR (n:Item) ON (n.uniqueid)
When you'll run your import query after creating the index, it will be much faster. But it will still take a bit of time as there are 2.5 million relationships. Read more about indexing in neo4j here.
Aside from suggestion from Charchit about create an index, I will recommend using an APOC function apoc.periodic.iterate which will execute the query into parallel batches of 10k rows.
https://neo4j.com/labs/apoc/4.4/overview/apoc.periodic/apoc.periodic.iterate/
For example:
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'file:///filename.csv' as relationships RETURN relationships",
"WITH relationships.uniqueid1 as uniqueid1, relationships.uniqueid2 as uniqueid2, relationships.extraproperty1 as extraproperty1, relationships.rela as rela... , relationships.extrapropertyN as extrapropertyN
WHERE relations.rela = "manager_relationship"
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
MERGE (b) - [rel: relationship_name {propertyvalue1: extraproperty1,...propertyvalueN: extrapropertyN }] -> (a)",{batchSize:10000, parallel:true})
The first parameter will return all the data in the csv file then it will divide the rows into 10k per batch and it will run it in parallel using default concurrency (50 workers).
I use it often where I load 40M nodes/edges in about 30mins.

Can sqlite-utils convert function select two columns?

I'm using sqlite-utils to load a csv into sqlite which will later be served via Datasette. I have two columns, likes and dislikes. I would like to have a third column, quality-score, by adding likes and dislikes together then dividing likes by the total.
The sqlite-utils convert function should be my best bet, but all I see in the documentation is how to select a single column for conversion.
sqlite-utils convert content.db articles headline 'value.upper()'
From the example given, it looks like convert is followed by the db filename, the table name, then the col you want to operate on. Is it possible to simply add another col name or is there a flag for selecting more than one column to operate on? I would be really surprised if this wasn't possible, I just can't find any documentation to support it.
This isn't a perfect answer as it doesn't resolve whether sqlite-utils supports multiple column selection for transforms, but this is how I solved this particular problem.
Since my quality_score column would just be basic math, I was able to make use of sqlite's Generated Columns. I created a file called quality_score.sql that contained:
ALTER TABLE testtable
ADD COLUMN quality_score GENERATED ALWAYS AS (likes /(likes + dislikes));
and then implemented it by:
$ sqlite3 mydb.db < quality_score.sql
You do need to make sure you are using a compatible version of sqlite, as this only works with version 3.31 or later.
Another consideration is to make sure you are performing math on integers or floats and not text.
Also attempted to create the table with the virtual generated column first then fill it with my data later, but that didn't work in my case - it threw an error that said the number of items provided didn't match the number of columns available. So I just stuck with the ALTER operation after the fact.

Add node to linked list after given node in cypher

I have the following graph:
(Boxer)-[:STARTS]->(Round)-[:CONTINUES]->(Round)-[:CONTINUES]->(Round)-[:CONTINUES]->(Round)
I want to insert a new Round AFTER a specified Round called prevRound. Right now I am doing this:
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
MERGE (prevRound)-[oldRel:CONTINUES]->(nextRound)
MERGE (prevRound)-[:CONTINUES]->(round)-[:CONTINUES]->(nextRound)
DELETE oldRel
This works but it will actually create an empty node when I try to insert a node at the end of the list. I know it's because of:
MERGE (prevRound)-[oldRel:CONTINUES]->(nextRound)
Indeed, this will create a nextRound node when it does not exists.
How can I prevent that?
I tried with optional match but it did not work at well.
MERGE is not the right clause to use here, since as you saw it will create the pattern if it does not exist, giving you a blank node and a relationship to it from prevRound. OPTIONAL MATCH is the correct clause to use for that line (though you do need a WITH clause between it and the preceding MERGE)...but a better approach would actually be to rearrange your query a little (see the last paragraph).
You should also split up the last MERGE, since a longer pattern like this will likely not do what you expect it to do under certain circumstances. Read our knowledge base article on understanding how MERGE works for some of the finer details that might otherwise trip you up.
We can actually accomplish what you want fairly simply by rearranging parts of your query.
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
WITH round, prevRound
OPTIONAL MATCH (prevRound)-[oldRel:CONTINUES]->(nextRound)
DELETE oldRel
MERGE (prevRound)-[:CONTINUES]->(round)
WITH round, nextRound, oldRel
WHERE nextRound IS NOT NULL
MERGE (round)-[:CONTINUES]->(nextRound)
We guard the MERGE between round and nextRound by the preceding WHERE clause, which filters out any rows where nextRound doesn't exist.
A perhaps simpler way to do this, though slightly less efficient, is to deal with the nodes you know exist first, round and prevRound, then deal with the pattern that may or may not exist, the MATCH to the old node, though you will need to do a bit of filtering, since the MATCH will also pick of the relationship you just created to round:
MERGE (round:Round {uuid: $round.uuid})
MERGE (prevRound:Round {uuid: $prevRound.uuid})
MERGE (prevRound)-[:CONTINUES]->(round)
WITH round, prevRound
MATCH (prevRound)-[oldRel:CONTINUES]->(nextRound)
WHERE nextRound <> round
DELETE oldRel
MERGE (round)-[:CONTINUES]->(nextRound)
You might also consider if there are any places where you know such a relationship does not exist, and if so, use CREATE instead of MERGE. I have a feeling the last MERGE here probably could be a CREATE instead.

Multiple feature vectors in weka

I am working in a Text categorization project using Weka,I have 12 class
I need to find text keywords for each class that distinguish one class from others,
So I am thinking to make feature vector(FV) for each class independently and store 12 (FV)s in separated 12 arff files!
The Question Is --> How can I combine 12 different Feature vectors in one feature vector?
Depending on classes overlapping or not, I propose two different approaches instead of joining the feature vectors:
If classes are not overlapping (that is, no document is in two or
more classes at the same time), you would rather build a single ARFF
file and then make use of the AttributeSelection filter (Ranker
search, InfoGainAttributeEval evaluator suggested) to determine which
features most discriminate among all the classes.
If classes are overlapping, you could build twelve one-again-the-rest
classifiers, each one with its own vocabulary. You could apply
attribute selection to each independent problem as well, finding the
features that best discriminate a single class from all of the rest.

Creating a boxplot from two tables in KNIME

I am trying to plot two data columns coming from different tables using KNIME. I want to plot them in a single plot in order to be easier to compare them. In R this effect can be achieved by the following:
boxplot(df$Delay, df2$Delay,names=c("Column from Table1","Column from Table2"), outline=FALSE)
However, by using KNIME, I cannot think of a way that you can use data coming from two different tables. Have you ever faced this issue in KNIME?
If you have the same number of rows in the columns you can use the Column Appender node to get both columns into the same table. If not, you can use the Column Joiner node to create a superset of both columns.
Seems to be a working solution would be -according to the discussions in the comments- the following:
Installing the KNIME Interactive R Statistics Integration (you might already have it installed)
Using the Add Table To R node to add the second table to R
I guess the usual R code can be used to create the figures
The above answer using the "Add Table to R" node is a very nice option.
You could also do it before hand in KNIME. If the two tables have the same columns you could concatenate them using the "Concatenate" node and, if you need mark the rows with the "Constant Value" node from which table the come from originally.
If the two tables have different columns but some row identifiers in common you could join them using the "Joiner" node into one table. Then pass the concatenated or joined table over to R.

Resources