How to get the out-degree/the in-degree of a vertex with a given name in Nebula Graph? - nebula-graph

I have created a set of data in Nebula Graph Database, I want to know how to get the out-degree and in-degree of a vertex with a given name.
(root#nebula) [basketballplayer]> match (v) return v limit 10
+-----------------------------------------------------------+
| v |
+-----------------------------------------------------------+
| ("player102" :player{age: 33, name: "LaMarcus Aldridge"}) |
| ("player106" :player{age: 25, name: "Kyle Anderson"}) |
| ("player115" :player{age: 40, name: "Kobe Bryant"}) |
| ("player129" :player{age: 37, name: "Dwyane Wade"}) |
| ("player138" :player{age: 38, name: "Paul Gasol"}) |
| ("team209" :team{name: "Timberwolves"}) |
| ("team225" :team{name: "Bucks"}) |
| ("team226" :team{name: "Magic"}) |
| ("player108" :player{age: 36, name: "Boris Diaw"}) |
| ("player122" :player{age: 30, name: "DeAndre Jordan"})

The out-degree of a vertex refers to the number of edges starting from that vertex, while the in-degree refers to the number of edges pointing to that vertex.
nebula > MATCH (s)-[e]->() WHERE id(s) == "given" RETURN count(e); #Out-degree
nebula > MATCH (s)<-[e]-() WHERE id(s) == "given" RETURN count(e); #In-degree
This is a very slow operation to get the out/in degree since no accelaration can be applied (no indices or caches). It also could be out-of-memory when hitting a supper-node.

Related

DynamoDB intersection select with pagination

I have following DB schema and I'd like to find the best way how to select list of Sorted keys which are common for PK_A and PK_B:
+---------------+---------+
| PK | SortKey |
+---------------+---------+
| | SK_A |
| PK_A | SK_B |
| | SK_C |
| - - - - - - - | |
| | SK_B |
| PK_B | SK_C |
| | SK_D |
+---------------+---------+
so when I do select by PK_A and PK_B it should return me only SK_B and SK_C?
Any help is appreciated.
Simple answer, you can't do it (in one call).
Dynamo is not a relational database, operations such as intersection are not supported.
You'd need to query() once for each partition key and then calculate the intersect yourself.

Get column names in its own column to render pie chart

I am writing a query in Kusto to parse heartbeat data from a sensor. This is what I've written:
datatable(timestamp:datetime, healthycount:int, unhealthycount:int, origin:string)
[
datetime(1910-06-11), 10, 1, 'origin',
datetime(1910-05-11), 9, 2, 'origin'
]
| summarize latest = arg_max(timestamp, *) by origin
| project healthy = healthycount,
unhealthy = unhealthycount
This outputs data like this:
+--------------+----------------+
| healthy | unhealthy |
+--------------+----------------+
| 10 | 1 |
+--------------+----------------+
However, I want to represent this data as a pie chart, but to do that I need the data in the following format:
+----------------+-------+
| key | value |
+----------------+-------+
| healthy | 10 |
| unhealthy | 1 |
+----------------+-------+
Is it possible to do this? What terminology am I looking for?
Here is one way:
datatable(timestamp:datetime, healthycount:int, unhealthycount:int, origin:string)
[
datetime(1910-06-11), 10, 1, 'origin',
datetime(1910-05-11), 9, 2, 'origin'
]
| summarize arg_max(timestamp, *) by origin
| extend Pack = pack("healthycount", healthycount, "unhealthycount", unhealthycount)
| mv-expand kind=array Pack
| project key = tostring(Pack[0]), value = toint(Pack[1])

Cassandra filtering map and other collections

I am trying to do the following.
Connected to DevCluster
[cqlsh 5.0.1 | Cassandra 3.10.0.1695 | DSE 5.1.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
user#cqlsh:test> desc table del28;
CREATE TABLE test.del28 (
sno int PRIMARY KEY,
dob date,
name range_dates,
ssss_details map<text, date>,
ssss_range map<text, frozen<map<date, date>>> );
CREATE INDEX idx_ssss_range ON test.del28 (keys(ssss_range));
CREATE INDEX ssss_details_idx ON test.del28 (values(ssss_details));
CREATE INDEX ssss_range_idx ON test.del28 (values(ssss_range));
user#cqlsh:test> select * from del28;
sno | dob | name | ssss_details | ssss_range
-----+------+--------------------------------------+----------------------------------------------+---------------------------------
5 | null | {start: 2014-03-05, end: 2018-04-05} | {'hello': 2014-05-05} | {'1': {2018-04-05: 2012-02-05}}
8 | null | {start: 2018-03-04, end: 2018-08-02} | {'hello8': 2018-08-08} | {'8': {2018-08-08: 2012-02-08}}
2 | null | {start: 2018-03-04, end: 2018-05-05} | {'hello': 2018-05-05} | {'1': {2018-07-08: 2018-09-01}}
4 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello1': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
7 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello4': 2014-05-03, 'hello5': 2014-05-02} | {'2': {2018-04-08: 2012-02-04}}
6 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello2': 2014-05-02, 'hello3': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
9 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello7': 2014-05-02, 'hello8': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
3 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
(8 rows)
My question is, can I use Filters on ssss_range, if so how? If not what is the best way to save this data. Idea is, there is number or text followed by dates. Example house1: {2012-04-05: 2013-02-05}, house2:{2013-04-08: 2014-02-04}...... for one particular user and where dates are a set and , explaining that person stayed on these times. I tried to split the dates in 'name' column. Still it did not work for me. Now there is lot of other info regarding this record.
I should be able to query based on house1, house2 i.e. where aaa = 'house1', some thing like that. Also should be able query based on dates i.e. where from_date > '' and to_date < ''. Something like that.
I am okay to change the way data is changed if it can query a better way. Any type of collections or data types are fine.
Please suggest the right approach.
Thanks

Find Interval Numbers with Quantile function on a Dataframe with Spark using Scala

I'm trying to port a code from R to Scala to perform Customer Analysis. I have already computed Recency, Frequency and Monetary factors on Spark into a DataFrame.
Here is the schema of the Dataframe :
df.printSchema
root
|-- customerId: integer (nullable = false)
|-- recency: long (nullable = false)
|-- frequency: long (nullable = false)
|-- monetary: double (nullable = false)
And here is a data sample as well :
df.order($"customerId").show
+----------+-------+---------+------------------+
|customerId|recency|frequency| monetary|
+----------+-------+---------+------------------+
| 1| 297| 114| 733.27|
| 2| 564| 11| 867.66|
| 3| 1304| 1| 35.89|
| 4| 287| 25| 153.08|
| 6| 290| 94| 316.772|
| 8| 1186| 3| 440.21|
| 11| 561| 5| 489.70|
| 14| 333| 57| 123.94|
I'm trying to find the intervals for on a quantile vector for each column given a probability segment.
In other words, given a probability vector of non-decreasing breakpoints,
in my case it will be the quantile vector, find the interval containing each element of x;
i.e. (pseudo-code),
if i <- findInterval(x,v),
for each index j in x
v[i[j]] ≤ x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := + Inf, and N <- length(v).
In R, this translates to the following code :
probSegment <- c(0.0, 0.25, 0.50, 0.75, 1.0)
RFM_table$Rsegment <- findInterval(RFM_table$Recency, quantile(RFM_table$Recency, probSegment))
RFM_table$Fsegment <- findInterval(RFM_table$Frequency, quantile(RFM_table$Frequency, probSegment))
RFM_table$Msegment <- findInterval(RFM_table$Monetary, quantile(RFM_table$Monetary, probSegment))
I'm kind of stuck with the quantile function thought.
In an earlier discussion with #zero323, he suggest that I used the percentRank window function which can be used as a shortcut. I'm not sure that I can apply the percentRank function in this case.
How can I apply a quantile function on a Dataframe column with Scala Spark? If this is not possible, can I use the percentRank function instead?
Thanks.
Well, I still believe that percent_rank is good enough here. Percent percent_rank window function is computed as:
Lets define pr as:
Transforming as follows:
gives a definition of a percentile used, according to Wikipedia, by Microsoft Excel.
So the only thing you really need is findInterval UDF which will return a correct interval index. Alternatively you can use rank directly and match on rank ranges.
Edit
OK, it looks like percent_rank is not a good idea after all:
WARN Window: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation
I am not exactly sure what is the point of moving data to a single partition to call non-aggregate function but it looks like we are back to square one. It is possible to use zipWithIndex on plain RDD:
import org.apache.spark.sql.{Row, DataFrame, Column}
import org.apache.spark.sql.types.{StructType, StructField, LongType}
import org.apache.spark.sql.functions.udf
val df = sc.parallelize(Seq(
(1, 297, 114, 733.27),
(2, 564, 11, 867.66),
(3, 1304, 1, 35.89),
(4, 287, 25, 153.08),
(6, 290, 94, 316.772),
(8, 1186, 3, 440.21),
(11, 561, 5, 489.70),
(14, 333, 57, 123.94)
)).toDF("customerId", "recency", "frequency", "monetary")
df.registerTempTable("df")
sqlContext.cacheTable("df")
A small helper:
def addRowNumber(df: DataFrame): DataFrame = {
// Prepare new schema
val schema = StructType(
StructField("row_number", LongType, false) +: df.schema.fields)
// Add row number
val rowsWithIndex = df.rdd.zipWithIndex
.map{case (row: Row, idx: Long) => Row.fromSeq(idx +: row.toSeq)}
// Create DataFrame
sqlContext.createDataFrame(rowsWithIndex, schema)
}
and the actual function:
def findInterval(df: DataFrame, column: Column,
probSegment: Array[Double], outname: String): DataFrame = {
val n = df.count
// Map quantiles to indices
val breakIndices = probSegment.map(p => (p * (n - 1)).toLong)
// Add row number
val dfWithRowNumber = addRowNumber(df.orderBy(column))
// Map indices to values
val breaks = dfWithRowNumber
.where($"row_number".isin(breakIndices:_*))
.select(column.cast("double"))
.map(_.getDouble(0))
.collect
// Get interval
val f = udf((x: Double) =>
scala.math.abs(java.util.Arrays.binarySearch(breaks, x) + 1))
// Final result
dfWithRowNumber
.select($"*", f(column.cast("double")).alias(outname))
.drop("row_number")
}
and example usage:
scala> val probs = Array(0.0, 0.25, 0.50, 0.75, 1.0)
probs: Array[Double] = Array(0.0, 0.25, 0.5, 0.75, 1.0)
scala> findInterval(df, $"recency", probs, "interval").show
+----------+-------+---------+--------+--------+
|customerId|recency|frequency|monetary|interval|
+----------+-------+---------+--------+--------+
| 4| 287| 25| 153.08| 1|
| 6| 290| 94| 316.772| 2|
| 1| 297| 114| 733.27| 2|
| 14| 333| 57| 123.94| 3|
| 11| 561| 5| 489.7| 3|
| 2| 564| 11| 867.66| 4|
| 8| 1186| 3| 440.21| 4|
| 3| 1304| 1| 35.89| 5|
+----------+-------+---------+--------+--------+
but I guess it is far from optimal.
Spark 2.0+:
You could replace manual rank computation with DataFrameStatFunctions.approxQuantile. This would allow for faster interval computation:
val relativeError: Double = ????
val breaks = df.stat.approxQuantile("recency", probs, relativeError)
This can be achieved with Bucketizer. Using the same data frame as in the example above:
import org.apache.spark.ml.feature.Bucketizer
val df = sc.parallelize(Seq(
(1, 297, 114, 733.27),
(2, 564, 11, 867.66),
(3, 1304, 1, 35.89),
(4, 287, 25, 153.08),
(6, 290, 94, 316.772),
(8, 1186, 3, 440.21),
(11, 561, 5, 489.70),
(14, 333, 57, 123.94)
)).toDF("customerId", "recency", "frequency", "monetary")
val targetVars = Array("recency", "frequency", "monetary")
val probs = Array(0.0, 0.25, 0.50, 0.75, 1.0)
val outputVars = for(varName <- targetVars) yield varName + "Segment"
val breaksArray = for (varName <- targetVars) yield df.stat.approxQuantile(varName,
probs,0.0)
val bucketizer = new Bucketizer()
.setInputCols(targetVars)
.setOutputCols(outputVars)
.setSplitsArray(breaksArray)
val df_e = bucketizer.transform(df)
df_e.show
Result:
targetVars: Array[String] = Array(recency, frequency, monetary)
outputVars: Array[String] = Array(recencySegment, frequencySegment, monetarySegment)
breaksArray: Array[Array[Double]] = Array(Array(287.0, 290.0, 333.0, 564.0, 1304.0), Array(1.0, 3.0, 11.0, 57.0, 114.0), Array(35.89, 123.94, 316.772, 489.7, 867.66))
+----------+-------+---------+--------+--------------+----------------+--------------
-+|customerId|recency|frequency|monetary|recencySegment|frequencySegment|monetarySegment|
+----------+-------+---------+--------+--------------+----------------+---------------+
| 1| 297| 114| 733.27| 1.0| 3.0| 3.0|
| 2| 564| 11| 867.66| 3.0| 2.0| 3.0|
| 3| 1304| 1| 35.89| 3.0| 0.0| 0.0|
| 4| 287| 25| 153.08| 0.0| 2.0| 1.0|
| 6| 290| 94| 316.772| 1.0| 3.0| 2.0|
| 8| 1186| 3| 440.21| 3.0| 1.0| 2.0|
| 11| 561| 5| 489.7| 2.0| 1.0| 3.0|
| 14| 333| 57| 123.94| 2.0| 3.0| 1.0|
+----------+-------+---------+--------+--------------+----------------+---------------+

How to get a mixed object types resultset from Doctrine2 Single Table Inheritance repository?

For the following schema:
Animal
- age
- gender
- size
Cat extends Animal
- fur_color
Snake extends Animal
- scales_color
Elephant extends Animal
- tusks_size
When I do $em->getRepository('AcmeDemoBundle:Animal')->findAll() I will recieve a collection on Animal objects without their subclass properties.
When I do $em->getRepository('AcmeDemoBundle:Cat')->findAll() I will recieve the objects with their subclass (Cat) properties, however I will get only Cat objects (no snakes or elephants).
1) Is there I way to get all the animals, but not as base Animal objects, but actually their leaf subclass type?
Eg. for database like this:
Animals table:
ID | discr | age | gender | size | fur_color | scales_color | tusks_size
1 | snake | 2 | male | 20ft | NULL | green | NULL
2 | cat | 3 | female | 5ft | red | NULL | NULL
3 | eleph | 6 | male | 99ft | NULL | NULL | 40ft.
4 | cat | 2 | male | 6ft | grey | NULL | NULL
I'd like to recieve a Collection of:
Snake (id: 1, age: 2, gender: male, size: 20ft, scales_color: green)
Cat (id: 2, age: 3, gender: female, size: 5ft, fur_color: red)
Elephant (id: 3, age: 6, gender: female, size: 99ft, tusks_Size: 40ft.)
Cat (id: 4, age: 2, gender: male, fur_color: grey)
2) If it's not possible with STI... is it possible with Class Table Inheritance?
Indeed it seems I had some error in my configuration. Recreating the bundle and writing the entities again fixed the problem as #Bez and #Cerad suggested.

Resources