How can I avoid collecting multiple instances of vertices and edges? - gremlin

I've created a query to return all "descendant" edges and vertices for a particular queried vertex.
The query works, but it duplicates edges and vertices with each iteration of the repeat step. At the end of the repeat step, the traverser takes the tail of these aggregated collections, tossing away all the previous iterations of aggregated data.
How can I avoid all these pointless collections and just create the tail collection that is returned at the end?
It is as if each iteration of repeat caches the complete result as a new collection. For example, if we track _item across iterations we get:
3__A.1.2
3__A.1.2 , 4__A.1.2.1
3__A.1.2 , 4__A.1.2.1 , 5__A.1.2.1.1
3__A.1.2 , 4__A.1.2.1 , 5__A.1.2.1.1 , 5__A.1.2.1.2
At the end of the query, the tail step returns iteration number 4.
Graph:
Query:
g.V("3__A.1.2")
.inE("hasFacilityItem") //steps to edge toward Item
.repeat(
hasLabel("hasFacilityItem")
.sideEffect(
fold().aggregate("_hasFacilityItem")
)
.sideEffect(
outV().fold().aggregate("_item")
)
.inV().hasLabel("FacilityItem")
.sideEffect(
fold().aggregate("_facilityItem")
)
.sideEffect(
coalesce(outE("containedByFacilityItem").not(outV().has("id", "3__A.1.2")), constant()) //returns [] if edge is first queried
.fold().aggregate("_contained")
)
.sideEffect(
coalesce(outE("containsFacilityItem"), constant()) //returns [] if edge is not extant
.fold().aggregate("_contains")
)
.coalesce(outE("containsFacilityItem").inV().inE("hasFacilityItem"), constant()) //returns [] if last vertex was reached
)
.emit()
.select("_item").tail().as("item")
.select("_hasFacilityItem").tail().as("hasFacilityItem")
.select("_facilityItem").tail().as("facilityItem")
.select("_contained").tail().as("contained")
.select("_contains").tail().as("contains")
.select("item", "hasFacilityItem", "facilityItem", "contained", "contains")
Setup: Note: I wrote this by hand..
g
.addV('Item').as('_3__A.1.2')
.addV('Item').as('_4__A.1.2.1')
.addV('Item').as('_5__A.1.2.1.1')
.addV('Item').as('_5__A.1.2.1.2')
.addV('FacilityItem').as('3__A.1.2')
.addV('FacilityItem').as('4__A.1.2.1')
.addV('FacilityItem').as('5__A.1.2.1.1')
.addV('FacilityItem').as('5__A.1.2.1.2')
.addE('hasFacilityItem').from('_3__A.1.2').to('3__A.1.2')
.addE('hasFacilityItem').from('_4__A.1.2.1').to('4__A.1.2.1')
.addE('hasFacilityItem').from('_5__A.1.2.1.1').to('5__A.1.2.1.1')
.addE('hasFacilityItem').from('_5__A.1.2.1.2').to('5__A.1.2.1.2')
.addE('hasItem').from('3__A.1.2').to('_3__A.1.2')
.addE('hasItem').from('4__A.1.2.1').to('_4__A.1.2.1')
.addE('hasItem').from('5__A.1.2.1.1').to('_5__A.1.2.1.1')
.addE('hasItem').from('5__A.1.2.1.2').to('_5__A.1.2.1.2')
.addE('containsFacilityItem').from('3__A.1.2').to('4__A.1.2.1')
.addE('containsFacilityItem').from('4__A.1.2.1').to('5__A.1.2.1.1')
.addE('containsFacilityItem').from('4__A.1.2.1').to('5__A.1.2.1.2')
.addE('containedByFacilityItem').from('5__A.1.2.1.2').to('4__A.1.2.1')
.addE('containedByFacilityItem').from('5__A.1.2.1.1').to('4__A.1.2.1')
.addE('containedByFacilityItem').from('4__A.1.2.1').to('3__A.1.2')

Related

How to Create an Adjacency List of the Graph Using ArangoDB AQL?

How can I create an adjacency list using just one AQL query?
To create an adjacency list of the graph, I am using two AQL queries.
The 1st query is for the nodes that have adjacent neighbours:
FOR v IN NTWK_feature
FOR e IN NTWK_edge
FILTER e.\_from == v.\_id || e.\_to == v.\_id
RETURN DISTINCT {
node: v.\_key,
adjNode: (e.\_from == v.\_id) ? DOCUMENT(e.\_to).\_key : DOCUMENT(e.\_from).\_key,
}
The 2nd query is for the nodes that have no adjacent neighbours:
FOR node in NTWK_feature
LET set = (FOR edge IN NTWK_edge
FILTER edge.\_to == node.\_id || edge.\_from == node.\_id
RETURN edge)
FILTER LENGTH(set)==0
RETURN DISTINCT {
node: node.\_key,
adjNode: ''
}
How can I create an adjacency list using just one query?

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

count total rows returned by query in doctrine 2 zf2

$countQuery = $qb->select('q.id,d.name,d.numbers')
->from('Application\Entity\quests', 'q');
->leftJoin('q.dots', 'd');
$query1 = $countQuery->getQuery()->getResult();
now how would i get the total number of results returned
**i don't want to write 2 queries** bcz it will increase the execution time than
i have tried
$countQuery = $qb->select('count(q.id) as total_results,d.name,d.numbers')
->from('Application\Entity\quests', 'q');
->leftJoin('q.dots', 'd');
$query1 = $countQuery->getQuery()->getResult();
but its not working
The getResult() method returns an array of results. To count total results returned by getResult() method simply count it with PHP function count.
$countQuery = $qb
->select('q.id,d.name,d.numbers)
->from('Application\Entity\quests', 'q')
->leftJoin('q.dots', 'd');
$query1 = $countQuery->getQuery()->getResult();
$totalResults = count($query1);
If you want to paginate your query then in case of counting total rows you need to execute two queries. One for paginated results and other to count all rows in the database.

Undesired flattening occuring

I'm using BigQuery on exported GA data (see schema here)
Looking at the documentation, I see that when I selected a field that is inside a record it will automatically flatten that record and duplicate the surrounding columns.
So I tried to create a denormalized table that I could query in a more SQL like mindset
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
I wrote the result of this table into another table denorm (flatten on, large data set on).
I get different results when I query denorm with the clause
WHERE session_id_STRING = "100001897901013346771447300813"
versus wrapping the above query in (which yields desired results)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
I'm sure this is by design, but if someone could explain the difference between these two methods that would be very helpful?
I believe you are saying that you did check the box "Flatten Results" when you created the output table? And I assume from your question that session_id_STRING is a repeated field?
If those are correct assumptions, then what you are seeing is exactly the behavior you referenced from the documentation above. You asked BigQuery to "flatten results" so it turned your repeated field into an un-repeated field and duplicated all the fields around it so that you have a flat (i.e., no repeated data) table.
If the desired behavior is the one you see when querying over the subquery, then you should uncheck that box when creating your table.
Looking at the documentation, I see that when I selected a field that
is inside a record it will automatically flatten that record and
duplicate the surrounding columns.
This is not correct. BTW, can you please point to the documentation - it needs to be improved.
Selecting a field does not flatten that record. So if you have a table T with a single record {a = 1, b = (2, 2, 3)}, then do
SELECT * FROM T WHERE b = 2
You still get a single record {a = 1, b = (2, 2)}. SELECT COUNT(a) from this subquery would return 1.
But once you write results of this query with flatten=on, you get two records: {a = 1, b = 2}, {a = 1, b = 2}. SELECT COUNT(a) from the flattened table would return 2.

RMySQL resultset is not "complete" after all results have been fetched

In a script that processes a lot of rows in a MySQL server, I use dbSendQuery and fetch to throttle the fetching and processing of results.
When my fetch command retrieves exactly the number of rows available (or left) in the resultset, leaving 0 rows to be fetched, dbHasCompleted returns FALSE whereas I expected it to return TRUE.
query <- "select record_id, name
from big_table left join another_table using (record_id)
limit 500"
resultset <- dbSendQuery(con, query) # con: DB connection
while(!dbHasCompleted(resultset)) {
input <- fetch(resultset, n = 500)
print(paste("Rows fetched:", nrow(input)))
# process input ...
}
I expected this loop to run once, but there is an extra run as after processing, print is called again:
Rows fetched: 500
...
Rows fetched: 0
Apparently, dbHasCompleted(resultset) is false when the exact number of available rows is fetched (same behaviour is observed for n = 1000, 2000, 3000). When in this script n = 501, there is no second loop.
Is this to be expected? Am I doing something wrong?

Resources