My simple database contains nodes of 'terms' and 'codes' linked to each other.
There are two types of relationships.
Relationships between 'terms' and 'codes' called :CODE and are undirected (or read in both directions equally).
Relationships between 'terms' called :NT (which means narrow term) and are directed.
I want to walk thru all the 'terms' from top to bottom and collect all the unique codes and count them.
This is my query:
MATCH (a)-[:NT*]->(b), (a)-[:CODE]-(c), (b)-[:CODE]-(d)
WHERE a.btqty = 0
RETURN a.termid AS termid, a.maxlen AS maxlen, COUNT(DISTINCT c.code) + COUNT(DISTINCT d.code) AS total, COLLECT(DISTINCT c.code) + COLLECT(DISTINCT d.code) AS codes
ORDER BY termid;
This is what I get:
termid maxlen total codes
22 2 3 ["S70","S43","S70"]
25 4 9 ["S20","S21","S54","S61","S63","S63","S21","S61","S54"]
26 2 9 ["S99","S98","S29","S13","S13","S20","S29","S14","S15"]
68 5 13 ["S38","S11","S12","S11","S12","S38","S37","S21","S36","S22","S98","S63","S58"]
123 2 3 ["S38","S12","S12"]
154 2 2 ["S58","S58"]
155 4 3 ["S63","S62","S63"]
159 2 2 ["S36","S36"]
...
I need to get rid of duplicates in collection and count them properly like this:
termid maxlen total codes
22 2 2 ["S43","S70"]
25 4 5 ["S20","S21","S54","S61","S63"]
26 2 7 ["S99","S98","S29","S13","S20","S14","S15"]
68 5 10 ["S38","S11","S12","S37","S21","S36","S22","S98","S63","S58"]
123 2 2 ["S38","S12"]
154 2 1 ["S58"]
155 4 2 ["S63","S62"]
159 2 1 ["S36"]
...
I think this is something about REDUCE function to apply but I do not know how to use it.
Thank you for your help!
You're right, this can be solved using REDUCE. Inside the reduce you need to check if the current element already exists in the accumulator and conditionally amend it:
MATCH (a)-[:NT*]->(b), (a)-[:CODE]-(c), (b)-[:CODE]-(d)
WHERE a.btqty = 0
WITH a.termid AS termid, a.maxlen AS maxlen,
REDUCE(uniqueCodes=[],
x in COLLECT(DISTINCT c.code) + COLLECT(DISTINCT d.code) |
CASE WHEN x IN uniqueCodes THEN uniqueCodes ELSE uniqueCodes+x END
) AS codes
ORDER BY termid
RETURN termid, maxlen, count(codes) as total, codes
Related
I have a large data frame with over 1 million observations. Two of my independent variables A and B have 18 and 72 numerically labelled categories respectively. For simplicity sake, assume the categories are labelled 1-18 and 1-72. I'd like to partition all of my data into 36 groups of 6, (A 1-6 with B 1-6, A 1-6 with B 7-12, etc.)
Currently, I am using dplyr's mutate with 36 nested ifelse statements, such as mutate(partition = ifelse(A <= 6 & B <= 6, 1, ifelse(...))) but this is tedious and difficult to change should I want to make partitions of different sizes.
Another way of describing it is that there are 18 * 72 = 1296 unique combinations of parameter A and B, but I would like to partition these 1296 into 36 groups of 36 observations, with the flexibility to change the number of observations and groups.
I really feel like there should be a better way to partition my data, but nothing comes to mind immediately. The only other idea I have is to use expand.grid and use a join of sorts. What other methods exist that allow me to partition my data?
The below example is kind of how I would like my data to appear.
A B Partition
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
1 6 1
2 1 1
... ... ...
6 6 1
7 1 2
... ... ...
12 71 12
12 72 12
13 1 13
... ... ...
18 70 36
18 71 36
18 72 36
I have a data frame that looks as follows
> df[1:10,c("Uri","Latency")]
Uri
1 /filters/test_group_1/test_datasource%20with%20space/test_application_alias_100
2 /applications?includeDashboards&includeMappings
3 /applications/test_application_alias_1
4 /applications?includeDashboards&includeMappings
5 /applications/test_application_alias_200
6 /applications/test_application_alias_100
7 /filters/00000000-0000-0000-0000-000000000001/test_datasource%20with%20space/test_application_alias_0
8 /dashboards?dashboard=test_dashboard_alias_9&includeMappings
9 /filters/00000000-0000-0000-0000-000000000001/test_dataSource_1/test_application_alias_100
10 /filters/00000000-0000-0000-0000-000000000001/test_datasource%20with%20space/test_application_alias_100
Latency
1 296
2 1388
3 58
4 833
5 239
6 60
7 217
8 36
9 86
10 112
I want to select only those rows that start with /applications. Note that the rest of the Uri could be anything, and is not important.
I could've got the exact matches by doing the following,
df[which(df$Uri == "/applications"),c("Uri","Latency")]
However, since, I am looking for a substring, I understand, I may have to do some wildcard processing, which in SQL would look like.
select * from <table_name> where Uri like '%/applications%'
How can I do the same in R
Assuming that df$Uri is a character vector, I'd go with with:
df[startsWith(df$Uri, "/applications"), ]
I'd use a regular expression:
df[ grepl( "^\\/applications" , df[, "Uri"] ) , c("Uri","Latency") ]
I got a table like this
a b c
-- -- --
1 1 10
2 1 0
3 1 0
4 4 20
5 4 0
6 4 0
The b column 'points' to 'a', a bit like if a is the parent.
c was computed. Now I need to propagate the parent c value to their children.
The result would be
a b c
-- -- --
1 1 10
2 1 10
3 1 10
4 4 20
5 4 20
6 4 20
I can't make an UPDATE/SELECT combo that works
So far I got a SELECT that procuce the c column I'd like to get
select t1.c from t t1 join t t2 on t1.a=t2.b;
c
----------
10
10
10
20
20
20
But I dunno how to stuff that into c
Thanx in advance
Cheers, phi
You have to look up the value with a correlated subquery:
UPDATE t
SET c = (SELECT c
FROM t AS parent
WHERE parent.a = t.b)
WHERE c = 0;
I finnally found a way to copy back my initial 'temp' SELECT JOIN to table 't'. Something like this
create temp table u as select t1.c from t t1 join t t2 on t1.a=t2.b;
update t set c=(select * from u where rowid=t.rowid);
I'd like to know how the 2 solutions, yours with 1 query UPDATE correlated SELECT, and mine that is 2 queries and 1 correlated query each, compare perf wise. Mine seems more heavier, and less aesthetic, yet regarding perf I wonder.
On the Algo side, yours take care not to copy the parent data, only copy child data, mine copy parent on itself, but that's a nop, yet consuming some cycles :)
Cheers, Phi
I am hoping to get some help on a view which needs to be pivoted, I am not sure though.
View is in following format:
CASE CASE_ORDER MANAGER MONTHLY_CASES FISCAL_CASES
case_1 1 John 15 84
case_1 1 Jeff 10 80
case_2 2 John 20 90
case_2 2 Jeff 13 65
case_3 3 John 7 72
case_3 3 Jeff 17 70
My final chart should look like the following:
CASE CASE_ORDER JOHN_CURR_MONTH JOHN_FY JOHN_CURR_MONTH JOHN_FY
case_1 1 15 84 10 80
case_2 2 20 90 13 65
case_3 3 7 72 17 70
My problem is that managers can change and so does the number of managers from month to month,
so I can't hard code their names (ie. 'mgr1' and 'mgr2') and use DECODE. It has to be dynamic...
Thanks for your suggestion, I figured it out. In fact there is a similar answer in tom kyte's blog (http://www.oracle.com/technetwork/issue-archive/2012/12-jul/o42asktom-1653097.html) which I modified for my purpose. Here it is:
CREATE OR REPLACE PROCEDURE dynamic_pivot_proc ( p_cursor IN OUT SYS_REFCURSOR )
AS
l_query LONG := 'SELECT case_order, case';
BEGIN
FOR x IN (SELECT DISTINCT manager FROM test_table ORDER BY 1 )
LOOP
l_query := l_query ||
REPLACE( q'|, MAX(DECODE(manager,'$X$',monthly_total)) $X$_current_month|',
'$X$', dbms_assert.simple_sql_name(x.manager) ) ||
REPLACE( q'|, MAX(DECODE(manager,'$X$',fiscal_total)) $X$_fy|',
'$X$', dbms_assert.simple_sql_name(x.manager) );
END LOOP;
l_query := l_query || ' FROM test_table
GROUP BY case_order, case
ORDER BY case_order ';
OPEN p_cursor FOR l_query;
END;
SQL> variable x refcursor;
SQL> exec dynamic_pivot_proc( :x );
SQL> print x
CASE CASE_ORDER JEFF_CURRENT_MONTH JEFF_FY JOHN_CURRENT_MONTH JOHN_FY
1 case_1 10 80 15 84
2 case_2 13 65 20 90
3 case_3 17 70 7 72
Now the thing is instead of printing the result I want to store it in a view. How do I achieve that? I tried to modify the line
l_query LONG := 'SELECT case_order, case';
with
l_query LONG := 'CREATE OR REPLACE VIEW SELECT case_order, case';
Needless to say that it did not work because CREATE OR REPLACE is a DDL statement, so some how I have to use EXECUTE IMMEDIATE.
Any suggestion? Thanks in advance.
I am using oracle 11g and have written a stored procedure which stores values in temporary table as follows:
id count hour age range
-------------------------------------
0 5 10 61 10-200
1 6 20 61 10-200
2 7 15 61 10-200
5 9 5 61 201-300
7 10 25 61 201-300
0 5 10 62 10-20
1 6 20 62 10-20
2 7 15 62 10-20
5 9 5 62 21-30
1 8 6 62 21-30
7 10 25 62 21-30
10 15 30 62 31-40
now using this temp table i want to return two cursors. one for 61 and one for 62(age).
and for cursors there distinct range will be columns . for example cursor for age 62 should return following as dataset.
user 10-20 21-30 31-40
Count/hour count/hour count/hour
----------------------------------------------
0 5 10 - - - -
1 6 20 8 6 - -
2 7 15 - - - -
5 - - 9 5 - -
7 - - 10 25 - -
10 - - - - 15 30
this column range in temp table is is not a fixed values these are referenced from other table.
edited: i am using PIVOT for above problem, all examples i saw in internet are there for fixed values of column values (range in my case). how can i get dynamic values. following is the ex query:
SELECT *
FROM (SELECT column_2, column_1
FROM test_table)
PIVOT (SUM(column1) AS sum_values FOR (column_2) IN ('value1' AS a, 'value2' AS b, 'value3' AS c));
Instead of using handwritten value i am using following query inside 'IN'
SELECT * from(
with x as (
SELECT DISTINCT range
FROM test_table
WHERE age = 62 )
select ltrim( max( sys_connect_by_path(range, ','))
keep (dense_rank last order by curr),
',') range
from (select range,
row_number() over (order by range) as curr,
row_number() over (order by range) -1 as prev
from x)
connect by prev = PRIOR curr
start with curr = 1 )
it is giving error in this case. But when i using handwritten values its giving right output.
select * from (select user_id, nvl(count,0) count, nvl(hour,0) hour,nvl(range,0) range,nvl(age,0)
age from test_table)
PIVOT (SUM(count) as sum_count, sum(hour) as sum_hour for (range) IN
(
'10-20','21-30','31-40'
)
) where age = 62 order by userid
how can i give values dynamically there?
how can i do it.
Cursors are slow, I would recommend trying to do this in a query unless there's no alternative (or speed doesn't matter). You may want to look into: PIVOT / UNPIVOT which can rotate columns (in this case "range").
Here's some PIVOT / UNPIVOT documentation and examples:
http://www.oracle-developer.net/display.php?id=506
Based on your last edit:
Pretty sure you have two options:
Build dynamic sql based on the distinct values found in the "range" column.
You'll probably be stuck using a cursor again to build the column names but at least it will be limited to just the distinct ranges.
Oracle has a PIVOT XML command that you can use for this.
See: http://www.oracle.com/technetwork/articles/sql/11g-pivot-097235.html
And scroll down to the section: "XML Type"