How to get recommendations based on previous orders in neo4j? - graph

I am trying to work on neo4j for the first time. I have written the following:
LOAD CSV WITH HEADERS FROM "file:///restaurant_data.csv" AS data
MERGE(n1:Customer{Name:data.Name, Latitude:toFloat(data.Latitude),Longitude:toFloat(data.Longitude)})
MERGE(n2:Orders{OrderId:data.Order_ID,OrderTimestamp:data.Order_ts,FoodName:data.Food_Item})
MERGE(n3:Restaurant{RestaurantName:data.Restaurant, RestLat:toFloat(data.Rest_lat), RestLong:toFloat(data.Rest_long)})
MERGE (n1)-[r1:PLACES_ORDER]->(n2)
MERGE (n2)-[r2:BELONGS_TO]->(n3)
MERGE (n3)-[r3:SERVES]->(n2)
RETURN *;
i can share the csv if needed.
I want to find recommendation of restaurant for customer based on his top 5 previous orders by order_timestamp and distance between restaurant and customer should be less than 5 kms.
MATCH(n1:Customer{Name:"Angy"})
MATCH(n1)-[:PLACES_ORDER]->(n2:Orders)<-[:SERVES]-(rec:Restaurant)
RETURN n2.FoodName, n2.OrderTimestamp, n3.RestaurantName
ORDER BY n2.OrderTimestamp
LIMIT 5
This gives me only top 5 orders, how do I find restaurants serving those orders?
My file link: https://docs.google.com/spreadsheets/d/e/2PACX-1vTc35TBanV3Uk5gbCeEFeJkm2YAhbJPLnpS0KzmYErVRulvbXCWdSZ7xiKUfnCZQQUt-1ArabgmAGmL/pubhtml

Your query did not work. This worked:
MATCH(:Customer{Name:"Angy"})-[:PLACES_ORDER]->(o:Orders)
WITH o ORDER BY o.OrderTimestamp DESC LIMIT 5
WITH [o.FoodName] as foods
MATCH (r:Restaurant)-[:SERVES]->(n:Orders)
WHERE (n.FoodName in foods)
RETURN distinct r.RestaurantName,foods

Related

How to combine many records value into one record

As you can see from the below picture I was able to combine two deals (blocked red) but the output should have one result instead of two. If anyone has any solutions on this please advise.
The red blocked component has more than one record, each record has an amount, the sum of all record amount must be shown in a single row.
record1: Amount:100
record2: Amount:200
record3: Amount:500
Merge of all records is following
record: Amount:800
Is it possible to merge many rows into a single row in integromat?
Based on your screenshot you aggregate an incorrect module. Source module in your aggregator has to be set to a module that generates multiple modules, in your case, it is module 10.
You aggregate module 14 that generates for every input module a single output module, there is nothing to aggregate. Module 10 returns for a single input 2 bundles.
Your case:
/---[6]---([14]---[11 aggregator])---
---[10] multiple output bundles
\---[6]---([14]---[11 aggregator])---
Solution:
/---[6]---[14]---\
---([10] [11 aggregator])--- single output bundle
\---[6]---[14]---/
Your scenario has to look like this (Aggregator: Source module = module no.10):

Neo4J and Cypher query

I am new to Neo4j and Cypher query.My create query is like each Shop has 2 chillers which has 2 PLCs each which in turn has 2 sensors each.
The create is as below
Create(:SHOP{name:"Shop1"})-[:hasChiller]->(:CHILLER{name:"Chiller1"})
Create(:SHOP{name:"Shop1"})-[:hasChiller]->(:CHILLER{name:"Chiller2"})
Create(:SHOP{name:"Shop2"})-[:hasChiller]->(:CHILLER{name:"Chiller3"})
Create(:SHOP{name:"Shop2"})-[:hasChiller]->(:CHILLER{name:"Chiller4"})
Create(:CHILLER{name:"Chiller1"})-[:hasPLC]->(:PLC{name:"Plc1"})
Create(:CHILLER{name:"Chiller1"})-[:hasPLC]->(:PLC{name:"Plc2"})
Create(:CHILLER{name:"Chiller2"})-[:hasPLC]->(:PLC{name:"Plc3"})
Create(:CHILLER{name:"Chiller2"})-[:hasPLC]->(:PLC{name:"Plc4"})
Create(:CHILLER{name:"Chiller3"})-[:hasPLC]->(:PLC{name:"Plc5"})
Create(:CHILLER{name:"Chiller3"})-[:hasPLC]->(:PLC{name:"Plc6"})
Create(:CHILLER{name:"Chiller4"})-[:hasPLC]->(:PLC{name:"Plc7"})
Create(:CHILLER{name:"Chiller4"})-[:hasPLC]->(:PLC{name:"Plc8"})
Create(:PLC{name:"Plc1"})-[:hasSensor]->(:SENSOR{name:"Sensor1"})
Create(:PLC{name:"Plc1"})-[:hasSensor]->(:SENSOR{name:"Sensor2"})
Create(:PLC{name:"Plc2"})-[:hasSensor]->(:SENSOR{name:"Sensor3"})
Create(:PLC{name:"Plc2"})-[:hasSensor]->(:SENSOR{name:"Sensor4"})
Create(:PLC{name:"Plc3"})-[:hasSensor]->(:SENSOR{name:"Sensor5"})
Create(:PLC{name:"Plc3"})-[:hasSensor]->(:SENSOR{name:"Sensor6"})
Create(:PLC{name:"Plc4"})-[:hasSensor]->(:SENSOR{name:"Sensor7"})
Create(:PLC{name:"Plc4"})-[:hasSensor]->(:SENSOR{name:"Sensor8"})
Create(:PLC{name:"Plc5"})-[:hasSensor]->(:SENSOR{name:"Sensor9"})
Create(:PLC{name:"Plc5"})-[:hasSensor]->(:SENSOR{name:"Sensor10"})
Create(:PLC{name:"Plc6"})-[:hasSensor]->(:SENSOR{name:"Sensor11"})
Create(:PLC{name:"Plc6"})-[:hasSensor]->(:SENSOR{name:"Sensor12"})
Create(:PLC{name:"Plc7"})-[:hasSensor]->(:SENSOR{name:"Sensor13"})
Create(:PLC{name:"Plc7"})-[:hasSensor]->(:SENSOR{name:"Sensor14"})
Create(:PLC{name:"Plc8"})-[:hasSensor]->(:SENSOR{name:"Sensor15"})
Create(:PLC{name:"Plc8"})-[:hasSensor]->(:SENSOR{name:"Sensor16"})
However the Match to get the sensors under SHOP1
MATCH(s:SHOP{name:"Shop1"})-[:hasChiller]->(cc:CHILLER)-[:hasPLC]->(pp:PLC)-[:hasSensor]->(ss:SENSOR) return ss.name
returns nothing.Says no changes and no data.
I am trying this out on Neo4J sandbox environment.I did this based on the understanding i had using match clause in SQL SERVER GRAPH 2019 where this works.
Can anyone point out where i am going wrong?
You are improperly creating multiple instances of the "same" node. You should create each node once, and then use its bound variable name later on when you need to create relationships involving that node.
Delete all your data and follow this pattern instead (you have to fill in the "..." parts):
CREATE
(sh1:SHOP{name:"Shop1"}), (sh2:SHOP{name:"Shop1"}),
(c1:CHILLER{name:"Chiller1"}), (c2:CHILLER{name:"Chiller2"}),(c3:CHILLER{name:"Chiller3"}), (c4:CHILLER{name:"Chiller4"}),
(p1:PLC{name:"Plc1"}), ..., (p8:PLC{name:"Plc8"}),
(se1:SENSOR{name:"Sensor1"}), ..., (se16:SENSOR{name:"Sensor16"}),
(sh1)-[:hasChiller]->(c1), (sh1)-[:hasChiller]->(c2),
... // create remaining relationships using bound variable names for nodes

getRetweeters() returns one id whereas getRetweetCount() returns 2 -- in twitteR package

I use twitteR package and I am trying to retrieve account ids of retweeters..
The retweeterCount and the list of retweeters does not appear to be always consistent.
For example, I retrieved a status (tweet) using
st<-showStatus("1058168768009043969")
retweeters(st$getId()) # returns "260857015"
st$getRetweetCount() # however returns 2
st$getRetweeters() # returns a known error
Using twitteR's getRetweeters method
twitter site shows 2 retweets as shown here
https://twitter.com/ConsueloMack/status/1058168768009043969
In order to run one needs a valid key and setup the oauth as follows
require('twitteR')
twapi<-read.csv("./coach_keys.json",sep=":",stringsAsFactors=F,header=F)
# in Linux you can obtain oauth as follows
setup_twitter_oauth(twapi[twapi$V1=="API_KEY",c("V2")],
twapi[twapi$V1=="API_SECRET_KEY",c("V2")],
twapi[twapi$V1=="ACCESS_TOKEN",c("V2")],
twapi[twapi$V1=="ACCESS_TOKEN_SECRET",c("V2")])
# then the above snippet can be run
I expected the retweeters method to return as many as indicated by
the getRetweetCount().
However, it does not. I am seeking some pointers especially if I am doing something wrong. Is it common occurrence? Can someone show for the ID I have how to retrieve count and the list consistent with each other?
Thank you very much.

Required Appropriate query to find out the result

I need a desired result with the less number of execution time.
I have a table which contains many rows (over 100k) , in this table a field name is notes varchar2(1800).
It contains following values:
notes
CASE Transfer
Surnames AAA : BBBB
Case Status ACCOUNT TXFERRED TO BORROWERS
Completed Date 25/09/2022
Task Group 16
Message sent at 12/10/2012 11:11:21
Sender : lynxfailures123#google.com
Recipient : LFRB568767#yahoo.com
Received : 21:31 12/12/2002
Rows should return with the values of(ACCOUNT TXFERRED TO BORROWERS).
I have used the following queries but it takes a long time(72150436 sec) to execute:
Select * from cps_case_history where (dbms_lob.instr(notes, 'ACCOUNT
TFR TO UFSS') > 1)
Select * from cps_case_history where notes like '%ACCOUNT TFR TO
UFSS%'
Could you please share us the exact query which will take less time to execute.
Can you try parallel hints. Optimizer hints
Select /*+ PARALLEL(a,8) */ a.* from cps_case_history a
where INSTR(NOTES,'Text you want to search') > 0; -- your condition
Replace 8 with 16 and see if the performance improves further.
Avoid % in beginning of the like operator
ie., where notes like '%Account...'
Updated answer : Try creating partition tables.You can go with range partitioning on completed_date column Partitioning

Web scraping SEC Edgar 10-K and 10-Q filings

Are there anyone experienced with scraping SEC 10-K and 10-Q filings? I got stuck while trying to scrape monthly realised share repurchases from these filings. In specific, I would like to get the following information: 1. Period; 2. Total Number of Shares Purchased; 3. Average Price Paid per Share; 4. Total Number of Shares Purchased as Part of Publicly Announced Plans or Programs; 5. Maximum Number (or Approximate Dollar Value) of Shares that May Yet Be Purchased Under the Plans or Programs for each month from 2004 to 2014. I have in total 90,000+ forms to parse, so it won't be feasible to do it manually.
This information is usually reported under "Part 2 Item 5 Market for Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities" in 10-Ks and "Part 2 Item 2 Unregistered Sales of Equity Securities and Use of Proceeds".
Here is one example of the 10-Q filings that I need to parse:
https://www.sec.gov/Archives/edgar/data/12978/000104746909007169/a2193892z10-q.htm
If a firm have no share repurchase, this table can be missing from the quarterly report.
I have tried to parse the html files with Python BeautifulSoup, but the results are not satisfactory, mainly because these files are not written in a consistent format.
For example, the only way I can think of to parse these forms is
from bs4 import BeautifulSoup
import requests
import unicodedata
import re
url='https://www.sec.gov/Archives/edgar/data/12978/000104746909007169/a2193892z10-q.htm'
def parse_html(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
tables = soup.find_all('table')
identifier = re.compile(r'Total.*Number.*of.*Shares.*\w*Purchased.*', re.UNICODE|re.IGNORECASE|re.DOTALL)
n = len(tables) -1
rep_tables = []
while n >= 0:
table = tables[n]
remove_invalid_tags(table)
table_text = unicodedata.normalize('NFKD', table.text).encode('ascii','ignore')
if re.search(identifier, table_text):
rep_tables += [table]
n -= 1
else:
n -= 1
return rep_tables
def remove_invalid_tags(soup, invalid_tags=['sup', 'br']):
for tag in invalid_tags:
tags = soup.find_all(tag)
if tags:
[x.replaceWith(' ') for x in tags]
The above code only returns the messy that may contain the repurchase information. However, 1) it is not reliable; 2) it is very slow; 3) the following steps to scrape date/month, share price, and number of shares etc. are much more painful to do. I am wondering if there are more feasible languages/approaches/applications/databases to get such information? Thanks a million!
I'm not sure about python, but in R there is an beautiful solution using 'finstr' package (https://github.com/bergant/finstr).
'finstr' automatically extracts the financial statements (income statement, balance sheet, cash flow and etc.) from EDGAR using XBRL format.

Resources