How to add existing table column in existing projection segmented by hash clause in vertica db? - projection

I have created one table and have one projection of that table. I have to add existing table column in existing projection segmented by hash clause in vertica db.
"I have to add SBS_ALERT_ID column in existing projection segmented by hash clause without creating new projection."
CREATE TABLE public.ALERT
(
AS_OF_DATE date,
ALERT_ID int,
LOAN_NUMBER varchar(20),
SERVICER_LOAN_NUMBER varchar(20),
SBS_LOAN_NUMBER varchar(20),
SBS_ALERT_ID int,
ALERT_TYPE_ID varchar(25),
);
CREATE PROJECTION public.ALERTTT_SEG /*+createtype(D)*/
(
AS_OF_DATE ENCODING RLE,
ALERT_ID ENCODING DELTARANGE_COMP,
LOAN_NUMBER ENCODING ZSTD_FAST_COMP,
SERVICER_LOAN_NUMBER,
SBS_LOAN_NUMBER ENCODING RLE,
SBS_ALERT_ID ENCODING DELTARANGE_COMP,
ALERT_TYPE_ID,
)
AS
SELECT ALERT.AS_OF_DATE,
ALERT.ALERT_ID,
ALERT.LOAN_NUMBER,
ALERT.SERVICER_LOAN_NUMBER,
ALERT.SBS_LOAN_NUMBER,
ALERT.SBS_ALERT_ID,
ALERT.ALERT_TYPE_ID,
FROM public.ALERT
ORDER BY ALERT.LOAN_NUMBER,
ALERT.SBS_LOAN_NUMBER
SEGMENTED BY hash(ALERT.LOAN_NUMBER, ALERT.SBS_LOAN_NUMBER) ALL NODES;

Do you mean something like this?
Initial situation:
CREATE TABLE segby (
fullname varchar(12),
dob date
);
CREATE PROJECTION segby_super /*+basename(segby),createtype(A)*/ (
fullname,
dob
)
AS
SELECT segby.fullname,
segby.dob
FROM segby
ORDER BY segby.fullname,
segby.dob
SEGMENTED BY hash(segby.dob, segby.fullname) ALL NODES OFFSET 0;
Then, you add a column and initialise it with a DEFAULT ...
ALTER TABLE segby ADD id INT NOT NULL DEFAULT HASH(dob,fullname);
And subsequently, you make sure the table is segmented by that new column, by creating such a projection:
CREATE PROJECTION segby_id
AS SELECT
id
, fullname
, dob
FROM segby
ORDER BY id
SEGMENTED BY HASH(id) ALL NODES;
And finish by running a REFRESH on your table ...
SELECT REFRESH('segby');
-- out REFRESH
-- out -
-- out Refresh completed with the following outcomes:
-- out Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
-- out ----------------------------------------------------------------------------------------
-- out "dbadmin"."segby_id": [segby] [refreshed] [scratch] [0] [0]
For your specific question: It looks as if you would like to be able to fire a command like:
ALTER PROJECTION public.ALERTTT_SEG
SEGMENTED BY hash(ALERT.LOAN_NUMBER, ALERT.SBS_LOAN_NUMBER,ALERT.SBS_ALERT_ID) ALL NODES;
This simply does not work.
You will have to create a new projection, segmented the new way, refresh the table, and drop the original projection

Related

How to create multiple vertex in SAP HANA Graph

I'm trying to create 2 (multiple) vertex in SAP HANA like -
Create two table for vertex ITEM and DATASET
CREATE COLUMN TABLE "GREEK_MYTHOLOGY"."ITEM" (
"ITEM_ID" VARCHAR(100) PRIMARY KEY,
"ITEM_NAME" VARCHAR(100)
);
CREATE COLUMN TABLE "GREEK_MYTHOLOGY"."DATASET" (
"DATASET_ID" VARCHAR(100) PRIMARY KEY,
"DATASET_NAME" VARCHAR(100)
);
And creating edge as REFERENCES
CREATE COLUMN TABLE "GREEK_MYTHOLOGY"."REFERENCES" (
"REF_ID" INT UNIQUE NOT NULL,
"SOURCE" VARCHAR(100) NOT NULL
REFERENCES "GREEK_MYTHOLOGY"."ITEM" ("ITEM_ID")
ON UPDATE CASCADE ON DELETE CASCADE,
"TARGET" VARCHAR(100) NOT NULL
REFERENCES "GREEK_MYTHOLOGY"."DATASET" ("DATASET_ID")
ON UPDATE CASCADE ON DELETE CASCADE,
"TYPE" VARCHAR(100)
);
Now I would like to connect both vertex (ITEM and DATASET) with edge REFERENCES like below
CREATE GRAPH WORKSPACE "GREEK_MYTHOLOGY"."GRAPH"
EDGE TABLE "GREEK_MYTHOLOGY"."DATASET"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
VERTEX TABLE "GREEK_MYTHOLOGY"."ITEM" KEY COLUMN "ITEM_ID"
VERTEX TABLE "GREEK_MYTHOLOGY"."DATASET"KEY COLUMN "DATASET_ID"
KEY COLUMN "REF_ID";
But it throws this exception at line VERTEX TABLE "GREEK_MYTHOLOGY"."DATASET"KEY COLUMN "DATASET_ID":
sql syntax error: incorrect syntax near "VERTEX": line 6 col 1 (at pos 200)
Is it possible to create multiple vertex in SAP HANA graph ? If yes then what is the right way to do this.
There's a misunderstanding here. The REFERENCES clause in the CREATE TABLE statement has nothing to do with the graph structure you want to represent.
Instead, it defines a foreign key constraint between the two tables.
The CREATE GRAPH WORKSPACE command only accepts one EDGE TABLE and one VERTEX TABLE as parameters.
However, you can also pass in synonyms or views here.
That way, you could create a view "ALL_ITEMS" like this:
CREATE VIEW "GREEK_MYTHOLOGY"."ALL_ITEMS" as
SELECT "ITEM_ID" as "ID", "ITEM_NAME" as "NAME" FROM "GREEK_MYTHOLOGY"."ITEM"
UNION
SELECT "DATASET_ID" as "ID", "DATASET_NAME" as "NAME" FROM "GREEK_MYTHOLOGY"."DATASET";
and then reference this view:
CREATE GRAPH WORKSPACE "GREEK_MYTHOLOGY"."GRAPH"
EDGE TABLE "GREEK_MYTHOLOGY"."DATASET"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
VERTEX TABLE "GREEK_MYTHOLOGY"."ALL_ITEMS"
KEY COLUMN "NAME";
Using this approach is possible, but you now have to make sure that the "NAME" values are unique and not NULL across both tables.

How to extract data from a csv file and put it to a sqlite database?

I am trying to use this code to extract data from a big csv file and insert into a database. The schema of the
database is provided in the code. However, I am doing something wrong in the last line. The
code is giving me value error. I am using pandas to read the csv file. Could someone help me point out where am I going wrong ?
import pandas as pd
import sqlite3
conn = sqlite3.connect('newdb.sqlite')
cur = conn.cursor()
cur.executescript('''
DROP TABLE IF EXISTS Policy;
DROP TABLE IF EXISTS Statecode;
DROP TABLE IF EXISTS County;
DROP TABLE IF EXISTS Line;
DROP TABLE IF EXISTS Construction;
DROP TABLE IF EXISTS Point_Granularity;
CREATE TABLE Statecode (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name TEXT UNIQUE
);
CREATE TABLE County (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name TEXT UNIQUE
);
CREATE TABLE Line(
id INTEGER NOT NULL PRIMARY KEY
AUTOINCREMENT UNIQUE,
name TEXT UNIQUE);
CREATE TABLE Construction(
id INTEGER NOT NULL PRIMARY KEY
AUTOINCREMENT UNIQUE,
name TEXT UNIQUE);
CREATE TABLE Point_Granularity(
id INTEGER NOT NULL PRIMARY KEY
AUTOINCREMENT UNIQUE,
number INTEGER UNIQUE);
CREATE TABLE Policy (
id INTEGER NOT NULL PRIMARY KEY
AUTOINCREMENT UNIQUE,
policyID INTEGER ,
eq_site_line FLOAT,
hu_site_line INTEGER,
statecode_id INTEGER,
county_id INTEGER,
line_id INTEGER,
construction_id INTEGER,
point_granularity_id INTEGER
);
''')
df = pd.read_csv('FL_insurance_sample.csv')
for policy in df.policyID:
cur.execute('INSERT INTO Policy (policyID) VALUES (?)',policy)
conn.commit()
I believe, using policy.astype(int) should work for this case ? Could anyone confirm ?
Basically you don't want to loop through your data frames - you would loose all pandas benefits then and it'll be too slow as well. You want to work with data frames:
df = pd.read_csv('FL_insurance_sample.csv')
df.to_sql('Policy', conn)
PS you might want to "massage" your data before so it will fit into your table structure
PPS if you want a working example you should provide a sample of your input data - 5-10 rows would be enough

SQLite Query plan

Is there a way to manipulate the query plan generated in SQLite?
I 'l try to explain my problem:
I have 3 tables:
CREATE TABLE "index_term" (
"id" INT,
"term" VARCHAR(255) NOT NULL,
PRIMARY KEY("id"),
UNIQUE("term"));
CREATE TABLE "index_posting" (
"doc_id" INT NOT NULL,
"term_id" INT NOT NULL,
PRIMARY KEY("doc_id", "field_id", "term_id"),,
CONSTRAINT "index_posting_doc_id_fkey" FOREIGN KEY ("doc_id")
REFERENCES "document"("doc_id") ON DELETE CASCADE,
CONSTRAINT "index_posting_term_id_fkey" FOREIGN KEY ("term_id")
REFERENCES "index_term"("id") ON DELETE CASCADE);;
CREATE INDEX "index_posting_term_id_idx" ON "index_posting"("term_id");
CREATE TABLE "published_files" (
"doc_id" INTEGER NOT NULL,,
"uri_id" INTEGER,
"user_id" INTEGER NOT NULL,
"status" INTEGER NOT NULL,
"title" VARCHAR(1024),
PRIMARY KEY("uri_id"));
CREATE INDEX "published_files_doc_id_idx" ON "published_files"("doc_id");
about 600.000 entries in the index_term, about 4 Millions in the index_posting and 300.000 in the published_files table.
Now when i want to find the number of unique doc_ids in index_posting which reference some terms i use the following SQL.
select count(distinct index_posting.doc_id) from index_term, index_posting
where
index_posting.term_id = index_term.id and index_term.term like '%test%'
The result is displayed in reasonable time (0.3 secs). Asking Explain Query plan returns
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
When i want to filter the count in the way that it only includes doc_ids of index_posting if there exists a published_files entry:
select count(distinct index_posting.doc_id) from index_term, index_posting,
published_files where
index_posting.term_id = index_term.id and index_posting.doc_id = published_files.doc_id and index_term.term like '%test%'
The query takes almost 10 times as long. Asking Explain Query plan returns
0|0|1|SCAN TABLE index_posting
0|1|0|SEARCH TABLE index_term USING INDEX sqlite_autoindex_index_term_1 (id=?)
0|2|2|SEARCH TABLE published_files AS pf USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
So as far as i understand SQLITE changed here its query plan doing a full table scan of index_posting and a lookup in index_term instead of the other way around.
As a workaround i did do a
analyze index_posting;
analyze index_term;
analyze published_files;
and now it seems correct,
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
0|2|2|SEARCH TABLE published_files USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
but my question is - is there a way to force SQLITE to always use the correct query plan?
TIA
ANALYZE is not a workaround; it's supposed to be used.
You can use CROSS JOIN to enforce a certain order of the nested loops, or use INDEXED BY to force a certain index to be used.
However, you asked for "the correct query plan", which might not be same as the one enforced by these mechanisms.

sqlite3 join-filter order-by performance

I'm trying to do a query like that on an sqlite3 database:
select node.loc, node.weight from node
inner join filt on (node.id = filt.node_id)
inner join filt T5 on (node.id = T5.node_id)
where (filt.word = 'aaa' and T5.word = 'aasvogel')
order by node.weight desc limit 10;
On mysql, such query works fine and fast (<0.2s); on sqlite3, on the same data, it runs for ~2s.
What could be the problem and what can I do to improve its performance?
The files I made to test sqlite3 can be found here: https://github.com/HoverHell/sqlperftst1
The table definitions in particular:
CREATE TABLE "node" (
"id" integer PRIMARY KEY,
"loc" varchar(255) NOT NULL,
"weight" real
);
CREATE INDEX "node_loc" ON "node" ("loc");
CREATE INDEX "node_weight" on "node" ("weight");
CREATE TABLE "filt" (
"id" integer PRIMARY KEY,
"node_id" integer NOT NULL,
"word" varchar(120) NOT NULL
);
CREATE INDEX "filt_word" ON "filt" ("word");
CREATE INDEX "filt_node_id" ON "filt" ("node_id");
UPD: Perofrmance comparison on realistic data and queries:
This query can be improved by creating an index on both columns used for lookups:
CREATE INDEX filt_word_node_id ON file(word, node_id);
However, the shape of tje data is unusual, which makes the query optimizer misestimate the selectivity of the word lookups.
Run ANALYZE to fix this.

How to autogenerate the username with specific string?

I am using asp.net2008 and MY SQL.
I want to auto-generate the value for the field username with the format as
"SISI001", "SISI002",
etc. in SQL whenever the new record is going to inserted.
How can i do it?
What can be the SQL query ?
Thanks.
Add a column with auto increment integer data type
Then get the maximum value of that column in the table using "Max()" function and assign the value to a integer variable (let the variable be 'x').
After that
string userid = "SISI";
x=x+1;
string count = new string('0',6-x.ToString().length);
userid=userid+count+x.ToString();
Use userid as your username
Hope It Helps. Good Luck.
PLAN A>
You need to keep a table (keys) that contains the last numeric ID generated for various entities. This case the entity is "user". So the table will contain two cols viz. entity varchar(100) and lastid int.
You can then have a function written that will receive the entity name and return the incremented ID. Use this ID concatenated with the string component "SISI" to be passed to MySQL for insertion to the database.
Following is the MySQL Table tblkeys:
CREATE TABLE `tblkeys` (
`entity` varchar(100) NOT NULL,
`lastid` int(11) NOT NULL,
PRIMARY KEY (`entity`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The MySQL Function:
DELIMITER $$
CREATE FUNCTION `getkey`( ps_entity VARCHAR(100)) RETURNS INT(11)
BEGIN
DECLARE ll_lastid INT;
UPDATE tblkeys SET lastid = lastid+1 WHERE tblkeys.entity = ps_entity;
SELECT tblkeys.lastid INTO ll_lastid FROM tblkeys WHERE tblkeys.entity = ps_entity;
RETURN ll_lastid;
END$$
DELIMITER ;
The sample function call:
SELECT getkey('user')
Sample Insert command:
insert into users(username, password) values ('SISI'+getkey('user'), '$password')
Plan B>
This way the ID will be a bit larger but will not require any extra table. Use the following SQL to get a new unique ID:
SELECT ROUND(NOW() + 0)
You can pass it as part of the insert command and concatenate it with the string component of "SISI".
I am not an asp.net developer but i can help you
You can do something like this...
create a sequence in your mysql database as-
CREATE SEQUENCE "Database_name"."SEQUENCE1" MINVALUE 1 MAXVALUE 9999999999999999999999999999 INCREMENT BY 001 START WITH 21 CACHE 20 NOORDER NOCYCLE ;
and then while inserting use this query-----
insert into testing (userName) values(concat('SISI', sequence1.nextval))
may it help you in your doubt...
Try this:
CREATE TABLE Users (
IDs int NOT NULL IDENTITY (1, 1),
USERNAME AS 'SISI' + RIGHT('000000000' + CAST(IDs as varchar(10)), 4), --//getting uniqueness of IDs field
Address varchar(150)
)
(not tested)

Resources