I constantly retrieve JSON data from some API and put that data into a MariaDB table.
The JSON ships with a timestamp which I'd like to place an index on, because this attribute is used for querying the table.
The JSON looks something like this (stripped):
{
"time": "2021-12-26T14:00:00.007294Z",
"some_measure": "0.10031"
}
I create a table:
CREATE TABLE some_table (
my_json JSON NOT NULL,
time TIMESTAMP AS (JSON_VALUE(my_json , '$.time')),
some_measure DOUBLE AS (JSON_VALUE(my_json , '$.some_measure'))
)
ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
my_json holds the entire JSON snippet, time and some_measure are virtual columns properly extracting the corresponding JSON values on the fly.
Now, trying to add an index on the TIMESTAMP attribute:
CREATE INDEX some_index ON some_table (time);
This fails:
SQL Error [1292] [22007]: (conn=454) Incorrect datetime value:
'2021-12-26T14:00:00.007294Z' for column `some_db`.`some_table`.`time` at row 1
How can I add an index on that timestamp?
The issue here is that converting a string (the JSON timestamp) to a TIMESTAMP is non-deterministic because it involves server side settings (sql_mode) and timezone settings.
Indexing virtual columns which are non-deterministic is not supported.
You would want to use a VARCHAR data type instead and index that:
CREATE TABLE some_table (
my_json JSON NOT NULL,
time VARCHAR(100) AS (JSON_VALUE(my_json , '$.time')),
some_measure DOUBLE AS (JSON_VALUE(my_json , '$.some_measure'))
)
ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
You should be able to create your index:
CREATE INDEX some_index ON some_table (`time`);
You can still query time because MariaDB automatically converts DATETIMEs if used against a VARCHAR:
SELECT
*
FROM some_table
WHERE time > '2008-12-31 23:59:59' + INTERVAL 1 SECOND;
The query will use the index:
I finally came up with a solution that works for me.
Changes are:
use STR_TO_DATE() to create a valid DATETIME from the JSON timestamp
make the generated (virtual) column PERSISTENT
use data type DATETIME instead of TIMESTAMP
So the new code looks like this:
CREATE TABLE some_table (
my_json JSON NOT NULL,
time DATETIME AS (STR_TO_DATE((JSON_VALUE(my_json , '$.time')), '%Y-%m-%d%#%T%.%#%#')) PERSISTENT,
some_measure DOUBLE AS (JSON_VALUE(my_json , '$.some_measure'))
)
ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE INDEX some_index ON some_table (`time`);
Created two tables in two different database(TestDb1 and TestDb2) in the same server.I wrote after delete trigger on "Error_Master" table`.if i delete record in "ERROR_MASTER" table which is in TestDB1 that trigger insert records in "ERROR_MASTER_LOG" table which exists in TestDb2.
My dblink->dblink('dbname=TestDb2 port=5432 host=192.168.0.48 user=postgres password=soft123')
DB->TestDb1
CREATE TABLE public."ERROR_MASTER"
(
"MARKERID" integer NOT NULL,
"FILENAME" character varying,
"RECNO" integer,
"ERRORCODE" character varying,
"USERID" character varying,
"ID" integer NOT NULL,
CONSTRAINT "ERR_MASTER_pkey" PRIMARY KEY ("ID")
)
WITH (
OIDS=FALSE
);
DB->TestDb2
CREATE TABLE public."ERROR_MASTER_LOG"
(
"MARKERID" integer NOT NULL,
"FILENAME" character varying,
"RECNO" integer,
"ERRORCODE" character varying,
"USERID" character varying,
"ID" integer NOT NULL,
CONSTRAINT "ERR_MASTER_Log_pkey" PRIMARY KEY ("ID"),
"Machine_IP" character varying,
"DELETED_AT" timestamp
)
WITH (
OIDS=FALSE
);
ALTER TABLE public."ERROR_MASTER_LOG"
OWNER TO postgres;
GRANT ALL ON TABLE public."ERROR_MASTER_LOG" TO postgres;
CREATE INDEX "IDX_ERROR_MASTER_LOG_MARKERID_RECNO"
ON public."ERROR_MASTER_LOG"
USING btree
("MARKERID" COLLATE pg_catalog."default", "RECNO" COLLATE pg_catalog."default", round("X1"::numeric, 2));
i tried below trigger in TestDb1 for inserting record in a table which exists in another database TestDb2 using dblink. It shows schema "old" does not exist.Please help.
CREATE OR REPLACE FUNCTION mdp_error_master_after_delete()
RETURNS trigger AS
$BODY$
BEGIN
perform dblink_connect('host=localhost user=postgres password=postgres dbname=TestDB2');
perform dblink_exec('INSERT INTO "ERROR_MASTER_LOG"("MARKERID","ID")
values('||OLD."MARKERID"||','||OLD."ID"')');
perform dblink_disconnect();
RETURN OLD;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'insert_new_sessions SQL ERROR: %', SQLERRM;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION mdp_error_master_after_delete()
OWNER TO postgres;
CREATE TRIGGER ERROR_MASTER_CHANGES
AFTER DELETE
ON "ERROR_MASTER"
FOR EACH ROW
EXECUTE PROCEDURE mdp_error_master_after_delete()
Json received from the server has this form.
[
{
"id": 1103333,
"name": "James",
"tagA": [
"apple",
"orange",
"grape"
],
"tagB": [
"red",
"green",
"blue"
],
"tagC": null
},
{
"id": 1103336,
"name": "John",
"tagA": [
"apple",
"pinapple",
"melon"
],
"tagB": [
"black",
"white",
"blue"
],
"tagC": [
"London",
"New York"
]
}
]
An object can have multiple tags, and a tag can be associated with multiple objects.
In this list, I want to find an object whose tagA is apple or grape and tagB is black.
This is the first table I used to write.
create table response(id integer primary key, name text not null, tagA text,
tagB text, tagC text)
select * from response where (tagA like '%apple%' or tagA like '%grape%') and (tagB like '%black%')
This type of table design has a problem that the search speed is very slow because it does not support the surface function of the fts function when using ORM library such as Room.
The next thing I thought about was to create a table for each tag.
create table response(id integer primary key, name text not null)
create table tagA(objectID integer, value text, primary key(objectID, value))
create table tagB(objectID integer, value text, primary key(objectID, value))
create table tagC(objectID integer, value text, primary key(objectID, value))
select * from response where id in ((select objectId from tagA where value in ('apple','grape'))
intersect
(select objectId from tagB where value in 'black'))
This greatly increases the insertion time and the capacity of the APK (roughly twice as much per additional table), but the search speed is far behind that of the FTS virtual table.
I want to avoid this as much as I use FTS tables because there are more things I need to manage myself.
There are a lot of things I missed (index etc.) but I can not figure out what it is.
How can I optimize the database without using the FTS method?
You could use a reference table (aka mapping table along with a multitude of other names) to allow a many-many relationship between tags (single table for all) and objects (again single table).
So you have the objects table each object having an id and you have the tags table again with an id for each object. So something like :-
DROP TABLE IF EXISTS object_table;
CREATE TABLE IF NOT EXISTS object_table (id INTEGER PRIMARY KEY, object_name);
DROP TABLE IF EXISTS tag_table;
CREATE TABLE IF NOT EXISTS tag_table (id INTEGER PRIMARY KEY, tag_name);
You'd populate both e.g.
INSERT INTO object_table (object_name) VALUES
('Object1'),('Object2'),('Object3'),('Object4');
INSERT INTO tag_table (tag_name) VALUES
('Apple'),('Orange'),('Grape'),('Pineapple'),('Melon'),
('London'),('New York'),('Paris'),
('Red'),('Green'),('Blue'); -- and so on
The you'd have the mapping table something like :-
DROP TABLE IF EXISTS object_tag_mapping;
CREATE TABLE IF NOT EXISTS object_tag_mapping (object_reference INTEGER, tag_reference INTEGER);
Overtime as tags are assigned to objects or vice-versa you add the mappings e.g. :-
INSERT INTO object_tag_mapping VALUES
(1,4), -- obj1 has tag Pineapple
(1,1), -- obj1 has Apple
(1,8), -- obj1 has Paris
(1,10), -- obj1 has green
(4,1),(4,3),(4,11), -- some tags for object 4
(2,8),(2,7),(2,4), -- some tags for object 2
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(3,7),(3,8),(3,9),(3,10),(3,11); -- all tags for object 3
You could then have queries such as :-
SELECT object_name,
group_concat(tag_name,' ~ ') AS tags_for_this_object
FROM object_tag_mapping
JOIN object_table ON object_reference = object_table.id
JOIN tag_table ON tag_reference = tag_table.id
GROUP BY object_name
;
group_concat is an aggregate function (applied per GROUP) that concatenates all values found for the specified column with (optional) separator.
The result of the query being :-
The following could be a search based upon tags (not that you'd likely use both tag_name and a tag_reference) :-
SELECT object_name, tag_name
FROM object_tag_mapping
JOIN object_table ON object_reference = object_table.id
JOIN tag_table ON tag_reference = tag_table.id
WHERE tag_name = 'Pineapple' OR tag_reference = 9
;
This would result in :-
Note this is a simple overview e.g. you may want to consider having the mapping table as a WITHOUT ROWID table, perhaps have a composite UNIQUE constraint.
Additional re comment :-
How do I implement a query that contains two or more tags at the same
time?
This is a little more complex if you want specific tags but still doable. Here's an example using a CTE (Common Table Expression) along with a HAVING clause (a where clause applied after the output has been generated, so can be applied to aggregates) :-
WITH cte1(otm_oref,otm_tref,tt_id,tt_name, ot_id, ot_name) AS
(
SELECT * FROM object_tag_mapping
JOIN tag_table ON tag_reference = tag_table.id
JOIN object_table ON object_reference = object_table.id
WHERE tag_name = 'Pineapple' OR tag_name = 'Apple'
)
SELECT ot_name, group_concat(tt_name), count() AS cnt FROM CTE1
GROUP BY otm_oref
HAVING cnt = 2
;
This results in :-
I have the following query working in pure SQL on SQLite but do not know how to convert it to pyDAL:
SELECT * FROM buy WHERE date('now','-2 days') < timestamp;
The buy table schema is:
CREATE TABLE "buy"(
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"order_id" CHAR(512),
"market" CHAR(512),
"purchase_price" DOUBLE,
"selling_price" DOUBLE,
"amount" DOUBLE
, "timestamp" TIMESTAMP, "config_file" CHAR(512));
Instead of using the SQLite date function, you can create the comparison date in Python:
import datetime
cutoff_time = datetime.datetime.now() - datetime.timedelta(2)
rows = db(db.buy.timestamp > cutoff_time).select()
Alternatively, you can also pass a raw SQL string as the query:
rows = db('buy.timestamp > date("now", "-2 days")').select(db.buy.ALL)
Note, in this case, because the query within db() is simply a string, the DAL will not know which table is being selected, so it is necessary to explicitly specify the fields in the .select() (alternatively, you could add a dummy query that selects all records, such as (db.buy.id != None)).
Having an issue where my queries used to work fine with = as in WHERE some_int_field = some_other_int_field. When I do that now I get 0 results. However if I do a WHERE some_int_field LIKE some_other_int_field I get my results. I have checked the length of the fields for hidden characters/spaces and the length of the fields are correct. They are both integer fields. Thoughts? Two tables structure below:
CREATE TABLE "languages"(
"language_id" Integer,
"name" Text,
"english" Text,
"spanish" Text,
"portuguese" Text,
"french" Text );
-- Create index languagesIdx
CREATE INDEX "languagesIdx" ON "languages"( "name" );
BEGIN;
-------------
CREATE TABLE "drop_downs"(
"mode_data" Integer,
"text_index" Integer,
"language_id" Integer );
-- Create index drop_downsIdx
CREATE INDEX "drop_downsIdx" ON "drop_downs"( "mode_data", "language_id" );
BEGIN;
SQLite uses dynamic typing and does not care about the declared type of the fields.
You have strings in your fields.
To check which rows have strings, use something like this:
SELECT * FROM drop_downs WHERE typeof(mode_data) = 'text'
To convert all values in a column into numbers, use something like this:
UPDATE drop_downs SET mode_data = CAST(mode_data AS integer)