Update a field with specific value inside a Json Object with MariaDB - mariadb

I'm trying to update the data stored in a json column in MariaDB (libmysql version - 5.6.43 , Server: 10.3.34-MariaDB-cll-lve - MariaDB Server).
My data is structured like this:
ID
json_data
1
{....}
2
{....}
where json_data is structured as follows:
{
"company": {
"id": "",
"name": "",
"address": ""
},
"info_company": {
"diff_v": "1",
"grav_v": "",
"diff_s": "2",
"grav_s": "",
"diff_g": "3",
"grav_g": "",
"diff_ri": "4",
"grav_ri": "2"
}
}
I'm trying to update data inside info_company replacing:
"1" with "<50%"
"2" with "<50%"
"3" with ">50%"
"4" with ">50%"
so the result should be:
{
"company": {
"id": "",
"name": "",
"address": ""
},
"info_company": {
"diff_v": "<50%",
"grav_v": "",
"diff_s": "<50%",
"grav_s": "",
"diff_g": ">50%",
"grav_g": "",
"diff_ri": ">50%",
"grav_ri": "<50%"
}
}
By writing this query, I can retrieve the info_company data, but then for each key contained I cannot update the data following the new value.
SELECT new_t.id, JSON_EXTRACT(new_t.json_data, “$.info_company“) FROM (SELECT * FROM `my_table` WHERE json_data LIKE “%info_company%”) new_t
Output:
ID
json_data
1
{"diff_v": "1","grav_v": "","diff_s": "2","grav_s": "","diff_g": "3","grav_g": "","diff_ri": "4","grav_ri": "2"}
Thank you for your help.

You can solve this problem by using a CTE to generate a regex to match the keys (and desired matching values) inside info_company and then using REGEXP_REPLACE to replace a 1 or 2 with <50% and a 3 or 4 with >50%:
UPDATE my_table
JOIN (
WITH jkeys_table AS (
SELECT id, JSON_KEYS(json_data, '$.info_company') AS jkeys
FROM my_table
)
SELECT id,
CONCAT('((?:',
REPLACE(SUBSTRING(jkeys, 2, CHAR_LENGTH(jkeys)-2), ', ', '|'),
')\\s*:\\s*)"([12])"'
) AS regex12,
CONCAT('((?:',
REPLACE(SUBSTRING(jkeys, 2, CHAR_LENGTH(jkeys)-2), ', ', '|'),
')\\s*:\\s*)"([34])"'
) AS regex34
FROM jkeys_table
) rt ON my_table.id = rt.id
SET json_data = REGEXP_REPLACE(REGEXP_REPLACE(json_data, regex12, '\\1"<50%"'), regex34, '\\1">50%"')
Output (for your sample JSON):
id json_data
1 {
"company":
{
"id": "",
"name": "",
"address": ""
},
"info_company":
{
"diff_v": "<50%",
"grav_v": "",
"diff_s": "<50%",
"grav_s": "",
"diff_g": ">50%",
"grav_g": "",
"diff_ri": ">50%",
"grav_ri": "<50%"
}
}
Demo on dbfiddle
If it's possible the keys in info_company might exist elsewhere inside json_data, you need to localise the changes to the info_company element. You can do this by changing the SET clause of the UPDATE to:
SET json_data = JSON_REPLACE(json_data, '$.info_company',
JSON_MERGE_PATCH(JSON_QUERY(json_data, '$.info_company'),
REGEXP_REPLACE(REGEXP_REPLACE(JSON_QUERY(json_data, '$.info_company'), regex12, '\\1"<50%"'), regex34, '\\1">50%"')
)
)
Demo on dbfiddle
If the keys in info_company are the same for every row, you can optimise the query by only computing the regex12 and regex34 values once, and then applying those values to all rows in my_table using a CROSS JOIN:
UPDATE my_table
CROSS JOIN (
WITH jkeys_table AS (
SELECT JSON_KEYS(json_data, '$.info_company') AS jkeys
FROM my_table
LIMIT 1
)
SELECT CONCAT('((?:',
REPLACE(SUBSTRING(jkeys, 2, CHAR_LENGTH(jkeys)-2), ', ', '|'),
')\\s*:\\s*)"([12])"'
) AS regex12,
CONCAT('((?:',
REPLACE(SUBSTRING(jkeys, 2, CHAR_LENGTH(jkeys)-2), ', ', '|'),
')\\s*:\\s*)"([34])"'
) AS regex34
FROM jkeys_table
) rt
SET json_data = REGEXP_REPLACE(REGEXP_REPLACE(json_data, regex12, '\\1"<50%"'), regex34, '\\1">50%"')
Demo on dbfiddle

Tested on MariaDB 10.3.34 database server with your json_data:
DELIMITER //
CREATE PROCEDURE percentage()
BEGIN
SELECT #info_keys:=JSON_KEYS(json_data, "$.info_company") FROM my_table;
SELECT #info_keys_num:=JSON_LENGTH(#info_keys);
WHILE #info_keys_num >= 0 DO
SET #info_keys_num = #info_keys_num - 1;
SELECT #info_attr:=JSON_EXTRACT(#info_keys, CONCAT("$[", #info_keys_num, "]"));
UPDATE my_table SET json_data = JSON_REPLACE(json_data, CONCAT("$.info_company.", #info_attr), "<50%")
WHERE CHAR_LENGTH(JSON_VALUE(json_data, CONCAT("$.info_company.", #info_attr))) = 1 AND
JSON_VALUE(json_data, CONCAT("$.info_company.", #info_attr)) < 3;
UPDATE my_table SET json_data = JSON_REPLACE(json_data, CONCAT("$.info_company.", #info_attr), ">50%")
WHERE CHAR_LENGTH(JSON_VALUE(json_data, CONCAT("$.info_company.", #info_attr))) = 1 AND
JSON_VALUE(json_data, CONCAT("$.info_company.", #info_attr)) > 2;
END WHILE;
END;
//
DELIMITER ;
call percentage();
Example of output:
MariaDB [test]> call percentage();
+------------------------------------------------------------------------------------+
| #info_keys:=JSON_KEYS(json_data, "$.info_company") |
+------------------------------------------------------------------------------------+
| ["diff_v", "grav_v", "diff_s", "grav_s", "diff_g", "grav_g", "diff_ri", "grav_ri"] |
+------------------------------------------------------------------------------------+
1 row in set (0.001 sec)
... [cut here] ...
Query OK, 5 rows affected (0.011 sec)

Related

Db2 on Cloud: Problem with column in querying from R

I created a connection between R and Db2 on Cloud
library(RODBC)
dsn_driver <- "{IBM DB2 ODBC Driver}"
dsn_database <- "bludb" # e.g. "bludb"
dsn_hostname <- "**"
dsn_port <- "***" # e.g. "32733"
dsn_protocol <- "TCPIP" # i.e. "TCPIP"
dsn_uid <- "**" #
dsn_pwd <- "**" #
dsn_security <- "ssl"
conn_path <- paste("DRIVER=",dsn_driver,
";DATABASE=",dsn_database,
";HOSTNAME=",dsn_hostname,
";PORT=",dsn_port,
";PROTOCOL=",dsn_protocol,
";UID=",dsn_uid,
";PWD=",dsn_pwd,
";SECURITY=",dsn_security,
sep="")
conn <- odbcDriverConnect(conn_path)
conn
Then I created the table
myschema <- "**" #
tables <- c("Annual_Crop")
for (table in tables){
# Drop School table if it already exists
out <- sqlTables(conn, tableType = "TABLE", schema = myschema, tableName =table)
if (nrow(out)>0) {
err <- sqlDrop (conn, paste(myschema,".",table,sep=""), errors=FALSE)
if (err==-1){
cat("An error has occurred.\n")
err.msg <- odbcGetErrMsg(conn)
for (error in err.msg) {
cat(error,"\n")
}
} else {
cat ("Table: ", myschema,".",table," was dropped\n")
}
} else {
cat ("Table: ", myschema,".",table," does not exist\n")
}
}
df1 <- sqlQuery (conn, "CREATE TABLE Annual_Crop(
CD_ID char (6) NOT NULL,
YEAR CHAR (20),
CROP_TYPE varchar (50),
GEO varchar (50),
SEEDED_AREA CHAR (50) ,
HARVESTED_AREA CHAR (50),
PRODUCTION CHAR (50),
AVG_YIELD CHAR (50),
PRIMARY KEY (CD_ID))",
errors = FALSE)
if(df1 == -1){
cat ("An error has occured.\n")
msg <- odbcGetErrMsg(conn)
print (msg)
} else {
cat ("Table was createdd successfuly.\n")
}
I loaded the dataset from a file into the table
anual_cropdf <- read.csv("/resources/labs/MYDATA/data1.csv")
sqlSave(conn, anual_cropdf, 'Annual_Crop', append=TRUE, fast=FALSE, rownames=FALSE, colnames=FALSE, verbose=FALSE)
Then I tried to fetch from the table and it works
FARMDB <- sqlFetch(conn, "Annual_Crop")
tail(FARMDB)
Finally, when I want to perform a query, it was not working. The result was just the name of columns 0X8
info <- paste('select * from Annual_Crop
where Geo = 41600')
query <- sqlQuery(conn,info,believeNRows = FALSE)
query
Why?
Based on your table schema, the data type for Geo is VARCHAR. Have you tried a query like this?
select * from Annual_Crop where Geo = 'Alberta'
or
select * from Annual_Crop where Geo = '41600'
Varchar / string needs to use single quotes for the value.

Filtering on a aggregate function

I am using Azure Cosmos DB and trying to write a query to filter document by Name and version. I am new to Cosmos and it seems the way I'm doing applies the filter per record versus the results themselves. Can anyone tell me the proper way to accomplish this:
select C.*
from c
JOIN (select MAX(c.version) from c where c.name = "test") maxVersion
where maxVersion = c.version
Sample data:
[{"name":"test","verson":1}{"name":"test","verson":2}{"name":"test","verson":3}]
Results:
I get a record back for each version vs the max version. IE I only should get one record back and it's version number should be 3
When you run this SQL:
select c,maxVersion
from c
JOIN (select MAX(c.version) from c where c.name = "test") maxVersion
you will get this document:
{
"c": {
"id": "1",
"name": "test",
"version": 1
},
"maxVersion": {
"$1": 1
}
}
{
"c": {
"id": "2",
"name": "test",
"version": 2
},
"maxVersion": {
"$1": 2
}
},
{
"c": {
"id": "3",
"name": "test",
"version": 3
},
"maxVersion": {
"$1": 3
}
}
Your maxVerson equals to c.version in each document, so you will get multiple documents not one.
According to your requirement, you can try something like this SQL:
SELECT TOP 1 *
FROM c
WHERE c.name = "test"
ORDER BY c.version DESC

Cosmos DB - Select root document based on child data

Sorry if this is a newbie question, but I am a newbie to Cosmos DB.
I am trying to select all the root documents from my collection where there a child element matches specified (multiple) criteria.
Lets assume you have an ORDER document, which has ORDERITEMS as sub-data document, what I need to do is to query all the orders where a particular product has been ordered, and to return the whole order document.
[
{
"order": {
"id": "1",
"orderiems": [
{
"partcode": "A",
"qty": "4"
},
{
"partcode": "B",
"qty": "4"
},
{
"partcode": "C",
"qty": "4"
}
]
}
},
{
"order": {
"id": "2",
"orderiems": [
{
"partcode": "A",
"qty": "4"
},
{
"partcode": "B",
"qty": "4"
},
{
"partcode": "A",
"qty": "4"
}
]
}
},
{
"order": {
"id": "3",
"orderiems": [
{
"partcode": "A",
"qty": "1"
}
]
}
}
]
My query is
SELECT order from order
JOIN items in order.orderitem
WHERE item.partcode = '<mypartcode>
AND item.qty > 1
Now, this sort of works and returns me the orders, but it is returning
id: 1
id: 2
id: 2 << repeated
because id: 2 has two of the same item.... id: 3 excluded because it's only 1 item
In normal SQL Server SQL I would simply have
SELECT *
from Orders o
where exists (select 1
from OrderItems oi
where oi.ordID = o.ID
and oi.partcode = 'A'
and oi.qty > 1)
How can I stop the duplication please
Please note that the above is a hand-crafted representation to simplify the problem as the document model I am actually working on a extremely large
Cosmos DB now supports the DISTINCT keyword and it will actually work on document use cases such as yours.
With the current version of the Azure Cosmos DB SQL API you can use some of these:
SELECT distinct VALUE order
FROM order
JOIN item in order.orderitems
WHERE item.partcode = '<Partcode>'
AND item.qty > 1
Or:
SELECT order
FROM order
WHERE EXISTS (
SELECT NULL
FROM item IN order.orderitems
item.partcode = '<Partcode>'
AND item.qty > 1
)

R : Updating an entry in mongodb using mongolite

I have a mongo database with information that I am passing to some R scripts for analysis. I am currently using the mongolite package to pass the information from mongo to R.
I have a field in each mongo entry called checkedByR, which is a binary that indicates whether the entry has been analysed by the R scripts already. Specifically, I am collecting a mongo entry by its respective mongo ID, running the scripts on the entry, assigning the checkedByR field with a 1, and then moving on.
For completeness, I am querying the database with the following request:
library(mongolite)
mongoID <- "1234abcd1234abcd1234"
m <- mongolite::mongo(url = "mongodb://localhost:27017",
collection = "collection",
db = "database")
rawData <- m$find(query = paste0('{"_id": { "$oid" : "',mongoID,'" }}'),
fields = '{"_id" : 1,
"checkedByR" : 1,
"somethingToCheck" : 1}')
checkedByR <- 1
However, I am having trouble successfully updating the mongo entry with the new checkedByR field.
I realise that an update function exists in the mongolite package (please consider : https://cran.r-project.org/web/packages/mongolite/mongolite.pdf), but I am having trouble gathering relevant examples to help me complete the updating process.
Any help would be greatly appreciated.
the mongo$update() function takes a query and a update argument. You use the query to find the data you want to update, and the update to tell it which field to update.
Consider this example
library(mongolite)
## create some dummy data and insert into mongodb
df <- data.frame(id = 1:10,
value = letters[1:10]
)
mongo <- mongo(collection = "another_test",
db = "test",
url = "mongodb://localhost")
mongo$insert(df)
## the 'id' of the document I want to update
mongoID <- "575556825dabbf2aea1d7cc1"
## find some data
rawData <- mongo$find(query = paste0('{"_id": { "$oid" : "',mongoID,'" }}'),
fields = '{"_id" : 1,
"id" : 1,
"value" : 1}'
)
## ...
## do whatever you want to do in R...
## ...
## use update to query on your ID, then 'set' to set the 'checkedByR' value to 1
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$set" : { "checkedByR" : 1} }'
)
## in my original data I didn't have a 'checkedByR' value, but it's added anyway
Update
the rmongodb library is no longer on CRAN, so the below code won't work
And for more complex structures & updates you can do things like
library(mongolite)
library(jsonlite)
library(rmongodb) ## used to insert a non-data.frame into mongodb
## create some dummy data and insert into mongodb
lst <- list(id = 1,
value_doc = data.frame(id = 1:5,
value = letters[1:5],
stringsAsFactors = FALSE),
value_array = c(letters[6:10])
)
## using rmongodb
mongo <- mongo.create(db = "test")
coll <- "test.another_test"
mongo.insert(mongo,
ns = coll,
b = mongo.bson.from.list(lst)
)
mongo.destroy(mongo)
## update document with specific ID
mongoID <- "5755f646ceeb7846c87afd90"
## using mongolite
mongo <- mongo(db = "test",
coll = "another_test",
url = "mongodb://localhost"
)
## to add a single value to an array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_array" : "checkedByR" } }'
)
## To add a document to the value_array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_array" : { "checkedByR" : 1} } }'
)
## To add to a nested array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_doc.value" : "checkedByR" } }'
)
rm(mongo); gc()
see mongodb update documemtation for further details

How to get a list of column names

Is it possible to get a row with all column names of a table like this?
|id|foo|bar|age|street|address|
I don't like to use Pragma table_info(bla).
SELECT sql FROM sqlite_master
WHERE tbl_name = 'table_name' AND type = 'table'
Then parse this value with Reg Exp (it's easy) which could looks similar to this: [(.*?)]
Alternatively you can use:
PRAGMA table_info(table_name)
If you are using the command line shell to SQLite then .headers on before you perform your query. You only need to do this once in a given session.
You can use pragma related commands in sqlite like below
pragma table_info("table_name")
--Alternatively
select * from pragma_table_info("table_name")
If you require column names like id|foo|bar|age|street|address, basically your answer is in below query.
select group_concat(name,'|') from pragma_table_info("table_name")
Yes, you can achieve this by using the following commands:
sqlite> .headers on
sqlite> .mode column
The result of a select on your table will then look like:
id foo bar age street address
---------- ---------- ---------- ---------- ---------- ----------
1 val1 val2 val3 val4 val5
2 val6 val7 val8 val9 val10
This helps for HTML5 SQLite:
tx.executeSql('SELECT name, sql FROM sqlite_master WHERE type="table" AND name = "your_table_name";', [], function (tx, results) {
var columnParts = results.rows.item(0).sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').split(','); ///// RegEx
var columnNames = [];
for(i in columnParts) {
if(typeof columnParts[i] === 'string')
columnNames.push(columnParts[i].split(" ")[0]);
}
console.log(columnNames);
///// Your code which uses the columnNames;
});
You can reuse the regex in your language to get the column names.
Shorter Alternative:
tx.executeSql('SELECT name, sql FROM sqlite_master WHERE type="table" AND name = "your_table_name";', [], function (tx, results) {
var columnNames = results.rows.item(0).sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').replace(/ [^,]+/g, '').split(',');
console.log(columnNames);
///// Your code which uses the columnNames;
});
Use a recursive query. Given
create table t (a int, b int, c int);
Run:
with recursive
a (cid, name) as (select cid, name from pragma_table_info('t')),
b (cid, name) as (
select cid, '|' || name || '|' from a where cid = 0
union all
select a.cid, b.name || a.name || '|' from a join b on a.cid = b.cid + 1
)
select name
from b
order by cid desc
limit 1;
Alternatively, just use group_concat:
select '|' || group_concat(name, '|') || '|' from pragma_table_info('t')
Both yield:
|a|b|c|
The result set of a query in PHP offers a couple of functions allowing just that:
numCols()
columnName(int $column_number )
Example
$db = new SQLIte3('mysqlite.db');
$table = 'mytable';
$tableCol = getColName($db, $table);
for ($i=0; $i<count($tableCol); $i++){
echo "Column $i = ".$tableCol[$i]."\n";
}
function getColName($db, $table){
$qry = "SELECT * FROM $table LIMIT 1";
$result = $db->query($qry);
$nCols = $result->numCols();
for ($i = 0; $i < $ncols; $i++) {
$colName[$i] = $result->columnName($i);
}
return $colName;
}
$<?
$db = sqlite_open('mysqlitedb');
$cols = sqlite_fetch_column_types('form name'$db, SQLITE_ASSOC);
foreach ($cols as $column => $type) {
echo "Column: $column Type: $type\n";
}
Using #Tarkus's answer, here are the regexes I used in R:
getColNames <- function(conn, tableName) {
x <- dbGetQuery( conn, paste0("SELECT sql FROM sqlite_master WHERE tbl_name = '",tableName,"' AND type = 'table'") )[1,1]
x <- str_split(x,"\\n")[[1]][-1]
x <- sub("[()]","",x)
res <- gsub( '"',"",str_extract( x[1], '".+"' ) )
x <- x[-1]
x <- x[-length(x)]
res <- c( res, gsub( "\\t", "", str_extract( x, "\\t[0-9a-zA-Z_]+" ) ) )
res
}
Code is somewhat sloppy, but it appears to work.
Try this sqlite table schema parser, I implemented the sqlite table parser for parsing the table definitions in PHP.
It returns the full definitions (unique, primary key, type, precision, not null, references, table constraints... etc)
https://github.com/maghead/sqlite-parser
Easiest way to get the column names of the most recently executed SELECT is to use the cursor's description property. A Python example:
print_me = "("
for description in cursor.description:
print_me += description[0] + ", "
print(print_me[0:-2] + ')')
# Example output: (inp, output, reason, cond_cnt, loop_likely)

Resources