JSON_SEARCH with a list / dynamic MySQL query in R - r

I do have a MySQL table called user_activities where one column (activities) is in JSON format:
id name activities
1 Peter ["football", "volley"]
2 Mary ["football", "hockey", "basketball"]
3 Jason ["volley", "hockey", "golf"]
And I need to construct a query that, given a list of activities, will return all those users that have at least one of the activities in this list.
Example 1:
Given a list
inputList <- list("football", "basketball")
the MySQL query should return:
id name activities
1 Peter ["football", "volley"]
2 Mary ["football", "hockey", "basketball"]
Example 2:
Given a list
inputList <- list("hockey", "golf", "basketball")
the MySQL query should return:
id name activities
2 Mary ["football", "hockey", "basketball"]
3 Jason ["volley", "hockey", "golf"]
I know that it's possible to check the existence of each element checking activity by activity, like:
SELECT * FROM user_activities
WHERE JSON_SEARCH(`activities`, 'one', 'football') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'basketball') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'volley') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'hockey') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'golf') IS NOT NULL;
But if an activity it's not in the specified list (inputList), I don't want to check its existence in activities. And this inputList changes every time I run the MySQL query.
So, is there any way to just check the list with the content in activities? I tried with:
SELECT * FROM user_activities
WHERE JSON_SEARCH(`activities`, 'all', (",paste(shQuote(inputList, type = "sh"), collapse = ','),")) IS NOT NULL;
but it 'obviously' returns an error:
`Error in .local(conn, statement, ...): could not run statement: Operand should contain 1 column(s)`
because JSON_SEARCH checks if a single string exists in a json array or a json document, and I'm not putting a single string in the function.
And JSON_CONTAINS
SELECT * FROM user_activities
WHERE JSON_CONTAINS(`activities`->'$[*]', JSON_ARRAY(", paste(shQuote(inputList, type = "sh"), collapse = ','), "))
returns if all the elements in the inputList exist in activities, and I want if any of the elements exist in activities (not necessarily all).
How could I achieve this?
Edit
I found a solution (see answer below) by building a dynamic query as it is suggested in this question MySQL Filter JSON_CONTAINS Any value from Array for PHP.

I found myself a solution with the dynamic query option I mentioned in the Edit part.
library(tractor.base)
condition <- function(dbcolumn,inlist){
cond <- implode(sapply(inlist, function(x) paste0("JSON_SEARCH(`",dbcolumn,"`, 'one', '", x,"') IS NOT NULL")), " OR ")
return(cond)
}
So, if I call the function (with the Example 2 in my question):
condition("activities",inputList)
it returns:
"JSON_SEARCH(`activities`, 'one', 'hockey') IS NOT NULL OR JSON_SEARCH(`activities`, 'one', 'golf') IS NOT NULL OR JSON_SEARCH(`activities`, 'one', 'basketball') IS NOT NULL"
So the MySQL query in R will finally look like:
query <- paste0("SELECT * FROM user_activities
WHERE ", condition("activities", inputList),";")

Related

Snowflake, Recursive CTE , Getting error String 'AAAA_50>BBBB_47>CCCC_92' is too long and would be truncated in 'CONCAT'

I am creating a recursive CTE in snowflake for getting complete path an getting following error:
String 'AAAA_50>BBBB_47>CCCC_92' is too long and would be truncated in 'CONCAT'
My script is as follows: (it works fine for 2 levels, starts failing for 3rd level)
with recursive plant
(child_col,parent_col,val )
as
(
select child_col, '' parent_col , trim(child_col) from My_view
where condition1 = 'AAA'
union all
select A.child_col,A.parent_col,
concat(trim(A.child_col),'>')||trim(val)
from My_view A
JOIN plant as B ON trim(B.child_col) = trim(A.parent_col)
)
select distinct * from plant
Most likely the child_col data type is defined as VARCHAR (N), this type is being passed on. Because CONCAT Returns:
The data type of the returned value is the same as the data type of
the input value(s).
Try to explicitly cast a type to a string like this cast(trim(child_col) as string):
Full code:
with recursive plant (child_col,parent_col,val )
as (
select child_col, '' parent_col , cast(trim(child_col) as string)
from My_view
where condition1 = 'AAA'
union all
select A.child_col, A.parent_col, concat(trim(A.child_col),'>')||trim(val)
from My_view A
join plant as B ON trim(B.child_col) = trim(A.parent_col)
)
select distinct * from plant
Remember that recursion in Snowflake is limited to 100 loops by default.
If you want to increase them, you need to contact support.
Reference: CONCAT Troubleshooting a Recursive CTE

R, ClickHouse: Expected: FixedString(34). Got: UInt64: While processing

I am trying to query data from ClickHouse database from R with subset.
Here is the example
library(data.table)
library(RClickhouse)
library(DBI)
subset <- paste(traffic[,unique(IDs)][1:30], collapse = ',')
conClickHouse <- DBI::dbConnect('here is the connection')
DataX <- dbgetdbGetQuery(conClickHouse, paste0("select * from database
and IDs in (", subset ,") ", sep = "") )
As a result I get error:
DB::Exception: Type mismatch in IN or VALUES section. Expected: FixedString(34).
Got: UInt64: While processing (IDs IN ....
Any help is appreciated
Thanks to the comment of #DennyCrane,
"select * from database where toFixedString(IDs,34) in
(toFixedString(ID1, 34), toFixedString(ID2,34 ))"
This query subset properly
https://clickhouse.tech/docs/en/sql-reference/functions/#strong-typing
Strong Typing
In contrast to standard SQL, ClickHouse has strong typing. In other words, it doesn’t make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
https://clickhouse.tech/docs/en/sql-reference/functions/type-conversion-functions/#tofixedstrings-n
select * from (select 'x' B ) where B in (select toFixedString('x',1))
DB::Exception: Types of column 1 in section IN don't match: String on the left, FixedString(1) on the right.
use casting toString or toFixedString
select * from (select 'x' B ) where toFixedString(B,1) in (select toFixedString('x',1))

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

Nesting "WHERE IN" (or AND, OR statements) SQLite

I'd like to have a SELECT query that gets 8 specific combinations of user1 & user2 (and that combination of user2 & user1 is also worth having and not redundant). The "IN" statement seemed worthwhile.
Directly related questions.
1. Can "WHERE IN" statements be nested?
If not, how might I most effectively/ most easily structure a query so that I might specify (for 4 pairs where user1 and user2 switch position - 8 combinations): (where user1 = billy AND user2 = sue) OR (where user1 = sue AND user2 =billy) OR (user1 = jack AND user2 = jill) OR (user1 = jill AND user2 = jack)...etc ?
[At this point it seems easier to run the query and grep out the pertinent info.]
Current thought:
sqlite3 -header -separator , some.db "SELECT DISTINCT programquality.time, simscores.user1, simscores.user2, simscores.simscore, programquality.gf, programquality.ga, programquality.pq FROM simscores LEFT JOIN programquality ON programquality.time = simscores.time AND programquality.username = simscores.user1 WHERE programquality.pq IS NOT NULL WHERE simscores.user1 IN ("abraham","billy","carl","dave","ethan","frank","george","harry") WHERE simscores.user2 IN ("abraham","billy","carl","dave","ethan","frank","george","harry");"
I've used this, but some non-relevant data is displayed.
sqlite3 -header -separator , some.db 'SELECT DISTINCT programquality.time, simscores.user1, simscores.user2, simscores.simscore, programquality.gf, programquality.ga, programquality.pq FROM simscores LEFT JOIN programquality ON programquality.time = simscores.time AND programquality.username = simscores.user1 WHERE (simscores.user1 = "billy" OR simscores.user1 = "suzy" OR simscores.user1 = "john") AND (simscores.user2 = "billy" OR simscores.user2 = "suzy" OR simscores.user2 = "john") AND programquality.pq IS NOT NULL AND programquality.time IS NOT NULL;'
A query can have only one WHERE clause, but that expression can combine multiple conditions with AND or OR:
SELECT ...
FROM ...
WHERE programquality.pq IS NOT NULL AND
simscores.user1 IN ('abraham', 'billy', ...) AND
simscores.user2 IN ('abraham', 'billy', ...)
However, these IN expressions do not allow you to match specific values for user1 and user2.
You cannot avoid listing all combinations you want.
However, you can simplify the expression somewhat by check the combined names for valid combinations:
... WHERE ... AND
simscores.user1 || '+' || simscores.user2
IN ('billy+sue', 'sue+billy',
'jack+jill', 'jill+jack',
...)

Consolidating values from multiple tables

I have an application which has data spread accross 2 tables.
There is a main table Main which has columns - Id , Name, Type.
Now there is a Sub Main table that has columns - MainId(FK), StartDate,Enddate,city
and this is a 1 to many relation (each main can have multiple entries in submain).
Now I want to display columns Main.Id, City( as comma seperated from various rows for that main item from submain), min of start date(from submain for that main item) and max of enddate( from sub main).
I thought of having a function but that will slow things up since there will be 100k records. Is there some other way of doing this. btw the application is in asp.net. Can we have a sql query or some linq kind of thing ?
This is off the top of my head, but firstly I would suggest you create a user defined function in sql to create the city comma separated list string that accepts #mainid, then does the following:
DECLARE #listStr VARCHAR(MAX)
SELECT #listStr = COALESCE(#listStr+',' , '') + city
FROM submain
WHERE mainid = #mainid
... and then return #listStr which will now be a comma separated list of cities. Let's say you call your function MainIDCityStringGet()
Then for your final result you can simply execute the following
select cts.mainid,
cts.cities,
sts.minstartdate,
sts.maxenddate
from ( select distinct mainid,
dbo.MainIDCityStringGet(mainid) as 'cities'
from submain) as cts
join
( select mainid,
min(startdate) as 'minstartdate',
max(enddate) as 'maxenddate'
from submain
group by mainid ) as sts on sts.mainid = cts.mainid
where startdate <is what you want it to be>
and enddate <is what you want it to be>
Depending on how exactly you would like to filter by startdate and enddate you may need to put the where filter within each subquery and in the second subquery in the join you may then need to use the HAVING grouped filter. You did not clearly state the nature of your filter.
I hope that helps.
This will of course be in stored procedure. May need some debugging.
An alternative to creating a stored procedure is performing the complex operations on the client side. (untested):
var result = (from main in context.Main
join sub in context.SubMain on main.Id equals sub.MainId into subs
let StartDate = subs.Min(s => s.StartDate)
let EndDate = subs.Max(s => s.EndDate)
let Cities = subs.Select(s => s.City).Distinct()
select new { main.Id, main.Name, main.Type, StartDate, EndDate, Cities })
.ToList()
.Select(x => new
{
x.Id,
x.Name,
x.Type,
x.StartDate,
x.EndDate,
Cities = string.Join(", ", x.Cities.ToArray())
})
.ToList();
I am unsure how well this is supported in other implimentations of SQL, but if you have SQL Server this works a charm for this type of scenario.
As a disclaimer I would like to add that I am not the originator of this technique. But I immediately thought of this question when I came across it.
Example:
For a table
Item ID Item Value Item Text
----------- ----------------- ---------------
1 2 A
1 2 B
1 6 C
2 2 D
2 4 A
3 7 B
3 1 D
If you want the following output, with the strings concatenated and the value summed.
Item ID Item Value Item Text
----------- ----------------- ---------------
1 10 A, B, C
2 6 D, A
3 8 B, D
The following avoids a multi-statement looping solution:
if object_id('Items') is not null
drop table Items
go
create table Items
( ItemId int identity(1,1),
ItemNo int not null,
ItemValue int not null,
ItemDesc nvarchar(500) )
insert Items
( ItemNo,
ItemValue,
ItemDesc )
values ( 1, 2, 'A'),
( 1, 2, 'B'),
( 1, 6, 'C'),
( 2, 2, 'D'),
( 2, 4, 'A'),
( 3, 7, 'B'),
( 3, 1, 'D')
select it1.ItemNo,
sum(it1.ItemValue) as ItemValues,
stuff((select ', ' + it2.ItemDesc --// Stuff is just used to remove the first 2 characters, instead of a substring.
from Items it2 with (nolock)
where it1.ItemNo = it2.ItemNo
for xml path(''), type).value('.','varchar(max)'), 1, 2, '') as ItemDescs --// Does the actual concatenation..
from Items it1 with (nolock)
group by it1.ItemNo
So you see all you need is a sub query in your select that retrieves a set of all the values you need to concatenate and then use the FOR XML PATH command in that sub query in a clever way. It does not matter where the values you need to concatenate comes from you just need to retrieve them using the sub query.

Resources