How to use dynamic values while executing SQL scripts in R - r

My R workflow now involves dealing with a lot of queries (RPostgreSQL library). I really want to make code easy to maintain and manage in the future.
I started loading large queries from separate .SQL files (this helped) and it worked great.
Then I started using interpolated values (that helped) which means that I can write
SELECT * FROM table WHERE value = ?my_value;
and (after loading it into R) interpolate it using sqlInterpolate(ANSI(), query, value = "stackoverflow").
What happens now is I want to use something like this
SELECT count(*) FROM ?my_table;
but how can I make it work? sqlInterpolate() only interpolates safely by default. Is there a workaround?
Thanks

In ?DBI::SQL, you can read:
By default, any user supplied input to a query should be escaped using
either dbQuoteIdentifier() or dbQuoteString() depending on whether it
refers to a table or variable name, or is a literal string.
Also, on this page:
You may also need dbQuoteIdentifier() if you are creating tables or
relying on user input to choose which column to filter on.
So you can use:
sqlInterpolate(ANSI(),
"SELECT count(*) FROM ?my_table",
my_table = dbQuoteIdentifier(ANSI(), "table_name"))
# <SQL> SELECT count(*) FROM "table_name"

sqlInterpolate() is for substituting values only, not other components like table names. You could use other templating frameworks such as brew or whisker.

Related

Formatting SQL WHERE Conditional Values

I am looking to see if there is a way to format conditional values in batch instead of manually typing. For example, I am filtering on 5 digit codes in SQL, my source of the codes is in Excel in list form. There can be hundreds of codes to add to a SQL WHERE statement to filter on, is there tool or formatting methods the will take a list of values and format with single quotes and comma separation?
From this:
30239
30240
30241
30242
To this:
'30239',
'30240',
'30241',
'30242',
...
Then, these formatted values can be pasted into the WHERE clause instead of manually typing all of this out. Again, this is for hundreds of values...
I used to use BrioQuery that had functionality to import text files to be used in filtering, but my current qry tool, TOAD Data Point does not seem to have this.
Thank you
Look into SQL*Loader. Create s staging table to contain the imported values. Use loader to populate the stage table. Then modify your query to reference the stage table; it becomes something like:
Select ...
where target_column_name in (select column_name from stage_table).
The structure "where in ( select)" may not be the best for performance, but once loaded you will have all the facilities SQL offers at your disposal.
It has been a few years since I've used TOAD but as I remember it has an import functionality. There are other tools for loading data into Excel into Oracle. SQL*Loader just happens to be the one Oracle supplies with the RDBMS.

Not able to have commands in User-Defined functions in Kusto

I am trying to create a function that will accept name of tag and a datetime value and drop a extent within a specific table which has that tag and then ingest a new record into that table with the same tag and the input datetime value -- sort of 'update' simulation. I am not bothered about performance, it's just going to hold metadata -- maybe 20-30 rows at max.
So this is how the create table looks:-
.create table MyTable(sometext:string,somevalue:datetime)
And shown below is my function creation step, which is failing:-
.create-or-alter function MyFunction(arg_sometext:string,arg_somedate:datetime)
{
.drop extents <| .show table MyTable extents where tags has arg_sometext;
.ingest inline into table MyTable with (tags="[arg_sometext]") <| arg_somedate
}
So you can see I am trying to do something simple -- I am suspecting that Kusto won't allow commands in a function. Is there any workaround for achieving this?
Generally:
Kusto mandates that control commands start with a dot (.), and that this must be the first character in the text of the command. As queries, functions, etc. don't start with a dot, this precludes them from invoking control commands.
This is an intentional limitation that prevents a wide range of code injection attacks. By imposing this rule, Kusto makes it easy to guarantee that any query that does not begin with a dot will only have read access to the data and metadata, and never be able to alter them.
Specifically: with regards to your specific scenario:
I'm assuming it's triggered automatically (even if you did have the option to create a function), which suggests you should be able to achieve your goal using Kusto's API / Client libraries and a simple script/app.
An alternative, and perhaps even better approach, would be to re-consider if you actually need to delete or update specific records, or you can use summarize arg_max() in order to query for only the latest "versions" of the records (you could also create a function which encapsulates that logic and overrides the table, by naming the function with the table's name).

UNION of tables using bigquery LegacySQL

I'm trying without luck to do a query to retrieve the union two tables of events using legacySQL, as standardSQL is not yet supported on data studio.
In standardSQL that would be something like:
SELECT
*
FROM
`com_myapp_ANDROID.app_events_*`,
`com_myapp_IOS.app_events_*`
However, in legacySQL I get an error when trying to refer app_events_*. How do I include all the tables of my events, so I can filter it afterwards on data studio if I can't use the wildcard?
I've tried something like:
select * from (TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events_"'))
But not sure if this is the right approach, I get:
Cannot output multiple independently repeated fields at the same time.
Found user_dim_user_properties_value_index and event_dim_date
Edit: in the end this is the result of the query, as you can't use directly FLATTEN with TABLE_QUERY:
select
*
from
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events"')),user_dim.user_properties),
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_IOS, 'table_id CONTAINS "app_events"')),user_dim.user_properties)
Table wildcards don't work in legacy SQL as you have guessed so you have to use the TABLE_QUERY() function.
Your approach is right but the first parameter in the TABLE_QUERY function should be the dataset name not the first part of the table name. Assuming your dataset name is app_events that would look like this:
TABLE_QUERY(app_events,'table_id CONTAINS "app_events"')
In legacySQL the union table operator is comma
select * from [table1],[table2]
For TABLE_QUERY you would include the dataset name as first param, and the expression for the second
select * from (TABLE_QUERY([dataset], 'table_id CONTAINS "event"'))
to read more how to debug TABLE_QUERY read this linked answer
The Web UI automatically flattens you the results, but when there are independent repeated fields you need to flatten with the FLATTEN wrapper.
It takes two params, table, and repeated field eg: FLATTEN(table, tags)
Also if TABLE_QUERY is involved you need to subselect probably like
select
*
from
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events"')),user_dim.user_properties)
That particular issue you are experiencing is not UNION related - you will see same error message even with just one table if the table has multiple independently repeated fields and you are trying to output them at once. This scenario is specific to Legacy SQL and can be resolved with use of FLATTEN clause
At the same time, most likely you don't actually mean to use SELECT * which cause those repeated fields to be in output all at the same time. If you can narrow down your output list - you have slight chance to address it - but if still few independently repeated fields are in output - you can use FLATTEN technique

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

SQL Server 2005 - Pass In Name of Table to be Queried via Parameter

Here's the situation. Due to the design of the database I have to work with, I need to write a stored procedure in such a way that I can pass in the name of the table to be queried against if at all possible. The program in question does its processing by jobs, and each job gets its own table created in the database, IE table-jobid1, table-jobid2, table-jobid3, etc. Unfortunately, there's nothing I can do about this design - I'm stuck with it.
However, now, I need to do data mining against these individualized tables. I'd like to avoid doing the SQL in the code files at all costs if possible. Ideally, I'd like to have a stored procedure similar to:
SELECT *
FROM #TableName AS tbl
WHERE #Filter
Is this even possible in SQL Server 2005? Any help or suggestions would be greatly appreciated. Alternate ways to keep the SQL out of the code behind would be welcome too, if this isn't possible.
Thanks for your time.
best solution I can think of is to build your sql in the stored proc such as:
#query = 'SELECT * FROM ' + #TableName + ' as tbl WHERE ' + #Filter
exec(#query)
not an ideal solution probably, but it works.
The best answer I can think of is to build a view that unions all the tables together, with an id column in the view telling you where the data in the view came from. Then you can simply pass that id into a stored proc which will go against the view. This is assuming that the tables you are looking at all have identical schema.
example:
create view test1 as
select * , 'tbl1' as src
from job-1
union all
select * , 'tbl2' as src
from job-2
union all
select * , 'tbl3' as src
from job-3
Now you can select * from test1 where src = 'tbl3' and you will only get records from the table job-3
This would be a meaningless stored proc. Select from some table using some parameters? You are basically defining the entire query again in whatever you are using to call this proc, so you may as well generate the sql yourself.
the only reason I would do a dynamic sql writing proc is if you want to do something that you can change without redeploying your codebase.
But, in this case, you are just SELECT *'ing. You can't define the columns, where clause, or order by differently since you are trying to use it for multiple tables, so there is no meaningful change you could make to it.
In short: it's not even worth doing. Just slop down your table specific sprocs or write your sql in strings (but make sure it's parameterized) in your code.

Resources