Extract Java comments from SQL statement in R - r

Trying to run a SQL statement in an RStudio environment, but I'm having difficulty extracting Java-style comments from the statement. I cannot edit the SQL statements / comments themselves, so trying a sequence of gsub to remove the unwanted special characters so I'm left with only the SQL statement in the R string.
I'm trying to use gsub to remove the special characters and the comment in between, but struggling to find the right regex to do so (especially one that does not read the division symbol in the SELECT statement as a part of the Java comment).
SELECT
id
, metric
, SUM(numerator)/SUM(denominator) AS rate
/*
This is an example of the comment.
I want to remove this. */
FROM table
WHERE id = 2

You can remove anything between /* and */ using this regex:
gsub(pattern = "/\\*[^*]*\\*/", replacement = "", x = text)
Result:
"SELECT\n id\n, metric\n, SUM(numerator)/SUM(denominator) AS rate\n/\nFROM table\nWHERE id = 2"

Related

Shiny textInput to sting used in SQL

I have a textInput in a Shiny app for the user to write three character product codes separated by a comma. For example: F03, F04, F05.
The output of the textInput is used in a function calling a sql script. It will be used as a filter in the sql statement, eg
sqlfunction <- function(text){
sqlQuery(conn, stri_paste("select .... where product_code in (", text, ");"))
}
To convert the textInput to a string that I can use in the sql statement, I have used
toString(sprintf("'%s'", unlist(strsplit(input$text_input, ","))))
This works and converts the textInput to 'F03', 'F04'. 'F05' however when used in the sql only the first code, 'F03', is used in the search despite using product_code in (). The data returned is only those with a product code F03.
How do I get all three codes, if not more, written in textInput into a string to use in the sql clause?
Andrew
if your input is
F03, F04, F05
with whitespaces between commas and next value, the statement gives :
select .... where product_code in ('F03', ' F04', ' F05');
Note the whitespaces. Then the values 'F04' and 'F05' are not found.
What if the input is 'F03,F04,F05' ? (without spaces)
I'll combine the important components of the other answers and discussion into one suggested answer. Feel free to accept one of them, they had the ideas first.
As Jrm_FRL said, it is possible that whitespace around the commas is being preserved in your script, which should not match in the SQL.
toString(sQuote(unlist(strsplit("hello, world, again", ","))))
# [1] "'hello', ' world', ' again'"
### ^-- ^-- leading spaces on the strings
Some options:
If you think it is important for the user to be able to intentionally introduce whitespace around the commas (meaning: at the beginning or end of a string/pattern), then your only hope is to instruct the user to only use whitespace when intended.
Otherwise, you can use trimws:
toString(sQuote(trimws(unlist(strsplit("hello, world, again", ",")))))
# [1] "'hello', 'world', 'again'"
strsplit(..., ",") can be incorrect if the user has quotes to keep things together. You might consider using read.csv:
trimws(unlist(read.csv(text="hello, \"world, too\", again", header = FALSE, stringsAsFactors = FALSE)))
# V1 V2 V3
# "hello" "world, too" "again"
This is not perfectly compatible with option 1 above.
Second, as Tim Biegeleisen and Jrm_FRL both agreed, you are specifically prone to SQL injection here. A malformed (either accidentally or intentionally) search string from the user can at best corrupt this query's results, at worst (depending on connection permissions) corrupt or delete the data in the database. (I strongly suggest you read https://db.rstudio.com/best-practices/run-queries-safely/.)
Ways to safeguard this:
Don't manually add single-quotes around your data: if a string includes a single-quote, it will not be escaped and will cause at best a SQL error.
toString(sQuote(trimws(unlist(strsplit("hello'; drop table students; --", ",")))))
# [1] "'hello'; drop table students; --'"
### this query may delete the 'students' table
### notice that `sQuote` is not enough here, it is not escaping this correctly
Instead, use DBI::dbQuoteString. While I believe most DBMSes use the same convention of single-quotes (and your question suggests that yours does, too), it can be good practice to let the database driver determine how to deal with literal strings and those with embedded quotes.
toString(DBI::dbQuoteString(con, trimws(unlist(strsplit("hello'; drop table students; --", ",")))))
# [1] "'hello''; drop table students; --'"
### ^^ this is SQL's way of escaping an embedded single-quote
### this is now a single string, allegedly SQL-safe
Instead of including the strings into the query, use DBI::dbBind, though admittedly you'll need to include multiple binding bookmarks (often but not always ?) based on the length of your values vector.
val_str <- "hello, world, again"
val_vec <- trimws(unlist(strsplit("hello, world, again", ",")))
qmarks <- paste(rep("?", length(val_vec)), collapse = ",")
qmarks
# [1] "?,?,?"
qry <- paste("select ... where product_code in (", qmarks, ")")
out <- tryCatch({
res <- NULL
res <- DBI::dbSendStatement(con, qry)
DBI::dbBind(res, val_vec)
DBI::dbFetch(res)
}, finally = { if (!is.null(res)) suppressWarnings(DBI::dbClearResult(res)) })
The use of ? varies by DMBS, so you may need to do some research for your specific situation.
(While I used tryCatch here to "guarantee" that res will be cleared on exit, this pattern is a little more robust than doing it without. If part of the query or binding fails without the finally= part, then it could leave the connection in an imperfect state.)
Here is what I suspect is what you want:
text_input = "A,B,C"
in_clause <- paste0("'", unlist(strsplit(text_input, ",")), "'", collapse=",")
sql <- paste0("WHERE product_code IN (", in_clause, ")")
sql
[1] "WHERE product_code IN ('A','B','C')"
Here I am still using your combination of unlist and strsplit to generate a string vector of terms for the IN clause. But then I use paste with collapse to get the output you want.

Copy to without quotes

I have a large dataset in dbf file and would like to export it to the csv type file.
Thanks to SO already managed to do it smoothly.
However, when I try to import it into R (the environment I work) it combines some characters together, making some rows much longer than they should be, consequently breaking the whole database. In the end, whenever I import the exported csv file I get only half of the db.
Think the main problem is with quotes in string characters, but specifying quote="" in R didn't help (and it helps usually).
I've search for any question on how to deal with quotes when exporting in visual foxpro, but couldn't find the answer. Wanted to test this but my computer catches error stating that I don't have enough memory to complete my operation (probably due to the large db).
Any helps will be highly appreciated. I'm stuck with this problem on exporting from the dbf into R for long enough, searched everything I could and desperately looking for a simple solution on how to import large dbf to my R environment without any bugs.
(In R: Checked whether have problems with imported file and indeed most of columns have much longer nchars than there should be, while the number of rows halved. Read the db with read.csv("file.csv", quote="") -> didn't help. Reading with data.table::fread() returns error
Expected sep (',') but '0' ends field 88 on line 77980:
But according to verbose=T this function reads right number of rows (read.csv imports only about 1,5 mln rows)
Count of eol after first data row: 2811729 Subtracted 1 for last eol
and any trailing empty lines, leaving 2811728 data rows
When exporting to TYPE DELIMITED You have some control on the VFP side as to how the export formats the output file.
To change the field separator from quotes to say a pipe character you can do:
copy to myfile.csv type delimited with "|"
so that will produce something like:
|A001|,|Company 1 Ltd.|,|"Moorfields"|
You can also change the separator from a comma to another character:
copy to myfile.csv type delimited with "|" with character "#"
giving
|A001|#|Company 1 Ltd.|#|"Moorfields"|
That may help in parsing on the R side.
There are three ways to delimit a string in VFP - using the normal single and double quote characters. So to strip quotes out of character fields myfield1 and myfield2 in your DBF file you could do this in the Command Window:
close all
use myfile
copy to mybackupfile
select myfile
replace all myfield1 with chrtran(myfield1,["'],"")
replace all myfield2 with chrtran(myfield2,["'],"")
and repeat for other fields and tables.
You might have to write code to do the export, rather than simply using the COPY TO ... DELIMITED command.
SELECT thedbf
mfld_cnt = AFIELDS(mflds)
fh = FOPEN(m.filename, 1)
SCAN
FOR aa = 1 TO mfld_cnt
mcurfld = 'thedbf.' + mflds[aa, 1]
mvalue = &mcurfld
** Or you can use:
mvalue = EVAL(mcurfld)
** manipulate the contents of mvalue, possibly based on the field type
DO CASE
CASE mflds[aa, 2] = 'D'
mvalue = DTOC(mvalue)
CASE mflds[aa, 2] $ 'CM'
** Replace characters that are giving you problems in R
mvalue = STRTRAN(mvalue, ["], '')
OTHERWISE
** Etc.
ENDCASE
= FWRITE(fh, mvalue)
IF aa # mfld_cnt
= FWRITE(fh, [,])
ENDIF
ENDFOR
= FWRITE(fh, CHR(13) + CHR(10))
ENDSCAN
= FCLOSE(fh)
Note that I'm using [ ] characters to delimit strings that include commas and quotation marks. That helps readability.
*create a comma delimited file with no quotes around the character fields
copy to TYPE DELIMITED WITH "" (2 double quotes)

How to do a column name inside of a dynamic where clause? TO_NUMBER(column name)

I am currently trying to create a dynamic Select statement when the user has to input a various amount of criteria to search by.
Currently, I have every part of the statement working except for the most important part.
I am attempting to do something like this:
selStmt := 'SELECT column_one, column_2, column_3
FROM nerf
whereClause := ' WHERE TO_NUMBER('''|| column_one ||''') <= '''|| userInput ||'''';
However, in doing this the WHERE cluse of my SELECT statement is not accurate as shown by my output line:
WHERE TO_NUMBER('') <= '5';
I have tried various solutions with quote marks and I end up with either a ORA-00905 missing identifier error, or I get a ORA-00911: invalid character error.
At this point I'm not quite sure how to approach this issue.
Any useful help gets thanks in advance.
For some reason, Oracle uses the single quote to delimit strings and to escape characters, so using '' is an instruction to Oracle to add a quote inside your string. Example:
'This is a string with a quote here: '' and then it ends normally'
will be represented as
This is a string with a quote here: ' and then it ends normally
In your example, you are ending the WHERE clause you're building up and then concatenating a PL/SQL variable identifier called column_one:
' WHERE TO_NUMBER('''|| column_one ||''')
...and with a NULL value for the identifier column_one this is represented as
WHERE TO_NUMBER('')
Presumably you want to reference column_one from inside the query, and not from a PL/SQL variable of the same name, so you should remove the quotes around it like so:
whereClause := ' WHERE TO_NUMBER(column_one) <= TO_NUMBER('''|| userInput ||''')';
Escaping strings in Oracle is often infuriating - it helps a lot if you have a good IDE with decent syntax highlighting like TOAD or SQL*Developer.
This should work:
selStmt := 'SELECT column_one, column_2, column_3 FROM nerf';
whereClause := ' WHERE TO_NUMBER(column_one) <= TO_NUMBER('''|| userInput ||''')';

SQLite: How to select part of string?

There is table column containing file names: image1.jpg, image12.png, script.php, .htaccess,...
I need to select the file extentions only. I would prefer to do that way:
SELECT DISTINCT SUBSTR(column,INSTR('.',column)+1) FROM table
but INSTR isn't supported in my version of SQLite.
Is there way to realize it without using INSTR function?
below is the query (Tested and verified)
for selecting the file extentions only. Your filename can contain any number of . charenters - still it will work
select distinct replace(column_name, rtrim(column_name,
replace(column_name, '.', '' ) ), '') from table_name;
column_name is the name of column where you have the file names(filenames can have multiple .'s
table_name is the name of your table
Try the ltrim(X, Y) function, thats what the doc says:
The ltrim(X,Y) function returns a string formed by removing any and all characters that appear in Y from the left side of X.
List all the alphabet as the second argument, something like
SELECT ltrim(column, "abcd...xyz1234567890") From T
that should remove all the characters from left up until .. If you need the extension without the dot then use SUBSTR on it. Of course this means that filenames may not contain more that one dot.
But I think it is way easier and safer to extract the extension in the code which executes the query.

Removing trailing Spaces from Long Datatype PL/SQL

I have a Long with a couple of sentences in it, at the end there is a huge amount of blank spaces that need removed.
The problem is that the I have wrote a function to convert this long to a Varchar2 and trim the spaces but this has not worked.
I have used, RTRIM, TRIM TRAILING, TRIM and even tried replace " " with "" (but that just removed all spaces even between words.
Example:
SELECT TRIM(comment)
FROM p_comments
WHERE p_domain = 'SIGNATURE'
AND p_code = c_p_code;
This did not work as it cannot perform the trim on a "LONG".
SELECT RTRIM(f_get_varchar(get_p_code('JOHN'))) FROM dual
Did not work and just returned the same result.
Does anyone have any ideas?
Managed to find the answer. I used a regular expression.
SELECT regexp_substr(cis.acs_reports.f_get_varchar(:p_pfo_code), '.+[^space::]') pfo_comment
FROM dual

Resources