Bracket-escaped table names with dplyr - r

I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.
I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")
This results in the error message:
Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3
This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:
sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3
Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.
EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:
library(dplyr)
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")
Error in sqliteSendQuery(conn, statement, bind.data) :
error in statement: unrecognized token: "14M3"

This is a bug in dplyr. It's still there in the current github master. As #hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.
The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.
I've fixed both errors at a fork of dplyr. I've also put in a pull request to #hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.
Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.
Finally, the proposal from #RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

Related

I keep getting a syntax error on this code [duplicate]

I'm trying to insert some information to MySQL with Pascal, but when I run the program I get the error
unknown column 'mohsen' in field list
This is my code
procedure TForm1.Button1Click(Sender: TObject);
var
aSQLText: string;
aSQLCommand: string;
namee:string;
family:string;
begin
namee:='mohsen';
family:='dolatshah';
aSQLText:= 'INSERT INTO b_tbl(Name,Family) VALUES (%s,%s)';
aSQLCommand := Format(aSQLText, [namee, family]);
SQLConnector1.ExecuteDirect(aSQLCommand);
SQLTransaction1.Commit;
end;
How can I solve this problem?
It's because your
VALUES (%s,%s)
isn't surrounding the namee and family variable contents by quotes. Therefore, your back-end Sql engine thinks your mohsen is a column name, not a value.
Instead, use, e.g.
VALUES (''%s'',''%s'')
as in
Namee := 'mohsen';
Family := 'dolatshah';
aSQLText:= 'INSERT INTO b_tbl(Name,Family) VALUES (''%s'',''%s'')';
aSQLCommand := Format(aSQLText,[namee,family]);
In the original version of my answer, I explained how to fix your problem by "doubling up" single quotes in the Sql you were trying to build, because it seemed to me that you were having difficulty seeing (literally) what was wrong with what you were doing.
An alternative (and better) way to avoid your problem (and the one I always use in real life) is to use the QuotedStr() function. The same code would then become
aSQLText := 'INSERT INTO b_tbl (Name, Family) VALUES (%s, %s)';
aSQLCommand := Format(aSQLText, [QuotedStr(namee), QuotedStr(family)]);
According to the Online Help:
Use QuotedStr to convert the string S to a quoted string. A single quote character (') >is inserted at the beginning and end of S, and each single quote character in the string is >repeated.
What it means by "repeated" is what I've referred to as "doubling up". Why that's important, and the main reason I use QuotedStr is to avoid the Sql db-engine throwing an error when the value you want to send contains a single quote character as in O'Reilly.
Try adding a row containing that name to your table using MySql Workbench and you'll see what I mean.
So, not only does using QuotedStr make constructing SQL statements as strings in Delphi code less error-prone, but it also avoid problems at the back-end, too.
Just in case this will help anybody else I had the same error when I was parsing a python variable with a sql statement and it had an if statement in i.e.
sql="select bob,steve, if(steve>50,'y','n') from table;"
try as I might it coming up with this "unknown column y" - so I tried everything and then I was about to get rid of it and give it up as a bad job until I thought I would swap the " for ' and ' for "..... Hoooraaahh it works!
This is the statement that worked
sql='select bob,steve, if(steve>50,"y","n") from table;'
Hope it helps...
To avoid this sort of problem and SQL injection you should really look into using SQL parameters for this, not the Pascal format statement.

teradata : to calulate cast as length of column

I need to use cast function with length of column in teradata.
say I have a table with following data ,
id | name
1|dhawal
2|bhaskar
I need to use cast operation something like
select cast(name as CHAR(<length of column>) from table
how can i do that?
thanks
Dhawal
You have to find the length by looking at the table definition - either manually (show table) or by writing dynamic SQL that queries dbc.ColumnsV.
update
You can find the maximum length of the actual data using
select max(length(cast(... as varchar(<large enough value>))) from TABLE
But if this is for FastExport I think casting as varchar(large-enough-value) and postprocessing to remove the 2-byte length info FastExport includes is a better solution (since exporting a CHAR() will results in a fixed-length output file with lots of spaces in it).
You may know this already, but just in case: Teradata usually recommends switching to TPT instead of the legacy fexp.

Is it possible to remove the binary values that prefix the output of teradata fexp?

I am trying to run teradata fexp with a simple sql script.
The select output column is a string expression and as such results in 2 extra length indicator bytes at the start of each row output.
I have searched for solutions online to the problem. I would like to avoid having to post-process if possible.
There is a thread suggesting the possibility of using an OUTMOD. I don't know what that is.
https://forums.teradata.com/forum/tools/fastexport-remove-binaryindicator-values-in-outmod
http://teradataforum.com/teradata/20100726_155313.htm
And yet another thread suggests casting to a fixed width string type but this would result in padding which I'd like to avoid.
https://forums.teradata.com/forum/tools/fexp-data-doubt
The desired output is actually a delimited plain text file. Is there a way to do it?

String continuation across multiple lines, no newline characters

Am using the RODBC library to bring data into R. I have a long query that I want to pass a variable to, much like this SO user.
Problem is that R interprets the whitespace/carriage returns in my query as a newline '\n'.
The accepted solution for this question suggests to simply break up the text into chunks and then paste() together - which works, but ideally I'd like to keep the whitespace intact - makes it easier to test/verify the behavior of the query over in the database before pasting into R.
In other languages I'm familiar with there's a simple line continuation character - indeed, several of the comments on the accepted answer are looking for an approach similar to python's \.
I found an aside to a workaround using strwrap deep in the bowels of an R discussion lists, so in the interest of making the internet better I will post it here. However, if someone can point the direction toward a more elegant/straightforward solution, I will happily accept your answer.
I don't know if you will find this helpful or not, but I have eventually gravitated towards keeping my SQL separate from my R scripts. Keeping the query in my R script, except for very very short ones, I find gets unreadable very quickly.
These days, I tend to keep queries that are more than a single line in their own separate .sql file. Then I can keep them nice and formatted and readable in a nice text editor, and read them into R as needed via something like this:
read_sql <- function(path){
stopifnot(file.exists(path))
sql <- readChar(path,nchar = file.info(path)$size)
sql
}
For binding parameters into the queries, I just keep a %s where the parameter will go in the .sql file, and then add in the parameters in R using sprintf.
I've been much happier this way, as I was finding that cluttering up my R scripts with really long paste statements and multi-line character objects was making my code really hard to read.
R's strwrap will destroy whitespace, including newline characters, per the documentation.
Essentially, you can get the desired behavior by initially letting R introduce line breaks/newline \ns, and then immediately stripping them out.
#make query using PASTE
query_1 <- paste("SELECT map.ps_studentid
,students.first_name || ' ' || students.last_name AS full_name
,map.testritscore
,map.termname
,map.measurementscale
FROM map$comprehensive_with_growth map
JOIN students
ON map.ps_studentid = students.id
WHERE map.termname = '",map_term,"'", sep='')
#remove newline characters introduced above.
#width is an arbitrary big number-
#it just needs to be longer than your string.
query_1 <- strwrap(query_1, width=10000, simplify=TRUE)
#execute the query
map_njask <- sqlQuery(XE, query_1)
query <- gsub(pattern='\\s',replacement="",x=query)
Try using sprintf to get variable substitution, and then replacing all newlines and whitespace.
See my answer to a similar question for details.

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources