Teradata Editdistance function - teradata

I am trying to use the teradata editdistance function, and it works nicely in the case I have static strings, e.g. select editdistance('Hello','Hella');
If I then introduce a column into the first string, that also works nicely e.g.
select editdistance(First_Name1, 'Hello');
But I do not seem to be able to understand how to do it over multiple strings for each input e.g.
select editdistance(First_Name1, First_Name2);
If I was using python a for loop would do the trick but that is not an option for this data unfortunately.
First_Name1
First_Name2
Adam
Adom
Chris
Chriss
Jane
Ben

Related

Why fn:substring-after Xquery function could not be used inside ML TDE

In my ML db, we have documents with distributor code like 'DIST:5012' (DIST:XXXX) XXXX is a four-digit number.
currently, in my TDE, the below code works well.
However instead of concat all the raw distributor codes, I want to simply concat the number part only. I used the fn:substring-after XQuery function. However, it won't work. It won't show that distributorCode column in the SQL View anymore. (Below code does not work.)
What is wrong? How to fix that?
Both fn:substring-after and fn:string-join is in TDE Dialect page.
https://docs.marklogic.com/9.0/guide/app-dev/TDE#id_99178
substring-after() expects a single string as input, not a sequence of strings.
To demonstrate, this will not work:
let $dist := ("DIST:5012", "DIST:5013")
return substring-after($dist, "DIST:")
This will:
for $dist in ("DIST:5012", "DIST:5013")
return substring-after($dist, "DIST:")
I need to double check what XPath expressions will work in a DTE, you might be able to change it to apply the substring-after() function in the last step:
fn:string-join( distributors/distributor/urn/substring-after(., 'DIST:'), ';')

How to set up custom automatic character replacement in emacs ess?

One of the useful features of ess-mode (Emacs speaks statistics) is to automatically replace the underscore _ with the assignment operator <-. Lately, I have been using a lot of pipes (written as %>%) and it would be great to not have to type three characters for each pipe.
Is it possible to define a custom key binding for the pipe, similar to the one converting _ into ->?
The simplest solution is to just bind a key to insert a string:
(define-key ess-mode-map (kbd "|") "%>%")
You can still insert | with C-q |. I'm not sure about the map's name; you'll almost certainly want to limit the key binding to ess-mode.
Check out yasnippet. You can use it to define something like "if this sequence of characters is followed by this key (which you can define to whatever you like), then replace them with this other sequence of characters and leave the cursor in this place". There's more to yasnippet than this, but there's plenty of documentation online and even already made recipes similar to the example I gave above that you can try, like yasnippet-ess-mode, for example.
Alternatively, you can also try abbrev-mode and see if that works for you.
I, for one, like yasnippet better, since you can also specify where to leave the cursor after the expansion, but abbrev-mode seems to be easier to set up. As always in Emacs world, try multiple solutions, don't settle for the first one you put your hands on. What works best for others might not work for you, and vice-versa.

AssertResultSetsHaveSameMetaData in TSQLT

I am using TSQLT AssertResultSetsHaveSameMetaData to compare metadata between two tables.But the problem is that i cannot hardcode the table name since i am passing the table name as the parameter at the runtime.So is there any way to do that
You use tSQLt.AssertResultSetsHaveSameMetaData by passing two select statements like this:
exec tSQLt.AssertResultSetsHaveSameMetaData
'SELECT TOP 1 * FROM mySchema.ThisTable;'
, 'SELECT TOP 1 * FROM mySchema.ThatTable;';
So it should be quite easy to parameterise the names of the tables you are comparing and build the SELECT statements based on those table name parameters.
However, if you are using the latest version of tSQLt you can also now use tSQLt.AssertEqualsTableSchema to do the same thing. You would use this assertion like this:
exec tSQLt.AssertEqualsTableSchema
'mySchema.ThisTable'
, 'mySchema.ThatTable';
Once again, parameterising the tables names would be easy since they are passed to AssertEqualsTableSchema as parameters.
If you explain the use case/context and provide sample code to explain what you are trying to do you stand a better chance of getting the help you need.

Bracket-escaped table names with dplyr

I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.
I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")
This results in the error message:
Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3
This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:
sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3
Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.
EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:
library(dplyr)
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")
Error in sqliteSendQuery(conn, statement, bind.data) :
error in statement: unrecognized token: "14M3"
This is a bug in dplyr. It's still there in the current github master. As #hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.
The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.
I've fixed both errors at a fork of dplyr. I've also put in a pull request to #hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.
Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.
Finally, the proposal from #RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources