Join works in Azure SQL but fails with with R DBI connection: The multi-part identifier could not be found - r

I have a query that works perfectly in SSMS. But when running the query in R using the DBI package, I receive several multipart identifier errors: The multi-part identifier: "rt.secondary_id" could not be bound, "rt.third_id" could not be bound, and "t2.important" could not be bound.
select t1.[main_id]
,rt.secondary_id
,rt.third_id
,t1.[date_col]
,t2.important
from t1
inner join rt on t1.main_id = rt.main_id
inner join t2 on rt.main_id = t2.main_id
inner join (select t1.main_id, max(t1.date_col) as upload_time from t1 group by t1.main_id) AS ag ON t1.main_id = ag.main_id AND t1.date_col = ag.upload_time
The unique identifier in t1 is the combination of main_id and date_col, and this query finds the most recent entry in t1 for a given main_id.
Not exactly sure if my query is structured in a poor way or this is an R issue. I've tried adding SET NOCOUNT ON to the query based on what I thought might be related issues elsewhere on stackoverflow, but no dice.

I found out what my issue was- silly (but time consuming) mistake on my part... but essentially, I was bringing my SQL query into R via paste(scan(...), collapse = " "). I had a comment in my SQL query, --, which could not be read correctly by R. Deleting the comment OR switching the comment to /* ... */ syntax fixes the problem.

Related

Sqoop trying to --split-by ROWID (Oracle) fails

(be Kind, this is my first question and I did extensive Research here and on the net beforehand. Question Oracle ROWID for Sqoop Split-By Column did not really solve this issue, as the original Person asking resorted to using another column)
I am using sqoop to copy data from an Oracle 11 DB.
Unfortunately, some tables have no index, no Primary key, only partitions (date). These tables are very large, hundreds of millions if not billions of rows.
so far, I have decided to Access data in the source by explicitly adressing the partitions. That works well and Speeds up the process nicely.
I need to do the splits by data that resides in each and every table in order to avoid too many if- branches in my bash script. (we're talking some 200+ tables here)
I notice that a split by 8 Tasks results in very uneven spread of workload among the Tasks. I considered using Oracle ROWID to define the split.
To do this, I must define a boundary-query. In a Standard query 'select * from xyz' the rowid is not part of the result set. therefore, it is not an option to let Sqoop define the boundary-query from --query.
Now, when I run this, I am getting the error
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Sqoop does not have the splitter for the given SQL
data type. Please use either different split column (argument --split-by)
or lower the number of mappers to 1. Unknown SQL data type: -8
samples of ROWID :
AAJXFWAKPAAOqqKAAA
AAJXFWAKPAAOqqKAA+
AAJXFWAKPAAOqqKAA/
it is static and unique once it is created for any row.
I cast this funny datatype into something else in my boundary-query
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect
jdbc:oracle:thin:#127.0.0.1:port:mydb --username $USER --P --m 8
--split-by ROWID --boundary-query "select cast(min(ROWID) as varchar(18)), cast
( max(ROWID)as varchar(18)) from table where laufbzdt >
TO_DATE('2019-02-27', 'YYYY-MM-DD')" --query "select * from table
where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD') and \$CONDITIONS "
--null-string '\\N'
--null-non-string '\\N'
But then I get ugly ROWIDs that are rejected by Oracle:
select * from table where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD')
and ( ROWID >= 'AAJX6oAG聕聁AE聉N:' ) AND ( ROWID < 'AAJX6oAH⁖⁁AD䁔䀷' ) ,
Error Msg = ORA-01410: invalid ROWID
how can I resolve this properly?
I am a LINUX-Embryo and have painfully chewed myself through the Topics of bash-shell-scripting and Sqooping so far, but I would like to make better use of evenly spread mapper-task workload - it would cut sqoop-time in half, I guess, saving some 5 to 8 hours.
TIA!
wahlium
You can try ROWNUM, but I think sqoop import does not work with pseudocolumn.

Create a dataset with a stored procedure with two databases for rdlc report VS2015

I want add a new Datasets to my rdlc report in VS 2015, so I created a new data source with my stored procedure. There it is :
CREATE PROCEDURE [dbo].[getAccidents]
(#p_anneeDebut date, #p_anneeFin date)
AS
SELECT *
FROM T_ACCIDENT
LEFT OUTER JOIN TJ_ACC_PAR ON TJ_ACC_PAR.ACC_id = T_ACCIDENT.ACC_id AND TJ_ACC_PAR.ACC_type = T_ACCIDENT.ACC_type AND TJ_ACC_PAR.ACC_annee = T_ACCIDENT.ACC_annee
LEFT OUTER JOIN TR_PARTIE_CORPS ON TJ_ACC_PAR.PAR_id = TR_PARTIE_CORPS.PAR_id
LEFT OUTER JOIN TR_BLESSURE ON TJ_ACC_PAR.BLE_id = TR_BLESSURE.BLE_id
LEFT OUTER JOIN ERP.dbo.TR_COST_CENTER ON TR_COST_CENTER.COS_id = T_ACCIDENT.ACC_lieuPrecis
WHERE ACC_date <= #p_anneeFin AND ACC_date >= #p_anneeDebut
But when I add this new data source, it does not appears in the list "Data Source" so I can't select it for my report. ( The data source is well created ).
I tested with other stored procedure and it works, the problem is this line (because without this line it works too) :
LEFT OUTER JOIN ERP.dbo.TR_COST_CENTER ON TR_COST_CENTER.COS_id = T_ACCIDENT.ACC_lieuPrecis
Because this line call an other database but the query works in SQL SERVER.
How can I solve my problem ?
Have you tried running you stored proc from the Sql server, maybe its the Collation between the two databases which is different, that is if the join is on a string value. If you are calling a stored proc in SSRS the best trouble shoot is to first get the results in SSMS and if an error comes up you can easily troubleshoot. Plus try using alias on your Joins.

How might I get detailed database error messages from dplyr::tbl?

I'm using R to plot some data I pull out of a database (the Stack Exchange data dump, to be specific):
dplyr::tbl(serverfault,
dbplyr::sql("
select year(p.CreationDate) year,
avg(p.AnswerCount*1.0) answers_per_question,
sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
"))
The query works fine on SEDE, but I get this error in the R console:
Error: <SQL> 'SELECT *
FROM (
select year(p.CreationDate) year,
avg(p.AnswerCount*1.0) answers_per_question,
sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
) "zzz11"
WHERE (0 = 1)'
nanodbc/nanodbc.cpp:1587: 42000: [FreeTDS][SQL Server]Statement(s) could not be prepared.
I reckoned "Statement(s) could not be prepared." meant that SQL Server didn't like the query for some reason. Unfortunately, it didn't give any hint about what went wrong. After fiddling with the query for a bit, I noticed it was wrapped in a subselect, according to the error message. Copying and executing the full query as constructed by one of the libraries in the chain, SQL Server gave me this more informative error message:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
Now the solution is obvious: remove (or comment out) the order by clause. But where is the detailed error message in the R console? I'm using Rstudio, should that matter. If I could get the full exception right next to the code I'm working on, it would help me fix bug a lot quicker. (And just to be clear, I get cryptic errors from dplyr::tbl often and typically use binary search debugging to fix them.)

Update with multiple tables and query optimization

I have this update statement
UPDATE
pr
SET
pr.ult_prezzo_euro = ROUND(pr.ult_prezzo/fs.cambio,7)
FROM --error SQL COMMAND NOT PROPERLY ENDED
fin_prodotto prod INNER JOIN
fin_prodotto_linea fpl ON prod.prodotto_id=fpl.prodotto_id INNER JOIN
fin_att_fin faf ON fpl.attivita_fin_id=faf.attivita_fin_id INNER JOIN
fin_prezzo pr ON pr.attivita_fin_id=faf.attivita_fin_id INNER JOIN
flx_sec_posizione_dt_upd fs ON pr.attivita_fin_id=fs.attivita_fin_id
where
prod.prodotto_id=43
and faf.codice_titolo_cad_s is not null
and pr.ult_prezzo = pr.ult_prezzo_euro
and faf.divisa_quot_t<>'242'
and prod.gstd_esist_b='S'
and fpl.gstd_esist_b='S'
and faf.gstd_esist_b='S'
and pr.gstd_esist_b='S'
and pr.gstd_ult_user_s in ('AGGIORNAMENTO_POSIZIONE')
and pr.ult_prezzo>0
and fs.cambio>0;
it give me "SQL COMMAND NOT PROPERLY ENDED" pointing at FROM row,
And also i would optimize this update statement because this should run on about 2 millions records, how is possible?
i can't be sure it will be right answer without seeing your database structure, but first:
you have wrong syntax, it should look like this:
UPDATE fin_prezzo as pr SET pr.ult_prezzo_euro = ...
and then you will have to change chain of JOINS, to adequately join from fin_prezzo table (as the first table mentioned).
if it would be problem, you can 'UPDATE' another table in the statement (without changing/adding/removing updated columns)
Ad optimization part:
it will be good to start with SELECT statement (instead of UPDATE) to see, how it is joinning (maybe duplicating rows because of joins) and reduce the duplications, or not necessary joins.
secondly is quicker to move 'rules' from WHERE statement to JOIN statement
for example:
INNER JOIN fin_att_fin as faf ON fpl.attivita_fin_id=faf.attivita_fin_id AND faf.gstd_esist_b='S'
another good technique is comparing numbers instead of strings ('S', '242'...), it is simply quicker
rest is up to trying..
hope i helped a bit ;)

Unexpected backwards incompatability in sqlite

I have a dev environment running sqlite 3.7.16.2 and a production environment running sqlite 3.7.9 and I am running into some unexpected backwards incompatability.
I have a table that looks like this:
sqlite> select * from calls;
ID|calldate|calltype
1|2013-10-01|monthly
1|2013-11-01|3 month
1|2013-12-01|monthly
2|2013-07-11|monthly
2|2013-08-11|monthly
2|2013-09-11|3 month
2|2013-10-11|monthly
2|2013-11-11|monthly
3|2013-04-22|monthly
3|2013-05-22|monthly
3|2013-06-22|3 month
3|2013-07-22|monthly
4|2013-10-04|monthly
4|2013-11-04|3 month
4|2013-12-04|monthly
5|2013-10-28|monthly
5|2013-11-28|monthly
With the newer version of sqlite (3.7.16.2) I can use this:
SELECT ID, MIN(calldate), calltype FROM calls WHERE calldate > date('NOW') GROUP BY ID;
which gives me:
ID|MIN(calldate)|calltype
1|2013-11-01|3 month
2|2013-11-11|monthly
4|2013-11-04|3 month
5|2013-10-28|monthly
However when I run this same code on the older version of sqlite (3.7.9) I get this:
ID|MIN(calldate)|calltype
1|2013-11-01|monthly
2|2013-11-11|monthly
4|2013-11-04|monthly
5|2013-10-28|monthly
I looked through the changes here, but could not figure out why this is still happening. Any suggestions on how to work around this or how to rewrite my query?
You are using an extension that was added in SQLite 3.7.11.
In standard SQL, it is not allowed to use columns that appear neither in the GROUP BY clause nor are wrapped in an aggregate function.
(SQLite accepts this silently for compatibility with MySQL, but returns the data from some random record in the group.)
To get other columns from a record with the minimum value, you have to search the minimum values for each group first, and then to join these with the original table:
SELECT calls.ID,
calls.calldate,
calls.calltype
FROM calls
JOIN (SELECT ID,
MIN(calldate) AS calldate
FROM calls
WHERE calldate > date('now')
GROUP BY ID
) AS earliest
ON calls.ID = earliest.ID AND
calls.calldate = earliest.calldate

Resources