Identify the incorrect history handled records - teradata

I have some data in a table where the history got incorrectly handled because of blanks/spaces coming in one column segment_value_cd.
I have to identify such records from the table.
I tried few queries but it;s fetching me the entire results.
Is there any way to identify only such records?
Sel * from party_segment where party_id in(6303031,6824664,216502393,6916270)
id Segment_Type_Cd Segment_Value_Cd Segment_Start_Dt Segment_End_Dt
6,303,031 MB 3/20/2013 6/7/2015
6,303,031 MB ? 6/7/2015 ?
6,824,664 MB 3/20/2013 6/7/2015
6,824,664 MB ? 6/7/2015 ?
6,916,270 MB ? 9/28/2015 ?
6,916,270 MB 3/20/2013 9/28/2015
216,502,393 NR ? 6/7/2015 ?
216,502,393 NR 8/7/2010 6/7/2015
Thanks for your help!!
EDIT:
The query is also fetching this party. However here history got handled because the segment_type_cd got changed.
23,707 KA 7/11/2010 3/6/2011
23,707 NM 3/6/2011 6/29/2011
23,707 KA 6/29/2011 3/25/2014
23,707 MB 3/25/2014 5/29/2014
23,707 KA 5/29/2014 6/7/2015
23,707 MB LC 6/7/2015 9/28/2015
23,707 KA ? 9/28/2015 ?
My requirement is to fetch only those parties where the segment_type_cd remains same and the history got handled based on blank and null segment_value_cd
and then merge those two records into one. Like the one below. I have to identify these and merge into one.
1 6,824,664 MB 3/20/2013 6/7/2015
2 6,824,664 MB ? 6/7/2015 ?

This should return any two consecutive rows with an empty string and a NULL in arbitrary order:
qualify -- within two rows there's both
-- an empty string
max(Segment_Value_Cd)
over (partition by party_id, Segment_Type_Cd
order by Segment_Start_Dt
rows 1 preceding) = ''
and -- and a NULL
min(case when Segment_Value_Cd is null then '*' end)
over (partition by party_id, Segment_Type_Cd
order by Segment_Start_Dt
rows 1 preceding) is not null

Related

dbgetquery java.sql.SQLException: Bigger type length than Maximum

I am trying to fetch a decently large result set (about 1-2M records) using RJDBC using the following
library(RJDBC)
drv <- JDBC("oracle.jdbc.driver.OracleDriver",
classPath="../oracle11g/ojdbc6.jar", " ")
con <- dbConnect(drv, "jdbc:oracle:thin:#hostname:1521/servname","user","pswd")
data <- dbGetQuery(con, "select * from largeTable where rownum < xxx")
The above works if xxx is less than 32768. Above 32800, I get the following exception
> data <- dbGetQuery(con, "select * from dba_objects where rownum < 32768")
> dim(data)
[1] 32767 15
> data <- dbGetQuery(con, "select * from dba_objects where rownum < 32989")
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Bigger type length than Maximum
In https://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf, I see "fetch retrieves the content of the result set in the form of a data frame. If n is -1 then the current implementation fetches 32k rows first and then (if not sufficient) continues with chunks of 512k rows, appending them." followed by "Note that some databases (like Oracle) don’t support a fetch size of more than 32767."
Sorry for the newbie question but I don't see how I can tell dbGetQuery to fetch the result set in chunks of 32K only. I believe my fetch is dying because it went to fetch 512K records.
Would really appreciate any suggestions. Thanks in advance.

HiveQL, Hive SQL select date range

It seems simple in SQL but I'm having troubles using HiveQL with date range.
I have a dataset like this:
hive> describe logs;
id string,
ts string,
app_id int
hive> select * from logs limit 5;
1389 2014-10-05 13:57:01 12
1656 2014-10-06 03:57:59 15
1746 2014-10-06 10:58:25 19
1389 2014-10-09 08:57:01 12
1656 2014-10-10 01:57:59 15
My goal is to get the distinct id for the last 3 days. The best thing is to read the current system time and get the unique id of last 3 days, but not sure where I need to put "unix_timestamp()". Considered that the log is recorded realtime and there's today's date in ts, I tried to use this query (first approach)
hive > SELECT distinct id FROM logs HAVING to_date(ts) > date_sub(max(ts), 3) and to_date(ts) < max(ts);
FAILED: SemanticException [Error 10025]: Line 1:45 Expression not in GROUP BY key 'ts'
If I add group by 'ts' like below, it spits up this error:
hive> SELECT distinct ext FROM pas_api_logs group by ts HAVING to_date(ts) > date_sub(max(ts), 7) and to_date(ts) < max(ts);
FAILED: SemanticException 1:47 SELECT DISTINCT and GROUP BY can not be in the same query. Error encountered near token 'ts'
After the numerous try, the last approach made was this, studied after [similar topic][1].
Select distinct id from (SELECT * FROM logs JOIN logs ON (max(logs.ts) = to_date(logs.ts))
UNION ALL
SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 1))
UNION ALL
SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 2)));
Apparently this doesn't work either. Can someone shed some lights on this?
The required result can be obtained by using this statement:
select distinct id from logs where DATEDIFF(from_unixtime(unix_timestamp()),ts) <= 3;
Hope it helps !

Remove anything above 900 rows in MariaDB

This works in Mysql, but it seems that the syntax for MariaDB is different, I am trying to remove anything above the first 900 returned rows (LIMIT 900) :
DELETE FROM cronschedule NOT IN (SELECT * FROM cronschedule LIMIT 900);
Trying to do this in Maria though returns the following error :
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
corresponds to your MariaDB server version for the right syntax to use near
'NOT IN (SELECT * FROM cronschedule LIMIT 900)' at line 1
So how would I do this in Maria?
I'd expect this to be a bit more efficient than your answer with the LEFT JOIN / IS NULL construction:
DELETE cr.*
FROM cronschedule cr
JOIN
(
SELECT id
FROM cronschedule ii
ORDER BY
id ASC
LIMIT 1 OFFSET 900
) i2
ON cr.id >= i2.id ;
This seems to work
DELETE cr.*
FROM cronschedule cr
LEFT JOIN
(
SELECT id
FROM cronschedule ii
ORDER BY
id ASC
LIMIT 900
) i2
ON cr.id = i2.id
WHERE i2.id IS NULL;

I need to put amount '0' if the row exist but has no amount in my "where " conditions

I need to put amount '0' if the row exist but has no amount in my "where " conditions.
the original command is:
select t.aaa, count (t.bbb), sum (t.ccc)
from nrb t
where t.vvv IN ('3','4','5','6','D','E','F')
and t.ddd like '50%'
and t.eee >= TO_DATE('2012/03/21','YYYY/MM/DD')
and t.eee <= TO_DATE('2012/07/21','YYYY/MM/DD')
group by t.aaa
order by t.aaa
and the result is:
"result" tab in excel file.
I need this result:
"result 2" tab in excel file.
the file : I did send full result .
http://www.mediafire.com/?69cc4ay6cyt9cr9
how can I have this?
Pl/sql 7.0.2 unlimited user license
oci : 9.2
oracle db : 11.1.0.6.0 enterprise
os : win xp
You probably want:
select t1.aaa, coalesce(t2.bbb_count, 0) bbb_count,
coalesce(t2.ccc_sum, 0) ccc_sum
from (
select distinct aaa
from nrb
) t1
left join (
select t.aaa, count (t.bbb) bbb_count, sum (t.ccc) ccc_sum
from nrb t
where t.vvv IN ('3','4','5','6','D','E','F')
and t.ddd like '50%'
and t.eee >= TO_DATE('2012/03/21','YYYY/MM/DD')
and t.eee <= TO_DATE('2012/07/21','YYYY/MM/DD')
group by t.aaa
) t2 on t1.aaa = t2.aaa
order by t1.aaa;

How to filter teradata help table

I'd like to create a table out of the dataset generated by teradata's "help table" function so i can add some more information about the table, and be able to filter the rows by conditions. the table has 400+ columns, so this would be very convenient for management. I'd like to be able to do something similar to creating a table as select, but it doesn't work with the help table syntax. short of exporting the data to excel, then manually creating the table schema and importing the table back in, does anyone know how to convert the output of a help table query into a table in teradata?
The output from the HELP TABLE command comes from Data Dictionary.
If I understand correctly, you want to create a new table with the following output.
help table t1;
*** Help information returned. 4 rows.
*** Total elapsed time was 1 second.
Column Name Type Comment
------------------------------ ---- --------
a1 I ?
b1 CF ?
c1 D ?
d1 DA ?
You can get all of those three columns (or even more) from the table DBC.TVFields.
help table dbc.tvfields;
help table dbc.tvfields;
*** Help information returned. 37 rows.
*** Total elapsed time was 1 second.
Column Name Type Comment
------------------------------ ---- ----------------
TableId BF ?
FieldName CV ?
FieldId I2 ?
Nullable CF ?
FieldType CF ?
MaxLength I ?
DefaultValue CV ?
DefaultValueI BV ?
TotalDigits I2 ?
ImpliedPoint I2 ?
FieldFormat CV ?
FieldTitle CV ?
CommentString CV ?
CollationFlag CF ?
UpperCaseFlag CF ?
DatabaseId BF ?
Compressible CF ?
CompressValueList CV ?
FieldStatistics BV ?
ColumnCheck CV ?
CheckCount I2 ?
CreateUID BF ?
CreateTimeStamp TS ?
LastAlterUID BF ?
LastAlterTimeStamp TS ?
LastAccessTimeStamp TS ?
AccessCount I ?
SPParameterType CF ?
CharType I2 ?
LobSequenceNo I2 ?
IdColType CF ?
UDTypeId BF ?
UDTName CV ?
TimeDimension CF ?
VTCheckType CF ?
TTCheckType CF ?
ConstraintId BF ?
But first we need to find out DatabaseId and TableId.
select databaseid
from dbc.dbase
where databasename='db1';
*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.
DatabaseId
----------
00000F04
select TVMId
from dbc.tables2
where databaseid='00000F04'xb
and TVMName='t1';
*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.
TVMId
------------
0000D8070000
Now you can list all the columns you need and store them correspondingly.
select * from dbc.tvfields
where databaseid='00000F04'xb
and tableid='0000D8070000'xb;

Resources