Unable to do conversion of datatype in U-sql - u-sql

I am facing issue doing some transformations using U-sql one of the issue is while changing the Date format, Only when I skip the First rows(1) I am able to convert the date format. But I do need the column names so I cannot Skip the first row. Also I need to do some other transformations like data type conversion and simple concatenations.Below is my sample code.Kindly help.
DECLARE #dir string = "/storefolder/Sourcefile/dwfile3.csv";
DECLARE #file_set_path string = "/BCBSvermot/Sample_output.csv";
#data =
EXTRACT
CHECK_DATE string,
FROM #dir
USING Extractors.Csv(skipFirstNRows:1);
#result = SELECT
Convert.ToDateTime(CHECK_DATE).ToString("dd-MM-yyyy") AS CHECK_DATE
FROM #data;
OUTPUT #result
TO #file_set_path
USING Outputters.Csv();
Thanks,
Rav

You can declare a function similar to this:
DECLARE #func Func<string,string> =
(s) =>{
DateTime i;
var x = DateTime.TryParse(s, out i);
return x?((DateTime)i).ToString("dd-MM-yyyy",CultureInfo.CurrentCulture) : s;
};
Then you can use it on your queries
#result =
SELECT #func(CHECK_DATE) AS CHECK_DATE
FROM #data;

Related

How add only required fields from table to dynamic temp table? - PROGRESS 4GL

I am new to progress 4gl and below is the query used to add all fields from a table to dynamic temp table except few fields but I am not sure how to add only required fields to dynamic temp table. Please help to modify the query I shared.
/* p-ttdyn2.p - a join of 2 tables */
DEFINE VARIABLE tth4 AS HANDLE.
DEFINE VARIABLE btth4 AS HANDLE.
DEFINE VARIABLE qh4 AS HANDLE.
DEFINE VARIABLE bCust AS HANDLE.
DEFINE VARIABLE bOrder AS HANDLE.
DEFINE VARIABLE i AS INTEGER.
DEFINE VARIABLE fldh AS HANDLE EXTENT 15.
bCust = BUFFER customer:HANDLE.
bOrder = BUFFER order:HANDLE.
CREATE TEMP-TABLE tth4.
tth4:ADD-FIELDS-FROM(bCust,"address,address2,phone,city,comments").
tth4:ADD-FIELDS-FROM(bOrder,"cust-num,carrier,instructions,PO,terms").
tth4:TEMP-TABLE-PREPARE("CustOrdJoinTT").
btth4 = tth4:DEFAULT-BUFFER-HANDLE.
FOR EACH customer WHERE cust.cust-num < 6, EACH order OF customer:
btth4:BUFFER-CREATE.
btth4:BUFFER-COPY(bCust).
btth4:BUFFER-COPY(bOrder).
END.
/* Create Query */
CREATE QUERY qh4.
qh4:SET-BUFFERS(btth4).
qh4:QUERY-PREPARE("for each CustOrdJoinTT").
qh4:QUERY-OPEN.
REPEAT WITH FRAME zz DOWN:
qh4:GET-NEXT.
IF qh4:QUERY-OFF-END THEN LEAVE.
REPEAT i = 1 TO 15:
fldh[i] = btth4:BUFFER-FIELD(i).
DISPLAY fldh[i]:NAME FORMAT "x(15)"
fldh[i]:BUFFER-VALUE FORMAT "x(20)".
END.
END.
btth4:BUFFER-RELEASE.
DELETE OBJECT tth4.
DELETE OBJECT qh4.
ADD-FIELDS-FROM only supports excluding fields that are not needed. Instead you can use ADD-LIKE-FIELD multiple times:
CREATE TEMP-TABLE tth4.
tth4:ADD-LIKE-FIELD("address", "customer.address").
tth4:ADD-LIKE-FIELD("address2", "customer.address2").
tth4:ADD-LIKE-FIELD("phone", customer.phone").
...
tth4:ADD-LIKE-FIELD("cust-num", "Order.cust-num").
...
tth4:TEMP-TABLE-PREPARE("CustOrdJoinTT").
btth4 = tth4:DEFAULT-BUFFER-HANDLE.
Depending on your use case, you can also invert the required field list to an except field list:
var handle ht,hb.
var longchar lcjson.
function invertFields returns character (
i_hb as handle,
i_crequired as char
):
var char cexcept,cfield.
var int ic.
do ic = 1 to i_hb:num-fields:
cfield = i_hb:buffer-field( ic ):name.
if lookup( cfield, i_crequired ) = 0 then
cexcept = cexcept + ',' + cfield.
end.
return substring( cexcept, 2 ).
end function.
create temp-table ht.
ht:add-fields-from(
buffer customer:handle,
invertFields( buffer customer:handle, "CustNum,Name" )
).
ht:temp-table-prepare( 'tt' ).
hb = ht:default-buffer-handle.
hb:buffer-create().
assign
hb::CustNum = 1
hb::Name = 'test'
.
hb:write-json( 'longchar', lcjson, true ).
message string( lcjson ).
https://abldojo.services.progress.com/?shareId=624993253fb02369b25437c4

Select Top 1 From a Table For Each row in another Table

I am just starting to work with openedge and I need to join information from two tables but I just need the first row from the second one.
Basically I need to do a typical SQL Cross Apply but in progress. I look in the documentation and the Statement FETCH FIRST 10 ROWS ONLY only in OpenEdge 11.
My query is:
SELECT * FROM la_of PUB.la_ofart ON la_of.empr_cod = la_ofart.empr_cod
AND la_of.Cod_Ordf = la_ofart.Cod_Ordf
AND la_of.Num_ordex = la_ofart.Num_ordex AND la_of.Num_partida = la_ofart.Num_partida
CROSS APPLY (
SELECT TOP 1 ofart.Cod_Ordf AS Cod_Ordf_ofart ,
ofart.Num_ordex AS Num_ordex_ofart
FROM la_ofart AS ofart
WHERE ofart.empr_cod = la_ofart.empr_cod
AND ofart.Num_partida = la_ofart.Num_partida
AND la_ofart.doc1_num = ofart.doc1_num
AND la_ofart.doc2_linha = ofart.doc2_linha
ORDER BY ofart.Cod_Ordf DESC) ofart
I am using SSMS to extract data from OE10 using an ODBC connector and querying to OE using OpenQuery.
Thanks for all help.
If I correctly understood your question, maybe you can use something like this. Maybe this isn't the best solution for your problem, but may suit your needs.
DEF BUFFER ofart FOR la_ofart.
DEF TEMP-TABLE tt-ofart NO-UNDO LIKE ofart
FIELD seq AS INT
INDEX ch-seq
seq.
DEF VAR i-count AS INT NO-UNDO.
EMPTY TEMP-TABLE tt-ofart.
blk:
FOR EACH la_ofart NO-LOCK,
EACH la_of NO-LOCK
WHERE la_of.empr_cod = la_ofart.empr_cod
AND la_of.Cod_Ordf = la_ofart.Cod_Ordf
AND la_of.Num_ordex = la_ofart.Num_ordex
AND la_of.Num_partida = la_ofart.Num_partida,
EACH ofart NO-LOCK
WHERE ofart.empr_cod = la_ofart.empr_cod
AND ofart.Num_partida = la_ofart.Num_partida
AND ofart.doc1_num = la_ofart.doc1_num
AND ofart.doc2_linha = la_ofart.doc2_linha
BREAK BY ofart.Cod_Ordf DESCENDING:
ASSIGN i-count = i-count + 1.
CREATE tt-ofart.
BUFFER-COPY ofart TO tt-ofart
ASSIGN ofart.seq = i-count.
IF i-count >= 10 THEN
LEAVE blk.
END.
FOR EACH tt-ofart USE-INDEX seq:
DISP tt-ofart WITH SCROLLABLE 1 COL 1 DOWN NO-ERROR.
END.

u-sql script can not obtain scalar value from dataset

In u-sql script I must extract a variable from file to a dataset and then use it to form a name of output file. How can I get the variable from the dataset?
In details.
I have 2 input files: csv file with a set of fields and a dictionary file. The 1st file has file name like ****ClintCode*****.csv. The 2nd file-dictionary has 2 fields with mapping: ClientCode - ClintCode2. My task is extract ClientCode value from the file name, get ClientCode2 from the dictionary, insert it as a field to output file (implemented), and, moreover, form the name of output file as ****ClientCode2****.csv.
Dictionary csv file has the content:
OldCode NewCode
6HAA Alfa
CCVV Beta
CVXX gamma
? Davis
The question is how to get ClientCode2 into scalar variable to write an expression for the output file?
DECLARE #inputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ESP_FAD_GL_6HAA_20170930.txt"; // '6HAA' is ClientCode here that mapped to other code in ClientCode_KVP.csv
DECLARE #outputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ClientCode_sftp_" + // 'ClientCode' should be replaced with ClientCode from mapping in ClientCode_KVP.csv
DateTime.Now.ToString("yyyymmdd") + "_" +
DateTime.Now.ToString("HHmmss") + ".csv";
DECLARE #dictionaryFile string = "D:/DFS_SSC_Automation/ClientCode_KVP.csv";
#dict =
EXTRACT [OldCode] string,
[NewCode] string
FROM #dictionaryFile
USING Extractors.Text(skipFirstNRows : 1, delimiter : ',');
#theCode =
SELECT Path.GetFileNameWithoutExtension(#inputFile).IndexOf([OldCode]) >= 0 ? 1 : 3 AS [CodeExists],
[NewCode]
FROM #dict
UNION
SELECT *
FROM(
VALUES
(
2,
""
)) AS t([CodeExists],[NewCode]);
#code =
SELECT [NewCode]
FROM #theCode
ORDER BY [CodeExists]
FETCH 1 ROWS;
#GLdata =
EXTRACT [ASAT] string,
[ASOF] string,
[BASIS_INDICATOR] string,
[CALENDAR_DATE] string,
[CR_EOP_AMOUNT] string,
[DR_EOP_AMOUNT] string,
[FUND_ID] string,
[GL_ACCT_TYPE_IND] string,
[TRANS_CLIENT_FUND_NUM] string
FROM #inputFile
USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);
// Prepare output dataset
#FundInfoGL =
SELECT "" AS [AccountPeriodEnd],
"" AS [ClientCode],
[FUND_ID] AS [FundCode],
SUM(GL_ACCT_TYPE_IND == "A"? System.Convert.ToDecimal(DR_EOP_AMOUNT) : 0) AS [NetValueOtherAssets],
SUM(GL_ACCT_TYPE_IND == "L"? System.Convert.ToDecimal(CR_EOP_AMOUNT) : 0) AS [NetValueOtherLiabilities],
0.0000 AS [NetAssetsOfSeries]
FROM #GLdata
GROUP BY FUND_ID;
// NetAssetsOfSeries calculation
#FundInfoGLOut =
SELECT [AccountPeriodEnd],
[NewCode] AS [ClientCode],
[FundCode],
Convert.ToString([NetValueOtherAssets]) AS [NetValueOtherAssets],
Convert.ToString([NetValueOtherLiabilities]) AS [NetValueOtherLiabilities],
Convert.ToString([NetValueOtherAssets] - [NetValueOtherLiabilities]) AS [NetAssetsOfSeries]
FROM #FundInfoGL
CROSS JOIN #code;
// Output
OUTPUT #FundInfoGLOut
TO #outputFile
USING Outputters.Text(outputHeader : true, delimiter : '|', quoting : false);
As David points out: You cannot assign query results to scalar variables.
However, we have a dynamic partitioned output feature in private preview right now that will give you the ability to generate file names based on column values. Please contact me if you want to try it out.
You can't. Please see Convert Rowset variables to scalar value.
You may still be able to achieve your ultimate goal in a different manner. Please consider re-writing your post with clear & concise language, small dataset, expected output, and a very minimal amount of code needed to repro - remove all details and nuances that aren't necessary to create a test case.

ROracle bind range of dates

I want to send to oracle via ROracle query with bind parameters which inculde range of dates for a date column.
I try to run :
idsample <- 123
strdate <- "TO_DATE('01/02/2017','DD/MM/YYYY')"
enddate <- "TO_DATE('01/05/2017','DD/MM/YYYY')"
res <- dbGetQuery(myconn,"SELECT * FROM MYTABLE WHERE MYID = :1 AND MYDATE BETWEEN :2 AND :3", data=data.frame(MYID =idsample , MYDATE=c(strdate,enddate )))
but I get error :
"bind data does not match bind specification"
I could find no documentation which covers using more than one positional parameter, but if one parameter corresponds to a single column of a data frame, then by this logic three parameters should correspond to three columns:
idsample <- 123
strdate <- "TO_DATE('01/02/2017', 'DD/MM/YYYY')"
enddate <- "TO_DATE('01/05/2017', 'DD/MM/YYYY')"
res <- dbGetQuery(myconn,
paste0("SELECT * FROM MYTABLE WHERE MYID = :1 AND ",
"MYDATE BETWEEN TO_DATE(:2, 'DD/MM/YYYY') AND TO_DATE(:3, 'DD/MM/YYYY')"),
data=data.frame(idsample, strdate, enddate))
Note that there is nothing special about strdate and enddate from the point of view of the API, such that they should be passed as vector.
Edit:
The problem with making TO_DATE a parameter is that it will probably end up being escaped as a string. In other words, with my first approach you would end up with the following in your WHERE clause:
WHERE MYDATE BETWEEN
'TO_DATE('01/02/2017','DD/MM/YYYY')' AND 'TO_DATE('01/05/2017','DD/MM/YYYY')'
In other words, the TO_DATE function calls ends up being a string. Instead, bind the date strings only.

u-sql script to search for a string then Groupby that string and get the count of distinct files

I am quite new to u-sql, trying to solve
str1=\global\europe\Moscow\12345\File1.txt
str2=\global.bee.com\europe\Moscow\12345\File1.txt
str3=\global\europe\amsterdam\54321\File1.Rvt
str4=\global.bee.com\europe\amsterdam\12345\File1.Rvt
case1:
how do i get just "\europe\Moscow\12345\File1.txt" from the strings variable str1 & str2, i want to just take ("\europe\Moscow\12345\File1.txt") from str1 and str2 then "Groupby(\global\europe\Moscow\12345)" and take the count of distinct files from the path (""\europe\Moscow\12345\")
so the output would be something like this:
distinct_filesby_Location_Date
to solve the above case i tried the below u-sql code but not quite sure whether i am writing the right script or not:
#inArray = SELECT new SQL.ARRAY<string>(
filepath.Contains("\\europe")) AS path
FROM #t;
#filesbyloc =
SELECT [ID],
path.Trim() AS path1
FROM #inArray
CROSS APPLY
EXPLODE(path1) AS r(location);
OUTPUT #filesbyloc
TO "/Outputs/distinctfilesbylocation.tsv"
USING Outputters.Tsv();
any help would you greatly appreciated.
One approach to this is to put all the strings you want to work with in a file, eg strings.txt and save it in your U-SQL input folder. Also have a file with the cities in you want to match, eg cities.txt. Then try the following U-SQL script:
#input =
EXTRACT filepath string
FROM "/input/strings.txt"
USING Extractors.Tsv();
// Give the strings a row-number
#input =
SELECT ROW_NUMBER() OVER() AS rn,
filepath
FROM #input;
// Get the cities
#cities =
EXTRACT city string
FROM "/input/cities.txt"
USING Extractors.Tsv();
// Ensure there is a lower-case version of city for matching / joining
#cities =
SELECT city,
city.ToLower() AS lowercase_city
FROM #cities;
// Explode the filepath into separate rows
#working =
SELECT rn,
new SQL.ARRAY<string>(filepath.Split('\\')) AS pathElement
FROM #input AS i;
// Explode the filepath string, also changing to lower case
#working =
SELECT rn,
x.pathElement.ToLower() AS pathElement
FROM #working AS i
CROSS APPLY
EXPLODE(pathElement) AS x(pathElement);
// Create the output query, joining on lower case city name, display, normal case name
#output =
SELECT c.city,
COUNT( * ) AS records
FROM #working AS w
INNER JOIN
#cities AS c
ON w.pathElement == c.lowercase_city
GROUP BY c.city;
// Output the result
OUTPUT #output TO "/output/output.txt"
USING Outputters.Tsv();
//OUTPUT #working TO "/output/output2.txt"
//USING Outputters.Tsv();
My results:
HTH
Taking the liberty to format your input file as TSV file, and not knowing all the column semantics, here is a way to write your query. Please note that I made the assumptions as provided in the comments.
#d =
EXTRACT path string,
user string,
num1 int,
num2 int,
start_date string,
end_date string,
flag string,
year int,
s string,
another_date string
FROM #"\users\temp\citypaths.txt"
USING Extractors.Tsv(encoding: Encoding.Unicode);
// I assume that you have only one DateTime format culture in your file.
// If it becomes dependent on the region or city as expressed in the path, you need to add a lookup.
#d =
SELECT new SqlArray<string>(path.Split('\\')) AS steps,
DateTime.Parse(end_date, new CultureInfo("fr-FR", false)).Date.ToString("yyyy-MM-dd") AS end_date
FROM #d;
// This assumes your paths have a fixed formatting/mapping into the city
#d =
SELECT steps[4].ToLowerInvariant() AS city,
end_date
FROM #d;
#res =
SELECT city,
end_date,
COUNT( * ) AS count
FROM #d
GROUP BY city,
end_date;
OUTPUT #res
TO "/output/result.csv"
USING Outputters.Csv();
// Now let's pivot the date and count.
OUTPUT #res2
TO "/output/res2.csv"
USING Outputters.Csv();
#res2 =
SELECT city, MAP_AGG(end_date, count) AS date_count
FROM #res
GROUP BY city;
// This assumes you know exactly with dates you are looking for. Otherwise keep it in the first file representation.
#res2 =
SELECT city,
date_count["2016-11-21"]AS [2016-11-21],
date_count["2016-11-22"]AS [2016-11-22]
FROM #res2;
UPDATE AFTER RECEIVING SOME EXAMPLE DATA IN PRIVATE EMAIL:
Based on the data you sent me (after the extraction and counting of the cities that you either could do with the join as outlined in Bob's answer where you need to know your cities in advance, or with the taking the string from the city location in the path as in my example, where you do not need to know the cities in advance), you want to pivot the rowset city, count, date into the rowset date, city1, city2, ... were each row contains the date and the counts for each city.
You could easily adjust my example above by changing the calculations of #res2 in the following way:
// Now let's pivot the city and count.
#res2 = SELECT end_date, MAP_AGG(city, count) AS city_count
FROM #res
GROUP BY end_date;
// This assumes you know exactly with cities you are looking for. Otherwise keep it in the first file representation or use a script generation (see below).
#res2 =
SELECT end_date,
city_count["istanbul"]AS istanbul,
city_count["midlands"]AS midlands,
city_count["belfast"] AS belfast,
city_count["acoustics"] AS acoustics,
city_count["amsterdam"] AS amsterdam
FROM #res2;
Note that as in my example, you will need to enumerate all cities in the pivot statement by looking it up in the SQL.MAP column. If that is not known apriori, you will have to first submit a script that creates the script for you. For example, assuming your city, count, date rowset is in a file (or you could just duplicate the statements to generate the rowset in the generation script and the generated script), you could write it as the following script. Then take the result and submit it as the actual processing script.
// Get the rowset (could also be the actual calculation from the original file
#in = EXTRACT city string, count int?, date string
FROM "/users/temp/Revit_Last2Months_Results.tsv"
USING Extractors.Tsv();
// Generate the statements for the preparation of the data before the pivot
#stmts = SELECT * FROM (VALUES
( "#s1", "EXTRACT city string, count int?, date string FROM \"/users/temp/Revit_Last2Months_Results.tsv\" USING Extractors.Tsv();"),
( "#s2", "SELECT date, MAP_AGG(city, count) AS city_count FROM #s1 GROUP BY date;" )
) AS T( stmt_name, stmt);
// Now generate the statement doing the pivot
#cities = SELECT DISTINCT city FROM #in2;
#pivots =
SELECT "#s3" AS stmt_name, "SELECT date, "+String.Join(", ", ARRAY_AGG("city_count[\""+city+"\"] AS ["+city+"]"))+ " FROM #s2;" AS stmt
FROM #cities;
// Now generate the OUTPUT statement after the pivot. Note that the OUTPUT does not have a statement name.
#output =
SELECT "OUTPUT #s3 TO \"/output/pivot_gen.tsv\" USING Outputters.Tsv();" AS stmt
FROM (VALUES(1)) AS T(x);
// Now put the statements into one rowset. Note that null are ordering high in U-SQL
#result =
SELECT stmt_name, "=" AS assign, stmt FROM #stmts
UNION ALL SELECT stmt_name, "=" AS assign, stmt FROM #pivots
UNION ALL SELECT (string) null AS stmt_name, (string) null AS assign, stmt FROM #output;
// Now output the statements in order of the stmt_name
OUTPUT #result
TO "/pivot.usql"
ORDER BY stmt_name
USING Outputters.Text(delimiter:' ', quoting:false);
Now download the file and submit it.

Resources