How to merge files for N number of days using U-SQL - u-sql

I want to merge the data from 7 days of files and perform some operation on them using U-SQL.
The folder structure on ADLS - /sample/data/YYYY/MM/DD.1.csv
For e.g. Today is 03/01/2018 (DD/MM/YYYY) then I want to pick the data from 27/12/2017 to 02/01/2018.
In U-SQL how to achieve this?

you should use virtual columns in your file path.
For example:
DECLARE #path = "/sample/data/{FileDate:yyyy}/{FileDate:MM}/{FileDate:dd}.1.csv
DECLARE #startDate = new DateTime(2017,12,27);
DECLARE #endDate = new DateTime(2018,01,02);
#data = EXTRACT
column1 string,
column2 string,
FileDate DateTime
from #path
USING Extractors.Csv(); //or which extractor you are using
OUTPUT(
SELECT * FROM #data
WHERE FileDate BETWEEN #startDate AND #endDate)
TO "/sample/data/appended.csv
USING Outputters.Text(delimiter : ';', outputHeader : true); //output this in your csv file
This code should extract you all data in your date range.
P.S. I am not sure just if i wrote good file name, check that.
Hope this will help you. Here you can find documentation about virtual columns.

Related

Splitting Columns in USQL

I am new to USQL and I am having a hard time splitting a column from the rest of my file. With my EXTRACTOR I declared 4 columns because my file is split into 4 pipes. However, I want to remove one of the columns I declared from the file. How do I do this?
The Json column of my file is what I want to split off and make you new object that does not include it. Basically splitting Date, Status, PriceNotification into the #result. This is what I have so far:
#input =
EXTRACT
Date string,
Condition string,
Price string,
Json string
FROM #in
USING Extractor.Cvs;
#result =
SELECT Json
FROM #input
OUTPUT #input
TO #out
USING Outputters.Cvs();
Maybe I have misunderstood your question, but you can simply list the columns you want in the SELECT statement, eg
#input =
EXTRACT
Date string,
Status string,
PriceNotification string,
Json string
FROM #in
USING Extractor.Text('|');
#result =
SELECT Date, Status, PriceNotification
FROM #input;
OUTPUT #result
TO #out
USING Outputters.Cvs();
NB I have switched the variable in your OUTPUT statement to be #result. If this does not answer your question, please post some sample data and expected results.

Writing SQL to U-SQL Query

Can anybody please guide me in writing this below SQL in U-SQL language used in Azure Data Lake
select tt.userId, count(tt.userId) from (SELECT userId,count(userId) as cou
FROM [dbo].[users]
where createdTime> DATEADD(wk,-1,GETDATE())
group by userId,DATEPART(minute,createdTime)/5) tt group by tt.userId
I don't find the DATEPART function in U-SQL . Azure Data Analytic job is giving me error.
U-SQL does not provide T-SQL intrinsic functions except for a few (like LIKE). See https://msdn.microsoft.com/en-us/library/azure/mt621343.aspx for a list.
So how do you do DateTime operations? You just use the C# functions and methods!
So DATEADD(wk, -1, GETDATE()) is something like DateTime.Now.AddDays(-7)
and
DATEPART(minute,createdTime)/5 (there is an extra ) in your line) is something like createdTime.Minute/5 (maybe you need to cast it to a double if you want non-integer value).
For anybody who is looking for the implementation mentioned by Michael. It's like below
#records =
EXTRACT userId string,
createdTime DateTime
FROM "/datalake/input/data.tsv"
USING Extractors.Tsv();
#result =
SELECT
userId,
COUNT(createdTime) AS userCount
FROM #records
WHERE createdTime > DateTime.Now.AddDays(-30)
GROUP BY userId,createdTime.Minute/5;
#result2= SELECT userId,COUNT(userId) AS TotalCount
FROM #result
GROUP BY userId;
OUTPUT #result2
TO "/datalake/output/data.csv"
USING Outputters.Csv();

Compare date part of datetime column with NamedQuery

I have a table containing entries with date and time. I try to create a NamedQuery which only compares the date part.
#NamedQuery(name = "Vote.findForDate", query = "SELECT v FROM Vote v WHERE v.createdAt = :date")
...
createNamedQuery("Vote.findForDate").setParameter("date", date, TemporalType.DATE).getResultList();
But it seems that it always tries to compare the whole datetime.
Is there no way without using date() function in SQL?
I try to be independent from the database. For example h2 has no date() function.
One possible solution is the usage of date as column type and reduce the information.
#Column(name = "voteDate")
#Temporal(value = TemporalType.DATE)
private Date voteDate;

Compare local DateTime to DateTimeOffset

I have a database with a DateTime column, and want to compare it to some DateTimeOffset. The DateTime column is in (server) local time. There is nothing I can do about that apart from rant about the WTF'yness of it, which I like to do frequently and vehemently, so I can skip it here for now.
I'd like to be able to select all rows from the table where DateTime x happens before my DateTimeOffset. I can live with assuming that if it is DST on the server now, the DateTime column is also in DST and the other way around. These times will be localised around 'now', with the DateTime (almost) always being in the past few hours. I'll take the DST change on the chin when it happens, but the solution shouldn't be always wrong in either DST or non-DST.
As a bonus requirement, the table is large, so performing any manipulation on the DateTime is out of the question. What it comes down to I suppose is I'm looking for the equivalent of
DECLARE #dto datetimeoffset(4) = '12-10-25 12:32:10 +06:00';
DECLARE #dt datetime = --this is where the magic happens?
SELECT * FROM MyTable WHERE datetimecol < #dt
could anyone help me making the magic happen?
You'll need a SQL CLR function to do this. Inspired by this post, I wrote the following:
using System;
using Microsoft.SqlServer.Server;
public class UserDefinedFunctions
{
[SqlFunction(IsDeterministic = false)]
public static DateTime? ConvertDateTimeOffsetToLocalDateTime(DateTimeOffset? dto)
{
if (!dto.HasValue)
return null;
return dto.Value.ToLocalTime().DateTime;
}
}
You can compile this yourself if you like, or just deploy it using the following, which contains the serialized binary and SQL deployment script:
EXEC sp_configure 'clr enabled', 1;
GO
RECONFIGURE
GO
CREATE ASSEMBLY [SqlClrDateTime]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C0103004F68E4520000000000000000E00002210B010B00000800000006000000000000EE260000002000000040000000000010002000000002000004000000000000000400000000000000008000000002000000000000030040850000100000100000000010000010000000000000100000000000000000000000942600005700000000400000C002000000000000000000000000000000000000006000000C0000005C2500001C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000F4060000002000000008000000020000000000000000000000000000200000602E72737263000000C00200000040000000040000000A0000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000000E00000000000000000000000000004000004200000000000000000000000000000000D026000000000000480000000200050094200000C804000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001330010030000000010000110F00280600000A2D0A1200FE150200001B062A0F00280700000A0B1201280800000A0C1202280900000A730A00000A2A1E02280B00000A2A42534A4201000100000000000C00000076342E302E33303331390000000005006C00000070010000237E0000DC010000E001000023537472696E677300000000BC0300000800000023555300C4030000100000002347554944000000D4030000F400000023426C6F620000000000000002000001471502080900000000FA253300160000010000000A0000000200000002000000010000000B000000050000000100000002000000010000000200000000000A00010000000000060042003B00060049003B00060054003B0006005D003B000600B5009B000600E100CE001B00F50000000600240104010600440104010A0098017D010000000001000000000001000100010010001D00000005000100010050200000000096006C000A0001008C20000000008618910019000200000001009700290091001D003100910022004100910028004900910019005100910019000C00AD014C000C00BA0157002100C4015C002100D001610014009100660009009100190020002B002D002E001300C1002E000B0079002E001B00CA002E002300D3006C0045005000048000000000000000000000000000000000620100000400000000000000000000000100320000000000040000000000000000000000010071010000000000000000003C4D6F64756C653E0053716C436C724461746554696D652E646C6C0055736572446566696E656446756E6374696F6E73006D73636F726C69620053797374656D004F626A656374004E756C6C61626C656031004461746554696D65004461746554696D654F666673657400436F6E766572744461746554696D654F6666736574546F4C6F63616C4461746554696D65002E63746F720064746F0053797374656D2E52756E74696D652E56657273696F6E696E67005461726765744672616D65776F726B4174747269627574650053797374656D2E446961676E6F73746963730044656275676761626C6541747472696275746500446562756767696E674D6F6465730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C6974794174747269627574650053716C436C724461746554696D650053797374656D2E44617461004D6963726F736F66742E53716C5365727665722E5365727665720053716C46756E6374696F6E417474726962757465006765745F48617356616C7565006765745F56616C756500546F4C6F63616C54696D65006765745F4461746554696D650000000000032000000000004BEC2CC7757F1E49B9400968748F57020008B77A5C561934E0890E000115110901110D15110901111103200001042001010E05200101111D0420010108170100010054020F497344657465726D696E69737469630006151109011111032000020615110901110D04200013000420001111042000110D0520010113000C070315110901110D111111114701001A2E4E45544672616D65776F726B2C56657273696F6E3D76342E300100540E144672616D65776F726B446973706C61794E616D65102E4E4554204672616D65776F726B20340801000200000000000801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010000000000004F68E45200000000020000001C010000782500007807000052534453DABD35B6971B914293249E1EC9335A4E01000000633A5C4465765C53716C436C724461746554696D655C53716C436C724461746554696D655C6F626A5C52656C656173655C53716C436C724461746554696D652E70646200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000BC2600000000000000000000DE260000002000000000000000000000000000000000000000000000D02600000000000000000000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100100000001800008000000000000000000000000000000100010000003000008000000000000000000000000000000100000000004800000058400000640200000000000000000000640234000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000000000000000000000000000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004C4010000010053007400720069006E006700460069006C00650049006E0066006F000000A001000001003000300030003000300034006200300000002C0002000100460069006C0065004400650073006300720069007000740069006F006E000000000020000000300008000100460069006C006500560065007200730069006F006E000000000030002E0030002E0030002E003000000048001300010049006E007400650072006E0061006C004E0061006D0065000000530071006C0043006C0072004400610074006500540069006D0065002E0064006C006C00000000002800020001004C006500670061006C0043006F0070007900720069006700680074000000200000005000130001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530071006C0043006C0072004400610074006500540069006D0065002E0064006C006C0000000000340008000100500072006F006400750063007400560065007200730069006F006E00000030002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000030002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000F03600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = SAFE
GO
CREATE FUNCTION dbo.ConvertDateTimeOffsetToLocalDateTime(#dto datetimeoffset)
RETURNS datetime2
AS EXTERNAL NAME SqlClrDateTime.UserDefinedFunctions.ConvertDateTimeOffsetToLocalDateTime
GO
And here is how you can use it in your scenario:
DECLARE #dto datetimeoffset(4) = '12-10-25 12:32:10 +06:00'
DECLARE #dt datetime = dbo.ConvertDateTimeOffsetToLocalDateTime(#dto)
SELECT * FROM MyTable WHERE datetimecol < #dt

How to search my database by using an array of words?

I'm attempting to setup a search function from a string a user types. (ex: "John Doe" or "Doe, John")
I was thinking I would use Replace(SearchString, ",", "") to get rid of the commas the user might enter, and then use Split(SearchString, " ") to get all the words into an array. Once they're in the array I would execute a Stored Procedure on each of the terms and build a DataTable with the results.
Below is what I'm wanting to use for executing my stored procedure.
oCommand = DataAccess.GetSQLCommand("MyStoredProcedure", CommandType.StoredProcedure, SourceServer.ConnectionLocal)
oCommand.Parameters.AddWithValue("#MySearchString", SearchString)
oAdapter = New SqlDataAdapter(oCommand)
oAdapter.Fill(MyDataTable)
Now I'm thinking the "SearchString" I will assign while looping through my array of words... but this doesn't seem like the right way to do this. Maybe it is but I don't know how to append my next result to the previous DataTable either.
There are some great ideas for using arrays and Lists in SQL Server on this page - http://www.sommarskog.se/arrays-in-sql-2005.html
I personally find the XML method the most useful;
http://www.sommarskog.se/arrays-in-sql-2005.html#XML
An example of how I've used this in the past is;
DECLARE #indata nvarchar(max)
DECLARE #hDoc int
SET #indata = '
<ROOT>
<SearchTerm code="Test search term"></SearchTerm>
<SearchTerm code="Other search term"></SearchTerm>
<SearchTerm code="Next search term"></SearchTerm>
</ROOT>'
CREATE TABLE #searchTerm (
code varchar(40)
)
EXEC sp_xml_preparedocument #hDoc OUTPUT, #indata
INSERT Into #searchTerm
SELECT code
FROM OPENXML(#hDoc, '/ROOT/SearchTerm',1)
WITH (code varchar(50))
EXEC sp_xml_removedocument #hDoc
-- Use the data in #searchTerm as needed in your query
SELECT * FROM MyTable WHERE searchValue IN (SELECT code FROM #searchTerm)
DROP TABLE #searchTerm
Try passing in the comma separated values in as a single string of nvarchar. Then use the SELECT FROM WHERE IN structure within your stored procedure. Create the sql command wwithin your stored procedure by concatenation and then call EXEC #sql
Declare #sql =
'SELECT * FROM tbl
WHERE person IN(' + #Application + ')'
exec(#sql)
Beware, if your list of search criteria is large this may not be the best solution for you.
Note that the commas should be between the full names not the first name and last name.

Resources