Splitting Columns in USQL - u-sql

I am new to USQL and I am having a hard time splitting a column from the rest of my file. With my EXTRACTOR I declared 4 columns because my file is split into 4 pipes. However, I want to remove one of the columns I declared from the file. How do I do this?
The Json column of my file is what I want to split off and make you new object that does not include it. Basically splitting Date, Status, PriceNotification into the #result. This is what I have so far:
#input =
EXTRACT
Date string,
Condition string,
Price string,
Json string
FROM #in
USING Extractor.Cvs;
#result =
SELECT Json
FROM #input
OUTPUT #input
TO #out
USING Outputters.Cvs();

Maybe I have misunderstood your question, but you can simply list the columns you want in the SELECT statement, eg
#input =
EXTRACT
Date string,
Status string,
PriceNotification string,
Json string
FROM #in
USING Extractor.Text('|');
#result =
SELECT Date, Status, PriceNotification
FROM #input;
OUTPUT #result
TO #out
USING Outputters.Cvs();
NB I have switched the variable in your OUTPUT statement to be #result. If this does not answer your question, please post some sample data and expected results.

Related

How can I List unique characters in a dictionary and store them as a set?

I am trying to list unique characters in a dictionary and store them as a set. The dictionary has the following fields
ID,Name, Description, Type Price.
I need to list the unique categories in "Type" field
content=("C:\\Users\\jon.welsh\\Desktop\\ebyayproducts.json", "r")
for item in ebayproducts:
values = set([i['Type'] for i in content])
# and then I get this Error
> TypeError: string indices must be integers
Based on your example, you don't open the file - you just create a tuple content that contains 2 string values.
To open a file, parse the json you can do:
import json
with open("C:\\Users\\jon.welsh\\Desktop\\ebyayproducts.json", "r") as f_in:
content = json.load(f_in)
values = set(i["Type"] for i in content)
print(values)

How do I split this array of strings into table rows in U-SQL?

I was trying this snippet to split my json array.
activity =
//to extract json object required, only "activity" field is to be parsed and not "nameOfWebsite"
EXTRACT activities : string
FROM #input
USING Extract.Json(rowPath: "[]");
activities_arr =
SELECT
//splitting into array based on delimiter
new ARRAY<string>(activities.Split(',')) AS activities
FROM activity
;
activities_output =
SELECT activities
FROM activities_arr AS ac
CROSS APPLY EXPLODE(ac.activities) AS activities //to split above array into rows
;
Input is like this
[
{
"nameOfWebsite": "StackOverflow", // this object is not required
"activities": [
"Python",
"U-SQL",
"JavaScript"
]
}
]
So, currently I am getting output as: 5 columns with one column as some random string not in input followed by 3 blank columns and then the 5th column contains Python, U-SQL, JavaScript in separate rows.
Questions:
Is there any way to avoid the 4 other columns as I only require data 4th column ie. only the name of activities?
Why are there blank spaces in my current output when my delimiter is defined as ','?
Current output ("blank" denotes blank space and not string blank)
AB#### "blank" "blank" "blank" Python
AB#### "blank" "blank" "blank" U-SQL
AB#### "blank" "blank" "blank" JavaScript
Output expected
Python
U-SQL
JavaScript

Extract the numeric value from string in Kusto

This is my datatable:
datatable(Id:dynamic)
[
dynamic([987654321][Just Kusto Things]),
]
and I've extracted 1 field from a json using
| project ID=parse_json(Data).["CustomValue"]
And the result is something like - [987654321][Just Kusto Things]. I wanted to extract the numbered value(987654321) within the 1st square brackets. How to best retrieve that value? Using split/parse/extract?
the datatable in the sample is not valid. If the values are just an array then you can get the results by using the array position like this:
datatable(Id:dynamic)
[
dynamic([987654321,"Just Kusto Things"]),
]
| extend Id = Id[0]
If it is something else, please provide a valid datatable with an example that is representative of the real data.
the result is something like - [987654321][Just Kusto Things]. I wanted to extract the numbered value(987654321) within the 1st square brackets. How to best retrieve that value?
you can use the parse operator
For example:
print input = '[987654321][Just Kusto Things]'
| parse input with '[' output:long ']' *

Getting error with basic trim() function in U-SQL script

I want to apply .trim() function on a column but getting the error.
Sample data:
Product_ID,Product_Name
1, Office Supplies
2,Personal Care
I have to do some data manipulation but can't get the basic trim() function right.
#productlog =
EXTRACT Product_ID string,
Prduct_Name string
FROM "/Staging/Products.csv"
USING Extractors.Csv();
#output = Select Product_ID, Product_Name.trim() from #productlog;
OUTPUT #output
TO "/Output/Products.csv"
USING Outputters.Csv();
Error:
Activity U-SQL1 failed: Error Id: E_CSC_USER_SYNTAXERROR, Error Message: syntax error. Expected one of: '.' ALL ANTISEMIJOIN ANY AS BEGIN BROADCASTLEFT BROADCASTRIGHT CROSS DISTINCT EXCEPT FULL FULLCROSS GROUP HASH HAVING INDEXLOOKUP INNER INTERSECT JOIN LEFT LOOP MERGE ON OPTION ORDER OUTER OUTER UNION PAIR PIVOT PRESORT PRODUCE READONLY REQUIRED RIGHT SAMPLE SEMIJOIN SERIAL TO UNIFORM UNION UNIVERSE UNPIVOT USING WHERE WITH ';' '(' ')' ',' .
Try the below, afaik you need to alias trimmed fields
#productlog =
EXTRACT Product_ID string,
Prduct_Name string
FROM "/Staging/Products.csv"
USING Extractors.Csv();
#output = Select Product_ID, Product_Name.trim() as Trimmed_Product_Name from #productlog;
OUTPUT #output
TO "/Output/Products.csv"
USING Outputters.Csv();
Got it right finally, in case someone else face the same issue. U-SQL is more like C# so it will be a bit tricky for people like me, coming from pure SQL background.
Code:
#productlog =
EXTRACT Product_ID string,
Prduct_Name string
FROM "/Staging/Products.csv"
USING Extractors.Csv();
#output =
SELECT
T.Product_ID,
T.Prduct_Name.ToUpper().Trim() AS Prduct_Name
FROM #productlog AS T;
OUTPUT #output
TO "/Output/Products.csv"
USING Outputters.Csv();

Sorting a datatable by a column that contains part numbers and part letters

I have a datatable and I need to be able to sort either asc or desc by the jobcode column. Unfortunately the column field values contain both numbers and letters like this.
HD1233
HD12333
PG2839
TP9383
I need to extract the numbers, sort numerically and then put it back. So the above would look like this in the output.
HD1233
PG2839
TP9383
HD12333
I have a piece of code which does some sort of sort which is like this ...
Dim dtOut As DataTable = Nothing
dt.DefaultView.Sort = Convert.ToString("jobcode" & Convert.ToString(" ")) & drpAscorDesc.SelectedItem.Text
dtOut = dt.DefaultView.ToTable()
Im just unable to do it properly without the letters. Any advice would be greatly appreciated.
Add another column of type Integer to your DataTable.
Iterate through all DataRows, put values into new column based on job codes in these datarows (read value -> delete all non-numeric characters -> Int32.TryParse()).
Sort by new column.

Resources