I am getting data from a single column in a datatable. I need it to be combine to a string separated by comma or any delimiter.
The end result should be a string instead of the tabular data.
let words = datatable(word:string, code:string) [
"apple","A",
"orange","B",
"grapes","C"
];
words | project word;
I need to combine the result combined into a string with a delimiter.
Result should be : "apple,orange,grapes"
try combining strcat_array() with summarize make_list() as follows:
let words = datatable(word:string, code:string) [
"apple","A",
"orange","B",
"grapes","C"
];
words
| summarize result = strcat_array(make_list(word), ",")
this returns a single table with a single string column, whose value is: apple,orange,grapes
Related
The dataset (table) I'm querying has a column containing a JSON string array.
I have a fixed list of verbs which I need to check against each entry in the table and find those, where at least one of the items in the JSON list starts with one of the verbs from the fixed list.
// Verbs to look for (actual list is longer).
let verbs = datatable (verb : string) [
"discover",
"gain"
];
// Data. Second column is a JSON string.
let data = datatable(id : int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
];
// Query: I need to know if at least one of the items in the "json" column starts
// with one of the verbs of the "verbs" list.
data
| extend parsedJson = parse_json(json)
| extend OneOrMoreListItemsHaveVerb = false
| project id, OneOrMoreListItemsHaveVerb
I tried to use mv_apply() but failed because I'm dealing with two lists/arrays compared against each other, not one array and one item.
For the example data above, I expect items with IDs 1 and 3 to be returned. The first element of item 1 has "discover" and the 2nd element of item 3 starts with "gain".
you could create an array from your input table (e.g. using summarize make_set()), then loop over it using mv-apply foreach of the inputs.
for example:
let verbs = datatable (verb: string) [
"discover", "gain"
]
;
let verbs_list = toscalar(verbs | summarize make_set(verb))
;
let data = datatable(id: int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
data
| mv-apply verb = verbs_list on (
mv-apply input = parse_json(json) on (
where input startswith verb
)
)
| project ['id'], json
id
json
1
["Discover me", "some text"]
3
["first element fine", "gain power isn't ok"]
alternatively, you can implement similar logic using the partition operator:
let verbs = datatable (verb: string) [
"discover", "gain"
]
;
let data = datatable(id: int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
verbs
| partition by verb
{
data
| mv-apply input = parse_json(json) on(
where input startswith verb
)
| project ['id'], json
}
id
json
1
["Discover me", "some text"]
3
["first element fine", "gain power isn't ok"]
I need to read a 10GB fixed width file to a dataframe. How can I do it using Spark in R?
Suppose my text data is the following:
text <- c("0001BRAjonh ",
"0002USAmarina ",
"0003GBPcharles")
I want the 4 first characters to be associated to the column "ID" of a data frame; from character 5-7 would be associated to a column "Country"; and from character 8-14 to be associated to a column "Name"
I would use function read.fwf if the dataset was small, but that is not the case.
I can read the file as a text file using sparklyr::spark_read_text function. But I don't know how to attribute the values of the file to a data frame properly.
EDIT: Forgot to say substring starts at 1 and array starts at 0, because reasons.
Going through and adding the code I talked about in the column above.
The process is dynamic and is based off a Hive table called Input_Table. The table has 5 columns: Table_Name, Column_Name, Column_Ordinal_Position, Column_Start, and Column_Length. It is external so any user can change, drop, and remove any file into the folder location. I quickly built this from scratch to not take actual code, does everything make sense?
#Call Input DataFrame and the Hive Table. For hive table we make sure to only take correct column as well as the columns in correct order.
val inputDF = spark.read.format(recordFormat).option("header","false").load(folderLocation + "/" + tableName + "." + tableFormat).rdd.toDF("Odd_Long_Name")
val inputSchemaDF = spark.sql("select * from Input_Table where Table_Name = '" + tableName + "'").sort($"Column_Ordinal_Position")
#Build all the arrays from the columns, rdd to map to collect changes a dataframe col to a array of strings. In this format I can iterator through the column.
val columnNameArray = inputSchemaDF.selectExpr("Column_Name").rdd.map(x=>x.mkString).collect
val columnStartArray = inputSchemaDF.selectExpr("Column_Start_Position").rdd.map(x=>x.mkString).collect
val columnLengthArray = inputSchemaDF.selectExpr("Column_Length").rdd.map(x=>x.mkString).collect
#Make the iteraros as well as other variables that are meant to be overwritten
var columnAllocationIterator = 1
var localCommand = ""
var commandArray = Array("")
#Loop as there are as many columns in input table
while (columnAllocationIterator <= columnNameArray.length) {
#overwrite the string command with the new command, thought odd long name was too accurate to not place into the code
localCommand = "substring(Odd_Long_Name, " + columnStartArray(columnAllocationIterator-1) + ", " + columnLengthArray(columnAllocationIterator-1) + ") as " + columnNameArray(columnAllocationIterator-1)
#If the code is running the first time it overwrites the command array, else it just appends
if (columnAllocationIterator==1) {
commandArray = Array(localCommand)
} else {
commandArray = commandArray ++ Array(localCommand)
}
#I really like iterating my iterators like this
columnAllocationIterator = columnAllocationIterator + 1
}
#Run all elements of the string array indepently against the table
val finalDF = inputDF.selectExpr(commandArray:_*)
I need to extract data from a file, see code below.
#rows =
EXTRACT booking_date string,
route string,
channel string,
pos string,
venta string,
flight_date string,
ancillary string,
paxs int?
FROM "/ventas/ventas1.csv"
USING Extractors.Text(delimiter:';', silent:true);
#output =
SELECT booking_date,
channel,
Convert.ToDouble(venta.Replace(",", ".")) AS venta,
paxs
FROM #rows;
My problem is that the numbers are in Spanish format, meaning "100,234" instead of "100.234".
Does anyone know how to change the format in Extractors.Text, or how transform strings in integers in U-SQL?
Import the column as a string and replace the full stops, eg something like this:
#input =
EXTRACT venta string,
paxs int
FROM #inputFile
USING Extractors.Text(delimiter : ';');
#output =
SELECT Convert.ToInt32(venta.Replace(".", "")) AS venta
FROM #input;
In u-sql script I must extract a variable from file to a dataset and then use it to form a name of output file. How can I get the variable from the dataset?
In details.
I have 2 input files: csv file with a set of fields and a dictionary file. The 1st file has file name like ****ClintCode*****.csv. The 2nd file-dictionary has 2 fields with mapping: ClientCode - ClintCode2. My task is extract ClientCode value from the file name, get ClientCode2 from the dictionary, insert it as a field to output file (implemented), and, moreover, form the name of output file as ****ClientCode2****.csv.
Dictionary csv file has the content:
OldCode NewCode
6HAA Alfa
CCVV Beta
CVXX gamma
? Davis
The question is how to get ClientCode2 into scalar variable to write an expression for the output file?
DECLARE #inputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ESP_FAD_GL_6HAA_20170930.txt"; // '6HAA' is ClientCode here that mapped to other code in ClientCode_KVP.csv
DECLARE #outputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ClientCode_sftp_" + // 'ClientCode' should be replaced with ClientCode from mapping in ClientCode_KVP.csv
DateTime.Now.ToString("yyyymmdd") + "_" +
DateTime.Now.ToString("HHmmss") + ".csv";
DECLARE #dictionaryFile string = "D:/DFS_SSC_Automation/ClientCode_KVP.csv";
#dict =
EXTRACT [OldCode] string,
[NewCode] string
FROM #dictionaryFile
USING Extractors.Text(skipFirstNRows : 1, delimiter : ',');
#theCode =
SELECT Path.GetFileNameWithoutExtension(#inputFile).IndexOf([OldCode]) >= 0 ? 1 : 3 AS [CodeExists],
[NewCode]
FROM #dict
UNION
SELECT *
FROM(
VALUES
(
2,
""
)) AS t([CodeExists],[NewCode]);
#code =
SELECT [NewCode]
FROM #theCode
ORDER BY [CodeExists]
FETCH 1 ROWS;
#GLdata =
EXTRACT [ASAT] string,
[ASOF] string,
[BASIS_INDICATOR] string,
[CALENDAR_DATE] string,
[CR_EOP_AMOUNT] string,
[DR_EOP_AMOUNT] string,
[FUND_ID] string,
[GL_ACCT_TYPE_IND] string,
[TRANS_CLIENT_FUND_NUM] string
FROM #inputFile
USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);
// Prepare output dataset
#FundInfoGL =
SELECT "" AS [AccountPeriodEnd],
"" AS [ClientCode],
[FUND_ID] AS [FundCode],
SUM(GL_ACCT_TYPE_IND == "A"? System.Convert.ToDecimal(DR_EOP_AMOUNT) : 0) AS [NetValueOtherAssets],
SUM(GL_ACCT_TYPE_IND == "L"? System.Convert.ToDecimal(CR_EOP_AMOUNT) : 0) AS [NetValueOtherLiabilities],
0.0000 AS [NetAssetsOfSeries]
FROM #GLdata
GROUP BY FUND_ID;
// NetAssetsOfSeries calculation
#FundInfoGLOut =
SELECT [AccountPeriodEnd],
[NewCode] AS [ClientCode],
[FundCode],
Convert.ToString([NetValueOtherAssets]) AS [NetValueOtherAssets],
Convert.ToString([NetValueOtherLiabilities]) AS [NetValueOtherLiabilities],
Convert.ToString([NetValueOtherAssets] - [NetValueOtherLiabilities]) AS [NetAssetsOfSeries]
FROM #FundInfoGL
CROSS JOIN #code;
// Output
OUTPUT #FundInfoGLOut
TO #outputFile
USING Outputters.Text(outputHeader : true, delimiter : '|', quoting : false);
As David points out: You cannot assign query results to scalar variables.
However, we have a dynamic partitioned output feature in private preview right now that will give you the ability to generate file names based on column values. Please contact me if you want to try it out.
You can't. Please see Convert Rowset variables to scalar value.
You may still be able to achieve your ultimate goal in a different manner. Please consider re-writing your post with clear & concise language, small dataset, expected output, and a very minimal amount of code needed to repro - remove all details and nuances that aren't necessary to create a test case.
I am looking for a method to return the results of an SQLite query as a single string for use in an internal trigger.
Something like Python's 'somestring'.join() method.
Table: foo
id | name
1 | "foo"
2 | "bar"
3 | "bro"
Then a select statement:
MAGIC_STRING_CONCAT_FUNCTION(SELECT id FROM foo,",");
To return
"1,2,3"
You're looking for the group_concat function:
group_concat((SELECT id FROM foo), ",");
The following is the description of the function group_concat from the documentation:
group_concat(X) group_concat(X,Y)
The group_concat() function returns a string which is the
concatenation of all non-NULL values of X. If parameter Y is present
then it is used as the separator between instances of X. A comma (",")
is used as the separator if Y is omitted. The order of the
concatenated elements is arbitrary.