U-SQL get filename of input and use for output - u-sql

I have a filename of test.csv and I want the output to be test.txt.
I can extract the filename of the input but don't know how to use it for the output?
OUTPUT #result TO "/output/{filename}.txt"
USING Outputters.Text(outputHeader:false, quoting:false);
The filename is in the #result.
This feature isn't supported as of yet.
Does anyone have a work around?
U-SQL How can I get the current filename being processed to add to my extract output?
Ideally I would like dd-mm-yy-test.text?
How do I append the day month and year?
I am using USQL for this.
Thanks

Let me address both issues you're laying out in this question:
To use the same output name as the input, there would have to be a way to access rowset values into u-sql variables which I'm pretty sure cannot be done, taking into account that the language is built around the necessity to process many files at once.
To append a date into the output you would only need to declare the current datetime at some point and then use it to write the output file name like this:
DECLARE #now DateTime = DateTime.Now;
OUTPUT #output TO "/tests/output/" + #now.ToString("dd-MM-yyyy") + "-output.csv" USING Outputters.Csv();

Related

How to replace comma with a dot in GTM for JSON structured data?

I am noob with structured data implementation and don't have any code knowledge.
I have been looking for a week how to solve a warning with price in Google structured data testing tool.
My prices are with a comma which is not accepted by Google.
By checking the http://schema.org/price it tells me that "Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator."
I have a CSS variable element #PdtPrixRef named in a variable "Product-price" with a comma "12.5" but I can't find how to replace it in my structured data with the value "12.5"... Someone to help me?
Hereafter my actual script :
My actual GTM script
Should I add something to my script or making an VARIABLE (Custom Js)?
I think it's something like
value.replace(",", ".")
But I do't know how to write the full proper function from beginning to end...
Yes you can just create a Custom JavaScript Variable
Here is the code
function(){
var price = {{Product-price}};
return price.replace("," , ".");
}
Then using this variable to your JSON-LD script.

Is there a way to parse out information from a xcom_pull in Airflow?

So what I'm working with is I have a DAG that has specific information that is being passed through tasks, everything is working as it should. The file needs to be stored into a reports/ folder for the following tasks to work correctly. I'm calling the actual name of the report through a xcom_pull but I also want to parse out information from this xcom_pull in order to capture the unique filename itself to use later on in other tasks. I have a task later on that inserts this filename into the csv file, but I need it to match the filename itself so its a 1:1 match.
I want to parse out information of a xcom_pull option and I'm having issues doing so. The example I have is below:
report_filename = "reports/{}_{}".format('report_example', str(uuid.uuid1()))
get_report = GoogleCampaignManagerDownloadReportOperator(
task_id="get_report",
profile_id=1234,
api_version=1234,
bucket_name=test_bucket,
report_name=report_filename,
report_id=report_id,
file_id=file_id,
)
report_filename_test = xcom_pull(get_report, 'report_name')
sanitize_report = SanitizeReportOperator(
task_id='sanitize_report',
dest_bucket=test_bucket,
dest_object=report_filename_test,
shared_object=str(report_filename_test).replace('reports/', ''),
append_timestamp=True,
append_filename=True
)
As of right now the xcom_pull pulls down the following:
reports/report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
However, I want to have another xcom_pull that will only pull the following:
report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
I have tried converting report_filename_test to a string and using the replace function, so for example:
new_test = str(report_filename_test).replace('reports/', '')
But when attempting this, it makes the new_test converting into a NULL format or ignores it completely and saves the file later on into a reports/ folder.
I have also tried passing the report_filename into a list and grabbing the first iteration and grabbing the first iteration, but with how Airflow works from task to task, it creates a new filename with a different uuid each time, which is not what I'm aiming to have done. I have also tried doing a PythonOperator option to create a function specifically to name the file and be called later on throughout the DAG but have not had any luck with this either.
Is there a way to do this where you can parse out the information from a xcom_pull or another way to make this work? The end goal is to essentially have a file name with a specific uuid that I can pass through into the csv file and rename the file to the same specific uuid that is being built without the folder name in front.
I'm just looking to have a unique filename be passed through multiple tasks that is the exact same each time with a uuid format. I'm running out of ideas of how to make this work and have been stuck on this for almost two weeks now.
Any help with this would be greatly appreciated!

UPDATE variable and OUTPUT TO filepath

I'm trying to ask a user to insert a filepath and then output the result to that filepath:
DEFINE VARIABLE outputPath AS CHARACTER FORMAT "x(50)".
UPDATE outputPath.
OUTPUT TO outputPath.
This doesn't seem to be working. But when I do for example:
OUTPUT TO "C:\temp\test.txt".
It seems to work.
To use the value of a variable in an OUTPUT statement:
OUTPUT TO VALUE( outputPath ).
VALUE is also used with INPUT FROM, INPUT THROUGH and INPUT-OUTPUT THROUGH.
(A "naked" variable name will be treated as a file name, no quotes needed -- a result of one of those "makes a good demo" decisions 30 years ago...)

U-SQL How can I get the current filename being processed to add to my extract output?

I need to add meta data about the Row being processed. I need the filename to be added as a column. I looked at the ambulance demos in the Git repo, but can't figure out how to implement this.
You use a feature of U-SQL called 'file sets' and 'virtual columns'. In my simple example, I have two files in my input directory, I use file sets and refer to the virtual columns in the EXTRACT statement, eg
// Filesets, file set with virtual column
#q =
EXTRACT rowId int,
filename string,
extension string
FROM "/input/filesets example/{filename}.{extension}"
USING Extractors.Tsv();
#output =
SELECT filename,
extension,
COUNT( * ) AS records
FROM #q
GROUP BY filename,
extension;
OUTPUT #output TO "/output/output.csv"
USING Outputters.Csv();
My results:
Read more about both features here:
https://msdn.microsoft.com/en-us/library/azure/mt621320.aspx

Exporting SAS DataSet on to UNIX as a text file....with delimiter '~|~'

I'm trying to export a SAS data set on to UNIX folder as a text file with delimiter as '~|~'.
Here is the code I'm using....
PROC EXPORT DATA=Exp_TXT
OUTFILE="/fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt"
DBMS=DLM REPLACE;
DELIMITER="~|~";
PUTNAMES=YES;
RUN;
Here is the output I'm getting on UNIX.....Missing part of delimiter in the data but getting whole delimiter in variable names....
Num~|~Name~|~Age
1~A~10
2~B~11
3~C~12
Any idea why I'm getting part of delimiter in the data only????
Thanks,
Sam.
My guess is that PROC EXPORT does not support using multiple character delimiters. Normally, column delimiters are just a single character. So, you will probably need to write your own code to do this.
PROC EXPORT for delimited files generates plain old SAS code that is then executed. You should see the code in the SAS log, from where you can grab it and alter it as needed.
Please see my answer to this other question for a SAS macro that might help you. You cannot use it exactly as written, but it should help you create a version that meets your needs.
The problem is referenced on the SAS manual page for the FILE statement
http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n15o12lpyoe4gfn1y1vcp6xs6966.htm
Restriction:Even though a character string or character variable is accepted, only the first character of the string or variable is used as the output delimiter. The FILE DLM= processing differs from INFILE DELIMITER= processing.
However, there is (as of some version, anyhow) a new statement, DLMSTR. Unfortunately you can't use DLMSTR in PROC EXPORT, but if you can't easily write the variables out, you can generate the log from a PROC EXPORT and paste it into your program and modify DELIMITER to DLMSTR. You could even dynamically do so - use PROC PRINTTO to generate a file with the log, then read in that file, parse out the line numbers and the non-code, change DELIMITER to DLMSTR, and %include the code.
Since you are using unix, why not make use of unix tools to fix this?
You can call the unix command from your sas program with the X statement:
http://support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/viewer.htm#xcomm.htm
after your export, use sed to fix the file
PROC EXPORT DATA=Exp_TXT
OUTFILE="/fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt"
DBMS=DLM REPLACE;
DELIMITER="~";
PUTNAMES=YES;
RUN;
X sed 's/~/~|~/g' /fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt > /fbrms01/dev/projects/tadis003/Export_txt_OF_New_v2.txt ;
It might take tweaking depending on your unix, but this works on AIX. Some versions of sed can use the -i flag to edit in place so you don't have to type out the filename twice.
It is a much simpler and easier single-line solution than a big macro.

Resources