Apache Airflow - dynamically generate number of BatchOperators (AWS) based on the number of files on AWS S3 - airflow

I have a workflow that generates the number of *.tif files and saves on S3. Then there is a function that queries the content of the directory on S3 and returns it in arrays. According to this return value, there should be created a number of BatchOperators in DAG dynamically and each of the particular arrays should be assigned to BatchOperator as env variable.
Example:
Return value of the function: [[a.tif, b.tif], [c.tif, d.tif], [e.tif]]
according to this, there should be dynamically created 3 BatchOperators, with arrays passed as env variable to them
BatchOperator1 - env var [a.tif, b.tif]
BatchOperator2 - env var [c.tif, d.tif]
BatchOperator3 - env var [e.tif]

You will want to use the .partial() and .expand() functions with the BatchOperators task. Input the constants to the partial, and then the elements to loop over in the expand() section as so:
list_to_print = ["Print", "Each", "Of", "These"]
def printList(list_to_print):
for i in list_to_print:
print(i)
task_print_list = PythonOperator.partial(
task_id='print_list',
).expand(python_callable=printList(list_to_print))
task_print_list
This will loop over each element in the list. In your case you will want to pass the function that creates the list you've mentioned as the parameter in expand(). More documentation can be seen here: https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/dynamic-task-mapping.html

Related

Is there a way to input multiple environment variables into a url for testing on Postman?

I am testing an endpoint in Postman using a url like this, {{api_url}}/stackoverflow/help/{{customer_id}}/{{client_id}}.
I have the api_url, customer_id, and client_id stored in my environment variables. I would like to test multiple customer_id and client_id without having to change the environment variables manually each time. I created a csv to store a list of customer_id and one to store client_id. When I go to run collection, it will only allow me to add one file. Is there another way to do this if I want to iterate through my tests to automate them?
You can add both customer_id & client_id in one csv file. Postman will iterate n times (n = number of csv lines, except header)
you can use postman.setNextRequest to control the flow. The below code runs the request with different values in the arr variable
url:
{{api_url}}/stackoverflow/help/{{customer_id}}/{{client_id}}
now add pre-request:
// add values for the variable in an array
const tempArraycustomer_id = pm.variables.get("tempArraycustomer_id")
const tempArrayclient_id = pm.variables.get("tempArrayclient_id")
//modify the array to the values you want
const arrcustomer_id = tempArraycustomer_id ? tempArraycustomer_id : ["value1", "value2", "value3"]
const arrclient_id = tempArrayclient_id ? tempArrayclient_id : ["value1", "value2", "value3"]
// testing variable to each value of the array and sending the request until all values are used
pm.variables.set("customer_id", arrcustomer_id.pop())
pm.variables.set("client_id", arrclient_id.pop())
pm.variables.set("tempArraycustomer_id", arrcustomer_id)
pm.variables.set("tempArrayclient_id", arrclient_id)
//end iteration when no more elements are there
if (arrcustomer_id.length !== 0) {
postman.setNextRequest(pm.info.requestName)
}

DoGet with multiple parameters not being recognized

I'm currently trying to connect a Lua Script with a GS WebApp. The connection is working but due to my lack of knowledge in GScripting I'm not sure why it isn't saving my data correctly.
In the Lua side I'm just passing in a hard-code a random name and simple numerical userid.
local HttpService = game:GetService("HttpService")
local scriptID = scriptlink
local WebApp
local function updateSpreadSheet ()
local playerData = (scriptID .. "?userid=123&name:Jhon Smith")
WebApp = HttpService:GetAsync(playerData)
end
do
updateSpreadSheet()
end
On the Google Script side i'm only saving the data on the last row and then add the value of the userid and the name.
function doGet(e) {
console.log(e)
// console.log(f)
callName(e.parameter.userid,e.parameter.name);
}
function callName(userid,name) {
// Get the last Row and add the name provided
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(sheet.getLastRow() + 1,1).setValues([userid],[name]);
}
However, the only data the script is saving is the name, bypassing the the userid for reasons I have yet to discover.
setValues() requires a 2D array and range dimensions should correspond to that array. The script is only getting 1 x 1 range and setValues argument is not a 2D array. Fix the syntax or use appendRow
sheet.getRange(sheet.getLastRow() + 1,1,1,2).setValues([[userid,name]]);
//or
sheet.appendRow([userid,name])
References:
appendRow

How do I get a value from a dictionary when the key is a value in another dictionary in Lua?

I am writing some code where I have multiple dictionaries for my data. The reason being, I have multiple core objects and multiple smaller assets and the user must be able to choose a smaller asset and have some function off in the distance run the code with the parent noted.
An example of one of the dictionaries: (I'm working in ROBLOX Lua 5.1 but the syntax for the problem should be identical)
local data = {
character = workspace.Stores.NPCs.Thom,
name = "Thom", npcId = 9,
npcDialog = workspace.Stores.NPCs.Thom.Dialog
}
local items = {
item1 = {
model = workspace.Stores.Items.Item1.Main,
npcName = "Thom",
}
}
This is my function:
local function function1(item)
if not items[item] and data[items[item[npcName]]] then return false end
end
As you can see, I try to index the dictionary using a key from another dictionary. Usually this is no problem.
local thisIsAVariable = item[item1[npcName]]
but the method I use above tries to index the data dictionary for data that is in the items dictionary.
Without a ton of local variables and clutter, is there a way to do this? I had an idea to wrap the conflicting dictionary reference in a tostring() function to separate them - would that work?
Thank you.
As I see it, your issue is that:
data[items[item[npcName]]]
is looking for data[“Thom”] ... but you do not have such a key in the data table. You have a “name” key that has a “Thom” value. You could reverse the name key and value in the data table. “Thom” = name

Frama-C-Plugin: Set value of variable in plugin

I am writing a Frama-C Plugin.
I want to develop a plugin, that sets the value of a local variable. By this idea I try to do the value-analysis afterwards, and then I can analyze the reachablility, path analysis and other things by my second plugin.
Is it possible to set the value of a local variable within a plugin (at the start of a function where I know the name)?
EDIT
I now found out how to make new local variables, how to get the Varinfo of variables and how to create new varinfos. The only missing thing is setting the variable's value.
I started with a code like this:
match kf_cil_fun with
| Cil_types.Definition(a,b) ->
let val_visitor = new value_set_visitor kf in
Visitor.visitFramacFileSameGlobals (val_visitor :> Visitor.frama_c_visitor) (Ast.get());
let old_varinfo = Cil.makeLocalVar a "x" Cil.intType in
let new_varinfo = Cil.makeVarinfo false false "x" Cil.intType in
val_visitor#doStuff old_varinfo new_varinfo;
()
| _ -> ()
where the visitor is a simple visitor with a method doStuff, and the builtin-methods vfile, vglob_aux and vstmt_aux that simply call Cil.DoChildren
method doStuff old_varinfo new_varinfo =
Cil.set_varinfo self#behavior old_varinfo new_varinfo;
Does anyone have an idea of how to set the value of x to 1 (or a fixed other value)? Am I doing the things right?

How to modify the avro key/value schema in a RDD map transformation

I'm trying to migrate some Hadoop Map Reduce code to Spark and I have doubts about how to manage map and reduce transformations when the schema of either the key or value change from input to output.
I have avro files with Indicator records that I want to process somehow. I already have this code that works:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[Indicator]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, Indicator.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val indicatorsRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[Indicator]],
classOf[AvroKey[Indicator]],
classOf[NullWritable])
val myRecordOnlyRdd = indicatorsRdd.map(x => (doSomethingWith(x._1), NullWritable.get)
val indicatorPairRDD = new PairRDDFunctions(myRecordOnlyRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)
But this code works since the schema of the input and ouput keys does not change, is always Indicator. In hadoop Map Reduce you can define a map or reduce functions and modify the schema from input to output. In fact, I have map functions which process every Indicator record and generates a new record SoporteCartera. How can I do this in spark? It is possible from the same RDD or I have to define 2 different RDDs and pass from one to another somehow?
Thanks for your help.
To answer my own question... the problem was that you cannot change the RDD type, you must define a different RDD, so I solved it with the above code:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[SoporteCartera]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, SoporteCartera.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val soporteCarteraRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[SoporteCartera]],
classOf[AvroKey[SoporteCartera]],
classOf[NullWritable])
val indicatorsRdd = soporteCarteraRdd.map(x => (fromSoporteCarteraToIndicator(x._1), NullWritable.get))
val indicatorPairRDD = new PairRDDFunctions(indicatorsRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)

Resources