tldr - Task Scheduler, Edit Action, Argument = (open excel without app selection, add date-time to cell, close Excel)
I'm new to using Task Scheduler and am looking to add a date-time stamp into the opened Excel document.
It can be very simple, it just needs to be in the next cell/row and close.
It also seems there is a difference in an application opening through Task Scheduler because it will ALWAYS ask what to open it with(yes, default extensions/apps in Settings/file set).
So it seems even with an argument, it will still be stopped? Is there a way around this is as well? I have modified permissions on the file, but that doesn't seem to work.
You can use a.txt if it is easier.
Thank you all you wonderful community!
Related
I am working with a Test Case that takes input from an excel file in Tosca. I'm using a Template Instance so the data from the excel gets loaded and I can use it in my test case. I know I can reinstantiate the Instance by clicking on the Reinstantiate button, but I'm running the Test Case from an external source, so I can't go to Tosca and click the button every time I need to update the input (the data of the excel input is different every time I run it).
Is there a way to make Tosca automatically Reinstantiate the Template Instance every time I run it?
No, unforunately there is NO way to do it automatically. It is not even possible without a great effort to (re)instantiate several templates at once.
I am trying to solve the following problem with airflow:
I have a data pipeline where I want to run several processes on a number of excel documents (eg: 5,000 excel files a day). My idea for a DAG is below:
Task 1 = Take an excel file, and adds a new sheet to it.
Task 2 = Convert this returned excel to a PDF.
Task 1 and 2 in the DAG would call a processing tool running outside airflow via an API call (So the actual data processing isnt happening inside airflow).
I seem to be going around in circles with figuring out the best approach to this workflow. Some questions I keep having are:
Should each DagRun be one excel, or should the DagRun take in a batch
of excels?
If taking in a batch (which I presume is the correct approach), what is the recommend batch amount?
How would I pass the returned values from task 1 to task 2. Would it be an XCOM dictionary with a reference to each newly saved excel? I read somewhere that the max size of an xcom should be 48kb. So if i have a XCOM of 5,000 excel filepaths, that will probabaly be larger than 48kb.
The last, most tricky question I have is, I would obviously want to start processing task 2 as soon as even 1 excel from Task 1 had completed, because i wouldnt want to wait for the entire batch of Task 1 to complete before starting Task 2. How can I run Task 2, multiple times within the same DagRun for each new result that Task 1 produces? Or should Task 2 be its own DAG?
Am I approaching this problem the right way? How should I be tackling this problem?
Assumptions
I made some assumptions since I don't know all the details of the Excel file processing:
You cannot merge the Excel files since you need them separate.
Excel files are accessible from Airflow DAG (same filesystem or similar).
If something of that is not true, please clarify accordingly.
Answers
That being said, I'll first answer your questions and then comment on some thoughts:
I think you can do in batches, since using one run per file will be very slow (because of the scheduler time mostly, that will add time between Excel files processing). You're also not using all the available resources, so better push Airflow to be more busy.
The batch amount will depend on the processing load and the task design. From your question I assume you're thinking about having the batch inside the task, but if the service that process the Excel files could handle good parallelism, I'd rather recommend one task per Excel file. Having 5000 tasks (one for each file) will be a bad idea (because that'll be difficult so see in the UI), but the exact number of processes per batch depends on your resources and service SLA mostly.
From my experience I recommend using one task for everything, since you can call the service in parallel and right after the service completes, you can directly transform the Excel file in PDF.
This gets solved with the answer from question #3.
Solution overview
The solution I imagine is something like:
First task for checking existence of pending files. You can do a fork using a BranchPythonOperator (example here).
Then you have X parallel tasks to process Excel (call the service) and transform that to PDF. Could be one PythonOperator task. If you use Airflow 2, you can simply use #task() decorator to simplify the code. The X could be from 10 to 100 for example, depending on the resources and the service throughput.
Have a final task that triggers the DAG again to process more files. This could be implemented using a TriggerDagRunOperator (example here).
I am trying to determine how much time I have spent on a project, which has mainly been done in .R files. I know the file.info function will extract metadata for me on that file, but since I have opened it several times over several days, I don't know how to use that information to determine total time editing. Is there a function to find this information, or a way to go through the file system to find it?
Just a thought: you could maintain a log file to which you write the following from your R script: start time, stop-time and R script file name.
You can add simple code in your script that would do this. You would then require a separate script that would analyse the logs and inform you about how much time was spent using the scipt.
For a single user this would work.
Note: this catches script execution time and not the time spent on editing the files. The log would still have merit: you would have a record of when you were working on the script under the assumption that you run your scripts frequently when developing code.
How about using an old-fashioned time sheet for the purpose of recording development time? Tools such as JIRA are very suitable for that purpose.
For example at the start of the script:
logFile <-file("log.txt")
writeLines(paste0("Scriptname start: ", Sys.time()), logFile)
close(logFile)
And at the end of the script:
logFile <-file("log.txt")
writeLines(paste0("Scriptname stop: ", Sys.time()), logFile)
close(logFile)
I created a very small script (without saving) in RCmdr top window, but I only saved the workspace.
When I reload this I can't see anything that was in the top window originally. My mistake I know, but is there a way to see any hint of the functions etc I may have called, from the workspace file? I can see the objects - but not what created them.
If you open a new R session, try hitting the up-arrow keys. The normally invisible .Rhistory file is usually loaded at the start of a new session if the prior session ended normally. If the session is open in a GUI hten you may be able to display the list of commands with a menu command. This may also display that file:
loadhistory(file = ".Rhistory")
The history is cumulative, so unless you had a really long session intervening you may still be able to get code going back for several session. I think it keeps the last 500 entries by default. Actually turns out to be 512. See:
?history
I have an R script that creates multiple scripts and submits these simultaneously to a computer cluster, and after all of the multiple scripts have completed and the output has been written in the respective folders, I would like to automatically launch another R script that works on these outputs.
I haven't been able to figure out whether there is a way to do this in R: the function 'wait' is not what I want since the scripts are submitted as different jobs and each of them completes and writes its output file at different times, but I actually want to run the subsequent script after all of the outputs appear.
One way I thought of is to count the files that have been created and, if the correct number of output files are there, then submit the next script. However to do this I guess I would have to have a script opened that checks for the presence of the files every now and then, and I am not sure if this is a good idea since it probably takes a day or more before the completion of the first scripts.
Can you please help me find a solution?
Thank you very much for your help
-fra
I think you are looking at this the wrong way:
Not an R problem at all, R happens to be the client of your batch job.
This is an issue that queue / batch processors can address on your cluster.
Worst case you could just wait/sleep in a shell (or R script) til a 'final condition reached' file has been touched
Inter-dependencies can be expressed with make too