I've trawled the web for the last 2 hours trying to figure out how to do this so I'm asking here. I've read the documentation but I can't find a good example and I think I may be implementing some sort of anti-pattern or I have missed something very basic but here goes.
I want to loop around an array and perform this task in each loop
grunt.task.run('string-replace:dist');
However, every time I run this loop I need a slightly different config. Different output folder, different replacement array etc etc
I have been changing the config like this.
grunt.config.set('string-replace.dist.options.replacements', translationsArray);
grunt.config.set('string-replace.dist.files.0.dest', 'dist/' + langCode + '/');
Here comes the problem. Grunt does not seem to execute tasks, wait for them to finish and then continue with your code. So it sets up the task for each loop but does not run it, then it changes the config for each loop overwriting the old config each time. Then it runs all the tasks with the last config. I just want to make Grunt execute the task, wait till its done, change the config, rinse and repeat.
Am I doing something really stupid? Grunt documentation seems to be desperately telling me its synchronous but this does not seem to be the case with grunt.task.run. I would also like to point out that I don't want to just register more tasks with the different options because I don't know how many times I might have to run this loop. It is controlled from elsewhere so it can be dynamic and data driven. Unless I'm supposed to register tasks dynamically?
Related
I'm producing an .exe file via PyInstaller and I wonder whether there is a specific option or similar to do so while iterating through some changes. For example, I add a simple print statement in the entire code, do I always have to go through the whole building stage of the .exe file?
I'm asking because it takes a while just to try something out. In my case I think it takes between 5-10 min to create the .exe. In general it's ok but when you do that a dozens or more times a day it adds up.
I am using AzureML pipelines, where the interface between pipeline steps is through a folder or file.
When I am passing data into the pipeline, I point directly to a single file. No problem at all. Very useful when passing in configuration files which all live in the same folder on my local computer.
However, when passing data between different steps of the pipeline, I can't provide the next step with a file path. All the steps get is a path to some folder that they can write to. Then that same path is passed to the next step.
The problem comes when the following step is then supposed to load something from the folder.
Which filename is it supposed to try to load?
Approaches I've considered:
Use a standardized filename for everything. Problem is that I want to be able to run the steps locally too, independant of any pipeline. This makes very for a very poor UX for that use case.
Check if the path is to a file, if it isn't, check all the files in the folder. If there is only one file, then use it. Otherwise throw an exception. This is maybe the most elegant solution from a UX perspective, but it sounds overengineered to me. We also don't structurally share any code between the steps at the moment, so either we will have repetition or we will need to find some way to share code, which is non-trivial.
Allow custom filenames to be passed in optionally, otherwise use a standard filename. This helpes with the UX, but often the filenames are supposed to be defined by the configuration files being passed in, so while we could do some bash scripting to get the filename into the command, it feels like a sub-par solution.
Ultimately it feels like none of the solutions I have come up with are any good.
It feels like we are making things more difficult for ourselves in the future if we assume some default filename. F.x. we work with multiple file types, so it would need to omit an extension.
But any way to do it without default filenames would also cause maintainence headache down the line, or incurr substantial upfront cost.
The question is am I missing something? Any potential traps, better solutions, etc. would be appreciated. It definately feels like I am somewhat under- and/or overthinking this.
We use flyway since years to maintain our DB scripts, and it does a wonderful job.
However there is one single situation where I am not really happy - possibly someone out there has a solution:
In order to reduce the number of scripts required (and also in order to keep overview about "where" our procedures are defined) I'd like to implement our functions/procedures in one script. Every time a procedure changes (or a new one is developed) this script shall be updated - repeatable scripts sound perfect for this purpose, but unfortunately they are not.
The drawback is, that a new procedure cannot be accessed by non-repeatable scripts, as repeatable scripts are executed last, so the procedure does not exist when the non-repeatable script executes.
I hoped I can control this by specifying different locations (e.g. loc_first containing the repeatables I want to be executed first, loc_normal for the standard scripts and the repeatables to be executed last).
Unfortunately the order of locations has no impact on execution order ;-(
What's the proper way to deal with this situation? Right now I need to specify the corresponding procedures in non-repeatable scripts, but that's exactly what I'd like to avoid ....
I found a workaround on my own: I'm using flyway directly with maven (the same would work in case you use the API of course). Each stage of my maven script has its own profile (specifiying URL etc.)
Now I create two profiles for every stage - so I have e.g. dev and devProcs.
The difference between these two maven profiles is, that the "[stage]Procs" profile operates on a different location (where only the repeatable scripts maintaining procedures are kept). Then I need to execute flyway twice - first with [stage]Procs then with [stage].
To me this looks a bit messy, but at least I can maintain my procedures in a repeatable script this way.
According to flyway docs, Repeatable migrations ALWAYS execute after versioned migration.
But, I guess, you can use Flyway callbacks. Looks like, beforeMigrate.sql callback is exactly what you need.
I have a list [~90 files] of zipped files . I have written a loop to unzip them (become 1Gb approx. per file), do some computations, save the output for each of the files and delete the unzipped file. One iteration of this process takes like 30-60min per file [not all files are the same size exactly].
I am not too worried about the time as I can leave it working over the weekend. However, R doesn’t manage to get all the way through. I left it on Friday night and it was only running 12 hours so it only processed 30 of the 90 files.
I don’t deal with this type of heavy processes often but the same has happened in the past with analogous processes. Is there any command I need to insert in my loops to avoid the computer from freezing with this intensive processes? I tried gc() at the end of the loop to no avail.
Is there a list of “good practice” recommendations for this type of procedures?
If your session is freezing you are likely running into a problem you need to isolate as it may be a single file, or maybe you are becoming restricted by memory or extensively using swap.
Regardless, here are some tips or ideas you could implement:
Writing your code to process a file as a singular case, e.g. a function like
process_gz_folder(). Then loop over the file paths and invoke the function you created each time, this keeps the global environment clean.
As you already tried, sometimes gc() can help, but it depends on the situation and if memory is being cleared (after running rm() for example). Could be used after invoking function in first point.
Are you keeping the results of each folder in memory? Does this set of results get larger with each iteration? If so this may be taking up required memory - storing the results to disk as a suitable format will let you accumulate the results after each has been processed.
To add to the prior point, if files produce outputs making sure their names are appropriate and even adding a timestamp (e.g. inputfile_results_YYYYMMDD).
Code could check if file is already processed and skip to next, this can help restarting from scratch, especially if your method for checking if a file is processed is using the existence of an output (with timestamp!).
Using try() to make sure failures do not stop future iterations - however this should produce warnings/output to notify of a failure so that you can come back at a later point.
An abstract approach could be to create a single script that processes a single file, it could just include the function from the first point, proceeded with setTimeLimit() and provide a time for which if the file is not processed the code will stop running. Iterate over this script with a bash script invoking said R script with Rscript which can be passed arguments (filepaths for example). This approach may help avoid freezes but is dependent on you knowing and setting an acceptable time.
Determine if the files are too large for memory when processing the code may need be adjusted to be more memory efficient or change code to process the data incrementally as to not run out of memory.
Reduce other tasks on the computer that can take resources that may cause a freeze.
These are just some ideas that spring to mind that could be things to consider in your example (given the info provided). It would help to see some code and understand what kind of processing you are doing on each file.
Given as little information as you have provided, it is hard to tell what the problem really is.
If possible, I would first unzip and concatenate the files. Then preprocess the data and strip off all fields that are not required for analysis. The resulting file would then be used as input for R.
Also be aware that parsing the input strings as e.g. timestamps may be quite time consuming.
I've a bunch of subtasks for grunt watch (e.g. grunt watch:styles, grunt watch:sprites, etc). Then many other tasks run grunt watch. I would like to exclude one task. Is there a way to specify that? Basically run all grunt watch subtasks except grunt watch:dist.
I know I could create another task and only specify the subtasks I'm actually interested on, however, if I add another subtask later, that means now I've to add it, so I would rather not do it that way.
Thanks
There might be a better way, but this is doing the trick for now:
grunt.registerTask('watch:basic', function () {
delete grunt.config.data.watch.dist;
grunt.task.run('watch');
});
Fortunately, in this case, I don't have other tasks that might run grunt watch:dist, so it's safe to simply remove the config, but I can think of other cases where this approach would create conflicts.