Standardized filenames when passing folders between steps in pipeline architecture? - pipeline

I am using AzureML pipelines, where the interface between pipeline steps is through a folder or file.
When I am passing data into the pipeline, I point directly to a single file. No problem at all. Very useful when passing in configuration files which all live in the same folder on my local computer.
However, when passing data between different steps of the pipeline, I can't provide the next step with a file path. All the steps get is a path to some folder that they can write to. Then that same path is passed to the next step.
The problem comes when the following step is then supposed to load something from the folder.
Which filename is it supposed to try to load?
Approaches I've considered:
Use a standardized filename for everything. Problem is that I want to be able to run the steps locally too, independant of any pipeline. This makes very for a very poor UX for that use case.
Check if the path is to a file, if it isn't, check all the files in the folder. If there is only one file, then use it. Otherwise throw an exception. This is maybe the most elegant solution from a UX perspective, but it sounds overengineered to me. We also don't structurally share any code between the steps at the moment, so either we will have repetition or we will need to find some way to share code, which is non-trivial.
Allow custom filenames to be passed in optionally, otherwise use a standard filename. This helpes with the UX, but often the filenames are supposed to be defined by the configuration files being passed in, so while we could do some bash scripting to get the filename into the command, it feels like a sub-par solution.
Ultimately it feels like none of the solutions I have come up with are any good.
It feels like we are making things more difficult for ourselves in the future if we assume some default filename. F.x. we work with multiple file types, so it would need to omit an extension.
But any way to do it without default filenames would also cause maintainence headache down the line, or incurr substantial upfront cost.
The question is am I missing something? Any potential traps, better solutions, etc. would be appreciated. It definately feels like I am somewhat under- and/or overthinking this.

Related

Benefits of using pyuic vs uic.loadUi

I am currently working with python and Qt which is kind of new for me coming from the C++ version and I realised that in the oficial documentation it says that an UI file can be loaded both from .ui or creating a python class and transforming the file into .py file.
I get the benefits of using .ui it is dynamically loaded so no need to transform it into python file with every change but what are the benefits of doing that?, Do you get any improvements in run time? Is it something else?
Thanks
Well, this question is dangerously near to the "Opinion-based" flag, but it's also a common one and I believe it deserves at least a partial answer.
Conceptually, both using the pyuic approach and the uic.loadUi() method are the same and behave in very similar ways, but with some slight differencies.
To better explain all this, I'll use the documentation about using Designer as a reference.
pyuic approach, or the "python object" method
This is probably the most popular method, especially amongst beginners. What it does is to create a python object that is used to create the ui and, if used following the "single inheritance" approach, it also behaves as an "interface" to the ui itself, since the ui object its instance creates has all widgets available as its attributes: if you create a push button, it will be available as ui.pushButton, the first label will be ui.label and so on.
In the first example of the documentation linked above, that ui object is stand-alone; that's a very basic example (I believe it was given just to demonstrate its usage, since it wouldn't provide a lot of interaction besides the connections created within Designer) and is not very useful, but it's very similar to the single inheritance method: the button would be self.ui.pushButton, etc.
IF the "multiple inheritance" method is used, the ui object will coincide with the widget subclass. In that case, the button will be self.pushButton, the label self.label, etc.
This is very important from the python point of view, because it means that those attribute names will overwrite any other instance attribute that will use the same name: if you have a function named "saveFile" and you name the button "saveFile", you won't have any [direct] access to that instance method any more as soon as setupUi is returned. In this case, using the single inheritance method might be helpful - but, in reality, you could just be more careful about function and object names.
Finally, if you don't know what the pyuic generated file does and what's it for, you might be inclined to use it to create your program. That is wrong for a lot of reasons, but, most importantly, because you might certainly realize at some point that you have to edit your ui, and merging the new changes with your modified code is clearly a PITA you don't want to face.
I recently answered a related question, trying to explain what happens when setupUi() is called in much more depth.
Using uic.loadUi
I'd say that this is a more "modular" approach, mostly because it's much more direct: as already pointed out in the question, you don't have to constantly regenerate the ui files each time they're modified.
But, there's a catch.
First of all: obviously the loading, parsing and building of an UI from an XML file is not as fast as creating the ui directly from code (which is exactly what the pyuic file does within setupUi()).
Then, there is at least one relatively small bug about layout contents margins: when using loadUi, the default system/form margins might be completely ignored and set to 0 if not explicitly set. There is a workaround about that, explained in Size of verticalLayout is different in Qt Designer and PyQt program (thanks to eyllanesc).
A comparison
pyuic approach
Pros:
it's faster; in a very simple test with a hundred buttons and a tablewidget with more than 1200 items I measured the following bests:
pyuic loading: 33.2ms
loadUi loading: 51.8ms
this ratio is obviously not linear for a multitude of reasons, but you can get the idea
if used with the single inheritance method, it can prevent accidental instance attribute overwritings, and it also means a more "contained" object structure
using python imports ensures a more coherent project structure, especially in the deployment process (having non-python files is a common source of problems)
the contents of those files are actually instructive, especially for beginners
Cons:
you always must remember to regenerate the python files everytime you update an ui; we all know how easy is to forget an apparently meaningless step like this might be, expecially after hours of coding: I've seen plenty of situations for which people was banging heads on desks (hopefully both theirs) for hours because of untraceable issues, before realizing that they just forgot to run pyuic or didn't run it on the right files; my own forehead still hurts ;-)
file tracking: you have to count two files for each ui, and you might forget one of them along the way when migrating/forking/etc, and if you forgot an ui file it possibly means that you have to recreate it completely from scratch
n00b alert: beginners are commonly led to think that the generated python file is the one to be used to create their programs, which is obviously wrong; unfortunately, the # WARNING! message is not clear enough (I've been almost begging the head PyQt developer about this); while this is obviously not an actual problem of this approach, right now it results in being so
some of the contents of a pyuic generated files are usually unnecessary (most importantly, the object name, which is used only for specific cases), and that's pretty obvious, since it's automatically generated ("you might need that, so better safe than sorry"); also, related to the issue above, people might be led to think that everything pyuic creates is actually needed for a GUI, resulting in unnecessary code that decreases its readability
loadUi method
Pros:
it's direct and immediate: you edit your ui on Designer, you save it (or, at least, you remember to do it...), and when you run your code it's already there; no fuss, no muss, and desks/foreheads are safe(r)
file tracking and deployment: it's just one file per ui, you can put all those ui files in a separate folder, you don't have to do anything else and you don't risk to forget something on the way
direct access to widgets (but this can be achieved using the multiple inheritance approach also)
Cons:
the layout issue mentioned above
possible instance attribute overwriting and no "ui" object "containment"
slightly slower loading
path and deployment: loading is done using os relative paths and system separators, so if you put the ui in a directory different from the py file that loads that .ui you'll have to consider that; also, some package managers use to compress everything, resulting in access errors unless paths are correctly managed
In my opinion, all considering, the loadUi method is usually the better choice. It doesn't distract me, it allows better conceptual compartmentation (which is usually good and also follows a pattern similar to MVC much more closely, conceptually speaking) and I strongly believe it as being far less prone to programmer errors, for a multitude of reasons.
But that's clearly a matter of choice.
We should also and always remember that, like every other choice we do, using ui files is an option.
There is people who completely avoids them (as there is people who uses them literally for anything), but, like everything, it all and always depends on the context.
A big benefit of using pyuic is that code autocompletion will work.
This can make programming much easier and faster.
Then there's the fact that everything loads faster.
pyuic6-Tool can be used to automate the call of pyuic6 when the application is run and only convert .ui files when they change.
It's a little bit longer to set up than just using uic.loadUi but the autocompletion is well worth it if you use something like PyCharm.

Ada `Gprbuild` Shorter File Names, Organized into Directories

Over the past few weeks I have been getting into Ada, for various different reasons. But there is no doubt that information regarding my personal reasons as to why I'm using Ada is out of scope for this question.
As of the other day I started using the gprbuild command that comes with the Windows version of GNAT, in order to get the benefits of a system for managing my applications in a project-related manner. That is, being able to define certain attributes on a per-project basis, rather than manually setting up the compile-phase myself.
Currently when naming my files, their names are based off of what seems to be a standard for the grpbuild, although I could very much be wrong. For periods (in the package structure), a - is put in the name of the file, for underscores, an _ is put accordingly. As such, a package by the name App.Test.File_Utils would have a file name of app-test-file_utils: .ads and .adb accordingly.
In the .gpr project file I have specified:
for Source_Dirs use ("app/src/**");
so that I am allowed to use multiple directories for storing my files, rather than needing to have them all in the same directory.
The Problem
The problem that arises, however, is that file names tend to get very long. As I am already putting the files in a directory based on the package name contained by the file, I was wondering if there is a way to somehow make the compiler understand that the package name can be retrieved from the file's directory name.
That is, rather than having to name the App.Test.File_Utils' file name app-test-file_utils, I would like it to reside under the app/test directory by the name file_utils.
Is this doable, or will I be stuck with the horrors of eventually having to name my files along the lines of: app-test-some-then-one-has-more_files-another_package-knew-test-more-important_package.ads? Granted, I have not missed something about how an Ada application should actually be structured.
What I have tried
I tried looking for answers in the package Naming configuration of the gpr files in the documentation, but to no avail. Furthermore I have been browsing the web for information, but decided it might be better to get help through Stackoverflow, so that other people who might struggle with this problem in the future (granted it is a problem in the first place) might also get help.
Any pointers in the right direction would be very helpful!
In the top-secret GNAT documentation there is a description of how to use non-default file names. It's a great deal of effort. You will probably give up, use the default names, and put them all in a single directory.
You can also simplify much of the effort by using GPS and letting it build your project file as you add files to your source directories.

Closure: --namespace Foo does not include Foo.Bar, and related issues

I have a rather big library with a significant set of APIs that I need to expose. In fact, I'd like to expose the whole thing. There is a lot of namespacing going on, like:
FooLibrary.Bar
FooLibrary.Qux.Rumps
FooLibrary.Qux.Scrooge
..
Basically, what I would like to do is make sure that the user can access that whole namespace. I have had a whole bunch of trouble with this, and I'm totally new to closure, so I thought I'd ask for some input.
First, I need closurebuilder.py to send the full list of files to the closure compiler. This doesn't seem supported: --namespace Foo does not include Foo.Bar. --input only allows a single file, not a directory. Nor can I simply send my list of files to the closure compiler directly, because my code is also requiring things like "goog.assers", so I do need the resolver.
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
This is my main issue.
However, later the closure compiler, with ADVANCED_OPTIMIZATIONS on, will optimize all these names away. Now I can fix that by adding "#export" all over the place, which I am not happy about, but should work. I suppose it would also be valid to use an extern here. Or I could simply disable advanced optimizations.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
Finally, for working in source mode, I need to do goog.require() for every namespace I am using. This is merely an inconvenience, though I am mentioning because it sort of related to my trouble above. I would prefer to be able to do:
goog.requireRecursively('FooLibrary')
in order to pull all the child namespaces as well; thus, recreating with a single command the environment that I have when I am using the compiled version of my library.
I feel like I am possibly misunderstanding some things, or how Closure is supposed to be used. I'd be interested in looking at other Closure-based libraries to see how they solve this.
You are discovering that Closure-compiler is built more for the end consumer and not as much for the library author.
If you are exporting basically everything, then you would be better off with SIMPLE_OPTIMIZATIONS. I would still highly encourage you to maintain compatibility of your library with ADVANCED_OPTIMIZATIONS so that users can compile the library source with their project.
First, I need closurebuilder.py to send the full list of files to the closure compiler. ...
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
You would need to specify an --root of your source folder and specify the namespaces of the leaf nodes of your file dependency tree. You may have better luck with the now deprecated CalcDeps.py script. I still use it for some projects.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
You can't do that because it only makes sense based on the final usage. You as the library writer wish to export everything, but perhaps a consumer of your library wishes to include the source (uncompiled) version and have more dead code elimination. Library authors are stuck in a kind of middle ground between SIMPLE and ADVANCED optimization levels.
What I have done for this case is maintain a separate exports file for my namespace that exports everything. When compiling a standalone version of my library for distribution, the exports file is included in the compilation. However I can still include the library source (without the exports) into a project and get full dead code elimination. The work/payoff balance of this though must be weighed against just using SIMPLE_OPTIMIZATIONS for the standalone library.
My GeolocationMarker library has an example of this strategy.

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

Protect.exe for AutoLISP code protection

I am developing an architectural LISP-based package for a member of the IntelliCAD consortium. Per recommendations I have found on websites, I have used the Kelvinator to deformat and disguise some of the code. Now I am attempting to use Protect.exe to encrypt the code. The exe seemed to work until I tried to put use a folder name in the output file name thus:
protect es.lsp L kelvinated\protected\es.lsp
First of all, can I do this? Will protect.exe work like this, or do the input and output file have to be in the same folder?
Also, one time I tried this and I got a "stack overflow" error. Therefore, I am here.
Kelvinator/protect et al are pretty old utilities, do you know the last time they were updated? Subtitle, they may expect old school 8.3 file / folder names.
As for "will this work?", I cannot say, as I use different schemes to protect my work when writing lisp for others (vlx/fas, bricscad's encryptor, my own loader / obfuscators ...).
A stack overflow in this context suggests a recursion error, perhaps when it tries to reconcile the pathing you're providing.
Have you tried to use the DOS short path? Putting the path in quotes? Using forward slashes? Using double backslashes?
What happens if you pass "/?" (and alternates) on the command line, does it provide any help?
Finally, if it refuses to process the files unless they share they same directory you could always front end with with a batch file that does the housekeeping for you.
Michael.

Resources