I'm running an optimization in OpenMDAO. One of the components in the model writes a few files to a directory which is given a random name. I track the progress of the optimization using a SqliteRecorder. I would like to be able to correlate iterations in the sqlite database to the directories of each evaluation.
Is there a way to attach arbitrary information to a recorder - in this case, the directory name?
i suggest that you add a string typed output to the component and set it to the folder name. Then the recorder will capture it.
Related
Context: We store historical data in Azure Data Lake as versioned parquet files from our existing Databricks pipeline where we write to different Delta tables. One particular log source is about 18 GB a day in parquet. I have read through the documentation and executed some queries using Kusto.Explorer on the external table I have defined for that log source. In the query summary window of Kusto.Explorer I see that I download the entire folder when I search it, even when using the project operator. The only exception to that seems to be when I use the take operator.
Question: Is it possible to prune columns to reduce the amount of data being fetched from external storage? Whether during external table creation or using an operator at query time.
Background: The reason I ask is that in Databricks it is possible to use the SELCECT statement to only fetch the columns I'm interested in. This reduces the query time significantly.
As David wrote above, the optimization does happen on Kusto side, but there's a bug with the "Downloaded Size" metric - it presents the total data size, regardless of the selected columns. We'll fix. Thanks for reporting.
I have a DICOM study with 3 series and want to refresh its UIDs (StudyInstanceUID, SeriesInstanceUID, SOPInstanceUID) to do some tests. All the data is in a single directory so it's not possible to tell which DICOM belongs to which series.
What I have tried is using dcmodify (dcmtk) with some generate options :
dcmodify mydirectory/*.dcm -gst -gse -gin
but it makes all single files different studies, the structure was broken.
Is there a way to do this or do I have to use other dcmtk tools to identify series UIDs that every single file has?
-gst -gse and -gin
Create a new Study-, Series and SOP Instance UID for each individual image matching mydirectory/*.dcm, hence destroying the study/series structure as you already observed.
The answer is two-fold:
To assign the same UID to all images, you rather use
-m (0020,000D)=...
(this example for the Study Instance UID)
But there is no command line tool in DCMTK that I am aware of which would completely solve your problem. The storescp has an option to create subdirectories for each study (e.g. --sort-on-study-uid) but that does not solve the series-level problem.
With the means of DCMTK, I think you need to do some scripting work around it using dcmdump to dump the files to text, extracting Study- and Series Instance UID from it and then moving the file to an appropriate Study+Series folder.
I am trying to use the SQLite4 LSM library as a separate entity.
How does one open more than one index in a given file/handle? Sqlite3 stores all indexes and tables in a single file, so I assume the LSM library has the ability to support 2 or more indexes within a single file. If not, I just need to know so that I can open two LSM trees. Nothing in the manuals and can't find an example.
I have a bitbake configuration that generates two partitions, the BOOT (FAT) partition containing uBoot, uEnv.txt, etc. and a root file system that gets mounted read-only. There may be instances where the filesystem isn't a separate partition, but rather a ramdisk, so I'm trying to enforce a design pattern that works in both instances:
What I'm trying to do is provide some of the files in the root filesystem as links to locations on the SD card. This way I can build a single SD card image and the minor edits for node IDs or names can be easily tweaked by end-users. So for example, if /etc/special_config.conf would be a useful one, then rather than store it on the read-only partition, a link would be created pointing back to the real file on the BOOT partition.
So far I've tried making a recipe that, for that case, does the following:
IMAGE_BOOT_FILES += "special_config.conf"
do_install () {
ln -s /media/BOOT/special_config.conf \
${D}${sysconfigdir}/special_config.conf
}
This doesn't seem to do anything. The IMAGE_BOOT_FILES doesn't collect the special_config.conf file into the BOOT partition, as if when the system image gets populated all of those changes get wiped out.
Has anyone seen a clever way to enforce this kind of behavior in BitBake?
If I understand correctly, you get your ${sysconfdir}/special_config.conf symlink in the image (via a package built from recipe mentioned), but you don't get the special_config.conf file on your BOOT partition using wic image fstype.
If that's the case, then the only problem is that you define IMAGE_BOOT_FILES in the package recipe, rather than defining it in the image recipe, because this variable is only evaluated at image build time. So drop that from your config file recipe and add it to the image recipe, it should work this way.
I'm using Hadoop and working with a map task that creates files that I want to keep, currently I am passing these files through the collector to the reduce task. The reduce task then passes these files on to its collector, this allows me to retain the files.
My question is how do I reliably and efficiently keep the files created by map?
I know I can turn off the automatic deletion of map's output, but that is frowned upon are they any better approaches?
You could split it up into two jobs.
First create a map only job outputting the sequence files you want.
Then, taking your existing job (doing really nothing in the map anymore but you could do some crunching depending on your implementation & use cases) and reducing as you do now inputting the previous map only job through as your input to the second job.
You can wrap this all up in one jar running the 2 jars as such passing the output path as an argument to the second jobs input path.