r copy.file faster solution with files already exist - r

Using R, when copying files from one folder to another, we can use the function file.copy and in order to not copy existing files, we can use the argument overwrite = FALSE.
I thought that it is very fast when files already exist, for me, we can just check if a file with the same name exists, if it is the case, then we do not copy. But in practice I see that it is not the case. So I am wondering what really happens when overwrite = FALSE.
And is there a simple solution to make the process of detecting existing files (and thus not copying) faster?

Related

Standardized filenames when passing folders between steps in pipeline architecture?

I am using AzureML pipelines, where the interface between pipeline steps is through a folder or file.
When I am passing data into the pipeline, I point directly to a single file. No problem at all. Very useful when passing in configuration files which all live in the same folder on my local computer.
However, when passing data between different steps of the pipeline, I can't provide the next step with a file path. All the steps get is a path to some folder that they can write to. Then that same path is passed to the next step.
The problem comes when the following step is then supposed to load something from the folder.
Which filename is it supposed to try to load?
Approaches I've considered:
Use a standardized filename for everything. Problem is that I want to be able to run the steps locally too, independant of any pipeline. This makes very for a very poor UX for that use case.
Check if the path is to a file, if it isn't, check all the files in the folder. If there is only one file, then use it. Otherwise throw an exception. This is maybe the most elegant solution from a UX perspective, but it sounds overengineered to me. We also don't structurally share any code between the steps at the moment, so either we will have repetition or we will need to find some way to share code, which is non-trivial.
Allow custom filenames to be passed in optionally, otherwise use a standard filename. This helpes with the UX, but often the filenames are supposed to be defined by the configuration files being passed in, so while we could do some bash scripting to get the filename into the command, it feels like a sub-par solution.
Ultimately it feels like none of the solutions I have come up with are any good.
It feels like we are making things more difficult for ourselves in the future if we assume some default filename. F.x. we work with multiple file types, so it would need to omit an extension.
But any way to do it without default filenames would also cause maintainence headache down the line, or incurr substantial upfront cost.
The question is am I missing something? Any potential traps, better solutions, etc. would be appreciated. It definately feels like I am somewhat under- and/or overthinking this.

Ada `Gprbuild` Shorter File Names, Organized into Directories

Over the past few weeks I have been getting into Ada, for various different reasons. But there is no doubt that information regarding my personal reasons as to why I'm using Ada is out of scope for this question.
As of the other day I started using the gprbuild command that comes with the Windows version of GNAT, in order to get the benefits of a system for managing my applications in a project-related manner. That is, being able to define certain attributes on a per-project basis, rather than manually setting up the compile-phase myself.
Currently when naming my files, their names are based off of what seems to be a standard for the grpbuild, although I could very much be wrong. For periods (in the package structure), a - is put in the name of the file, for underscores, an _ is put accordingly. As such, a package by the name App.Test.File_Utils would have a file name of app-test-file_utils: .ads and .adb accordingly.
In the .gpr project file I have specified:
for Source_Dirs use ("app/src/**");
so that I am allowed to use multiple directories for storing my files, rather than needing to have them all in the same directory.
The Problem
The problem that arises, however, is that file names tend to get very long. As I am already putting the files in a directory based on the package name contained by the file, I was wondering if there is a way to somehow make the compiler understand that the package name can be retrieved from the file's directory name.
That is, rather than having to name the App.Test.File_Utils' file name app-test-file_utils, I would like it to reside under the app/test directory by the name file_utils.
Is this doable, or will I be stuck with the horrors of eventually having to name my files along the lines of: app-test-some-then-one-has-more_files-another_package-knew-test-more-important_package.ads? Granted, I have not missed something about how an Ada application should actually be structured.
What I have tried
I tried looking for answers in the package Naming configuration of the gpr files in the documentation, but to no avail. Furthermore I have been browsing the web for information, but decided it might be better to get help through Stackoverflow, so that other people who might struggle with this problem in the future (granted it is a problem in the first place) might also get help.
Any pointers in the right direction would be very helpful!
In the top-secret GNAT documentation there is a description of how to use non-default file names. It's a great deal of effort. You will probably give up, use the default names, and put them all in a single directory.
You can also simplify much of the effort by using GPS and letting it build your project file as you add files to your source directories.

grunt replicate empty file structure

I would like to clone an existing file structure, without copying the files, during a grunt task. The new structure should be empty in the beginning and will be filled with processed data from the already existing file structure.
I've checked grunt-contrib-copy but there seems to be no filter for copying directories only.
This is true, place a .keep file in each (or lowest-level) directory and run grunt-contrib-copy and it will replicate your structure.
Updating this to cover the comments:
There is currently no option in grunt-contrib-copy or grunt-init to copy the structure only.
If however, you put aside the option of starting from 0, you could simply allow the pre-processed data to fill the structure, and pass a newer: to your copy task to only overwrite files that are older than those present.
If you absolutely need those folders to be empty, your best bet is to set up a boilerplate for yourself. I wish I could stumble on something better for this.

How to directly work with a data.frame in physical RData?

Do I have to
1) load a data.frame from the physical RData to the memory,
2) make changes,
3) save it back to the physical RData,
4) remove it from the memory to avoid conflicts?
Is there anyway I can skip the load/save steps and make permanent changes to the physical RData directly? Is there a way to work with data.frame like the way working with a SQLite/MySQL database? Or should I just use SQLite/MySQL (instead of data.frame) as the data storage?
More thoughts: I think the major difference is that to work with SQLite/MySQL you establish a connection to the database, but to work with data.frame from RData you make a copy in the memory. The later approach can create conflicts in complex programs. To avoid potential conflicts you have to save the data.frame and immediately remove it from the memory every time you change it.
Thanks!
Instead of using load you may want to consider using attach. This can attach the saved data object to the search path without loading all the objects in it into the global environment. The data frame would then be available to use.
If you want to change the data frame then you would need to copy it to the global environment (will happen automatically for most editing) and then you would need to save it again (there would not be a simple way to save it into a .Rdata file that contains other objects).
When you are done you can use detach (but if you have made a copy in the global environment then you will still need to delete that copy).
If you don't like typing the load/save commands (or attach/detach) each time then you could write your own function that goes through all the steps for you (and if the copy is only in the environment of the function then you don't need to worry about deleting it).
You may also want to consider different ways of storing your data. The typical .Rdata file works well for an all or nothing approach. The saveRDS and readRDS functions will save and read a single object (and do not force you to use the same name when reading it back in). The interfacing with a database approach is probably the best if you are making frequent changes to tables and want them stored outside of R.

Common Lisp: creating a directory

Is there a way to create a directory using common lisp. I want to first create a folder and then put my .txt .png files in that. I know that first I can create the folder externally and then using with-open-file and so on create my files inside the directory. What I want is a common lisp solution for this.
(ensure-directories-exist "/path/name/")
This page seems to be a nice writeup, explaining all the nuances of the file I/O issue that CL needs to address.
Imagine you have multiple processes running simultaneously, who might sometimes need to line up for a program (resource) that cannot run in parallel. Then creating a directory can be an efficient way to lock the resource. ensure-directories-exist is not good for that because it is not atomic. In this situation I use (in sbcl)
(sb-unix:unix-mkdir "/name/of/directory" #o777)

Resources