Unable to ignore .DS_Store files in DVC - dvc

I use DVC to track my media files. I use MacOS and I want".DS_Store" files to be ignored by DVC. According to DVC documentation I can achieve it with .dvcignore. I created .dvcignore file with ".DS_Store" rule. However every time ".DS_Store" is created dvc status still says that content has changed
Here is the little test to reproduce my issue:
$ git init
$ dvc init
# create directory to store data
# and track it's content with DVC
$ mkdir data
$ dvc add data
# Ignore .DS_Store files created by MacOS
$ echo ".DS_Store" > .dvcignore
# create .DS_Store in data dir
$ touch "data/.DS_Store"
If I understand DVC documentation correctly then dvc status should print something like "Pipeline is up to date. Nothing to reproduce". However dvc status gives me:
data.dvc:
changed outs:
modified: data
How I can really ignore ".DS_Store" files?
UPDATE: The .dvcignore support noticeably improved in latest versions and the problem is no more relevant.

The current implementation of .dvcignore is very limited. Read more on it here.
Please, mention that you are interested in this feature here - https://github.com/iterative/dvc/issues/1876. That would help our team to prioritize issues properly.
The possible workaround for now would be to use one of these approaches - How to stop creating .DS_Store on Mac?

It seems this has been since fixed/implemented!
$ dvc init --no-scm
...
$ mkdir data
$ dvc add data
WARNING: 'data' is empty.
Saving information to 'data.dvc'.
$ echo ".DS_Store" > .dvcignore
$ touch "data/.DS_Store"
$ tree -a data
data
└── .DS_Store
0 directories, 1 file
$ dvc status
Data and pipelines are up to date.

Related

Undo 'dvc add' operation

I dvc add-ed a file I did not mean to add. I have not yet committed.
How do I undo this operation? In Git, you would do git rm --cached <filename>.
To be clear: I want to make DVC forget about the file, and I want the file to remain untouched in my working tree. This is the opposite of what dvc remove does.
One issue on the DVC issue tracker suggests that dvc unprotect is the right command. But reading the manual page suggests otherwise.
Is this possible with DVC?
As per mroutis on the DVC Discord server:
dvc unprotect the file; this won't be necessary if you don't use symlink or hardlink caching, but it can't hurt.
Remove the .dvc file
If you need to delete the cache entry itself, run dvc gc, or look up the MD5 in data.dvc and manually remove it from .dvc/cache.
Edit -- there is now an issue on their Github page to add this to the manual: https://github.com/iterative/dvc.org/issues/625
dvc remove appears to do what you need for uncommitted files - at least for files that aren't in a pipeline. The key (which wasn't clear to me from the error or the docs) is to pass the ….dvc file name, otherwise it tries to find and remove it as a section from dvc.yaml.
# Precondition: DVC is configured for the repo. No dvc.yaml file (untested with it)
$ touch so-57966851.txt
$ dvc add so-57966851.txt
WARNING: 'so-57966851.txt' is empty.
100% Adding...|████████████████████████████████████████|1/1 [00:00, 49.98file/s]
To track the changes with git, run:
git add .gitignore so-57966851.txt.dvc
# Ooops! I did the wrong thing! I didn't mean to add that…
$ dvc remove so-57966851.txt.dvc
$ ll so-*.txt
-rw-r--r-- 1 ibboard users 0 Aug 23 20:27 so-57966851.txt
(Tested with v2.5.4)

Updating tracked dir in DVC

According to this tutorial when I update file I should remove file from under DVC control first (i.e. execute dvc unprotect <myfile>.dvc or dvc remove <myfile>.dvc) and then add it again via dvc add <mifile>. However It's not clear if I should apply the same workflow for the directories.
I have the directory under DVC control with the following structure:
data/
1.jpg
2.jpg
Should I run dvc unprotect data every time the directory content is updated?
More specifically I'm interested if I should run dvc unprotect data in the following use cases:
New file is added. For example if I put 3.jpg image in the data dir
File is deleted. For example if I delete 2.jpg image in the data dir
File is updated. For example if I edit 1.jpg image via graphic editor.
A combination of the previous use cases (i.e. some files are updated, other deleted and new files are added)
Only when file is updated - i.e. edit 1.jpg with your editor AND only if hadrlink or symlink cache type is enabled.
Please, check this link:
updating tracked files has to be carried out with caution to avoid data corruption when the DVC config option cache.type is set to hardlink or/and symlink
I would strongly recommend reading this document: Performance Optimization for Large Files it explains benefits of using hardlinks/symlinks.
Links above do not work anymore -> here is the up-to-date link and also pasting the instructions here:
Modifying content
Unlink the file with dvc unprotect. This will make train.tsv safe to edit:
dvc unprotect train.tsv
Then edit the content of the file, for example with:
echo "new data item" >> train.tsv
Add the new version of the file back with DVC:
dvc add train.tsv
git add train.tsv.dvc
git commit -m "modify train data"
If you have remote storage and/or an upstream repo:
dvc push
git push
Replacing files
If you want to replace the file altogether, you can take the following steps.
First, stop tracking the file by using dvc remove on the .dvc file. This will remove train.tsv from the workspace (and unlink it from the cache):
dvc remove train.tsv.dvc
Next, replace the file with new content:
echo new > train.tsv
And start tracking it again:
dvc add train.tsv
git add train.tsv.dvc .gitignore
git commit -m "new train data"
If you have remote storage and/or an upstream repo:
dvc push
git push

"patch: **** Can't rename file" bash patch error

I am running these three commands.
cd "${folder1}"
diff -ruN "${folder1}" "${folder2}" > "${patchname}"
patch -f -s -d "${folder1}" --merge < "${patchname}"
When I run them it successfully changes the files in folder1 to the same as folder2. However when I run these commands I get the output.
patch: **** Can't rename file ./update.patch.omMg8yG to update.patch : Operation not permitted
The problem is here:
cd "${folder1}"
diff -ruN "${folder1}" "${folder2}" > "${patchname}"
You're inside folder1, and trying to create a patch that's also inside folder1 (which we know because your log file is calling the file ./update.patch.omMg8yG, explicitly referring to the current directory), which contains a set of differences between folder1 and folder2, while those differences also include the contents of the output file itself -- the output file being generated over the course of the diff operation, and read over the course of patch operation.
Consequently, patch is trying to change the patch file it's reading from. It's failing, hence the error, but you shouldn't be having it make the attempt -- particularly since on most UNIXlike operating systems, this attempt wouldn't fail (I'm assuming you're on Cygwin, or on a remote filesystem mount that doesn't support open unlinked files).
Modify your patchfile variable to point to a location in a different directory, neither folder1 or folder2.

How to install gdb in unix without root access

I am trying to install gdb in unix but i don't have root access to create files and directories in root folder. However i can only create folders in my own directories. I have followed this link http://www.tutorialspoint.com/gnu_debugger/installing_gdb.htm but every time execution fails at step 4 because it needs to create files at root level. How do I fix it?
Step1:
$ build> gzip -d gdb-6.6.tar.gz
$ build> tar xfv gdb-6.6.tar
$ build> cd gdb-6.6
Step2:
$ gdb-6.6> .⁄configure
Step3:
$ gdb-6.6> make
Step4:
$ gdb-6.6> make install
**execution fails at this point.
Or is there any other solution to install gdb in unix without root level access. Please help.
When you ./configure, you can specify --prefix which control whether software is installed.
./configure --prefix=$HOME/gdb
make
make install
Above will install gbd under $HOME/gdb.
You need to specify $HOME/gdb/bin/gdb to run the program after installation. Or adjust $PATH to include $HOME/gdb/bin:
export PATH=$PATH:$HOME/gdb/bin

If condition inside the %Files section on a SPEC file

I'm kinda a new to writing spec files and building RPM's. Currently I have one RPM that is supposed to deploy some files in 1 of 2 possible directories that will vary with the OS.
How can I, within the %files section, verify them? I can't use variable...I can't verify both paths because one will for sure fail...I tried to define a macro earlier in the %install section but it will be defined just once and won't be redefined on every RPM installation...
what can I do here?
Thanks
I had a similar situation where additional files were included in the RPM in case of a DEBUG build over and above all files in the RELEASE build.
The trick is to pass a list of files to %files alongwith a regular list of files below it:
%install
# Create a temporary file containing the list of files
EXTRA_FILES=$RPM_BUILD_ROOT/ExtraFiles.list
touch %{EXTRA_FILES}
# If building in DEBUG mode, then include additional test binaries in the package
%if %{build_mode} == "DEBUG"
# %{build_mode} is a variable that is passed to the spec file when invoked by the build script
# Like: rpmbuild --define "build_mode DEBUG"
echo path/to/file1 > %{EXTRA_FILES}
echo path/to/file2 >> %{EXTRA_FILES}
%endif
%files -f %{EXTRA_FILES}
path/to/release/file1
path/to/release/file2
In your case, you can leverage the %if conditional in the %install section, use the OS as a spec variable passed to rpmbuild (or detect it in the RPM spec itself) and then pass the file containing the list to %files
The %files section can have variables in it, but usually this would be something like your path that is defined so you don't have to repeat it a bunch. so %{long_path}/file_name, where long_path was defined earlier in the spec file. the %files section is all the information that goes into the RPM database, and is created when you build the RPM so you won't be able to change those values based on machine information when installed.
If you really want to do this, you could include a tar file inside of the main tarball that gets extracted depending on certain conditions (since the spec file is just bash). Now keep in mind this is an awful idea. The files won't be tracked by the RPM database, so when you remove the RPM these files will still exist.
In reality you should build two RPMs, this will allow for better support going forward into the future in the event you have to hand this off to someone, as well as preserving your own sanity a year from now when you need to update the RPM.
This is how I solved my problem
step 1 :
In Build section .. somewhere I wrote :
%build
.....
#check my condition here & if true define some macro
%define is_valid %( if [ -f /usr/bin/myfile ]; then echo "1" ; else echo "0"; fi )
#after his normal continuation
.....
...
Step 2: in install section
%install
......
#do something in that condition
if %is_valid
install -m 0644 <file>
%endif
#rest all your stuff
................
Step 3:in files section
%files
%if %is_valid
%{_dir}/<file>
%endif
That's it
It works.
PS : I cannot give you full code hence giving all useful snippet
Forrest suggests the best solution, but if that is not possible practical you can detect the OS version at runtime in the post-install section, move the script to the appropriate location, and then delete it post-uninstall, eg:
# rpm spec snippets
%define OS_version %(hacky os detection)
...
Source2: script.sh
...
%install
install %{_sourcedir}/script.sh %{buildroot}/some/known/location
...
%post
%if %{OS_version} == "..."
mv /some/known/location/script.sh /distro/specific/script.sh
%elif %{OS_version} == "..."
...
%preun
rm -rf /all/script/locations
Much more error prone than building different RPMs on different OSes, but will scale a little better if you need to support many different OSes.

Resources