dvc gc and files in remote cache - dvc

dvc documentation for dvc gc command states, the -r option indicates "Remote storage to collect garbage in" but I'm not sure if I understand it correctly. For example I execute this command:
dvc gc -r myremote
What exactly happens if I execute this command? I have 2 possible answers:
dvc checks which files should be deleted, then moves these files to "myremote" and then deletes all these files in local cache but not in remote.
dvc checks which files should be deleted and deletes these files both in local cache and "myremote"
Which one of them is correct?

one of DVC maintainers here.
Short answer: 2. is correct.
A bit of additional information:
Please be careful when using dvc gc. It will clear your cache from all dependencies that are not mentioned in the current HEAD of your git repository.
We are working on making dvc gc preserving whole history by default.
So if you don't want to delete files from your history commits, it would be better to wait for completion of this task.
[EDIT]
Please see comment below.

Related

DVC experiment is restoring deleted files

I am using DVC to run experiments in my project using
dvc exp run
Now when i make changes to a file(example train.py) and run "dvc exp run" everything goes well,
but my problem is that when making changes by deleting a file(example train.py or an image in the data folder) as soon as i run the "dvc exp run" the file is restored.
how to stop that from happening?
This is my dvc.yaml:
stages:
train:
cmd: python train.py
deps:
- train.py
metrics:
- metrics.txt:
cache: false
From the clarifications under the OP it seems that (both train.py and) the data files are controlled by Git.
[DVC experiments][1] have to be based on the Git HEAD so dvc exp run may be doing git checkout HEAD internally, before reproducing the pipeline (dvc.yaml). Any Git-tracked files will be restored.
UPDATE: Looks like this may be a bug. Being tracked in https://github.com/iterative/dvc/issues/6297. Should be fixed soon!

git-ignore dvc.lock in repositories where only the DVC pipelines are used

I want to use the pipeline functionality of dvc in a git repository. The data is managed otherwise and should not be versioned by dvc. The only functionality which is needed is that dvc reproduces the needed steps of the pipeline when dvc repro is called. Checking out the repository on a new system should lead to an 'empty' repository, where none of the pipeline steps are stored.
Thus, - if I understand correctly - there is no need to track the dvc.lock file in the repository. However, adding dvc.lock to the .gitginore file leads to an error message:
ERROR: 'dvc.lock' is git-ignored.
Is there any way to disable the dvc.lock in .gitignore check for this usecase?
This is definitely possible, as DVC features are loosely coupled to one another. You can do pipelining by writing your dvc.yaml file(s), but avoid data management/versioning by using cache: false in the stage outputs (outs field). See also helper dvc stage add -O (big O, alias of --outs-no-cache).
And the same for initial data dependencies, you can dvc add --no-commit them (ref).
You do want to track dvc.lock in Git though, so that DVC can determine the latest stage of the pipeline associated with the Git commit in every repo copy or branch.
You'll be responsible for placing the right data files/dirs (matching .dvc files and dvc.lock) in the workspace for dvc repro or dvc exp run to behave as expected. dvc checkout won't be able to help you.

Do not re-create repositories after updating

We manage systems and thus manage repositories. We remove repositories which we do not use, present in /etc/yum.repos.d/<file>
Our problem is: after an update/upgrade of the system, CentOS automatically re-creates the repositories which were removed, which is an issue for us.
Question: Is there a command / method to ensure repositories are not re-created after an upgrade on CentOS 7 systems.
Those repositories are created by someone, the OS doesn't recreate them.
Either they are restored by an update of a RPM package such as centos-release or by an automatic script you setup/run (ansible?).
I'm not aware of an automatic method to delete a repo; I see a couple of solutions:
Exclude centos-release from the upgradable packages, by adding
exclude=centos-release
to /etc/yum.conf (space separated list), but this could break some updates;
Disable them with:
# yum-config-manager --disable base,updates,extras,centosplus,epel,whatever
(this can be easily scripted and put in a cron or in your ansible playbook)
Write a small script and place it in /etc/cron.hourly/, e.g. /etc/cron.hourly/wipe_repos, containing:
#!/usr/bin/env bash
rm -f /etc/yum.repos.d/CentOS-Base.repo
or, better:
#!/usr/bin/env bash
yum-config-manager --disable base,updates,extras,centosplus,epel,whatever
I would suggest to use solution 2, since the repo files aren't overwritten by updates, but the new versions are placed along the old in .rpmnew files.
This is guaranteed by the flag %config(noreplace) in the source rpm of centos-release, applied to all files in /etc/yum.repos.d/.
You can check this by downloading the .src.rpm and opening the centos-release.spec file.
$ mkdir test && cd test
$ yumdownloader --source centos-release
$ rpm2cpio centos-release*.rpm | cpio -idmv
$ cat centos-release.spec
(or search for the package online and download the src.rpm)
Then scroll down to section %files and you'll notice:
%config(noreplace) /etc/yum.repos.d/*
%config(noreplace) means that all those files are not replaced with new files from an update, but the files from the new rpm are saved with the extension .rpmnew, so you'll have:
$ ls /etc/yum.repos.d/
CentOS-Base.repo <-- here you set them as disabled
CentOS-Base.repo.rpmnew <-- this comes from the update, but yum will ignore it
For reference, see http://people.ds.cam.ac.uk/jw35/docs/rpm_config.html or https://serverfault.com/a/48819/.
As I already said in the comments below the question, the reason why those repositories keep reappearing after an update is quite simple: the files defining the system repositories are owned by the package centos-release and whenever this package gets updated or reinstalled, the repositories reappear.
The package centos-release is a very basic package, it provides the capabilities redhat-release and system-release, and a number of other basic packages depend on it.
[local ~]$ rpm -q --provides centos-release
centos-release = 7-6.1810.2.el7.centos
centos-release(upstream) = 7.6
centos-release(x86-64) = 7-6.1810.2.el7.centos
config(centos-release) = 7-6.1810.2.el7.centos
redhat-release = 7.6-1
system-release = 7.6-1
system-release(releasever) = 7
[local ~]$ rpm -q --whatrequires system-release
setup-2.8.71-10.el7.noarch
grubby-8.28-25.el7.x86_64
[local ~]$ rpm -q --whatrequires redhat-release
initscripts-9.49.46-1.el7.x86_64
systemd-219-62.el7_6.5.x86_64
There is no easy way out of this.
But one possible solution might be to create a customized RPM package to replace centos-release. It should contain the pointers to your own repositories and of course needs to provide the capabilities redhat-release and system-release.
Please be aware that I have no idea if this is actually going to work, it's just something that came to my mind while thinking about the problem. It might save you the work of creating a full custom distribution derived from CentOS, which is the only other way I can think of to achieve what you seem to want.
My solution doesn't exactly solve the problem you request ("how do I delete default repository config files forever?"), but it does stabilize your config changes. If you zero out the files instead of deleting them, then system updates will leave your 'edited' versions unchanged.
I do feel that this is a 'hack', leaving named ghost files, but it's one I can live with. No need to disable or customize redhat-release or system-release.
My problem was slightly different than yours - I maintained different configs for the same repositories for different situations, indicated by filename. On updates the original files would return, leaving me with redundant and incorrect definitions. Now they don't.

Git merge results in 400 rename/rename conflicts, how do I resolve them quickly?

So, I have a number of Wordpress sites managed with a Git repository, all of which are branches off of a central upstream Git repository. I recently applied a bunch of updates to the parent repo, but one of the child website repos had a plugin updated to a different version and now throws up about 400 rename/rename conflicts. All of these conflicts are in an upstream plugin directory that would be safe to just resolve in favor of the upstream branch.
I want to do the following:
Ensure the upstream version of the files 'wins' the merge conflict (e.g. what the --theirs flag does with checkout)
Produce a mergeable history (If it's not safe for a coworker to type "git pull origin master" with an old repo, it's not an option. I'm religiously opposed to rebasing.)
Not restructure my Git repository (My hosting provider, Pantheon, will not install Composer dependencies at deploy time. Upstream plugins have to be part of the repo.)
Not get a repetitive stress injury (Has to be a reasonably small number of commands because I have to resolve these kinds of messes once a month or so.)
If I just type "git checkout wp-content/plugins/** --theirs", I get hit in the face with about 400 errors, and Git refuses to checkout the files. They look like this:
....400 or so errors omitted...
error: path 'wp-content/plugins/wordpress-seo/js/dist/wp-seo-quick-edit-handler-710.min.js' does not have their version
error: path 'wp-content/plugins/wordpress-seo/js/dist/wp-seo-quick-edit-handler-720.min.js' does not have their version
error: path 'wp-content/plugins/wordpress-seo/js/dist/wp-seo-recalculate-710.min.js' does not have their version
error: path 'wp-content/plugins/wordpress-seo/js/dist/wp-seo-recalculate-720.min.js' does not have their version
I categorically refuse to type 400 git rm/git add commands with each individual path included. git checkout --force is not an option, as --theirs and --force are mutually incompatible (for some reason). My current solution is to open Git GUI and manually right-click -> Use Remote Version and then click Yes... 400 times. I don't have to type the path at least but this is still time consuming.
How do I efficiently resolve a large number of rename/rename conflicts in favor of the remote repository?
Do you want to just resolve the conflicted files in favour of the remote, or just take a whole tree as it is in the remote?
For the latter, you could do this:
Just accept the files as-is with conflicts. git add . or similar
Commit the merge.
rm -Rf path/in/question
git checkout origin/branch -- path/in/question
git commit --amend -a
For the former, it's probably something pretty similar
Just accept the files as-is with conflicts. git add . or similar
Commit the merge.
Find files with conflicts. e.g. grep -r -l '>>>>' path/in/question > /tmp/conflicts.txt
Delete the files with conflicts, check out the desired versions, and amend the commit in a similar means to the above.
(If there are files/paths with spaces in them, small adjustments to the above commands may be necessary. I've given the simpler versions for clarity.)

svn cleanup: sqlite: database disk image is malformed

I was trying to do a svn cleanup because I can't commit the changes in my working copy, and I got the following error:
sqllite: database disk image is malformed
What can I do right now?
First, open command/terminal at repository root (folder which has .svn as child folder):
cd /path/to/repository
Download sqlite3 and put executable sqlite3 at root of folder.
You do an integrity check on the sqlite database that keeps track of the repository (/path/to/repository/.svn/wc.db):
sqlite3 .svn/wc.db "pragma integrity_check"
That should report some errors.
Then you might be able to clean them up by doing:
sqlite3 .svn/wc.db "reindex nodes"
sqlite3 .svn/wc.db "reindex pristine"
If there are still errors after that, you still got the option to check out a fresh copy of the repository to a temporary folder and copy the .svn folder from the fresh copy to the old one. Then the old copy should work again and you can delete the temporary folder.
Integrity check
sqlite3 .svn/wc.db "pragma integrity_check"
Clean up
sqlite3 .svn/wc.db "reindex nodes"
sqlite3 .svn/wc.db "reindex pristine"
Alternatively
You may be able to dump the contents of the database that can be read to a backup file, then slurp it back into an new database file:
sqlite3 .svn/wc.db
sqlite> .mode insert
sqlite> .output dump_all.sql
sqlite> .dump
sqlite> .exit
mv .svn/wc.db .svn/wc-corrupt.db
sqlite3 .svn/wc.db
sqlite> .read dump_all.sql
sqlite> .exit
The SVN cleanup didn't work. The SVN folder on my local system got corrupted. So I just deleted the folder, recreated a new one, and updated from SVN. That solved the problem!
After a power blackout, I ran into the database disk image is malformed error and the suggested reindex nodes command did not fix all issues due to violated constraints. Also the procedure described in http://mail-archives.apache.org/mod_mbox/subversion-users/201111.mbox/%3C874nybhpxi.fsf#stat.home.lan%3E did not resolve the problem.
Solution in my case:
Checkout the svn repository again into a temporary folder
Copy, i.e. replace, the file ".svn/wc.db" from the new checkout to the corrupt one
This may be useful, if your original svn checkout contains many modified or unversioned files and you don't want to switch to a fresh svn checkout.
I copied over .svn folder from my peer worker's directory and that fixed the issue.
Do not waste your time on checking integrity or deleting data from work queue table because these are temporary solutions and it will hit you back after a while.
Just do another checkout and replace the existing .svn folder with the new one. Do an update and then it should go smooth.
check out this svn at another place
show hidden .svn file
replace wc file
this works for me!
Maybe, could be a solution:
right mouse click over project
team -> disconnect
Select: Also delete ...
Now, re-connect again:
right mouse click over project
team -> Share project
select your repositorie: mine SVN ( other case: git, etc)
select your repositorie folder
Note:
On my case, I did a backup of my files. ( safe ur back :P )
Edit:
I am talking about SVN plugin on Eclipse :)
Have you seen this post on the subversion site? You could also potentially try validating and "fixing" the database directly as described here. (Note that I'm no expert, I just did a quick google search. May not be related to your issues at all).
Personally, I'd try checking out the repo again and reapplying your changes. Not sure if this is possible though in your case?
Throughout my researches, I've found 2 viable solutions.
If you're using any type of connections, ssh, samba, mounting, disconnect/unmount and reconnect/remount. Try again, this often resolved the problem for me. After that you can do svn cleanup or just keep on working normally (depending on when the problem appeared). Rebooting my computer also fixed the problem once... yes it's dumb I know!
Some times all there is to do is to rm -rf your files (or if you're not familiar with the term, just delete your svn folder), and recheckout your svn repository once again. Please note that this does not always solve the problem and you might also have changes you don't want to lose. Which is why I use it as the second option.
Hope this helps you guys!
I solved my problem of visual svn server rep-cache.db corruption.
Their are two solutions.
Stop the Visual SVN Server service.
Download sqllite3.exe shell from sqllite website and copy that into repo's db folder.
Type the following commands at command prompt in the repo's db folder.
-- First Solution --
sqlite3 rep-cache.db
.clone rep-cache-new.db
press ctrl+c to exit sqllite.
ren rep-cache.db rep-cache-old.db
ren re-cache-new.db rep-cache.db
-- 2nd Solution --
Delete The rep-cache.db
del rep-cache.db
it will be automatically created.
If you install the Tortoise SVN, Please go to task manager and stop it.
Then try to delete the folder. it will work
I fixed this for an instance of it happening to me by deleting the hidden .svn folder and then performing a checkout on the folder to the same URL.
This did not overwrite any of my modified files & just versioned all of the existing files instead of grabbing fresh copies from the server.
Marked answer might be the correct one, according to subversion cleanup. But the error is definitely a generic one, which led me here, this question page.
Our project has the dependency System.Data.SQLite and the error message was the same:
database disk image is malformed
In my case, I've executed following check script and the followings via SQLiteStudio 3.1.1.
pragma integrity_check
(I don't have any idea if these statistics would help, but I'm going to share them anyway...)
The DataBase file is being used on everyday usage for 1.5 year, via the connection journal mode on Memory, and was about 750 MB large. There were approximately 140K records per table and 6 tables was this large.
After the execution of Integrity Check script, 11 rows was returned after 30 minutes of execution time.
wrong # of entries in index sqlite_autoindex_MyTableName_1
wrong # of entries in index MyOtherTableAndOrIndexName_1
wrong # of entries in index sqlite_autoindex_MyOtherTableAndOrIndexName_2
etc...
All the results were about the indexes.
Following-up the re-building each indexes, my problem was resolved.
reindex sqlite_autoindex_MyTableName_1;
reindex MyOtherTableAndOrIndexName_1;
reindex sqlite_autoindex_MyOtherTableAndOrIndexName_2;
After re-indexing, the integrity check resulted "ok".
I've got this error last year, and I was restored the DB from the backup, and then re-committed all the changes, which was a real nightmare...
Check your local machine space where you are trying to checkout data. In my case my c drive don't have space for complete checkout so that error was coming :)
no need to worry for a directory lock guys.
Just you need to do is,
If sqllite3 is not installed, type below command,
>sudo apt-get install sqlite3
Open SVN database by typing this command,
>sqlite3 .svn/wc.db
Now just you need to do is to remove locks entries from SVN DB.
sqlite> select * from wc_lock;
1|-1
sqlite> delete from wc_lock;
sqlite> select * from wc_lock;
sqlite> .q
Process Completed. You can work on your SVN repository, do commit, update, add, remove operations without issue.
:-)
During app development I found that the messages come from the frequent and massive INSERT and UPDATE operations.
Make sure to INSERT and UPDATE multiple rows or data in one single operation.
var updateStatementString : String! = ""
for item in cardids {
let newstring = "UPDATE "+TABLE_NAME+" SET pendingImages = '\(pendingImage)\' WHERE cardId = '\(item)\';"
updateStatementString.append(newstring)
}
print(updateStatementString)
let results = dbManager.sharedInstance.update(updateStatementString: updateStatementString)
return Int64(results)
cd to folder containing .svn
rm -rf .svn
svn co http://mon.svn/mondepot/ . --force

Resources