Is the default DVC behavior to store connection data in git? - dvc

I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config in git.
This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure username) is also stored in .dvc/config, which means it would end up in git. Making it not ideal for team collaboration scenarios.
What's even less ideal (read: really scary) is that connection strings entered using dvc remote modify blah connection_string ... also end up in .dvc/config, making them end up in git and, in the case of open source projects, making them end up in very interesting places.
Am I doing something obviously wrong? I wouldn't expect the getting started docs to go very deep into security issues, but I wouldn't expect them to store connection strings in source control either.
My base assumption is that I'm misunderstanding/misconfiguring something, I'd be curious to know what.

DVC has few "levels" of config, that can be controlled with proper flag:
--local - repository level, ignored by git by default - designated for project-scope, sensitive data
project - same as above, not ignored - designated to specify non-sensitive data (it is the default)
--global / --system - for common config for more repositories.
More information can be found in the docs.

Related

How do I reset Superset's db?

I've been testing stuff out with Superset and I think I corrupted my superset db. When I try to acess any chart i get this error:
I found a workaround to this problem, by searching with ag - the silver searcher which individual migration dropped the dbs.perm table, and using the command
superset db downgrade <migration-id>
on the migration prior to that one.
It's still not very clear to me which steps I would take as to completely reset the db safely.
I have the manual, dev installation since I'm working on customizing the code. Let's say I didn't have anything too important in the db, so I'm not afraid to loose tables, users, perms, etc.
I've found I have a superset.db in ~/.superset, but I don't think deleting that will be enough, right?
How can I reset Superset's db so as to make a clean db and start over? Can I do this without losing my Superset installation, or do I need to start over completely? In any case, can you guide me through it?
You need not to reinstall everything. Just remove the ~/.superset/superset.db file and take the backup of this file before removing it just in case you want to restore it. and then run the below commands. These commands will create another database file.
Initialize the database
superset db upgrade
Create an admin user (you will be prompted to set a username, and first and last name before setting a password)
$ export FLASK_APP=superset
$ superset fab create-admin
Load some data to play with
superset load_examples
Create default roles and permissions
superset init
deleting superset.db in ~/.superset should be enough and it's the more clean way to start over. Yet note that SQLLite is not a recommended DB engine for metadata and it's support should be completely removed on the future.
I also recommend using the docker-compose provided for testing/developing on Apache Superset

When migrating from an old Artifactory instance to a new one, what is the point of copying $ARTIFACTORY_HOM/data/filestore?

Artifactory recommends the steps outlined here when moving from an old Artifactory server to a new one: https://jfrog.com/knowledge-base/what-is-the-best-way-to-migrate-a-large-artifactory-instance-with-minimal-downtime/
Under both methods it says that you're supposed to copy over $ARTIFACTORY_HOME/data/filestore, but then you just go ahead an export the old data and import it into the new instance, and in the first method you also rsync the files. This seems like you're just doing the exact same thing three times in a row. JFrog really doesn't explain why each of these steps is necessary and I don't understand what each does differently that cannot be done by the other.
When migrating Artifactory instance we need to take two things into consideration:
Artifactory Database - Contains the information about the binaries, configurations, security information (users, groups, permission targets, etc)
Artifactory Filestore - Contains all the binaries
Regardless to your questions, I would like to add that from my experience, in case of a big filestore size (500GB+) it is recommended to use a skeleton export (export the database only, without the filestore. This can be done by marking "Exclude Content" in Export System) and copy the filestore with the help of a 3rd party tool such as Rsync.
I hope this clarifies further.
The main purpose of this article is to provide a bit faster migration comparing to simple full export & import.
The idea of both methods is to select the "Exclude Content". The content we select to exclude is exactly the one that is stored in $ARTIFACTORY_HOME/data/filestore/.
The difference between the methods is that Method #1 exposes some downtime, as you will have to shut down Artifactory at a certain point, sync the diffs, and start the new one.
While method #2 exposes a bit more complexed process, that includes in-app replications to sync the diffs.
Hope that makes more sense.

Continuous deployment and db migration

This question is like " What was first, chicken or egg?".
Let's imagine we have some source code. Written using symfony or yii. It has db migration code that hadle some database changes.
Now, we have some commits that updates our code (for example new classes) and some db changes (change old columns or add new tables).
When we developing at localhost or update our dev servers it's ok to have time to stop services\any actions and update server. But when we tries to do it on production server we will crash everything for a while and this is not an option.
Why this will happen - when we pull it (git\mercurial) our code will be updated, but NOT database, and when code will be executed - it will throw exceptions of database. To fix it we should run build-in framework migrations. So in the end our server will be crashed until migrations will be called.
Code and migrations should be updated "in one time".
What is the best practice to handle it?
ADDED:
Solution like "run pull then run migrations in one call" - not an option in highload project. Because on highload even in second some entries\calls can be borken.
Stop server we cannot too.
Pulling off a zero downtime deployment can be a bit tricky and there are many ways to achieve this.
As for the database it is recommended to do changes in a backwards compatible fashion. So for example adding a nullable column or new table will not affect your existing code base and can be done safely. So if you want to add a new non-nullable column you would do it in 3 steps:
Add new column as nullable
Populate with data to make sure there are no null-values
Make the column NOT NULL
You will need a new deployment for 1 & 3 at the very least. When modifying a column it's pretty much the same, you create a new column, transfer the data over, release the code that uses the new column (optionally with the old column as fallback) and then remove the old column (plus fallback code) in a 3rd deployment.
This way you make sure that your database changes will not cause a downtime in your existing application. This takes great care and obviously requires you to have a good deployment pipeline allowing for fast releases. If it takes hours to get a release out this method will not be fun.
You could copy the database (or even the whole system), do a migration and then switch to that instance, but in most applications this is not feasible because it will make it a pain to keep both instances in sync between deployments. I cannot recommend investing too much time in that, but I might be biased from my experience.
When it comes to switching the current version of your code with a newer one you have multiple options. The fancy cloud based solutions like kubernetes make this kind of easy. You create a second cluster with your new version and then slowly route traffic from the old cluster to the new one. If you have a single server it is quite common to deploy a new release to a separate folder, do all the management tasks like warming caches and then when the release is ready to be used you switch a symlink to the newest release. Both methods require meticulous planning and tweaking if you really want them to be zero downtime. There are all kinds of thing that can cause issues like a shared cache being accidentally cleared to sessions not being transferred over correctly to the new release. Whenever something that's stored in a session changes you have to take a similar approach as with the database and basically slow move the state over to the new one while running the code or having a fallback to still handle the old data otherwise you might get errors when reading the session, causing 500 pages for your customers.
They key to deploy with as few outages and glitches as possible is good monitoring of the systems and the application to see where things go wrong during a deployment to make it more stable over time.
You can create a backup server with content that mirrors your current server. Then do some error detection.
If an error is detected on your primary server, update your DNS record to divert your traffic to your secondary server.
Once primary back up and running, traffic moves back to primary and then sync the changes in your secondary.
These are called failover servers.

SQLite: how to totally clear the shared-cache?

I'm experimenting with enabling the shared cache in a SQLite implementation I'm working on. In the actual app everything works fine, but my unit tests are now failing with "disk I/O error"s. I'm assuming this is because the shared cache is making assumptions about the file that are no longer valid once it's been deleted.
How can I clear out this shared cache data? I've tried running sqlite3_shutdown() followed by sqlite3_initialize() but the problems persist.
I've actually discovered that some of my tests weren't closing connections properly, and that was the source of my problem - shared-cache was just highlighting it (though I'm not 100% sure why).
That said, in my journey, I did manage to find a way to control where SQLite puts its temporary files. the sqlite3_temp_directory global variable lets you define it - by default it's blank and defers to the OS, I think.
If you set that directory you can manually clear out any files whenever you wish.

How to upgrade (merge) web.config with web deploy (msdeploy)?

I'm trying to set up a deployment chain for some of our ASP.NET applications. The tool of choice is Web Deploy (msdeploy) - for now. Unfortunately I'm stuck on a problem.
A high level overview of the chain is thus:
Web developer creates the code and checks it in SVN;
Buildserver sees the update and builds the msdeploy .zip package of the website;
The .zip package is automatically put inside our installer and sent to various clients;
The clients run the installer on their webserver(-s);
The installer uses msdeploy internally to deploy the .zip package and create a new website or upgrade an existing one.
Msdeploy makes it easy to deploy a new instance, but I'm stumped about how to perform an "upgrade" install. The main problem is the web.config file. Each client will most certainly have made some customizations there to suit their specific environment. The installer itself offers to set some more critical parameters at the first-time installation (achieved by msdeploy's parameter mechanism), but they can do others by hand.
On the other hand, we developers also occasionally make changes to web.config, adding some new settings or removing obsolete ones. So I can't just tell msdeploy to ignore the file entirely. I need some kind of advanced XML modification mechanism. It could be a script that the developers maintain, but then it needs to be run ONLY at upgrades, not new installs.
I've no idea how to accomplish this.
Besides that, sometimes there's also some completely weird upgrade logic. For example, the application comes with our company logo, but some clients have replaced that .png file to show their own logo. Recently we needed to update the logo - but only for clients that hadn't replaced it with their own.
Similarly, there might be some cache folders that might need to be cleaned at SOME upgrades but not at others. Or folders with user content that may not be touched (but come with default content at the initial installation). Etc.
How do you normally achieve this dual behavior for msdeploy packages? Do I really need to create 2 distinct packages for every application?
Suggestion from personal experience:
Isolate customisations
Your customers should have the ability to customise their set up and the best way is to provide them with something like an override file. That way you install the new package and follow by superimposing your customer's customisations on top of your standard setup. If its a brand new install then there will be nothing to superimpose.
> top-level --
> standard files |
images | This will never be touched or changed by customer
settings.txt |
__
> customer files --
images | Customer hacks this to their heart's content
settings.txt_override |
--
Yes, this does mean that some kind of merging process needs to happen and there needs to be some script that does that but this approach has several advantages.
For settings that suddenly become redundant just issue a warning to that effect
If a customer has their own logo provide the ability to specify this in the override file
The message is clear to customers. Stay off standard files.
If customers request more customisable settings then write the default if it does not exist into the override file during upgrades.
Vilx, in answer to your question, the logic for knowing whether it is an upgrade or not must be contained in the script itself.
To run an upgrade script before installation
msdeploy -verb:sync -source:contentPath="C:\Test1" -dest:contentPath="C:\Test2" -preSync:runcommand="c:\UpgradeScript.bat"
Or to run an upgrade script after installation
msdeploy -verb:sync -source:contentPath="C:\Test1" -dest:contentPath="C:\Test2" -postSync:runcommand="c:\UpgradeScript.bat"
More info here
As to how you know its an upgrade your script could check for a text file called "version.txt" and if it exists the upgrade bat script will run. Version to be contained within the text file. Bit basic but it should work.
This also has the added advantage of giving you the ability of more elegantly merging customer's custom settings between versions as you know which properties could be overriden for that particular version.
There are some general suggestions (not specific to msdeploy), but I hope that helps:
I think you'll need to provide several installers anyway: for the initial setup and for each version-to-version upgrade.
I would suggest to let your clients to merge the config files themselves. You could just provide them either detailed desciption of waht was added/changed/removed, and/or include the utility that simplifies the merge. Maybe this and this links will give you some pointers.
as for merging the replaced logos, other client's customization, I think the best approach would be to support branding your application. I mean - move all branding details to the place where your new/upgrade installers won't touch that.
as for the rest of the adjustments made by your clients, they do that on their own risk, so the only help you could provide them is to include the detailed list of changes (maybe even the list of changed files since the previous version) and the How-To article about merging the sources with tools like Araxis Merge or similar
Or.. you could create a utility and include it to the installer, which will try to do all the tricky merging stuff on client's machine. I would not recommend this way as it requires a lot of efforts/resources to maintain.
One more thing: you could focus on backup-ing the previous client copy before upgrade. So even client will have troubles with upgrading - that will be always possible to roll back. The only thing here for you is to provide a good feedback channel which your clients can use to shoot their troubles. This feedback will allow you to figure out what the troubles your clients have and how to make their upgrade process more comfortable.
I would build on what the above have said, but I would do it with transformations, and strict documentation about who configures what. The way you have it now relies on customer intervention against a config that is mission critical to the app deploy process.
Create three config file areas. One for development, one for the "production generic" build, and one that is an empty template for the customer to edit.
The development instance should be self explanatory. This is the transform that takes the production generic template and creates a web config for your development server. (it sounds like you are shooting for a CI type process here)
The "production generic" transform should set the app up for a hypothetically perfect instance of the app. This is what the install would look like if the architect had his way.
The customer transform is used by the customers to set up the web config as required to meet their own needs. Write some documentation and see what happens. Edit the docs as you help customers through the process.
It that what you were looking for? Thoughts?

Resources