svn2git large repository migration

svn2git large repository migration - svn2git

I tried svn2git to test a migration from svn to git, like this:
svn2git https://my-svn.net/project/ --username $USERNAME
I execute this command where git is installed. I perform this in /home/myuser/svn2git/. After some waiting the execution of the commands end:
...
A src/xx
A src/xx
A src/xx
A src/xx
I do not see the repository getting downloaded to the location where i am running this command from. After the command runs fine, If i commit and push to GIT i do not see anything getting committed. I see .git folder getting created and on the display with verbose i can see all my history.
Can anyone tell me what is incorrect here?

For a one-time migration git-svn is not the right tool for conversions of repositories or parts of a repository. It is a great tool if you want to use Git as frontend for an existing SVN server, but for one-time conversions you should not use git-svn, but svn2git which is much more suited for this use-case.
There are plenty tools called svn2git, the probably best one is the KDE one from https://github.com/svn-all-fast-export/svn2git. I strongly recommend using that svn2git tool. It is the best I know available out there and it is very flexible in what you can do with its rules files.
The svn2git tool you used (the nirvdrum one) is based on git-svn and just tries to work around some of the quirks and drawbacks. In thus that svn2git tool suffers from most of the same drawbacks.
You will be easily able to configure svn2gits rule file to produce the result you want from your current SVN layout, including any complex histories that might exist and including producing several Git repos out of one SVN repo or combining different SVN repos into one Git repo cleanly in one run if you like and also excluding any paths or branches or tags you don't to have migrated, though I'm always crying a little if someone discards code history which is precious.
If you are not 100% about the history of your repository, svneverever from http://blog.hartwork.org/?p=763 is a great tool to investigate the history of an SVN repository when migrating it to Git.
Even though git-svn or the nirvdrum svn2git is easier to start with, here are some further reasons why using the KDE svn2git instead of git-svn is superior, besides its flexibility:
the history is rebuilt much better and cleaner by svn2git (if the correct one is used), this is especially the case for more complex histories with branches and merges and so on
the tags are real tags and not branches in Git
you can generate annotated tags instead of lightweight tags
with git-svn the tags contain an extra empty commit which also makes them not part of the branches, so a normal fetch will not get them until you give --tags to the command as by default only tags pointing to fetched branches are fetched also. With the proper svn2git tags are where they belong
if you changed layout in SVN you can easily configure this with svn2git, with git-svn you will loose history eventually
with svn2git you can also split one SVN repository into multiple Git repositories easily
or combine multiple SVN repositories in the same SVN root into one Git repository easily
the conversion is a gazillion times faster with the correct svn2git than with git-svn
You see, there are many reasons why git-svn is worse and the KDE svn2git is superior. :-)

Related

In GitAhead, is there a way to delete a remote branch when I already deleted the corresponding local branch?

I've already deleted a local branch without deleting its upstream branch on a GitHub.
Is there a way to delete the remote branch in a GitAhead?
In Sourcetree you just right click on the remote branch and choose delete.

Unfortunately no, GitAhead doesn't have an easy way to push a delete except for the little convenience checkmark when you delete the local branch. You would have to resort to the command line or doing it on your remote host.

This is a major design flaw in Git (in my opinion). The branching concept in mercurial is much more sane (no detached heads, named branches, no ability to delete branches - at least not if it was published once).
think about it: it is a versioning control system. What you want is to preserve and document development history. Someone has created and published a branch for purpose. So even if git allows manipulation of the repository much more than mercurial (and here most abilities are disabled by default!), just do not use it. leave the branch! Its OK. Its the way it should be. Anyway, as git is lean here, its just a pointer (not a real named branch like in hg), it does not take much space.

R Studio - Cloning local repository

I want to create a master repository on our server, from which I can clone a local version onto my computer.
I am using R Studio v0.98.994.
So far, this is what I have tried doing:
Create a folder for the master repository to live in. I do this using 'new project' in R studio, and tell it to make a git repository.
I can then open up another new project, located on my C drive, and use R studio to clone, by telling it to open an existing project and setting the URL as the location of the master project.
However, then when I make changes and commit to my local repository (which works fine) I cannot push to the master repository, I get an error exactly as described in this question: git push fails: `refusing to update checked out branch: refs/heads/master`
So it appears that R Studio creates non-bare repositories?
Now I thought, well okay, I will use git bash to initialise the repository and then connect to that within R studio.
I do so, but cannot then find a way to use that repository in R Studio.
I am very new to Git, so it is entirely probable that this is one of those 'read the instructions' questions, in which case I am very sorry - and could someone possibly point me towards some guidance for this situation? I have spent the better half of a day googling around this error and haven't yet managed to pull together the pieces :( I also apologise; this doesn't feel like a very reproducible question.

It sounds like you are using Windows Git, with a setup on a local Windows machine (C: drive) and a server of some kind, mounted as the S: drive. There's a few things you should be aware of when doing this.
Shared Repositories
If you are intending for multiple people to share the same repository, you want to initiate a shared repository. See the --shared option in git-init for more details. Note that I'm not sure how having your repository on a Windows machine affects the sharing options. If you are just trying to keep your repository in two places, that makes things a lot easier.
Bare Repositories
Separate from the discussion of sharing is the discussion of bare repositories. If you don't intend to ever work with files in the server (i.e. it's just going to be a place to push changes so they are safely stored), you could initialize a bare repository. A bare repository contains the database structure of Git, but does not have the actual files in the directory.
A standard Git repository is a directory with a hidden folder in it named .git. This .git folder contains all the various data structures that Git uses to track changes. A bare repository is essentially a folder containing only the contents of .git.
The good thing about a bare repository is that no one can work in the repository itself (since there is no working directory, just the database). This means that no one could log into S: and edit the repository themselves. Instead, they would have to clone the repository, then push their changes back to the origin. The GitGuys have a good article about why this is ideal.
Note that shared repos and bare repos are not dependent or mutually exclusive. As a general practice, if you are having a "server repo" from which you pull and to which you push, you should have it be bare, regardless of whether the project is shared.
A Non-Shared Workflow
Since it's not clear if you are sharing or not sharing and you're on a Windows environment, which I don't know about from a sharing standpoint, I'm going to give you a simple example. Using git-bash, you should be able to change directories to wherever on S: you have your repositories. Then, use git init with the bare options as described by the link above to initialize a bare repository. Navigate to where you want your repository to live on C:, and then do git clone to get a working copy.
Add a README file or something else so you can do your initial commit, and then commit and do git push origin master to push your changes to the S: repository. Once all that is done, THEN initialize the RStudio Git project. RStudio should defer to your existing configuration, and things should hopefully work.

Single git repo setup tracking multiple locations on hard drive

I'm very new to the world of git (done some svn in the past) and would like some advice on trying to accomplish the following.
My current workflow is that I setup the static html files using Middleman to get the base HTML structure and styles before porting over to a Wordpress template. These static files are located at C:/git/project-name/HTMLTemplates.
My wordpress setup uses Xampp so the theme files are kept in C:/Xampp/wordpress/wp-content/themes/project-theme.
What I would like to do is have a single git repo that tracks the changes of the two different locations (HTMLTemplates and project-theme)
Is this at all possible, or do I simply create two individual repos (eg: proect-static and project-wordpress)?

No, there is no mechanism in git for this. Git assumes that all files that it manages (the "working copy") live in a single directory (and subdirectories); there is no support for managing two separate directories in in repo.
So you'll have to somehow keep everything in one directory, probably as subdirectories HTMLTemplates and theme or similar.
You could use two git repos, but I'd strongly advise against this. A single repo should contain a whole "project", i.e. everything needed to build one piece of software (excluding things like external libraries). If you split your project across two repositories, you cannot usefully branch and merge (because you'd have to do it in both repos simultaneously), you cannot easily check out old versions etc..
To solve your problem, I see a few possible solutions:
Have some build / deployment script that copies everything to the right places. You probably alread have a script that invokes Middleman, and possibly tells Wordpress to refresh its cache, so you could add it there.
Set up a symbolic link for the wordpress directory. On UNIX-like systems this is easy and commonly done. On Windows, you can create "junction points", which I believe work similarly.
Configure Wordpress / Apache to read the directory directly from your git working copy. The path should be configurable.
I would prefer the first solution; this has the added advantage that it will decouple your development environment from the server configuration. This will make it easier if your setup later changes or your project needs to run in a different environment (development on a different machine, someone else also wants to work on your project, you want to deploy to a hosted server somewhere etc.).
Note: The problem is, I believe, that your are trying to use git as a deployment tool. While many people do this, git is not really suitable for this purpose. Deployment should usually be a separate step.

Release Process Improvements

The process of creating a new build and releasing it to production is a critical step in the SDLC but it is often left as an afterthought and varies greatly from one company to the next.
I'm hoping people will share improvements they have made to this process in their organisation so we can all takes steps to 'reduce the pain'.
So the question is, specify one painful/time consuming part of your release process and what did you do to improve it?
My example: at a previous employer all developers made database changes on one common development database. Then when it came to release time, we used Redgate's SQL Compare to generate a huge script from the differences between the Dev and QA databases.
This works reasonably well but the problems with this approach are:-
ALL changes in the Dev database are included, some of which may still be 'works in progress'.
Sometimes developers made conflicting changes (that were not noticed until the release was in production)
It was a time consuming and manual process to create and validate the script (by validate I mean, try to weed out issues like problem 1 and 2).
When there were problems with the script (eg the order in which things were run such as creating a record which relies on a foreign key record which is in the script but not yet run) it took time to 'tweak' it so it ran smoothly.
It's not an ideal scenario for Continuous Integration.
So the solution was:-
Enforce a policy of all changes to the database must be scripted.
A naming convention was important for ensuring the correct running order of the scripts.
Create/Use a tool to run the scripts at release time.
Developers had their own copy of the database do develop against (so there was no more 'stepping on each others toes')
The next release after we started this process was much faster with fewer problems, indeed the only problems found were due to people 'breaking the rules', eg not creating a script.
Once the issues with releasing to QA were fixed, when it came time to release to production it was very smooth.
We applied a few other changes (like introducing CI) but this was the most significant, overall we reduced release time from around 3 hours down to a max of 10-15 minutes.

We've done a few things over the past year or so to improve our build process.
Fully automated and complete build. We've always had a nightly "build" but we found that there are different definitions for what constitutes a build. Some would consider it compiling, usually people include unit tests, and sometimes other things. We clarified internally that our automated build literally does everything required to go from source control to what we deliver to the customer. The more we automated various parts, the better the process is and less we have to do manually when it's time to release (and less worries about forgetting something). For example, our build version stamps everything with svn revision number, compiles the various application parts done in a few different languages, runs unit tests, copies the compile outputs to appropriate directories for creating our installer, creates the actual installer, copies the installer to our test network, runs the installer on the test machines, and verifies the new version was properly installed.
Delay between code complete and release. Over time we've gradually increased the amount of delay between when we finish coding for a particular release and when that release gets to customers. This provides more dedicated time for testers to test a product that isn't changing much and produces more stable production releases. Source control branch/merge is very important here so the dev team can work on the next version while testers are still working on the last release.
Branch owner. Once we've branched our code to create a release branch and then continued working on trunk for the following release, we assign a single rotating release branch owner that is responsible for verifying all fixes applied to the branch. Every single check-in, regardless of size, must be reviewed by two devs.

We were already using TeamCity (an excellent continuous integration tool) to do our builds, which included unit tests. There were three big improvements were mentioning:
1) Install kit and one-click UAT deployments
We packaged our app as an install kit using NSIS (not an MSI, which was so much more complicated and unnecessary for our needs). This install kit did everything necessary, like stop IIS, copy the files, put configuration files in the right places, restart IIS, etc. We then created a TeamCity build configuration which ran that install kit remotely on the test server using psexec.
This allowed our testers to do UAT deployments themselves, as long as they didn't contain database changes - but those were much rarer than code changes.
Production deployments were, of course, more involved and we couldn't automate them this much, but we still used the same install kit, which helped to ensure consistency between UAT and production. If anything was missing or not copied to the right place it was usually picked up in UAT.
2) Automating database deployments
Deploying database changes was a big problem as well. We were already scripting all DB changes, but there were still problems in knowing which scripts were already run and which still needed to be run and in what order. We looked at several tools for this, but ended up rolling our own.
DB scripts were organised in a directory structure by the release number. In addition to the scripts developers were required to add the filename of a script to a text file, one filename per line, which specified the correct order. We wrote a command-line tool which processed this file and executed the scripts against a given DB. It also recorded which scripts it had run (and when) in a special table in the DB and next time it did not run those again. This means that a developer could simply add a DB script, add its name to the text file and run the tool against the UAT DB without running around asking others what scripts they last ran. We used the same tool in production, but of course it was only run once per release.
The extra step that really made this work well is running the DB deployment as part of the build. Our unit tests ran against a real DB (a very small one, with minimal data). The build script would restore a backup of the DB from the previous release and then run all the scripts for the current release and take a new backup. (In practice it was a little more complicated, because we also had patch releases and the backup was only done for full releases, but the tool was smart enough to handle that.) This ensured that the DB scripts were tested together at every build and if developers made conflicting schema changes it would be picked up quickly.
The only manual steps were at release time: we incremented the release number on the build server and copied the "current DB" backup to make it the "last release" backup. Apart from that we no longer had to worry about the DB used by the build. The UAT database still occasionally had to be restored from backup (eg. since the system couldn't undo the changes for a deleted DB script), but that was fairly rare.
3) Branching for a release
It sounds basic and almost not worth mentioning, yet we weren't doing this to begin with. Merging back changes can certainly be a pain, but not as much of a pain as having a single codebase for today's release and next month's! We also got the person who made the most changes on the release branches to do the merge, which served to remind everyone to keep their release branch commits to an absolute minimum.

Automate your release process whereever possible.
As others have hinted, use different levels of build "depth". For instance a developer build could make all binaries for runnning your product on the dev machine, directly from the repository while an installer build could assemble everything for installation on a new machine.
This could include
binaries,
JAR/WAR archives,
default configuration files,
database scheme installation scripts,
database migration scripts,
OS configuration scripts,
man/hlp pages,
HTML documentation,
PDF documentation
and so on. The installer build can stuff all this into an installable package (InstallShield, ZIP, RPM or whatever) and even build the CD ISOs for physical distribution.
The output of the installer build is what is typically handed over to the test department. Whatever is not included in the installation package (patch on top of the installation...) is a bug. Challenge your devs to deliver a fault free installation procedure.

Automated single step build. The ant build script edits all the installer configuration files, program files that need changed ( versioning) and then builds. No intervention required.
There is still a script run to generate the installers when it's done, but we will eliminate that.
The CD artwork is versioned manually; that needs fixed too.

Agree with previous comments.
Here is what has evolved where I work. This current process has eliminated the 'gotchas' that you've described in your question.
We use ant to pull code from svn (by tag version) and pull in dependencies and build the project (and at times, also to deploy).
Same ant script (passing params) is used for each env (dev, integration, test, prod).
Project process
Capturing requirements as user 'stories' (helps avoid quibbling over an interpretation of a requirement, when phrased as a meaningful user interaction with the product)
following an Agile principles so that each iteration of the project (2 wks) results in demo of current functionality and a releasable, if limited, product
manage release stories throughout the project to understand what is in and out of scope (and prevent confusion abut last minute fixes)
(repeat of previous response) Code freeze, then only test (no added features)
Dev process
unit tests
code checkins
scheduled automated builds (cruise control, for example)
complete a build/deploy to an integration environment, and runs smoke test
tag the code and communicate to team (for testing and release planning)
Test process
functional testing (selenium, for example)
executing test plans and functional scenarios
One person manages the release process, and ensures everyone complies. Additionally all releases are reviewed a week before launch. Releases are only approved if there are:
Release Process
Approve release for a specific date/time
Review release/rollback plan
run ant with 'production deployment' parameter
execute DB tasks (if any) (also, these scripts can be version and tagged for production)
execute other system changes / configs
communicate changes

I don't know or practice SDLC, but for me, these tools have been indispensible in achieving smooth releases:
Maven for build, with Nexus local repository manager
Hudson for continuous integration, release builds, SCM tagging and build promotion
Sonar for quality metrics.
Tracking database changes to development db schema and managing updates to qa and release via DbMaintain and LiquiBase

On a project where I work we were using Doctrine's (PHP ORM) migrations to upgrade and downgrade the database. We had all manner of problems as the generated models no longer matched with the database schema causing the migrations to completely fail half way.
In the end we decided to write our own super basic version of the same thing - nothing fancy, just up's and down's that execute SQL. Anyway it worked out great (so far - touch wood). Although we were reinventing the wheel slightly by writing our own, the fact that the focus was on keeping it simple meant that we have far less problems. Now a release is a cinch.
I guess the moral of the story here is that it is sometimes OK to reinvent the wheel some times as long as you are doing so for a good reason.

Improving Your Build Process

Or, actually establishing a build process when there isn't much of one in place to begin with.
Currently, that's pretty much the situation my group faces. We do web-app development primarily (but no desktop development at this time). Software deployments are ugly and unwieldy even with our modest apps, and we've had far too many issues crop up in the two years I have been a part of this team (and company). It's past time to do something about that, and the upshot is that we'll be able to kill two Joel Test birds with one stone (daily builds and one-step builds, neither of which exists in any form whatsoever).
What I'm after here is some general insight on the kinds of things I need to be doing or thinking about, from people who have been in software development for longer than I have and also have bigger brains. I'm confident that will be most of the people currently posting in the beta.
Relevant Tools:
Visual Build
Source Safe 6.0 (I know, but I can't do anything about whether or not we use Source Safe at this time. That might be the next battle I fight.)
Tentatively, I've got a Visual Build project that does this:
Get source and place in local directory, including necessary DLLs needed for project.
Get config files and rename as needed (we're storing them in a special sub directory that isn't part of the actual application, and they are named according to use).
Build using Visual Studio
Precompile using command line, copying into what will be a "build" directory
Copy to destination.
Get any necessary additional resources - mostly things like documents, images, and reports that are associated with the project (and put into directory from step 5). There's a lot of this stuff, and I didn't want to include it previously. However, I'm going to only copy changed items, so maybe it's irrelevant. I wasn't sure whether I really wanted to include this stuff in earlier steps.
I still need to coax some logging out of Visual Build for all of this, but I'm not at a point where I need to do that yet.
Does anyone have any advice or suggestions to make? We're not currently using a Deployment Project, I'll note. It would remove some of the steps necessary in this build I presume (like web.config swapping).

When taking on a project that has never had an automated build process, it is easier to take it in steps. Do not try to swallow to much at one time, otherwise it can feel overwhelming.
First get your code compiling with one step using an automated build program (i.e. nant/msbuild). I am not going to debate which one is better. Find one that feels comfortable to you and use it. Have the build scripts live with the project in source control.
Figure out how you want your automated build to be triggered. Whether it is hooking it up to CruiseControl or running a nightly build task using Scheduled Tasks. CruiseControl or TeamCity is probably the best choice for this, because they include a lot of tools you can use to make this step easier. CruiseControl is free and TeamCity is free to a point, where you might have to pay for it depending on how big the project is.
Ok, by this point you will be pretty comfortable with the tools. Now you are ready to add more tasks based on what you want to do for testing, deployment, and etc...
Hope this helps.

I have a set of Powershell scripts that do all of this for me.
Script 1: Build - this one is simple, it is mostly handled by a call to msbuild, and also it creates my database scripts.
Script 2: Package - This one takes various arguments to package a release for various environments, such as test, and subsets of the production environment, which consists of many machines.
Script 3: Deploy - This is run on each individual machine from within the folder created by the Package script (the Deploy script is copied in as a part of packaging)
From the deploy script, I do sanity checks on things like the machine name so things don't accidentally get deployed to the wrong place.
For web.config files, I use the
<appSettings file="Local.config">
feature to have overrides that are already on the production machines, and they are read-only so they don't accidentally get written over. The Local.config files are not checked in, and I don't have to do any file switching at build time.
[Edit] The equivalent of appSettings file= for a config section is configSource="Local.config"

We switched from using a perl script to MSBuild two years ago and haven't looked back.
Building visual studio solutions can be done by just specifying them in the main xml file.
For anything more complicated (getting your source code, executing unit tests, building install packages, deploying web sites) you can just create a new class in .net deriving from Task that overrides the Execute function, and then reference this from your build xml file.
There is a pretty good introduction here:
introduction

I've only worked on a couple of .Net projects (I've done mostly Java) but one thing I would recommend is using a tool like NAnt. I have a real problem with coupling my build to the IDE, it ends up making it a real pain to set up build servers down the road since you have to go do a full VS install on any box that you want to build from in the future.
That being said, any automated build is better than no automated build.

Our build process is a bunch of homegrown Perl scripts that have evolved over a decade or so, nothing fancy but it gets the job done. One script gets the latest source code, another builds it, a third stages it to a network location. We do desktop application development so our staging process also builds install packages for testing and eventually shipping to customers.
I suggest you break it down to individual steps because there will be times when you want to rebuild but not get latest, or maybe just need to re-stage. Our scripts can also handle building from different branches so consider that also with whatever solution you develop.
Finally we have a dedicated build machine that rebuilds the trunk and maintenance branches every night and sends out an email with any problems or if it completed successfully.

One thing I would suggest ensure your build script (and installer project, if relevant in your case) is in source control. I tend to have a very simple script that just checks out\gets latest the "main" build script then launches it.
I say this b/c I see teams just running the latest version of the build script on the server but either never putting it in source control or when they do they only check it in on a random basis. If you make the build process to "get" from source control it will force you to keep the latest and greatest build script in there.

Our build system is a makefile (or two). It has been rather fun getting it working as it needs to run on both windows (as a build task under VS) and under Linux (as a normal "make bla" task). The really fun thing is that the build gets the actual file list from a .csproj file, builds (another) makefile from that, and run that. In the processes the make file actually calls it's self.
If that thought doesn't scare the reader, then (either they are crazy or) they can probably get make + "your favorite string mangler" to work for them.

We use UppercuT.
UppercuT uses NAnt to build and it is extremely easy to use.
http://code.google.com/p/uppercut/
Some good explanations here: UppercuT

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex