Retaining svn copy history when converting to git using kde svn2git tool - svn2git

My question is very similar to Retaining svn copy history when converting to git except that I am using the free kde svn2git tool
To summarize:
I'm trying to convert a single SVN repo over to a single git repo.
After migration of the svn repository to git I have noticed that git does not seem to follow svn copy operations so the resulting history is much briefer than I expect. Suppose my SVN repo looked like this:
root
a
b
c
parent project
b
c
Projects b and c were recently copied under parent-proj as part of a restructuring effort with the intention of eventually deleting them from their old locations under root. After migration the resulting git repo is missing all of the history that originated from /b and /c before the move.
I am using the kde svn2git tool which can be found at https://github.com/svn-all-fast-export/svn2git
I tried to look at the sample rules files and documentation for this tool but could not find any information on how to achieve this using this tool.

In my experience the situation you describe is a frequent occurance in SVN. People create an 'arbitrary trunk'.
Ideally projects should follow the Subversion conventions. That means that each project will have a subfolder called 'trunk, branches, tags'. Then when migrating with svn2git all you need is a rule for those three folders:
create repository fw
end repository
match /Project/trunk/
repository fw
branch master
end match
match /Project/branches/([^/]+)/
repository fw
branch \1
end match
match /Project/tags/([^/]+)/
repository fw
branch refs/tags/\1
end match
Unfortunately, if you are migrating a repository which didn't strictly follow the convention, then you can have arbitrary trunks, which means a completely separate history. So you need a separate rule, for example you might have a rule like this:
match /Project/trunk/subdir1/subdir2/myproj/trunk
repository fw
branch master
end match
You may need to write a script examine the svn paths, which generate rules based on paths discovered.

Related

svn2git large repository migration

I tried svn2git to test a migration from svn to git, like this:
svn2git https://my-svn.net/project/ --username $USERNAME
I execute this command where git is installed. I perform this in /home/myuser/svn2git/. After some waiting the execution of the commands end:
...
A src/xx
A src/xx
A src/xx
A src/xx
I do not see the repository getting downloaded to the location where i am running this command from. After the command runs fine, If i commit and push to GIT i do not see anything getting committed. I see .git folder getting created and on the display with verbose i can see all my history.
Can anyone tell me what is incorrect here?
For a one-time migration git-svn is not the right tool for conversions of repositories or parts of a repository. It is a great tool if you want to use Git as frontend for an existing SVN server, but for one-time conversions you should not use git-svn, but svn2git which is much more suited for this use-case.
There are plenty tools called svn2git, the probably best one is the KDE one from https://github.com/svn-all-fast-export/svn2git. I strongly recommend using that svn2git tool. It is the best I know available out there and it is very flexible in what you can do with its rules files.
The svn2git tool you used (the nirvdrum one) is based on git-svn and just tries to work around some of the quirks and drawbacks. In thus that svn2git tool suffers from most of the same drawbacks.
You will be easily able to configure svn2gits rule file to produce the result you want from your current SVN layout, including any complex histories that might exist and including producing several Git repos out of one SVN repo or combining different SVN repos into one Git repo cleanly in one run if you like and also excluding any paths or branches or tags you don't to have migrated, though I'm always crying a little if someone discards code history which is precious.
If you are not 100% about the history of your repository, svneverever from http://blog.hartwork.org/?p=763 is a great tool to investigate the history of an SVN repository when migrating it to Git.
Even though git-svn or the nirvdrum svn2git is easier to start with, here are some further reasons why using the KDE svn2git instead of git-svn is superior, besides its flexibility:
the history is rebuilt much better and cleaner by svn2git (if the correct one is used), this is especially the case for more complex histories with branches and merges and so on
the tags are real tags and not branches in Git
you can generate annotated tags instead of lightweight tags
with git-svn the tags contain an extra empty commit which also makes them not part of the branches, so a normal fetch will not get them until you give --tags to the command as by default only tags pointing to fetched branches are fetched also. With the proper svn2git tags are where they belong
if you changed layout in SVN you can easily configure this with svn2git, with git-svn you will loose history eventually
with svn2git you can also split one SVN repository into multiple Git repositories easily
or combine multiple SVN repositories in the same SVN root into one Git repository easily
the conversion is a gazillion times faster with the correct svn2git than with git-svn
You see, there are many reasons why git-svn is worse and the KDE svn2git is superior. :-)

R: RStudio: How to check out an existing branch, modify it and commit back to GitHub (Windows machine)

I have followed every advice on http://r-pkgs.had.co.nz/git.html and on the subsection http://r-pkgs.had.co.nz/git.html#git-branch and I am still getting error.
The steps I need/did (different from what Hadley's page dictates).
grab URL of GitHub repo (e.g, https://github.com/OHDSI/Achilles.git )
create versioned project in RStudio with this URL
set up my global user names for git
select a dev branch here (for example devXYZ)
At this point I got "detached at origin/devXYZ) message.
Per instructions in Hadley book - I tried to do fix this using this command
git push --set-upstream origin devXYZ
but it fails. The error is: origin does not appear to be a git repository or src refspec devXYZ does not match any
I tried fixing it with doing this command (may be wrong)
git remote add origin https://github.com/OHDSI/Achilles.git
I am using windows, latest R, latest RStudio, latest git from https://git-scm.com/download/win
EDIT: I also tried making a new branch using the recommended mechanism but it also fails. The goal is to get instructions where there is not git init and the whole process starts with an URL and new project in RStudio.
The desired future steps to work would be 5. modify and commit into the devXYZ branch.
THIS ONLY APPLIES TO NON-MASTER BRANCHES:
If you are newbie to git - simply don't try to do the git part in R at all.
Instead, use GitHub Desktop or SourceTree.
Point that tool to the desired repo, switch to desired branch
Start RStudio and do any development
Close RStudio and use that external tool to perform any git steps.
FOR MASTER BRANCHES:
integrated RStudio git implementation works great.
I think I might know what the problem is. You're trying to push directly to the main repo. I'm guessing you're not one of the main contributors for that repo so it won't allow you to create a branch there directly. I'm guessing in that book he's probably using his own repository as an example rather than using an existing one
The reason you're getting that error is because that branch doesn't exist on the remote repo so it can't get the reference to it which is inferred from this src refspec devXYZ does not match any
The preferred workflow is to work on a fork of the main repo (basically its your own personal copy of the main repo that is stored on the server). Even if you end up as a contributor at some point I think this is a good workflow to follow
Here's a good explanation on how use the fork workflow. There's other information on stackoverflow as well
Once you've made updates you'd create what's called a pull request to the original repo (commonly referred to as upstream). This basically is a request to merge your changes from the fork into the main repo. This allows the repo owner to review the changes and decide whether to accept them or make changes
Since you're just going over a tutorial I'd say use your fork as the origin wherever its used in the book for now

R Studio - Cloning local repository

I want to create a master repository on our server, from which I can clone a local version onto my computer.
I am using R Studio v0.98.994.
So far, this is what I have tried doing:
Create a folder for the master repository to live in. I do this using 'new project' in R studio, and tell it to make a git repository.
I can then open up another new project, located on my C drive, and use R studio to clone, by telling it to open an existing project and setting the URL as the location of the master project.
However, then when I make changes and commit to my local repository (which works fine) I cannot push to the master repository, I get an error exactly as described in this question: git push fails: `refusing to update checked out branch: refs/heads/master`
So it appears that R Studio creates non-bare repositories?
Now I thought, well okay, I will use git bash to initialise the repository and then connect to that within R studio.
I do so, but cannot then find a way to use that repository in R Studio.
I am very new to Git, so it is entirely probable that this is one of those 'read the instructions' questions, in which case I am very sorry - and could someone possibly point me towards some guidance for this situation? I have spent the better half of a day googling around this error and haven't yet managed to pull together the pieces :( I also apologise; this doesn't feel like a very reproducible question.
It sounds like you are using Windows Git, with a setup on a local Windows machine (C: drive) and a server of some kind, mounted as the S: drive. There's a few things you should be aware of when doing this.
Shared Repositories
If you are intending for multiple people to share the same repository, you want to initiate a shared repository. See the --shared option in git-init for more details. Note that I'm not sure how having your repository on a Windows machine affects the sharing options. If you are just trying to keep your repository in two places, that makes things a lot easier.
Bare Repositories
Separate from the discussion of sharing is the discussion of bare repositories. If you don't intend to ever work with files in the server (i.e. it's just going to be a place to push changes so they are safely stored), you could initialize a bare repository. A bare repository contains the database structure of Git, but does not have the actual files in the directory.
A standard Git repository is a directory with a hidden folder in it named .git. This .git folder contains all the various data structures that Git uses to track changes. A bare repository is essentially a folder containing only the contents of .git.
The good thing about a bare repository is that no one can work in the repository itself (since there is no working directory, just the database). This means that no one could log into S: and edit the repository themselves. Instead, they would have to clone the repository, then push their changes back to the origin. The GitGuys have a good article about why this is ideal.
Note that shared repos and bare repos are not dependent or mutually exclusive. As a general practice, if you are having a "server repo" from which you pull and to which you push, you should have it be bare, regardless of whether the project is shared.
A Non-Shared Workflow
Since it's not clear if you are sharing or not sharing and you're on a Windows environment, which I don't know about from a sharing standpoint, I'm going to give you a simple example. Using git-bash, you should be able to change directories to wherever on S: you have your repositories. Then, use git init with the bare options as described by the link above to initialize a bare repository. Navigate to where you want your repository to live on C:, and then do git clone to get a working copy.
Add a README file or something else so you can do your initial commit, and then commit and do git push origin master to push your changes to the S: repository. Once all that is done, THEN initialize the RStudio Git project. RStudio should defer to your existing configuration, and things should hopefully work.

Use Fossil for system files?

As a new user of Fossil, I'm curious if there are any negative implications with using Fossil to store things like /etc/, /usr/local/etc files from Unix like systems like FreeBSD & OpenBSD. If I'm doing this for multiple systems, I think I'd create a branch with each hostname to track those files.
Q1: Have you done this? Do you prefer a different VCS to handle the system files?
Q2: Lots of changes have happened in Fossil over the years and I'm curious if it's possible to restrict who can merge branches with trunk. From reading earlier threads it wasn't possible but there are two workarounds:
a) tell people not to merge to trunk
b) have people clone and trunk maintainer pick up changes from their repo
System configuration files stored in /etc, /var or /usr/local/etc can generally only edited by the root user. But since root has complete access to the whole system, a mistaken command there can have dire consequences.
For that reason I generally use another location to keep edited configuration files, a directory in my home-directory that I call setup, which is under control of git. Since I have multiple machines running FreeBSD, each machine gets its own subdirectory. There is a special subdirectory of setup called shared for those configuration files that are used on multiple machines. Maintaining multiple copies of identical files in separate repositories or even branches can be a lot of extra work.
My workflow is the following;
Edit a configuration file in my repository.
Copy it to its proper location.
Test the changes. If problems occur, go back to step 1.
Commit the changes to the revision control system. Copy the
committed files to their proper location.
Initially I had a shell script (basically a list of install commands) to install the files for me. But I also wanted to see the differences between the working tree and the installed files.
So for my convenience, I wrote a script called deploy to help me with this. It can tell me which files in the repo are different from the installed files and can show me the differences. It can also install files to their proper locations.

Single git repo setup tracking multiple locations on hard drive

I'm very new to the world of git (done some svn in the past) and would like some advice on trying to accomplish the following.
My current workflow is that I setup the static html files using Middleman to get the base HTML structure and styles before porting over to a Wordpress template. These static files are located at C:/git/project-name/HTMLTemplates.
My wordpress setup uses Xampp so the theme files are kept in C:/Xampp/wordpress/wp-content/themes/project-theme.
What I would like to do is have a single git repo that tracks the changes of the two different locations (HTMLTemplates and project-theme)
Is this at all possible, or do I simply create two individual repos (eg: proect-static and project-wordpress)?
No, there is no mechanism in git for this. Git assumes that all files that it manages (the "working copy") live in a single directory (and subdirectories); there is no support for managing two separate directories in in repo.
So you'll have to somehow keep everything in one directory, probably as subdirectories HTMLTemplates and theme or similar.
You could use two git repos, but I'd strongly advise against this. A single repo should contain a whole "project", i.e. everything needed to build one piece of software (excluding things like external libraries). If you split your project across two repositories, you cannot usefully branch and merge (because you'd have to do it in both repos simultaneously), you cannot easily check out old versions etc..
To solve your problem, I see a few possible solutions:
Have some build / deployment script that copies everything to the right places. You probably alread have a script that invokes Middleman, and possibly tells Wordpress to refresh its cache, so you could add it there.
Set up a symbolic link for the wordpress directory. On UNIX-like systems this is easy and commonly done. On Windows, you can create "junction points", which I believe work similarly.
Configure Wordpress / Apache to read the directory directly from your git working copy. The path should be configurable.
I would prefer the first solution; this has the added advantage that it will decouple your development environment from the server configuration. This will make it easier if your setup later changes or your project needs to run in a different environment (development on a different machine, someone else also wants to work on your project, you want to deploy to a hosted server somewhere etc.).
Note: The problem is, I believe, that your are trying to use git as a deployment tool. While many people do this, git is not really suitable for this purpose. Deployment should usually be a separate step.

Resources