JGit bare commit / tree construction - jgit

I'm trying to commit a single blob directly to a repository using jgit. I know how to insert the blob and get its sha1, however I'm having difficulties constructing a tree for this scenario. I can't seem to figure out how to correctly use jgit's tree abstractions (TreeWalk and the like) to recursively construct a tree, almost identical to the previous commits', with only different parent trees of the blob.
What is the idiomatic way to do this in JGit?
The reason I ask, is because I'm writing a program that's a kind of editor for documents living in git repositories. In my case, the whole point of using git is to be able to have multiple versions of the documents (aka branches) at the same time. Since it's an editor I have to be able to commit changes, however since I want to see multiple versions of the document at the same time, checking out, modifying a file and committing using JGit porcelain API is not possible, it has to work directly with git objects.

The low-level API you can use for this is TreeFormatter together with CommitBuilder.
An example of using this can be seen here. In this case, it constructs one new tree object with multiple subtrees.
In your case, you probably have to recursively walk the tree and create the new tree objects on the path to the changed file and insert them bottom up. For the rest of the tree, you can use the existing tree IDs and don't have to descend into them. I recommend looking into TreeWalk#setRecursive and TreeWalk#setPostOrderTraversal.
Another option would be to create an in-core DirCache, fill it with DirCacheEntries from the commit and your updated entry, and then call DirCache#writeTree.

Related

Using git for feedback from proof readers

I am currently writing a text with R bookdown and asked two friends to read my text and give comments, corrections and general feedback. My source files for the text are stored on GitHub and I would like my collaborators to make changes in the files (one for each chapter) with the help of git. However, none of us are really experts on git. This makes it hard to figure out what a suitable workflow is.
For now, we decided that each one of them creates himself a branch so that he does not directly push into the master branch. After I have read their changes I would like to decide what I merge into the master branch and what not. So far, it looks like each change needs to be in a separate commit because I am not able to merge single lines from a specific commit (not sure if that is at all possible). However, this seems like a lot of annoying and unnecessary commits to create. So, I guess I am looking for a way to avoid that and/or general pointers towards a good workflow for such kind of projects.
A useful command will be git cherry-pick, it allows you to select specific commits from a branch.
A general good practice is that commits should be self contained (if applied alone they make sense) and they target a specific feature (in the use case mentioned, that could be a paragraph or a section or a chapter).
In the end, if you would like to apply only specific changes of a commit, that would have to happen manually, someone has to decide which parts to apply and which not. A commit can be edited using git rebase -i <branch name> before being merged. This question might also be useful.
I finally found what worked for me in here. Basically, on my master branch I had to use
git merge --no-commit --no-ff branch-to-merge
This will merge all changes into my master branch but does not immediatly commit the changes so that they can still be staged/unstaged. Then, I can decide what line change to include by staging the line changes I want to keep and discard all other line changes. Finally, I commit all staged line changes et voilĂ , that's what I wanted to get.
Sidenote: I am using gitkraken and as a beginner with git I enjoy using the GUI but the merge part with the options "no-commit" and "no-fast-forwarding" had to be done via the git console (at least I could not find a way to to that using the GUI). Choosing which lines to stage and which to discard is then an easy task via the GUI.

How do you use guards(1) with quilt(1)

One of the ancillary tools bundled with quilt is guards, which processes a list of guards and a configuration file matching guards and files, and outputs a list of files whose guard specifications match the provided guards.
However I can't figure how they're supposed to fit together: quilt(1) doesn't show any way to invoke a command to generate series files, I didn't find examples in the mans or the working copy, and the internets are less than helpful (all the hits talk about bedding).
I feel like guard has to be manually invoked whenever its "dependencies" change and the series file overwritten, is that the case? If so, how is data fed back the other way around e.g. when adding a new patch to the series, does it have to be manually synchronised to the guard file?
Background: a few years back I used mq quite a bit, but it integrates guards natively so the synchronisation back and forth is not an issue at all.

Is there any way to execute repeatable flyway scripts first?

We use flyway since years to maintain our DB scripts, and it does a wonderful job.
However there is one single situation where I am not really happy - possibly someone out there has a solution:
In order to reduce the number of scripts required (and also in order to keep overview about "where" our procedures are defined) I'd like to implement our functions/procedures in one script. Every time a procedure changes (or a new one is developed) this script shall be updated - repeatable scripts sound perfect for this purpose, but unfortunately they are not.
The drawback is, that a new procedure cannot be accessed by non-repeatable scripts, as repeatable scripts are executed last, so the procedure does not exist when the non-repeatable script executes.
I hoped I can control this by specifying different locations (e.g. loc_first containing the repeatables I want to be executed first, loc_normal for the standard scripts and the repeatables to be executed last).
Unfortunately the order of locations has no impact on execution order ;-(
What's the proper way to deal with this situation? Right now I need to specify the corresponding procedures in non-repeatable scripts, but that's exactly what I'd like to avoid ....
I found a workaround on my own: I'm using flyway directly with maven (the same would work in case you use the API of course). Each stage of my maven script has its own profile (specifiying URL etc.)
Now I create two profiles for every stage - so I have e.g. dev and devProcs.
The difference between these two maven profiles is, that the "[stage]Procs" profile operates on a different location (where only the repeatable scripts maintaining procedures are kept). Then I need to execute flyway twice - first with [stage]Procs then with [stage].
To me this looks a bit messy, but at least I can maintain my procedures in a repeatable script this way.
According to flyway docs, Repeatable migrations ALWAYS execute after versioned migration.
But, I guess, you can use Flyway callbacks. Looks like, beforeMigrate.sql callback is exactly what you need.

How to transfer a structure from one Plone to another

I have a Plone instance which contains some structures which I need to copy to a new Plone instance (but much more which should not be copied). Those structures are document trees ("books" of Archetypes folders and documents) which use resources (e.g. images and animations, by UID) outside those trees (in a separate structure which of course contains lots of resources not needed by the ones which need to be copied).
I tried already to copy the whole data and delete the unneeded parts, but this takes very (!) long, so I'm looking for a better way.
Thus, the idea is to traverse my little forest of document trees and transfer them and the resources they need (sparsely rebuilding that separate structure) to the new Plone instance. I have full access to both of them.
Is there a suggested way to accomplish this? Or should I export all of them, including the resources structure, and delete all unneeded stuff afterwards?
I found out that each time that I make this type of migration by hand, I make mistakes that force me to do it again.
OTOH, if migration is automated, I can run it, find out what I did wrong, fix the migration, and do it all over again until I am satisfied.
In this context, to automate your migration, I advise you to look at collective.transmogrifrier.
I recommend jsonmigrator - which is a twist on collective.transmogrifier mentioned by Godefroid. See my blog on it here
You can even use it to migrate from Archetypes to Dexterity types (you just need matching fieldnames (and matching types roughly speaking).
Trying to select the resources to import will be tricky though. Perhaps you can find a way to iterate through your document trees & "touch" (in a unix sense) any resource that you are using. Then copy across only resources whose "timestamp" indicates that they have been touched.

How to avoid code duplication for non-data structures (views, stored procedures etc)

My project contains a lot of objects like views and stored procedures which are being changed quite frequently. Now I have to create new SQL script on every update which contains complete source code of changed objects despite I've actually changed only few rows. It leads to massive code duplication and I also found it difficult to review these changes.
I'd like to have only one actual version of SQL script for every object like view or procedure and recreate these objects every time I redeploy the database. As result I could change existing source file (like in Java or C programming) instead of creating a new update every time I need to alter view or procedure.
Is there a possibility to execute some scripts every time I migrate the database with Flyway?
I'm not sure why that got so many downvotes, it's a perfectly understandable and valid question. Perhaps it's because it closely resembles this open question:
Migrating Stored Procedures with Flyway
We are actually starting to push against this issue now. We've been using flyway for development and testing (and love it). But we've come to a point where we're starting to have to use procs/triggers/views (p/t/v's) and the fundamental disconnect between how we did it before, and how we must use flyway, is starting to be a strain.
Before, for a given database object (let's say it's a procedure), there'd be one source file. And if you needed to change the proc 'n' times, there would be 'n' versions of the same file in your VCS. Diff tools work great, IDE's all understand this, merges detect when two developers working in separate branches make changes to the proc, etc, etc. You know, old school.
But with flyway, any one proc with 'n' changes is now scattered across 'n' files. Instead of "one object in one file with 'n' versions", you have "one objecct in 'n' files with one change each". I now need to do a text search in my IDE for any instance of "proc_name" if I want to know the history of changes to the proc. The VCS knows nothing about it. Devs can each make a migration in their own branches that succeed when each is deployed, but leave the proc with a missing update.
I'm not saying any of this to complain about flyway, and I fully realize it's not a simple area. I'd almost say it's unsolveable (by flyway).
We're scheming how to handle this problem, and I'd be very interested to know how others have handled it.
Repeatable migrations are supported by Flyway 4.0, now.
Just add sql files starting with "R" without any version information to your migration folder:
R__Blue_cars.sql
You have to ensure, that the script could be repeatable migrated.
This is usually done by "CREATE OR REPLACE" clauses in your DDL statements.
https://flywaydb.org/documentation/migration/repeatable

Resources