DVCS for a small company of remote employees - dvcs

Here's the situation: at my small office, because we like to keep mobile and occasionally work from home, instead of having a central file server, we have all the office documents in an SVN repository, and each person keeps a checkout on their own laptops. A checkout weighs in at about 3GB, and the repo with revisions in it: about 6GB. This is all working great.
The problem is that soon we won't have a small office any more - all our 5 workers will be working remotely. I had considered purchasing a dedicated server and running our SVN repository from that, except two of our workers will be really remote and will be using wireless "broadband" with a 3GB/month limit, and I'm afraid that a few large updates will really rip through their monthly allowance, not to mention taking all day to complete.
Reading a few questions on Stack Overflow, it seems there's quite a community of distributed VCS aficionados who think git or mercurial is definitely the best for many situations. Given that all the employees would still be able to meet face-to-face at least once a fortnight (and hence be on a fast LAN), I'm wondering if a DVCS would work for us?

I don't know exactly what's in your repo, but unless you're changing all the files regularly, a DVCS should provide you a very desirable workflow.
You could do an svn -> git conversion, stick the repo on a DVD and mail it out to all the satellite offices, and then let them fetch from the office as things change at a fairly low incremental cost (should be smaller than the delta in general).

Checkout the Fossil DVCS, it may fit your bill. Fossil may be used like SVN or a DVCS. If you are concerned about it handling your current repository try it out. It also has a built in project wiki and bug tracking system that distribute with the repository as well. You could try it out and see if it would work for your small team.
The pain for you would be losing your revision history, at this time I don't beleive you can import a svn repository into Fossil.
Join the mailing list and you will get answers for any of your questions. The creator of SQLite is also the creator of this project as well. Hope this helps.

I can't see why not. With something like git, the repository is local to the machine, and so your remote employees can actually have a tracked changelog that can then be merged or rebased with the main repository--whatever you decide that to be--when they get the chance.
Also, git has really good compression compared to SVN, so the 3GB/mo quota may be more than enough for your remote employees.
Randal Schwartz actually gave a really good presentation on git at Google's Tech Talks: http://www.youtube.com/watch?v=8dhZ9BXQgc4

(It seems no one is answering this.) DVCS of course seems like it would work, but I have no experience with it. A centralized system like svn might also work if you are not expecting large changes daily. (to go up and back from the server) The initial get in that case would be the only real expensive issue.
Can you monitor your use now and see how much traffic goes back and forth?
The real problem here is the 3GB/mo bandwidth limitation. It's probably just better to come up with a better solution for connectivity...

Related

Getting started with Jupyter in the cloud

So far I've only used Jupyter on my local machine, which is way too slow. I'm completely new to using cloud services for Jupyter, or using cloud services at all for that matter. I know there are a million tutorials out there, but this is my problem: How to choose the right service from all those options (Amazon? Google? Cheaper options?)? What's the 'right way' to get started?
What I need:
I want a service where I can start up a Jupyter notebook in my browser as simply as possible. (I know next to nothing about setting up servers etc., and have very limited time to learn that if needed)
I currently have an old MacBook from 2014. The server should be at least 10x faster. (Which options do I need to pick?)
I want to do machine learning, so GPUs would be good.
My budget is about $50 per month, less would be great; a free tryout would be great too.
As I am completely new, I also need to know what pitfalls to look out for. (E.g.: Stop the machine to stop increasing the costs?)
If you could help me, or point me to a good tutorial or even a book, I'd be forever grateful.
(Sorry for the basic question. Of course I googled tutorials myself before posting this question, but as indicated above, I'm overwhelmed by the options - that's why I posted this question.)
AWS based tutorial:
https://aws.amazon.com/de/getting-started/hands-on/get-started-dlami/
GPU, CPU and pricing informations are gathered here:
https://docs.aws.amazon.com/dlami/latest/devguide/pricing.html
You can set up a budget for cost limitation:
https://aws.amazon.com/de/getting-started/hands-on/control-your-costs-free-tier-budgets/

Is luwak production ready?

I'm implementing a blob store over Riak KV to make an experiment storing mail attachment. Riak CS seems over reaching for this goal.
I already have a prototype implemented in Python and many ideas to keep working on it. Today I stumbled upon luwak, which has a similar design, however much more complete and consistent with the Merkle tree metadata.
Is luwak abandoned in favor of Riak CS? Is it production ready?
Looks like it has been abandoned:
Hi all,
I wanted to take a moment and let the community-at-large know
that we are going to end-of-life the Luwak functionality in Riak, as
part of our 1.1 release in February. This simply means that the Luwak
repo on Github will not be actively developed/supported by Basho, or
included as a default Riak dependency. In keeping with our commitment
to Open Source, the code will still be available if someone else wants
to develop on it. You can also continue to use it with Riak if you are
willing to edit the deps and compile from source -- the Luwak README
will contain information on how to do this.
While the idea of Luwak
was interesting, we ultimately decided that it wasn’t the
architectural path we wanted to pursue for storing larger values in
Riak. If Luwak is a piece of your Riak deployment, we’d love to hear
from you and incorporate your feedback into future directions.
Please don’t hesitate to reach out to myself or Mark Phillips.
Thanks, D.
The last commit on the open-source project was a little over 3 years ago, and it's had an outstanding pull request open for 10 months (the comments on there are not encouraging either). So yeah, I'd say it's pretty dead.

Forbid developer to commit code because of making weekly build

Our development team (about 40 developers) has a formal build every two weeks. We have a process that in the "build day", every developers are forbiden to commit code into SVN. I don't think this is a good idea because:
Build will take days (even weeks in bad time) to make and BVT.
People couldn't commit code as they will, they will not work.
People will commit all codes in a huge pack, so the common is hard to write.
I want know if your team has same policy, and if not how do you take this situation.
Thanks
Pick a revision.
Check out the code from that revision.
Build.
???
Profit.
Normally, a build is made from a labeled code.
If the label is defined (and do not move), every developer can commit as much as he/she wants: the build will go on from a fixed and defined set of code.
If fixes need to be make on that set of code being built, a branch can then be defined from that label, minor fixes can be made to achieve a correct build, before being merging back to the current development branch.
A "development effort" (like a build with its tweaks) should not ever block another development effort (the daily commits).
Step 1: svn copy /trunk/your/project/goes/here /temp/build
Step 2: Massage your sources in /temp/build
Step 3: Perform the build in /temp/build. If you encounter errors, fix them in /temp/build and build again
Step 4: If successful, svn move /temp/build /builds/product/buildnumber
This way, developers can check in whenever they want and are not disturbed by the daily/weekly/monthly/yearly build
Sounds frustrating. Is there a reason you guys are not doing Continuous Integration?
If that sounds too extreme for you, then definitely invest some time in learning how branching works in SVN. I think you could convince the team to either develop on branches and merge into trunk, or else commit the "formal build" to a particular tag/branch.
We create a branch for every ticket or new feature, even if the ticket is small (eg takes only 2 hours to fix).
At the end of each the coding part of each iteration we decide what tickets to include in the next release. We then merge those tickets into trunk and release software.
There are other steps within that process where testing is performed by another developer on each ticket branch before the ticket is merged to trunk.
Developers can always code by creating their own branch from trunk at any time. Note we are a small team with only 12 developers.
Both Kevin and VonC have well pointed out that the build should be made from a specific revision of the code and should not ever block the developers from committing in new code. If this is somehow a problem, then you should consider using another version management software which uses centralized AND local repositories. For example, in mercurial, there is a central repository just like in svn, but developers also have a local repository. This means that when a developer makes a commit, he only commits to his local repository and the changes will not be seen by other developers. Once he is ready to commit the code for other developers, then the developer just pushes the changes from his local repository to the centralized repository.
The advantage with this kind of an approach is that developers can commit smaller pieces of code, even if it would break a build, because the changes are only applied to the local repository. Once the changes are stable enough, they can be pushed to the centralized repository. This way a developer can have the advatange of source control even though the centralized repository would be down.
Oh, and you'll be looking at branches in a whole new way.
If you became interested in mercurial, check out this site: http://hginit.com
I have worked on projects with a similar policy. The reason we needed such a policy is that we were not using branches. If developers are allowed to create a branch, then they can make whatever commits they need to on that branch and not interrupt anyone else -- the policy becomes "don't merge to main" during the weekly-build period.
Another approach is to have the weekly-build split off onto a branch, so that regardless of what gets checked in (and possibly merged), the weekly build will not be affected.
Using labels, as VonC suggested, is also a good approach. However, you need to consider what happens when a labeled file needs a patch for the nightly build -- what if a developer has checked in a change to that file since it was labeled, and that developer's changes should not be included in the weekly build? In that case, you will need a branch anyway. But branching off a label can be a good approach too.
I have also worked on projects that make branches like crazy and it becomes a mess trying to figure out what's happening with any particular file. Changes may be committed to multiple branches in the same timeframe. Eventually the merge conflicts need to be resolved. This can be quite a headache. Regardless, my preference is to be able to use branches.
Wow, thats an awful way to develop.
Last time I worked in a really large team we had about 100 devs in 3 time zones: USA, UK, India, so we could effectively have 24 hour development.
Each dev would check the build tree and work on what they had to work on.
At the same time, there would be continuous builds happening. The build would make its copy from the submitted code and build it. Any failures would go back to the most recent submitter(s) for code for that build.
Result:
Lots of builds, most of which compiled OK. These builds then started automatic smoke testing scenarios to find any unexpected bugs not found during testing priot to commiting.
Build failures found early, fixed early.
Bugs found early, fixed early.
Developers only wait the minimum time to submit (they have to wait until any other dev that is submitting has finished submitting - this requirement made so that the build servers have a point at which they can grab the source tree for a new build).
Most devs had two machines so they could work on a second bug while running their tests on the other machine (the tests were very graphical and would cause all sorts of focus issues, so you really needed a different machine to do other work).
Highly productive, continuous development with no deadtime as in your scenario.
To be fair, I don't think I could work in place that you describe. It would be soul destroying to work in such an unproductive way.
I strongly believe that your organization would benefit from Continuous Integrations, where you build very often, perhaps for every checkin to your code base.
Don't know if i'll get shot for saying this, but you should really move to a decentralized solution like GIT. SVN is horrible about this and the fact that you can't commit basically stops people from working properly. At 40 people this is worth it because each can continue working on their own stuff and only push what they want. The build server can do what it wants and build without affecting everyone.
Yet another example why Linus was right when saying that in almost all cases, a decentralized solution like git works best in real life teams.

What are you using for Distributed Caching in web farms running ASP.NET?

I am curious as to what others are using in this situation. I know a couple of the options that are out there like a memcached port or ScaleOutSoftware. The memcached ports don't seem to be actively worked on (correct me if I'm wrong). ScaleOutSoftware is too expensive for me (I don't doubt it is worth it). This is not to say that I don't want to hear about people using memcached or ScaleOutSoftware. I'm just stating what I "know" at this point.
So my question is basically this: for those of you ACTIVELY using distributed caching, what are you using, are you happy with it, and what should I look out for?
I am moving to two servers very soon...both will be at the same location. I use caching fairly heavily (but carefully) to reduce the load on my database server.
Edit: I downloaded Scaleout Software's solution. I've coded for it and it seems to work real well. I just have to decide if my wallet will part with the cash for it. :) Anyone have experiences good or bad with ScaleoutSoftware?
Edit Again: It's been a little while since I asked this? Any more thoughts on it? We ended up buying the solution from ScaleOutSoftware and have been happy with it, but I'm curious what others are doing.
Microsoft has a product pending code-named Velocity. It's still in CTP, and is moving slowly, but looks like it will be pretty good. We'll be beating it up in the near future to see how it handles what we want it to do (> 2 million read/writes per hour). Will post back with results.
There is a 100% native .NET, well documented open source (LGPL) project called Shared Cache. Looks like it is not yet mentioned on SO, but it's promising and should be able to do what most people expect from a distributed cache. It even supports different strategies like distributed or replicated caching etc.
I will update this post with more details as soon as I had a chance to try it on a real project.
We're currently using an incredibly simple cache that I wrote in a couple of hours, based on re-hosting the ASP.NET cache in a Windows Service (more info and source code here). I won't pretend it's anywhere near as optimised as something like Memcached but we were just looking for something simple and free until Velocity came along, and it's held up extremely well even under fairly heavy load.
It comes down to our personal preference for core components - i.e. ones that affect whether the site is available or not - that they are either (a) supported by a vendor with a history of rapid and high quality support, or (b) written by us so that if something goes wrong we can fix it quickly. Open source is all well and good, and indeed we do use some OSS, but if your site is offline then unfortunately newsgroups et al don't have a 1 hour SLA, and just because it's OSS doesn't mean you have the necessary understanding or ability to fix it yourself.
We are using the memcached port for Windows and we are very pleased with it. The enyim.com memcached client API is great and easy to work with. It's also open source, which is a big advantage, if you ask me.
We are now using this setup in a production web-app and it has helped a lot in improving its performance.
There's a great .NET wrapper/port found here on Codeplex. Awesomesauce!
We use memcached with the enyim library in a production environment (www.funda.nl). Works fine, very pleased with it, but we did notice a substantial raise in CPU use on the clients. Presumably due to the serializing/deserializing going on. We do around 1000 reads per second.
One tried and tested product by 100's of customers worldwide is NCache. Its
a feature rich product that lets you store session state in a redundant and highly available manner, lets you share data
within the enterprise as well as bridging for WAN communication essentially acting as a data fabric and lastly it lets you build an elastic caching tier so that when
your application scales, you can add servers to the cache and actually boost performance further.

What do people think of the fossil DVCS? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
fossil http://www.fossil-scm.org
I found this recently and have started using it for my home projects. I want to hear what other people think of this VCS.
What is missing in my mind, is IDE support. Hopefully it will come, but I use the command line just fine.
My favorite things about fossil: single executable with built in web server wiki and bug tracking. The repository is just one SQLite (http://www.sqlite.org) database file, easy to do backups on. I also like that I can run fossil from and keep the repository on my thumb drive. This means my software development has become completely portable.
Tell me what you think....
Mr. Millikin, if you will take a few moments to review some of the documentation on fossil, I think your objections are addressed there. Storing a repository in an sQLite database is arguably safer than any other approach. See link text for some of the advantages of using a transactional database to store a repository. As for bloat: The entire thing is in a single self-contained executable which seems to disprove that concern.
Full disclosure: I am the author of fossil.
Note that I wrote fossil because no other DVCS met my needs. On the other hand, my needs are not your needs and so only you can judge whether or not fossil is right for you. But I do encourage you to at least have a look at the documentation and try to understand the problem that fossil is trying to solve before you dismiss it.
After having used Fossil for more than a year now on non-trivial development projects, I feel confident enough to weigh in on this topic.
Below is my experience so far. I'm comparing against git and svn at times, simply because I know those SCM's very well and comparing makes it easier for me to get the idea across.
I'm totally in love with this SCM, so it's mostly points on the plus side.
What I like about Fossil:
We have a bunch of machines (win/mac/a number of Linux distros), and the single-executable installation is just as beautiful as it sounds. No dependencies; it just works. Git is a messy pile of files and the dependency hell in Subversion makes it very nasty on some Linux distributions, especially if you must build it yourself.
The default Fossil workflow suits our projects perfectly, and more git'ish workflows are possible when needed.
We've found it extremely robust, even on large projects. I wouldn't expect anything else from the guys who wrote SQLite. No crashes, no corruption, no funny business.
I'm actually very, very happy with performance. Not as fast as git on huge trees, but not much slower either. I make up any lost time by not having to consult the documentation every other command, as is the case with git.
The fact that there's a tried'n'true transactional database behind every operation makes me sleep better at night. Yes, we've been through more than one horrible incident of stale and corrupt Subversion repositories (thankfully, a helpful community helped us fix them.) I can't imagine that happening in Fossil. Even Subversion 1.7.x use SQLite now for metadata storage. (Try turning off power in the midst of a git commit - it'll leave a corrupt repos!)
The integrated issue tracker and wiki are optional, obviously, but very handy as it's always there - no installation required. I wish the issue tracker had some more features though, but hey - it's an SCM.
The built-in server and web gui is simply brilliant and quite configurable through css.
We sometimes need to import to and from git and subversion repositories. This is a no-brainer in Fossil.
Single file repository. No '.svn' directories all over the place.
What I miss in / dislike about Fossil:
Someone please write TortoiseFossil for our non-technical Windows users :)
The community isn't that large yet, so it's probably hard for a lot of people to introduce it in their company. Hopefully this will change, gaining all the benefits of a large community (documentation, more testing of new releases, etc.)
I wish the local web ui had a search feature (including searching for file content).
Fewer merge options than in git (though the Fossil workflow makes merging less likely to occur in the first place.)
I hope everyone gives Fossil a run - the world is a better place with stuff that just works and which you don't need to be a rocket scientist to use.
Fossil is small, simple, yet powerful and robust, reminds me some principles of C Culture. Likable by those who develop independently and still collaborate.
Any great project should start with principles and continue them at its core as it gathers more layers (GUI, extra features).
I am impressed with Fossil and starting to use... take a look at fossil
cheers
I'm landing on this page after an year of the last post, recursive add that has been mentioned here is now taken care of.
Fossil mesmerizes me with simplicity especially after I struggled to get a bug-tracking system to work with mercurial. I need to see how to manage multiple projects, publish the repositories for multi-user access and how to do merging, manage patches etc. I get the feeling that it wont be disappointing going forward.
I'm not interested in using it for source-code version control, but I am interested in a distributed version-controlled personal wiki that I can sync between all the machines I use.
damian,
1/ yes, fossil doesn't support recursive add. However there are some fairly simply workarounds such as
for /r %i in (*.*) do fossil add "%i"
on Windows, and
find . -type f -print0 | xargs -0 fossil add --
on Unix.
2/ I saw the message about malformed manifest when you are adding a file with non-ASCII characters in the filename. The problem was corrected in the last build.
Regards,
Petr
I think fossil is really cool. The most important feature for me was easy installation, and developer friendly defaults. I currently use it to keep track of the local changes of my files. (Our project is hosted in sourceforge and kept track in CVS.) This way I can "commit" locally even if it would otherwise break the project, so smaller changes can be kept track as well.
Fossil is good. It is simple and easy to use. If fossil can provide GUI interface to check in and check out, then it would be better (prefer java gui to archive cross-platform GUI).
The main advantages of Fossil are "open source" and "use SQLite database", so somebody can compile fossil source code to make fossil work on google android platform (mobile and tablet devices).
I am trying your vcs right now.
I like the idea of having all integrated. After all, is all i want when i look for a system like this. I am an active user of Mercurial. And i couldn't find an integration with a issue tracker (I try unsuccessfully to set p Trac with mercurial in the past).
After some test i realize that:
1) "add" command is not recursively, or i can not found in the doc a way to do it
2) i write a bat (i work with windows) to add 750 files and i run it (it took a while). When a run commit it jumps with "manifest malformed"
i think you could address this issues and others making a survey like the Mercurial's one in https://www.mercurial-scm.org/wiki/UserSurvey.
you could write me at dnoseda at gmail
i am interested in you work. keep improve it.
regards
ps.: as an mayor improvement you could add something like gitstat
Perhaps an uneducated knee-jerk reaction, but the idea of storing a repository in a binary blob like an SQLite database terrifies me. I'm also dubious of the benefits of including wikis and bug trackers directly in the VCS -- either they're under-featured compared to full software like Trac, or the VCS is massively bloated compared to Subversion or Bazaar.

Resources