Best Practise Coding for R script running in production [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
We have a linux production server and a number of scripts we are writing that we want to run on it to collect data which then will be put into a Spark data lake.
My background is SQL Server / Fortran and there are very specific best practices that should be followed.
Production environments should be stable in terms of version control, both from the code point of view, but also the installed applications, operating system, etc.
Changes to code/applications/operating system should be done either in a separate environment or in a way that is controlled and can be backed out.
If a second environment exist, then the possibility of parallel execution to test system changes can be performed.
(Largely), developers are restricted from changing the production environment
In reviewing the R code, there are a number of things that I have questions on.
library(), install.packages() - I would like to exclude the possibility of installing newer versions of packages each time scripts are run?
how is it best to call R packages that are scheduled through a CRON job? There are a number of choices here.
When using RSelenium what is the most efficient way to use a gui/web browser or virtualised web browser?

In any case I would scratch any notion of updating the packages automatically. Expect the maintainers of the packages you rely on to introduce backward incompatible changes. Your code will stop working out of the blue if you auto update. Do not assume anything sacred.
Past that you need to ask yourself how much hands on is your deployment. If you're OK with manually setting up each deployment then you can probably get away using the packrat package to pull down and keep sources of the exact versions you are using. This way reproducing your deployment is painful, but at least possible. If you want fully automated reproducible deployments I suggest you start building docker images with your packages and tagging them with dates or versions.
If you make no provisions for reproducing your environment you are asking for trouble, while it may seem OK at first to simply fix any incompatibilities as they come up with updates, and does indeed seem to be the official workflow from the powers that be, however misguided that is; eventually as your codebase grows that will be all you will end up doing.

Related

Deployment Process for DevOps/Agile [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm currently trying to implement a deployment process (I think that’s what you call it?)
A former Company I worked for used to have 3 Environments and were using some form of DevOps.
dev.url => Development for Devs
stage.url => Staging for QA
(live.)url
Finished Features would be pulled on stage for QA. When QA gave the go it got a Tag and that tag then was pulled on the live environment. It was all in combination with Agile.
So my Question would be:
Do you know the name of that Deployment-Process ?/
Do you know further popular Deployment-Processes similar to the one I just described ? or What kind of process do you use ?
I'm looking for something like:
Development process, deployment, GitHub
Thanks
Looks like what you are looking for is called as Deployment Pipeline which as mentioned by #prasanna is the key part of Continuous Delivery. Key for Continuous Delivery is Continuous Integration [which in turn requires automated tests] and automated deployment with Configuration Management tools.
Regarding the tool, you can use Jenkins along with its Build Pipeline Plugin.
Of-course this is continuos delivery. But the devil is in the details.
What do you do when things move from Dev->QA->Staging->Prod
What tests are run when the build is across these stages
How does the promotion between environments happen (automated/manual) etc.
The key in CD is to ensure that you try to automate all these as deep as possible to you can take faster decisions when builds get stuck in any of these environments.
As rightly mentioned in the above two answers you are referring to Continuos delivery. Now there can be multiple levels of maturity in Continuos delivery. You start with having a Continuos Integration process in place which essentially means that the code is compiled frequently to check for possible failures.
Then you put some checks on the compiled code which get triggered automatically.
Then you go ahead and deploy this code.
The next step to this would be where the environment where the code is deployed is also provisioned on the fly.

Is node.js ready for production use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Starting a new project. It's basically a blogging/commenting system.
We're considering node.js as the back end server. Is node.js ready for this sort of thing or is it too early and experimental?
We need HTTPS and gzip compression - perhaps a front end nginx server could provide this?
What's missing from node.js that would make developing a web app difficult?
From a production ready perspective, we're wondering if it is stable enough for building a commercial app on top of.
Thanks
UPDATE:
Almost a year has passed and now I'd definitely use node.js for live systems.
It's not ready. It sure is an awesome piece of software but it's not suitable for production use yet.
The developer of node.js himself stated in a talk, that it's probably full of bugs and security issues.
This is the talk: http://www.yuiblog.com/blog/2010/05/20/video-dahl/
He recommends that IF it is to be used in a production environment, you should place it behind a stable http proxy like nginx but he discourages doing that at all.
I'll wait for a production release and until then, play with it on my local machine.
Node.js is really great. But it's complicated for a production use now. Actually, the API change several times in each version and can be change again a lot of time. So you need fix to a particular version. The migration can be painful.
I'm using it for a production site. It's been live for a few months and I've had no issues with the node runtime. Stick with the latest stable release (currently 0.2.6).
The 3rd party modules written by the community are where you may run into issues. Some modules are more stable than others. The node community has standardized on github, so it's pretty easy to fork and fix things you run into. But be prepared to roll up your sleeves and hack -- it's probable that you'll need to fix a few bugs in the modules you use.
Overall I've been happy using node.js
It's just another tool, with different pros and cons. If your project is planned carefully you shouldn't run into major problems. Node.js is a very active project and it shouldn't be long before it reaches stable. If your team finally decides to use node.js please contribute any findings / solutions / code or any kind of valuable information back to the community while you're at it. That would really help. The more people active, the faster node.js will progress.
It's still got some rough edges, but I'd say it's ready to use (I'm about to launch a production site based on it). Here's an article describing how 3 companies are using it in production.
You may still find yourself finding/fixing the occasional bug, but that's where the community really shines.
(Updated answer) On June 2013 (version 0.10.12):
Node.js is ready for production, it's stable and really fast.
I am using it on live servers with Redis, using a SmartOS VM with dtrace and flamegraph for profiling (on a dev server). It also replaced quite well my Apache/PHP stack for creating websites.
The best ways to find up-to-date modules are Nipster and npmjs.
As some modules are not mature enough, finding the right one is sometimes an iterative process.
--
(Old answer) On May 2012 (version 0.6.18):
Node.js and its API seems stable enough for production use.
However, its ecosystem isn't: most modules are not stable yet and a lot of them aren't maintained anymore (last commits from 8 to 18 months - you can check on the github pages of modules)
Currently, using a module often require an active participation: subscribing to its mailing list and patch it when needed.

Matching ASP.NET source code to a compiled web application [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My client has a compiled ASP.NET 2.0 application that was compiled & deployed a year ago. They also have 4 versions of source code projects/solutions not under source control (stored on previous developer's workstation file system). None of the file dates appear to match one another.
Is there any way to determine which (if any) of those versions is the one actually deployed to the production web site?
I've done this on client projects multiple times and use Reflector, like other commentors. This kind of thing happens more often than it should. For instance, when someone leaves the development team suddenly. On one project, my team of contractors was called in after the ENTIRE development team left and we had to follow this procedure on every single piece of code running in production to be sure of what we actually had on our hands.
The way I deal with it is this by taking EVERY version of the compiled code that's available into a separate area in the filesystem. This includes the version that's in source control or off of the development workstation. This is important because Reflector sees the IL and not the actual original source, and you want to compare apples to apples.
I use the FileDisassembler for Reflector to decompile each of the binaries into a separate folder. I end up with a structure that looks something like this:
ProjectXyzReconciliation
|-production
|-staging
|-test
|-qa
|-devworkstation
|-sourcecontrol
|-reconciled (this is what will eventually go back in source control)
I then use WinMerge (but have used other merge/comparison tools equally well) to compare the directories and merge them into the "reconciled" folder. I usually populate that with what's running in production to start with and compare every other version against it.
The first pass is really just to see what's different and decompiling out to files lets you use tools like WinMerge to get reports of what's actually different for making decisions.
Sometimes, this process yields one or 2 changes that are easily traceable to bugs in the bug tracking database or emails, etc. and decisions can be made as to whether it should go in or stay out for further work.
When every difference is explained and either merged or rejected for later re-work or removal, the newly reconciled code is used as the new base for future development and refactoring. This does lose any comments that were in the code, but when this whole procedure has been necessary, the losing the comments hasn't been much of a loss to be frank.
The first time through, this can seem daunting, but members of my teams that have gotten good at this have found that on later projects, they can often seem the hero for being able to seemingly accomplish the impossible when a nasty situation arises, making it worthwhile to get this into your toolbox.
If I were in your situation, I would compile each of the 4 separate source projects one at a time... Then run the diff add-in for .NET Reflector to see if you have a match with the production assembly. If not, compile the next source project and try the diff again.
If your project directories contain build artifacts such as DLLs and EXEs, you could check the version numbers and compare with those in production. Even if you don't get an exact match, you'll see what might be closest.
.NET Reflector is a handy tool to see what code is being in use at a given server.

Build Server Best Practices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I heard google has some automated process like that:
When you check in, your code is checked into a temporary location.
It is built.
Style checks run.
Tests run.
If there are no problems, code goes to actual repository.
You receive an e - mail containing test results, performance graphs, style check results and whether your code is checked in.
So if you want to learn if you broke something or the great performance gain you expected occurred, you just check in and receive an e - mail telling you what you need to know.
What are your favorite build server best practices?
What you described for google is what every basic build process does. Specific projects may have additional needs, for example - how we deploy web applications from staging to production:
Build start
Live site is taken offline (Apache redirects to different directory holding an "Under construction" page)
SVN update is ran for production server
Database schema deltas are ran
Tests are ran against updated source and schema
In case of fail: rollback is ran (SVN revert and database schema UNDO)
Site gets back online
Build ends
On the java platform I have tried every single major CI system there is. My tip is that paying for a commercially supported solution has been the cheapest build system I've ever seen. These things take time to maintain, support and troubleshoot. Especially with a heavy load of builds running all the time.
The example workflow you give is similar to the one proposed by TeamCity. The idea being:
Code
Check in to "pre-test"
CI server tests the "pre-commit"
If (and only if) tests pass, the CI server commits the code change to the main repo
It's a religious war but I prefer:
Code - Test - Refactor (loop)
Commit
CI server also validates your commit
Every responsible programmer should run all the tests before committing.
The main argument for the first way is that it gurantees that there is no broken code in SCM. However, I would argue that:
You should trust your developers to test before committing
If tests take to long, the problem is your slow tests, not the workflow
Developers are keen to keep tests fast
Relying on CI server to run tests give you false sense of security

What Tools Do You Recommend To Auto-Build Your Application? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
As recently as several years ago, the developers actually made the builds that went to clients. This was obviously a disaster for reasons too numerous to list.
Then when we started to learn the errors of our ways, we looked for a way to auto-build the entire application on a dedicated build machine. The culture at that time was very averse to bringing in outside tools, so we built our own autobuild system by writing a VB app.
This worked fine for a while, until the project's structure started to change, new projects were added, and we needed to build the application in different ways. Then then weaknesses of our hand-rolled autobuilder became apparent and, over time, increasingly onerous. This disease has progressed now to the point where QA (who owns our build process) can't even maintain the autobuilder because it requires more and more programming skill. Every time we add a project or change something in an existing project, it consumes more developer time just to make it work. There have been days when we were unable to produce a build because the system was broken.
I'm now in a position where I can change this process, and I'm looking to scrap the entire system and put something else in it's place. My goals are:
Have an autobuild system that can run with zero human interaction at a specific time every day. It should be able to gather all the source code, compile all the apps, create the setups, put the finished products on a network share, and possibly trigger the automated testing system to kick in (we use QTP).
The autobuild system should be flexible enough to easily adapt to changes in the project without rrequiring a major overhaul.
It should be simple enough so that QA can own the system and not require developer resources to make changes to how builds are made.
What are your experiences? Can you recommend an autobuild system? Should I have different goals?
I'm currently using CruiseControl integrated with Ant to control project builds. This allows flexibility of build schedules and means you can automate the entire build process fairly easily using Ant scripts. Also, during defect fixing periods you can have CruiseControl set up to watch for source control submissions instead of time periods and build when these occur. This allows developers very quick feedback on defect fixes.
I use FinalBuilder and FinalBuilder Server for nightly builds. It's a bit buggy at times, but if you think it through it's quite easy to create extensible projects that can build X project type, build it's database from change scripts and deploy it to a testing server.
It can also handle all kinds of wierd and wonderful things like ZIPing a nightly build and uploading it to an FTP or creating ISO images automatically.
Definitely look into MSBuild if you're on the Microsoft stack.
Joel is always going on and on about how great FinalBuilder is, so that might be worth a look as well.
We just migrated from a hand-rolled set of Perl scripts to a Buildbot setup. I found it because that's what Google's using for Chrome.
You can do nightlies, or it can integrate with source control to do an isolated test build whenever anybody does a checkin, or a variety of other things. It's also parallel; you can have more than one machine in the build farm, either for specialized duties or just to handle more load.
The entire system is written in Python, so it's platform-agnostic, which is important if you need to do builds on more than one platform. It can do anything you can do from the command line; we have it calling MSBuild for user-mode components, a DDK build for kernel-mode pieces, and running products for unit test builds.
Out of the box it supports most OSS source control tools, but if you're using TFS or something else you may need to modify the package that you install on the slave machines.
I think you are on the right track here.
Whoever looks after your automated build process needs to have a fundamental understanding of how your solution fits together. This doesn't necessarily mean knowing how to write code or architect solutions, but they will require a solid understanding of how the solution compiles, packages itself etc.
You might need to share responsibility for builds between people or teams to accomplish this. I'd say that a daily build is a "team responsibility".
I'd look at establishing a baseline build configuration which can be extended for "special use" builds (besides just building a release version), e.g. internationalized releases, fxCop/Quality Tools config, build + run Unit Tests, continuous integration builds, a build config to run on developer workstations, etc.
Instead, I'd aim to achieve the following:
Automatic versioning, signing etc
Ability to produce verbose output (logging) to help debug build breaks
On that point - it should handle errors properly, capture as much information and log it properly
Consistency - It should work the same way each time to produce repeatable outcomes
Run in a clean, limited access environment
Well commented/documented so that it can be understood by new staff, etc.
Option to generate release notes, compile metrics, produce reports (if this option is available)
Ability to deploy to multiple environments
Support different ways to obtain source code from source control, e.g. by changeset, label, date, etc
As for tool recommendations, I've used FinalBuilder, Visual Build Pro, MSBuild/Team Build, nAnt, CruiseControl and CIFactory plus and good old fashioned batch files.
Each has its pros and cons, I'm not going to make a recommendation except to say that the products with decent UI support were a little bit easier to work with, but at times were far less powerful. If you're working with VIsual Studio, MSBuild is very powerful, but has a somewhat steep learning curve.
As of tools delivered with MS Visual Studio you might want to use MSBuild. Additional Community toolsets for MSBuild will even give you the possibility to checkout code from Subversion and zip output.
We're using it successfully in our company. Projects consists of several solutions with 100+ subprojects. Works like a charm.
Visual Build Pro is nice, if your build machines are Windows. I think this would fill the requirement you have about QA owning the system. But don't get me wrong, it's pretty powerful.
We use CruiseControl.NET and UppercuT (which uses NAnt) to do this. UppercuT uses conventions for building so it makes it really easy for someone to get started by answering three questions (What is the solution named? What is the path to source control? What is your company's name?) and you are building.
http://code.google.com/p/uppercut/
Some good explanations here: UppercuT
We use the Hudson buildbot for for big Java web app building from ant build scripts. Hudson is pretty sweet for our purposes. It has a master/slave setup so builds can be done concurrently (on a timer or on-demand). Slave nodes can be any OS/hardware combo provided it has the needed build tools already on it and is on the network (and won't crash every 10 min).
Full web-based interface including live console output, change logs, artifacts from the build are available across the network including previous builds (if successful). Awesomesauce!

Resources