How to increase oozie shell memory container globally - bigdata

I am facing some issues like physical memory is out of memory.
When I add below configs to workflow
<property>
<name>oozie.mapreduce.map.memory.mb</name>
<value>6144</value> <!-- for example -->
</property>
it works fine, I just wanted to know any other place apart from workflow where we can add this configs like global level of hadoop configs.

Related

applicationHost.xdt in App Service Plan instances

The Issue
I am currently in the process of integrating a pre-rendering service for SEO optimization, however we use an Azure App Service Plan to scale up or down when necessary.
One of the steps for setting up the proper configuration requires placing an applicationHost.xdt file in the /site/ directory, which is one level above the /site/wwwroot directory where the application itself gets deployed to.
What steps should I take in order to have the applicationHost.xdt file persist to new instances spawned by the scaling process?
Steps I have taken to solve the issue
So far I have been Googling a lot, but haven't succeeded in finding a lot of documentation on using an applicationHost.xdt file in combination with an Azure App Service Plan.
I am able to upload the file to an instance manually, however I have assumed that when we then scale up to more instances the manually uploaded file will not be present on the new instance(s).
Etcetera
We are using Prerender.io as pre-rendering service.
Should there be an easier to set-up & similarly priced service available, we would be open to suggestions as we are in an exploratory phase regarding pre-rendering.
Suppose this won't be a problem, cause all files under azure app are shared between all your instances. You could check it in this Kudu wiki:Persisted files. And in my test all instances will keep the file.
About upload the applicationHost.xdt, you don't have to do it manually, there is a IIS Manager Site Extension to lets you very easily create XDT files. And it will provide some sample XDT's for you.

Change storage directory in Artifactory

I have just installed Artifactory and I need to set up and running a company-wide Ivy repository.
For disaster-recovery purposes, I need Artifactory to store data on a RAID-1 file system mounted at /srv (where MySQL datafiles are stored also). I would not prefer using blob storage, so how can I tell Artifactory to store all of its data in a directory different than the standard?
System info: I run SLES 11 and I have installed Artifactory from RPM.
If you have artifactory 4.6 or greater, you can create a $ARTIFACTORY_HOME//binarystore.xml config file. e.g /var/opt/jfrog/artifactory/etc/binarystore.xml
The following config would put the artifacts in the /data directory
<config version="v1">
<chain template="file-system"> <!-- Use the "file-system" template -->
</chain>
<provider id="file-system" type="file-system"> <!-- Modify the "file-system" binary provider -->
<fileStoreDir>/data/binaries</fileStoreDir> <!-- Override the <fileStoreDir> attribute -->
</provider>
</config>
The checksum based storage is one of the biggest advantages of Artifactory. It gives much better performace, deduplication and allows uploads optimization, replication optimization, free copy and move artifacts. The blob storage is by far the right way to store blobs (binaries).
Location of the artifacts storage can be changed according your needs by mapping the storage as $ARTIFACTORY_HOME/data.
For disaster recovery we recommend to setup active/passive synchronization or active/active cluster. Also, the Artifactory backup dumps the files in the standard directory structure format and the location of the backup can be configured.

What's the best way to remotely configure a running asp.net web application?

I have an ASP.NET web application that I'm deploying and I'm trying to figure out the best way to manage which environment it should point to when it starts up and to make sure that I haven't overlooked any options.
First a little bit of background.
The application is web deployed automatically from a build server, using the artifacts generated by the continuous build. The deployment package contains the configuration settings for every available environment, so you end up with something like this:
/Config/Environments/Development.xml
/Config/Environments/UAT.xml
/Config/Environments/Production.xml
The question is, what's the best way to indicate to the application when it starts which environment configuration file it should load?
Ideally I'd like to be able to change the current environment of the running application if possible, but I'm happy to skip this for now as I can always do a redeploy if need be.
I'd also like to avoid changing any of the artifacts that are created by the build, especially because the web deploy package is a zip file and doing that would mean rebuilding the web deploy package.
I've come up with the following options:
Use an environment variable on the target machine to hint at what environment to start up with and, if not present, default to development. The main downside to this is that I wouldn't be able to run two instances of the application on the same machine that point to different environments, and because we typically deploy uat and staging environments to the same machine this might become a problem.
Remotely edit the web.config indicating which environment to start up with, I'm not sure how to do this, but it might be the best option(?).
There might be something you can do with web deploy, for example telling it to set web.config values when it runs, but I don't know if this is possible(?).
Am I missing something obvious? Any help would be greatly appreciated!
If you are deploying multiple instances to the same machine, then I assume the file paths could be helpful? e.g. deploy to a file path with environment name in folder: C:\inetpub\dev
this is probably the simplest approach.
If you go with option 1 I'd use registry over environment variable.
You can also probably look at the target environment in your build script depending on the build server you are using.
Turns out the third option is actually possible, what I've done is the following:
Create a parameters.xml file in the root of the web application project that contains the parameters I want to change:
<parameters>
<parameter name="Environment"
description="Please provide the environment name for the application."
defaultValue="Development"
tags="">
<parameterEntry
kind="XmlFile"
scope="\\web.config$"
match="/configuration/appSettings/add[#key='Environment']/#value" />
</parameter>
</parameters>
This will create a SetParameters.xml in the same place as your deployment package when it is created.
You can update the SetParameters.xml with the value of the environment you want to deploy to when your deploy step is running, for example in msbuild this would look like this:
<XmlUpdate
XmlFileName="$(DeployFolder)\Project.SetParameters.xml"
Xpath="/parameters/setParameter[#name='Environment']/#value"
Value="$(Environment)" />
Now you can run your deploy.cmd and it'll set the parameters when it deploys on the remote machine.

Using Robocopy to deploy sites

I want to be able to quickly deploy updates to a site that is fairly busy. For smaller sites I would just FTP the new files over the old ones. This one, however, has a few large dll's that regularly get updated and while they are copying the site is effectively down (plus there is the hassle of making backups of them in case something goes wrong.
My plan is to use TortoiseHg to synchronise with a staging copy on the server over FTP (using netdrive or something similar). I can then check all is running smoothly and once that is complete I would like to run a .bat file (or something else) that will create a backup of the live site (preferably only the files that are about to change, but that is not critical) and then copy the newly changed files over to the live site.
If possible I also want to have the copy ignore certain directories (like user uploads) so that it won't overwrite those files on the live site?
I've heard RoboCopy is the way to go but I'm not sure of where to start. Would I need to call 2 commands (1 for the initial backup and one for the copy)? Is there any way to restore the live site to it's previous state should something go wrong?
The site is in ASP.NET and would be copied to Windows 2003 server.
EDIT: It gets a little tricky when web.config items have changed and need to be merged so that the staging servers settings (appsettings, connection strings, etc) don't get deployed to the live site. How does that get handled?
What we use is the following
first build the website with msbuild in cruisecontrol.net to build the binaries
archive the currently deployed files under a timestamped folder to avoid losing data in case of a problem
C:\DevTools\Robocopy\robocopy.exe /R:1 /W:10 /mir "D:\WebSite\Files" "D:\Webarchive\ArchivedFiles\Documents.%date:~0,-8%.%date:~3,-5%.%date:~6%.%time:~0,-9%.%time:~3,-6%.%time:~6,-3%" /XF *.scc
stop the website
deploy the website by copying everything except the files we archived (/XD is eXclude Directory)
C:\DevTools\Robocopy\robocopy.exe /R:1 /W:10 /mir "c:\dev\site" "D:\WebSite" /XF *.scc /XD "D:\WebSite\Files"
copy and rename (with xcopy, this time) a release.config with correct information to d:\Website\web.config (in fact, that's what we used to do, now we have a homebrew transformation engine to change parts of the dev web.config on the fly).
restart the website
(optional) delete the archive you made at step two
In your case, you'll have to add the /XD flags for any directory you want to ignore, such as the users' upload. And unless the production web.config file is complicated, i'd really recommend simply copying a release.config that you maintain as a part of the project, side by side with the web.config
Is Robocopy a hard requirement? Why not use MSBuild? Everything you have listed can painlessly be done in MSBuild.
<!-- Attempt to build new code -->
<MSBuild Projects="$(BuildRootPath)\ThePhotoProject.sln" Properties="Configuration=$(Environment);WebProjectOutputDir=$(OutputFolder);OutDir=$(WebProjectOutputDir)\" />
<!-- Get temp file references -->
<PropertyGroup>
<TempConfigFile>$([System.IO.Path]::GetTempFileName())</TempConfigFile>
<TempEnvironmentFile>$([System.IO.Path]::GetTempFileName())</TempEnvironmentFile>
</PropertyGroup>
<!-- Copy current web configs to temp files -->
<Copy SourceFiles="$(OutputFolder)\web.config" DestinationFiles="$(TempConfigFile)"></Copy>
<Copy SourceFiles="$(OutputFolder)\web.$(Environment).config" DestinationFiles="$(TempEnvironmentFile)"></Copy>
<ItemGroup>
<DeleteConfigs Include="$(OutputFolder)\*.config" />
</ItemGroup>
<Delete Files="#(DeleteConfigs)" />
...
<!-- Copy app_offline file -->
<Copy SourceFiles="$(CCNetWorkingDirectory)\Builder\app_offline.htm" DestinationFiles="$(DeployPath)\app_offline.htm" Condition="Exists('$(CCNetWorkingDirectory)\Builder\app_offline.htm')" />
<ItemGroup>
<DeleteExisting Include="$(DeployPath)\**\*.*" Exclude="$(DeployPath)\app_offline.htm" />
</ItemGroup>
<!-- Delete Existing files from site -->
<Delete Files="#(DeleteExisting)" />
<ItemGroup>
<DeployFiles Include="$(OutputFolder)\**\*.*" />
</ItemGroup>
<!-- Deploy new files to deployment folder. -->
<Copy SourceFiles="#(DeployFiles)" DestinationFiles="#(DeployFiles->'$(DeployPath)\%(RecursiveDir)%(Filename)%(Extension)')" />
<!-- Delete app_offline file -->
<Delete Files="$(DeployPath)\app_offline.htm" Condition="Exists('$(DeployPath)\app_offline.htm')" />
On Nix based servers i would use RSYNC and i understand that on Windows you can use DeltaCopy which a port of RSYNC and is open sources (never used DeltaCopy so please check it carefully) Anyway assuming it works like RSYNC then it is fast and only updates files that have been changed.
You can use various configuration options to delete files on the target that have been deleted on the source and you can also use an add in a file that will exclude files or directories, i.e. the local config, you do not want copying. etc.
You should be able to fold it all into one script to run when required which means you can test and time it so you know what is happening.
Check out these links to see if they help:
ASP.NET website Continuous Integration+Deployment using CruiseControl.NET, Subversion, MSBuild and Robocopy
Deployment to multiple folders with Robocopy
You'll find that robocopy.exe /? is extremely helpful. In particular you'll want the /XF switch for excluding files, and /XD for excluding folders.
You will need to write a script (e.g. bat, powershell, cscript) to take care of the web.config issues though.
Microsoft themselves use robocopy to deploy updates to some sites.
I don't know if you have multiple servers, but our deployment script went something like: 1) Stop IIS (which would take the server out of load-balancer rotation, 2) RoboCopy /MIR from \STAGING\path\to\webroot to \WEB##\path\to\webroot where ## is the number of the server, 3) Start IIS. This was done after the site was smoke-tested on the staging server.
That doesn't much help with your config problem, but our staging and production config files were the same.
What you need (and I need) is a synchronize program with the ability to create backup of the files on the server, and make quick copy over ftp of the files at ones by probably copying them first on a temporary directory, or by partial updating.
This is one program that I found : http://www.superflexible.com/ftp.htm
WebDeploy is a much better way to handle deploys (see Scott H http://www.hanselman.com/blog/WebDeploymentMadeAwesomeIfYoureUsingXCopyYoureDoingItWrong.aspx)
But, Robocopy is a great low-cost deploy tool that I still use on some sites (haven't find the time to change them to webdeploy). Robocopy is like xcopy but with a much richer set of options. So you would need 2 Robocopy commands (1 for backup and 1 for deploy). I normally do the backup command when the files are staged.
Managing config files is always tricky (and a big reason to use webdeploy). One approach, is keep a copy of the config files for each environment checked into your source control (eg, web.dev.config, web.uat.config, web.prod.config, etc). The staging (or deploy script) would grab and rename the necessary config file.
You would probably need to use a combination of tools.
I would have a look at DFSR (File Server role) with a read-only folder on your live site (so it's one-way replication).
It is very easy to configure, has a nice GUI, ability to exclude files based on location and/or masks, and with Volume Shadow Copy enabled you can have it running on schedules you set and updating those files that change only (or have it run on a schedule, or even run it manually). The beauty of this is once it is configured, you don't have to touch it again.
Once you have the bulk of your files replicating you could then get assistance in automating the possible merge on web.config, assuming you want that automated.
MSBuild is great, except for one minor (or major depending on your point of view) flaw. It rebuilds the binaries every time you run a build. This means, for deploying from TEST to PRODUCTION, or STAGE to PRODUCTION (or whatever your pre-production environment is called), if you use MSBuild, you are not promoting existing binaries from one environment to the next, you are re-building them. This also means that you are relying, with certainty, that NOTHING has changed in the source code repository since you did an MSBuild to your pre-production environment. Allowing even the slightest chance of a change to anything, major or minor, means you will not be promoting a fully tested product into your production environment. In the places I work, that is not an acceptable risk.
Enter Robocopy. With Robocopy, you are copying a (hopefully) fully tested product to your production environment. You would then either need to manually modify your web.config/app.config to reflect the production environment, OR use a transformation tool to do that. I have been using the "Configuration Transformation Tool" available on SourceForge for that purpose - it works just like the MSBuild web/app.config transformations.

How do you deal with connection strings when deploying an ASP.NET site?

Right now our test and production databases are on the same server, but with different names. Deploying has meant editing Web.config to change all the connection strings for the correct database. A step which I forget all too frequently...
We've finally created a new database server for testing, and I'm moving the databases over... but now the server will be different and we'll still need to deal with connection string issues.
I was thinking of managing it via a hosts file, but the thought of switching that on my desktop machine whenever I need to test against production data seems cumbersome at best.
So I'm just wondering if there's a better way out there. Something that would build with a "production" web config for deployment would be ideal...
Use a Web Deployment Project and update the wdproj file (it's just an MSBuild file) with some post build tasks to output the correct .config file. I keep a web.config and web.release.config then use this in the wdproj file:
<Target Name="AfterBuild">
<Copy Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' " SourceFiles="$(SourceWebPhysicalPath)\web.release.config" DestinationFiles="$(OutputPath)\web.config" />
<Delete Files="$(OutputPath)\web.release.config" />
</Target>
More information
A simpler solution some like is using configSource property of appSettings and connectionStrings and then never overwriting that file on the production server.
I usually have three separate web configs: one for my development machine, one for QA, and one for production. The development one connects to my local SQL database (which is firewalled from outside) and it is the default web.config. The others are named web-prod.config and web-qa.config. After publishing I delete the two that I don't need and rename the correct one to web.config. If I forget, the app breaks the first time it attempts to access the database, since the default config references one it can't get to.
Since IIS refuses to serve up a file named .config, I make sure they all end in .config instead of say web.config-prod or web.config-qa.
Here's another thing you can try:
Using SQL Server Configuration Manager, make a db Alias for your development database so that the web.config file can be the same on both your development box and the production server.
I create a database alias on each server to point to the database. I then use this alias in my web.config files. If I need to change which database the application points to, then I change the alias and not the web.config.
For SQL Server, go to SQL Server Configuration Manager > SQL Native Client Configuration > Aliases > Create New Alias.
You can do the same thing with Oracle with the tnsnames file.
have environment folders with separate configs for each environment
deploy out the correct one for the environment
I did this so often, I made the web.config on the production server read-only.
I've been in a few places now that store them in the registry.
There's probably more elaborate ways to do it now but a lot of code I've worked on with a 1.0/1.1 heritage store the strings in the registry.
The registry has a few advantages
It keeps people from deploying the code to the wrong places since machines not configured properly will lack the keys
It eliminates the problem wherein a developer will accidentally package a web.config file with the development connection strings in it (followed by a frantic phone call in the middle of the night wherein it is revealed that the late night sysadmin did not back up the previous web.config and the developer does not know or recall the production strings)
It limits the possibility of a hacker being able to get the connection string by fetching the web.config off of the machine. Plus the registry has more levels of security than the filesystem.
We drive our a deployments from our CI server. We usualy have a seperate file for each location and have the CI server switch to the appropriate config depending on the arguments passed ot it. All the file editing is done in NAnt scripts, so develops can run the sam build on their machine to get their own settings.
I'll put my connection strings in the machine.config on our QA and Production boxes. I'll keep them in the web.config on my dev box for flexibility, though. Then, I'll use a web deployment project to overwrite my dev connection strings with nothing (no connection strings) when deploying to QA. Therefore the QA site relies on the connection strings in machine.config. I still deploy to Production manually to make sure everything succeeds. I do this by manually copying everything from QA (except for web.config) to production.
This kind of task is exactly what build events are designed to address. Think of building as building for a specific target, any target specific configuration should be done there. (with the normal caveat that there are always some exceptions to the rule)
I've recently been leaning towards config manipulation on the continuous integration server. That's because we've had problems with multiple web.config, web.qa.config, web.production.config keeping the 95% of the file that should be the same in sync.
In a nutshell: there's only the one web.config in source control and it's the development configuration (debug friendly, local db, etc.). The build server does the compile, then a deploy to the canary site, then the package for release candidate.
We're using nant, so it's the .build file that has xmlpoke to set debug="false", alter connection strings, and whatever else needs to change in the canary copy and the packaging copy of the web.config.
The build machine's deploy is called "canary" because it's the first thing to die if there's a problem.

Resources