What is the best method to zip large files present in AZ blob storage and download them to the user in an archive file (zip/rar)
does using azure batch can help ?
currently we implement this functions in a traditionally way , we read stream generate zip file and return the result but this take many resources on the server and time for users.
i'am asking about the best technical and technologies solution (preferred way using Microsoft techs)
There are few ways you can do this **from azure-batch only point of view**: (for the initial part user code should own whatever zip api they use to zip their files but once it is in blob and user want to use in the nodes then there are options mentioned below.)
For initial part of your question I found this which could come handy: https://microsoft.github.io/AzureTipsAndTricks/blog/tip141.html (but this is mainly from idea sake and you will know better + need to design you solution space accordingly)
In option 1 and 3 below you need to make sure you user code handle the unzip or unpacking the zip file. Option 2 is the batch built-in feature for *.zip file both at pool and task level.
Option 1: You could have your *rar or *zip file added as azure batch resource files and then unzip them at the start task level, once resource file is downloaded. Azure Batch Pool Start up task to download resource file from Blob FileShare
Option 2: The best opiton if you have zip but not rar file in the play is this feature named Azure batch applicaiton package link here : https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
The application packages feature of Azure Batch provides easy
management of task applications and their deployment to the compute
nodes in your pool. With application packages, you can upload and
manage multiple versions of the applications your tasks run, including
their supporting files. You can then automatically deploy one or more
of these applications to the compute nodes in your pool.
https://learn.microsoft.com/en-us/azure/batch/batch-application-packages#application-packages
An application package is a .zip file that contains the application binaries and supporting files that are required for your
tasks to run the application. Each application package represents a
specific version of the application.
With regards to the size: refer to the max allowed in blob link in the document above.
Option 3: (Not sure if this will fit your scenario) Long shot for your specific scenario but you could also mount virtual blob to the drive at join pool via mount feature in azure batch and you need to write code at start task or some thing to unzip from the mounted location.
Hope this helps :)
Related
I a using GGIR package for accelerometer data analysis. My data is onedrive folder which takes a long time to download. Is there a way I can access the onedrive files directly without downloading to my local machine?
My guess would be that this is not possible. If you're working with Azure there are tools available to connect to OneDrive and download/upload the data which is then processed on a separate instance. I'm guessing the same applies to your local machine, but I'm not intimately familiar with Microsoft's services to be sure.
For example:
By using Azure Logic Apps and the OneDrive connector, you can create automated tasks and workflows to manage your files, including upload, get, delete files, and more. With OneDrive, you can perform these tasks:
Build your workflow by storing files in OneDrive, or update existing files in OneDrive.
Use triggers to start your workflow when a file is created or updated within your OneDrive.
Use actions to create a file, delete a file, and more. For example, when a new Office 365 email is received with an attachment (a trigger), create a new file in OneDrive (an action).
https://learn.microsoft.com/en-us/azure/connectors/connectors-create-api-onedrive
Given we have some Hadoop MapReduce task to be run. This MapReduce needs to access some system resources on local drive, i.e. on some node (in fact, we have to place that resources to all nodes).
A question is: which permissions should be given to that resource file?
I would like to give it permissions to be read by the user which runs Hadoop. But in fact the task will be executed under another user. That user is 'yarn'. I.e. if I want to place some resources to some home folder of user which runs Hadoop Job, or related Oozie job etc I cannot do it because in fact home folder of the user which owns MapReduce is /home/yarn/.
What is the best way to deal with this issue?
How do I control under which user MapReduce runs?
Where can I lookup that settings?
I guess all you need is to create the required folders for such resources in HDFS, and set the permissions to those folders and the contained files using 'hadoop fs -chmod ..' command.
Please refer this below link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html
First off the statement "MapReduce needs to access some system resources on local drive" is not possible when running a MapReduce program in distributed mode. Whatever file you need should be moved to HDFS. Give the file a read permission to all users I hope everything should be fine. If you need to read the file in the Mapper or Reducer and not pass the file as the input to the MapReduce program then consider using Distributed Cache mechanism provided my MapReduce.
I want to deploy a php application from a git repository to AWS Opsworks service.
I've setup an App and configured chef cookbooks so it runs the database schema creation, dumping assets etc...
But my application has some user generated files in a sub folder under web root. git repository has a .gitignore file in that folder so an empty folder is there when i run deploy command.
My problem is : after generating some files (by using the site) in that folder, if I run 'deploy' command again 'Opsworks' adds a new release under 'site_name/releases/xxxx' folder and symlink to it from 'site_name/current' folder.
So it makes my previous 'user generated stuff' inaccessible. What is the best solution for this kind of situation?
Thanks in advance for your kind answers.
You have a few different options. Listed below in order of personal preference:
Use Simple Storage Service (S3) to store the files.
Add an Elastic Block Store (EBS) volume to your server and save files to the volume.
Save files to a database (This is something I would not do myself but the option is there.).
When using OpsWorks think of replicable/disposable servers.
What I mean by this is that if you can create one server (call it server A) and then switch to a different one in the same stack (call it server B), the result of using server A or server B should not impact how your application works.
While it may seem like a good idea to save your user generated files in a directory that is common between different versions of your app (every time you deploy a new release directory is generated) when you destroy your server, you run the risk of destroying your files.
Benefits and downsides of using S3?
Benefits:
S3 will give you high redundancy and availability to your files.
S3 is external to your application server so if your server dies or decide to move it to a different region, you can continue using the same s3 bucket.
Application Easy to scale. You could add multiple application servers that read and write files to S3.
Downsides:
You need extra code in you application. You will have to use the AWS API in order to store and retrieve the files. Using the S3 API is not hard but it may require an extra step to get you where you need. Take a look at the "Using an Amazon S3 Bucket" walk through for reference. This is be the code they use to upload the files to the S3 bucket in the example.
Benefits and downsides of using EBS?
Benefits:
EBS is an "external hard drive" that you can easily mount to your machine using the OpsWorks Resource Manager.
EBS volumes can be backed-up and restored.
It may be the fastest option to implement and integrate to your application.
Downsides:
You need to assign it to an instance before it is running.
It could be time consuming to move from server A to server B (downtime may be required).
You can not scale your application horizontally. While you can create copies of the EBS and assign them to different instances, the EBS will not be shared.
Downside of using a database?
Just do a google search on "storing files in database"
Take a look at Storing Images in DB - Yea or Nay?
My preferred choice would be to use S3, but ultimately this is your decision.
Good luck!
EDIT:
Take a look at this repository opsworks-chef-cookbooks it contains some recipes to deploy Symfony2 application on OpsWorks. I have been using it for over a year and works quite well.
Use Chef templates, and use them in a recipe in the opsworks deploy lifecycle event.
I have heard lots of strategies for deploying asp.net applications, but am not quite sure which strategy is best for my needs.
I have an asp.net 4 application. I have separate development/staging/production environments (different web.configs). I also need to manage sql server changes. It is possible that I may have more than 1 DB server and more than 1 app server to push changes to. Ideally, I would like to hit a button and say "deploy to staging" or "deploy to production" and it brings deploys code/db/config files to the correct servers. Ideally, I'd like there to be some process to rollback in case of a bad release as well.
I have heard xcopy/robocopy strategies, MSDeploy (now called Web Deploy?) strategies, and building MSI packages to deploy.
Which of these seems like the best fit for this type of need?
Method #1
If you have some time to spend, I suggest using CruiseControl.NET. For a while at least, the stackoverflow team used this for deployments.
Method #2
As far as copy strategies go, I recommend using a combination of 7zip and ftp for application and media. 7Zip is nice, as it allows you to exclude file types (web.config), folders, and file types, and allows you to compress different files differently. Example, there is no point in compressing a PNG. Note, this does a full deployment every time. So, if you have large media folders, I'd handle them separately.
As for the database, I believe you will have the best of luck using SQL Compare by Redgate. They are commercial applications, but they are very, very good. They've been positively mentioned multiple times on the stackoverflow podcasts.
Build a CMD file on the development/build server that generates the master 7zip file and FTP's it to a dedicated folder on the staging (or production) server. I end up with multiple calls to 7zip feeding files into a single 7zip file, using different compression methods for each batch.
Build a CMD file for each staging or production server. This file will execute proper file backups, and extract the 7zip file to the proper location.
A deployment to staging will go like this:
Execute your 7zip-prep command file which will trigger FTP upload to a dedicated FTP folder on the staging server
Execute DB changes against the staging database server via scripts generated by SQL Compare
Execute 7zip extraction command file on the staging server
This is the method that I use. I have not invested the time to master CruiseControl.NET, but when I do I will probably use it instead, at least for larger applications. It's not a one-click deployment, but it allows for multiple efficient deployments per day (as I've been doing off and on for few years now). The 7zip method is nice, because once you have your command files, you can copy them and use them for new projects very quickly.
I'm developing an application using Adobe Flex 4.5 SDK, in which the user would be able to export multiple files bundled in one zip file. I was thinking that I must need to take the following steps in order for performing this task:
Create a temporary folder on the server for the user who requested the download. Since it is an anonymous type of user, I have to read Sate/Session information to identify the user.
Copy all the requested files into the temporary folder on the server
Zip the copied file
Download the zip file from the server to the client machine
I was wondering if anybody knows any best-practice/sample-code for the task
Thanks
The ByteArray class has some methods for compressing, but this is more for data transport, not for packaging up multiple files.
I don't like saying things are impossible, but I will say that this should be done on the server-side. Depending on your server architecture I would suggest sending the binary files to a server script which could package the files for you.
A quick google search for your preferred server-side language and zipping files should give you some sample scripts to get you started.