Push to Azure Container Registry is painfully slow at unpredictable times - azure-container-registry

We're pushing docker images from Azure DevOps pipeline to the repo in ACR.
The build agent is running on a VM in Azure and it's not utilised by anything else.
Sometimes (often at night) the push commands takes 3s, sometimes (mostly during the day UTC) it can take up to 20min.
The tier of our ACR is Standard.
Do you encounter similar problems? Is there to do to avoid these "lags"?
Since we're pusinh many images per day, this can be really a nuisance.

There is an issue about this is on Github: https://github.com/Azure/acr/issues/214
For us it has been especially bad yesterday and today.
In the issue on Github they mentioned they are looking into it: https://github.com/Azure/acr/issues/214#issuecomment-509672337

Related

How to configure Google Cloud Composer cost-effectively

After some research and testing, we have decided to start using Google Cloud Composer. Since our current DAGs and tasks are relatively small, and don't require the server to run continuously, I am looking how to manage costs.
Two questions:
The option to use preemptible VMs seems logical. This saves costs considerably, and I'm thinking to go for 3x n1-standard-4. I expect each task to be quite short, so don't think this will have significant impact for our workloads. Is it possible to use preemptible VMs with Composer?
Schedule to turn the Composer environment on/off, as asked in this post. I can't find how to do this in the documentation, either by turning the whole enviroment down, or to shutdown the workers as proposed in the answer.
Help, anyone?
This is an interesting question.
One roadblock you may encounter is the nature of Airflow itself. Generally, Airflow is not intended for use ephemerally. Instead, I'd suspect that the vast majority of Airflow use, Cloud Composer or otherwise, is persistent. Ephemerality brings cost benefits but also risks with Airflow architecture. For example, what happens if the scheduler to restart your Airflow resources fails?
To answer your questions:
Preemptibles are not supported in Composer. While PVMs have a ton of awesome benefits, they could leave tasks in a very weird state, especially if you got preempted several times.
There is not formal documentation for this process because it's generally informal and not recommended if you must depend on your environment. The basic approach, though, would be to:
Create a very small GCE VM
Setup the Cloud SDK (gcloud) to connect to your project
Create a crontab that either does a fresh create/delete of an environment when you need it /or/ pauses the VMs in the Composer worker pool
In the long-term, I think Composer will better support ephemeral use of worker resources. In the short term, another option is to run a lightweight Airflow environment on a small(ish) GCE VM and then suspend/resume that VM when you need to use Airflow. You don't get Composer that way, but you do benefit from the team's work improving and expanding GCP support in core Airflow.

Next.js app takes 33 seconds to load. What could make it THAT slow?

I'm almost done with my first ever Next.js app.
Dev was always somewhat slow to load, but production is absolutely ridiculous.
On first load it takes 30+ seconds for the home page to render.
I've seen really slow sites take 5-10 seconds, but what could I possibly be doing wrong that would lead to 30 second load times?
Is prefetch a huge performance killer?
Finding the bottleneck:
First of all you need to find out what is making your site slow.
For that I recommend lighthouse or the network tab of your development tools.
Common developer mistakes:
A. Development instead production mode:
Next.js has two modes. The development mode (next dev) is pretty slow, since a lot of development tools are executed and shipped.
To run your app in production mode first build it with next build and then start it with next start.
B. Included too much code:
Sometimes developers include gigantic npm modules or even modules that only have been build for node.js into next.js.
Finding these modules is actually pretty easy thanks to these examples:
https://github.com/zeit/next.js/tree/canary/examples/with-webpack-bundle-analyzer
https://github.com/zeit/next.js/tree/canary/examples/with-webpack-bundle-size-analyzer
C. Cold Serverless instance:
If you are running your next.js instance on a serverless provider keep in mind that it may take some time to start the serverless instance. Especially if you have a free plan.
Try to exclude your development directory from Windows Defender

Alternative to client side task runners in asp.net 5

I have been using Visual Studio Team Services (was Visual Studio Online) for version control. As a backup i keep my project files on OneDrive and mapped the root work-spaces there. Soon after the launch of RC 1 and its support on Azure, we have started migrating all our projects to ASP.NET 5 and things have been really great, and i really love the GruntJS task runner for the client side development, but the issue with it is that the node modules create a highly nested folder structure which causes the OneDrive sync to fail.
The client side development happens in TypeScript and grunt is used just for bundling and minification purposes of the compiled JavaScript files. Since Web Optimizations is not the recommended method in ASP.NET 5 and i could not find any port for .NET Core
While searching for solutions i somewhere stumbled upon a link which said that OneDrive for Business does not have this limitation, we have Office 365 subscription for the organization and tried syncing the projects there, but failed as even OneDrive for business has this limitation.
Though there is a UserVoice suggestion for this, where the Microsoft Representative says that they are thinking about this, but till the time this gets implemented, i wanted to ask for alternatives to GruntJS for ASP.NET 5.
Since most of the application we build are single page enterprise apps where user opens the app during start of the workday and closes the tab at the end of day, rarely refreshing the app so we can live without optimizations for now, but just out of curiosity for consumer oriented application, are there any alternatives for GruntJS for ASP.NET 5.
EDIT
Points to be noted
TFS Check in happens only when a code needs to be committed, but OneDrive sync happens continuously in the background automatically, in the event of a hardware crash or device getting lost (or any reason in the world stopping us from accessing the code), TFS won't be able to provide the code under development which was not committed. OneDrive saves us from those scenarios
Check in happens mostly when a significant portion of assigned task in backlog is completed. In 50% of the cases it's 2-6 working hrs but in the rest half it can happen around 2-3 business days, worst case scenario is 5 days in case holidays happen in between.
One drive allows us to stop syncing of selected folders available on live to local but there is no way we can stop sync of local folders to live
Possibility of hardware failure is assumed to be once a year resulting in a loss of 5 working days and it's not an acceptable risk to loose the progress. (Just installation of visual studio enterprise with all features takes around 5-8 hrs on my development machine with 4th Gen Intel i5 2.9GHz 16GB ram, we have plans to upgrade our development machines with SSD and 64 GB of ram, but that's out of scope of question)
If we weigh extra time spent in optimization tasks vs the risk of lost files, the optimization happens only when we push updates to production server which happens 1-3 times a month, every round of optimization tasks does not takes more than 10-15 minutes, so we can live with manually optimizing the files before production or not optimizing at all.
Files like .gitignore and.csproj allows which files to sync to TFS not OneDrive, and my problem is not at all with TFS, we are perfectly happy with the way Check Ins are done and managed, once a code is committed, all the worries are resolved automatically, my worries is just for the uncommitted code.
Summary
I agree the problem has a highly specific nature, but it might be of help to future readers either looking for similar solutions or just to gain some more knowledge.
Two things:
Team Foundation Services / Visual Studio online IS your source control solution, why would you need / want to back it up elsewhere?
I would probably exclude your node modules from your sync anyway. Your project configuration files will contain a list of the node modules you need, and if you wanted to move the app to a new machine or folder you'd just run npm install again to pull down the packages.
EDIT:
TFS Check in happens only when a code needs to be committed, but
OneDrive sync happens continuously in the background automatically, in
the event of a hardware crash or device getting lost (or any reason in
the world stopping us from accessing the code), TFS won't be able to
provide the code under development which was not committed. OneDrive
saves us from those scenarios
This indicates that your check ins are likely too large or not frequent enough.
How often do you check in? How often does your hardware fail to the extent where you lose all files? Is this an acceptable risk?
One drive allows us to stop syncing of selected folders available on live to local but there is no way we can stop sync of local folders
to live
And this is why you should be using TFS/VSO as the mechanism to control what files are stored in your source control repository. Systems like .gitignore or .csproj files exist for exactly this reason.
EDIT2:
I'm trying not to be too harsh and I agree this is a problem which could probably be solved in some specific way by OneDrive, but I'm trying to perform a SO equivalent of the 5 Whys.
You've said:
TFS won't be able to provide the code under development which was not committed. OneDrive saves us from those scenarios
and
Check in happens mostly when a significant portion of assigned task in backlog is completed. In 50% of the cases it's 2-6 working hrs but in the rest half it can happen around 2-3 business days, worst case scenario is 5 days in case holidays happen in between
Ok, so lets talk about that. You want to prevent loss of code if hardware fails, sure we all do. Looking at your figures 2-6 working hours seems reasonable, so you're looking at loosing 0.25 to 1 day (ballpark). I think the bigger issue is if people aren't checking in for 5 days. You've mentioned 5 days could be the worse case and only if holidays happen, so that's not 5 working days it's 2-3 business days worst case.
You've then said:
Possibility of hardware failure is assumed to be once a year resulting in a loss of 5 working days
So, above you've said 2-6 hours for check-ins (with exceptions) but you've based your decision to come up with a solution for this problem based on an assumption of 5 working days. Above you've said 5 days (not working) only if holidays occur. So really your figures for risk are exaggerated.
Let's be more realistic and say you have a hardware failure once a year, and you lose 4 hours worth of work.
From OneDrive, how long would it take you restore this work? lets guess at say a few minutes to connect and download it, and perhaps half an hour to an hour to sort out conflicts (#optimism). So lets say a net saving of 3 hours...once a year...do you think this is reason enough to back up all code to a and offsite cloud provider?
You talk about time taken to re-install VS, but that's not relevant to backing files up to OneDrive, you'd need to do that irrespective.
To be honest, I think your fear of losing uncommitted code is blown out of proportion and I think your reaction is too. TFS and VSO offers functions like shelvesets which can be used for just this kind of situation.
As an enterprise developer too, I do understand the concern, but normally you'd cover this by working on a machine with a RAID array or similar.
I think you should re-assess you check in policy and your assessment of the potential for lost code.

When will Meteor.com Meteor hosting be reliable enough for production use?

I've had very mixed experiences trying to host on Meteor.com.
I often get "This site is down. Try again later.". Initially I couldn't figure out why, but then I suspected that the problems were caused by me accidentally restoring the "system.users" collection. I tried restoring without that, but the site went down a few days later. Today, it's magically back up again without me doing anything.
"meteor logs" shows nothing. It's a complete black box.
I've investigated other options (Heroku, demeteorizer, meteor bundle, etc), but they are clunky and unreliable too (problems installing fibers, doesn't seem to handle Meteor.call() properly, etc).
I would really like to host in production on Meteor.com, but I feel I can't trust it right now. Free is nice, but I need reliable, production-quality Meteor hosting. When will I be able to buy that from Meteor.com?
Thanks,
Graeme
From the documentation:
We provide this as a free service so you can try Meteor. It is also helpful for quickly putting up internal betas, demos, and so on.
So it means it's intended to try things out, not for production. They offer it for free, I think it would be bad manner to abuse it. And with so many people around trying the thing, don't be surprised if it's overloaded from time to time.
I'm not sure if and when the Meteor team will make this hosting production ready. At this moment, I'm happy they're focusing on making the framework mature.
For other options, Heroku works as a charm. I'm using it for several projects, including production ones, and had no problem. Don't bother with demeteorizer and such. Just create a new app and run these commands (replacing appname with your app name):
heroku git:remote -a appname
heroku config:add ROOT_URL=appname.herokuapp.com
heroku config:set BUILDPACK_URL=https://github.com/oortcloud/heroku-buildpack-meteorite.git
git push -f heroku master
MDG (the Meteor Development Group, the core team) is working on a hosting solution called Galaxy. It will likely tie in to meteor in ways other solutions won't (such as meteor deploy). This will be (AFAIK) a paid service, and as such will likely offer analytics for better insight (my guess). And of course scaling will be taken care of for us.
More on the subject: http://www.meteor.com/blog/2013/10/01/geoff-schmidt-at-devshop-8-getting-meteor-to-10

Replicating load related crashes in non-production environments

We're running a custom application on our intranet and we have found a problem after upgrading it recently where IIS hangs with 100% CPU usage, requiring a reset.
Rather than subject users to the hangs, we've rolled back to the previous release while we determine a solution. The first step is to reproduce the problem -- but we can't.
Here's some background:
Prod has a single virtualized (vmware) web server with two CPUs and 2 GB of RAM. The database server has 4GB, and 2 CPUs as well. It's also on VMWare, but separate physical hardware.
During normal usage the application runs fine. The w3wp.exe process normally uses betwen 5-20% CPU and around 200MB of RAM. CPU and RAM fluctuate slightly under normal use, but nothing unusual.
However, when we start running into problems, the RAM climbs dramatically and the CPU pegs at 98% (or as much as it can get). The site becomes unresponsive, necessitating a IIS restart. Resetting the app pool does nothing in this situation, a full IIS restart is required.
It does not happen during the night (no usage). It happens more when the site is under load, but it has also happened under non-peak periods.
First step to solving this problem is reproducing it. To simulate the load, we starting using JMeter to simulate usage. Our load script is based on actual usage around the time of the crash. Using JMeter, we can ramp the usage up quite high (2-3 times the load during the crash) but the site behaves fine. CPU is up high, and the site does become sluggish, but memory usage is reasonable and nothing is hanging.
Does anyone have any tips on how to reproduce a problem like this in a non-production environment? We'd really like to reproduce the error, determine a solution, then test again to make sure we've resolved it. During the process we've found a number of small things that we've improved that might solve the problem, but I'd really feel a lot more confident if we could reproduce the problem and test the improved version.
Any tools, techniques or theories much appreciated!
You can find some information about troubleshooting this kind of problem at this blog entry. Her blog is generally a good debugging resource.
I have an article about debugging ASP.NET in production which may provide some pointers.
Is your test env the same really as live?
i.e
2 separate vm instances on 2 physical servers - with the network connection and account types?
Is there any other instances on the Database?
Is there any other web applications in IIS?
Is the .Net Config right?
Is the App Pool Config right for service accounts ?
Try look at this - MS Article on II6 Optmising for Performance
Lots of tricks.

Resources