Google Cloud Stackdriver Debugger - production debugging?

Google Cloud Stackdriver Debugger - production debugging? - stackdriver

How does stackdriver debug application which are in production?
Will the server be down during this period?
How would the latency be?
Is there a way we can debug to an incident that's 'already happened'? e.g. I have an application running in production. And there was an issue - say, I wasn't able to add an item to the shopping cart, or some other issue. Can we go back and debug the issue? Or does it debug the live application?

Stackdriver Debugger's core functionality is rapidly taking a snapshot of your running operation. This means your server is not down, but also means that you can't go back in time either.
Stackdriver Debugger has a quickstart and various other docs that can be useful in getting a basic understanding of what the product does.

Stackdriver Debugger is an always on, whole service debugger. You don't debug just a single server/VM but rather all of your servers belonging to the same service, at the same time. It captures the call stack and variables from a single server when the condition hits and then cancels the snapshot from all other servers.
Stackdriver Debugger agent doesn't stop the process, but briefly pauses the thread hitting the snapshot line and condition. Usually the thread is paused for about 3ms to capture ~64K of information, your time may vary.
Stakdriver Debugger agents are written from scratch with the purpose of optimizing for application latency. They use all sort of tricks to avoid pausing the running thread/server. (e.g., serialization of the data happens after the thread is released)
Stackdriver Debugger is a realtime interactive debugger. There is really now way to debug something that happen in the past. However, since it's a production debugger you can set your snapshot location in production and wait of the event to happen again.
One other feature of Stackdriver Debugger that might find useful are logpoints. These are log statement that you can insert dynamically to your application with a specific case/condition in mind. You don't have to make code changes or re-deploy your service. see the blogpost.

Related

How to debug mobile QML applications in production?

If crash happened on mobile device, how developer team can receive it?
What should be logged to restore what happened? Just actions on objects and page transitions?
If my markup will looks wrong on some devices or application will behave strange or come to weird state, I want functionality to collect screenshot and info from device and send it. What is the best practices here?

The question is about sending the crash stack trace and logs out. Not about QML app per se but about its C++ base or just about C++ app if we have one. The app should have logging enabled and collect its activity info, maybe for the period of time or until the logs get large enough. We were splitting log in chunks files and removing the oldest after we've accumulated, say, 5 of 100kb chunks.
Crash stack/minidump. Both call stack for all threads and the time of the crash plus minidump of the code with all variables visible can be collected.
How to send the log and crash stack/minidump out? There solutions like BreakPad we supposed to link with/ enable in the app code. The app will take care of sending all the crash info out when it runs again after the crash.
Quite a few things to implement, no to mention the web service that collects the crash info from client apps.
And you have to have "symbols" for the app release code kept in order to be able to trace the stack and see variable values at the time of a crash.

OpenCensus Not Showing Traces On Google App Engine in Stack Driver

I am using OpenCensus as recommended by Google Cloud to run StackDriver Trace (https://cloud.google.com/trace/docs/setup/java). My configuration is running on Google App Engine Standard Java 8. I have ensure the API is enabled on the project, used the initialization code and have created spans where I am trying to trace.
I simply create the span with
Span span = tracer.spanBuilder(spanName).startSpan();
and then finish it with
span.end();
It seems straight forward but none of my custom traces were visible in the Google Cloud Trace console, only the default RPC calls traced by Google. I then tried using Scopes instead of Span, initializing StackdriverTraceExporter with and without the project name, but nothing results in creating the custom traces.
Any guidance or suggestion on where to look would be greatly appreciated as this is the first time I am using OpenCensus.

I found that OpenCensus has a 5 seconds delay before flushing its cache to write to the exporter location. This means to get the traces to show up, you have to keep the thread alive for at least 5 seconds. The issue I had is in a multithreaded environment, the Threads were dying too fast.
OpenCensus is proposing a chance to that will allow you to pro grammatically flush the cache which will allow developers to flush the cache prior to returning the response which should ensure span data is written out reliably.

ASP.NE WEBSITE takes forever to respond

My asp.net web application is encountering down time everyday, it takes forever to respond. But once I stop and start (not iis reset) the website in IIS it will work again. Then hours/a day later it will become unresponsive again. What would be the reason? I'm suspecting an unclosed connection to database but hard to find them. The codes were made by the previous programmer.

Check the queue length which is a setting under apppool.
If its happening during a particular time of the day then please check the resource utilization like CPU/RAM consumed during that particular time.
There are APM tools like App Insight available which you can use to monitor the request response time for the requests.
You can implement Google analytics to see number of users online or requesting to see if its threshold issue.

Look into IIS logs during the time of issue and check the time-taken field. If its above normal, proceed to the following step
During the time of issue (before you restart the website), capture a manual hang dump of the w3wp process - https://blogs.msdn.microsoft.com/debugdiag/2013/03/15/debug-diagnostic-1-2-generate-a-manual-hang-dump-on-a-specific-process/
Run Debug Diag report and share it if you can. It'll tell you things that are possible going wrong.

Use of AX 2012 debugger when other users also work on same ax client

I would like to debug code in PreProduction environment but I'm wondering if it will bother other users who are using same AX client. Will it affect others if I debug code?

It should not bother other users since the debugger is a program separate to the client.
Add breakpoints to the code using F9, or selecting Toggle breakpoint in the debug menu or clicking on the gray line next to the code. The breakpoint is then user specific and the client will only stop and start debugging for your user.
If you set a breakpoint in your X++ code by using the breakpoint statement, that will bother users and when their actions reach the breakpoint statement, the client will stop and start a debugging session.
So use this:
And not this:

It will most likely not disrupt users in the way you think it may. Like Jan said, their clients could hang, but when your client is frozen during debugging, their clients will not be frozen at the same time, merely because you are debugging. They open their own sessions and connect to the AOS independently of each other. They would be affected by locked transactions, that should take seconds, but since you've potentially paused code execution in the middle of a transaction, it will maintain a lock.
You can demonstrate this by debugging in a development environment, then opening a second client instance on the same local or on a remote machine, and you will see that you can continue working/testing. This is what you should do if you are very concerned about impact.
If they have administrative or AX debugger permissions on the machine, global breakpoints turned on, and the debugger installed in tandem with the client where they are working, then technically they could launch a debugger session...but the planets sort of have to align for that to happen in most installations. It would be very bad practice for that to happen.

Why would the Application_Start method being called a lot on my ASP.NET web service?

I have an ASP.NET Web Service (SOAP style) that is running in our production environment.
Our server guys have set things up such that things like starting and stopping of Windows services, etc., are sent via email to the appropriate parties.
Lately my boss has been getting emails about my ASP.NET web service:
The (My Web Service's Name) Application_Start method was called
Now I figure that what's happening here is that the service has gone so long since being called last that the server has unloaded it from memory and now it's being re-loaded again (the product that consumes this web service has declined in popularity, so this isn't too far fetched a theory).
However my boss tells me he's been getting this email "dozens" of times per day.
I suppose it's still possible that my theory above is accurate, especially given how it's spread out over 3-4 servers in our web tier, but is there any other explanation for why this might be happening so frequently?
At this point in time I don't know whether or not Application_End calls are being similarly emailed or not, or what the ratio is.

The application could be downloaded when some of the settings in <processModel> are exceeded. idleTimeout could be the one in your case, but also requestLimit and memoryLimit.
Also, and this is based on a true story, if you start any thread, run anything in a separate threadpool thread, or use the TPL, make sure that you catch any exception that might be thrown. Uncaught exceptions from those threads will kill the worker process. Check the application logs in the Windows event log. If this is the case, you should see the red icon application error signs around the same time that the emails go out.