While reading the Spotify blog, I found a reference to something called "synthetic testing":
Having synthetic tests reduces time to recover
After this work involving timelines, we got some signals on time to recover. One such signal was that TTR was all over the place and genuinely hard to correlate with any single aspect of our systems.
However, we got a hit. One of the more exciting things we learned through our incident study was that synthetic testing works. We spent a fair amount of time grading whether or not a synthetic test would have plausibly detected outages, and then looked at the TTR for those that were in fact detected by synthetic tests, versus those that were not because they were not covered by a synthetic test.
The results were even more striking than we thought. We found that incidents involving coverable features that did have a synthetic test saw a recovery time that was generally 10 times faster. No really, read it again!
This may seem obvious, but we never want to discount the power of data to drive decisions. This isn’t just a curiosity. We’ve adjusted our priorities to put a greater emphasis on synthetic testing, as we think it’s pretty important to get things back up and running as quickly as possible.
What is a synthetic test, and how it differs from the normal software testing (unit, integration, ...) that is running in a CI?
Related
I would like to do anomaly detection in R on real-time stream of sensor data. I would like to explore use of either the Twitter anomalyDetection or anomalous.
I am trying to think of the most efficient way to do this, as some online sources suggest R is not suitable for real-time anomaly detection. See https://anomaly.io/anomaly-detection-twitter-r. Should I use the stream package to implement my own data stream source? If I do so, is there any "rule-of-thumb" as to how much data I should stream in order to have a sufficient amount of data (perhaps that is what I need to experiment with)? Is there any way of doing the anomaly detection in-database rather than in-application to speed things up?
My experience is that if you want real time anomaly detection, you need to apply an online learning algorithm (rather than batch), ideally running on each sample as it is collected/generated. To do it, you would need to modify the existing open sources to run in online mode and adapt the model parameters for each sample that is processed.
I'm not aware of an open source package that does it though.
For example, if you're computing a very simple anomaly detector, using the normal distribution, all you need to do is update the mean and variance of each metric with each sample that arrives. If you want the model to be adaptive, you'll need to add a forgetting factor (e.g., exponential forgetting), and control the "memory" of the mean and variance.
Another algorithm which lends itself to online learning is Holt-Winters. There is a several R-implementations of it, though you still have to make it run in online mode to be real time.
I gave a talk on this topic at the Big Data, Analytics & Applied Machine Learning - Israeli Innovation Conference last May. The video is at:
https://www.youtube.com/watch?v=SrOM2z6h_RQ
(DISCLAIMER: I am the chief data scientist for Anodot, a commercial company doing real time anomaly detection).
How is a load test different from a Spike test, Considering the below scenarios.
Load test: Using an automation tool(JMeter in my case) I create a load of 1000 virtual users loaded in 1 sec(ramp up period).
Spike test: Using an automation tool(JMeter in my case) I create a continuous load of 400 virtual users loaded every 1 sec and a spike load of 600 virtual users loaded in 1 sec at a certain point in time.
When there is a spike load induced is it not the same as a load test described?
So my point is what is the need of a spike test if load tests can be carried out continuously at varied load conditions?
Test scenario:
Application tested : Website.
Automation tool : Jmeter.
Speed of internet used while testing: 3 MBPS.
I`m thanking you all in advance.
According to "Performance Testing Guidance for Web Applications", "spike testis a type of performance test focused on determining or validating performance characteristics of the product under test when subjected to workload models and load volumes that repeatedly increase beyond anticipated production operations for short periods of time.". So I think about analogy with Geometric or Algebraic progression, because volumes are repeatedly (and rapidly) increased. Also this and other definitions are paying attention to short period of time.
Load testing is more general term, without specified time (short or long) of testing or pattern to increase load volumes.
Load Testing: It helps us to know how much load a application/System can bear at a point of time.
Ex: Let a normal man can drink Maximum 3lt water at a time.
spike testing: It helps us to know the behaviour of a system by giving suddenly high amount of load.
Ex: For spike testing we try to know whether a normal man can drink 4lt or more at a time?
A spike test is a kind of load test, used to simulate bursty traffic patterns.
For example, you might want to support 1 million client requests an hour. That's an average of 277 requests/sec. However, that doesn't account for varying usage patterns, like a sudden burst of traffic followed by a lull period. A spike test would simulate these bursts, where the short-term request rate can be much higher or lower than the expected average.
Is it possible to estimate the heat generated by an individual process in runtime.
Temperature readings of the processor is easily accessible but what I need is process specific information.
Is it possible to map information such as cpu utilization, io, running time, memory usage etc to get some kind of an estimate?
I'm gonna say no. Because the overall temperature of your system components isn't a simple mathematical equation with everything that's moving and switching either.
Heat generated by and inside a computer is dependent on many external factors like hardware setup, ambient temperature of the room, possibly the age of the components, is there dust on them or in the fans, was the cooling paste correctly applied on the CPU or elsewhere, where heat sinks are present, how is heat being dissipated etc.etc.. In short, again, no.
Additionally, your computer runs a LOT of processes at any given time apart from the ones that you control (and "control" is a relative term). Even if it is possible to access certain sensory data for individual components (like you can see to some extent in the BIOS) then interpolating one single process' generated temperature in regard to the total is, well, impossible.
At the lowest levels (gate networks, control signalling etc.), an external individual no longer has any means to observe or measure what's going on but there as well, things are in a changing state, a variable amount of electricity is being used and thus a variable amount of heat generated.
Pertaining to your second question: that's basically what your task manager does. There are countless examples and articles on the internet on how to get that done in a plethora of programming languages.
That is, unless some of the actually smart people in this merry little community of keytappers and screengazers say that it IS actually possible, at which point I will be thoroughly amazed...
EDIT: Monitoring the processes is a first step in what you're looking for. take a look at How to detect a process start & end using c# in windows? and be sure to follow up on duplicates like the one mentioned by Hans.
You could take a look at PowerTOP or some other tool that monitors power usage. I am not sure how accurate it is across different systems but a power estimation should provide at least some relative information as the heat generated assuming the processes you are comparing are running in similar manners on hardware. In reality there are just too many factors to predict power, much less heat, effectively but you may be able to get an idea of the usage.
we have a single threaded application that simulates the interaction of a hundred of thousands of objects over time with the shared memory model.
obviously, it suffers from its inability to scale over multi CPU hardware.
after reading a little about agent based modeling and functional programming/actor model I was considering a rewrite with the message-passing paradigm.
the idea is very simple - each object will be an actor and their interactions will be messages so that the simulation could happen in parallel. given a configuration of objects at a certain time - its future consequences can be easily computed.
the question is how to model the time:
for example let's assume the the behavior of object X depends on A and B, as the actors and the messages calculations order is not guaranteed it could be that when X is to be computed A has already sent its message to X but B didn't.
how to make sure the computation happens correctly?
I hope the question is clear
thanks in advance.
Your approach of using message passing to parallelize a (discrete-event?) simulation is well-known and does not require a functional style per se (although, of course, this does not prevent you to implement it like that).
The basic problem you describe w.r.t. to the timing of events is also known as the local causality constraint (see, for example, this textbook). Basically, you need to use a synchronization protocol to ensure that each object (or agent) processes its messages in the right order. In the domain of parallel discrete-event simulation, such objects are called logical processes, and they communicate via events (i.e. time-stamped messages).
Correctly implementing a synchronization protocol for these events is challenging and the right choice of protocol is highly application-specific. For example, one important factor is the average amount of computation required per event: if there is little computation required, the communication costs dominate the overall execution time and it will be hard to scale the simulation.
I would therefore recommend to look for existing solutions/libraries on top of the actors framework you intend to use before starting from scratch.
I know there are some scheduling problems out there that are NP-hard/NP-complete ... however, none of them are stated in such a way to show this situation is also NP.
If you have a set of tasks constrained to a startAfter, startBy, and duration all trying to use a single resource ... can you resolve a schedule or identify that it cannot be resolved without an exhaustive search?
If the answer is "sorry pal, but this is NP-complete" what would be the best heuristic(s?) to use and are there ways to decrease the time it takes to a) resolve a schedule and b) to identify an unresolvable schedule.
I've implemented (in prolog) a basic conflict resolution goal through recursion that implements a "smallest window first" heuristic. This actually finds solutions rather quickly, but is exceptionally slow at finding invalid schedules. Is there a way to overcome this?
Yay for compound questions!
The hardest part of most scheduling problems in real life is getting hold of a reliability and complete set of constraints. If we take the example of creating a university timetable:
Professor A will not get up in the morning, he is on a lot of committees, but no-one will tell the timetable office about this sort of constraint
Department 1 needs the timetable by the start of term, however, Department 2 that uses the same rooms is unwilling to decide on the courses that will be run until after all the students have arrived
Etc
Then you need a schedule system that can cope with changes, so when one constraint is changed at the last minute you don’t have to change the complete timetable.
All of the above is normally ignored in research papers about scheduling systems. As to NP completeness of a given scheduling problem, in real life you don’t care as even if it is not NP complete you are unlikely to even be able to define what the “best solution” is, so good enough is good enough.
See http://www.asap.cs.nott.ac.uk/watt/resources/university.html for a list of papers that may help get you started; there are still many PHDs to be had in scheduling software.
There are often good approximation algorithms for NP-hard/complete optimization problems like scheduling. You might skim the course notes by Ahmed Abu Safia on Approximation Algorithms for scheduling or various papers.
In a sense, all public key cryptography is done with "less hard" problems like factoring partially because NP-hard problems offer up too many easy cases. It's the same NP-completeness that makes them "morally hard" which also gives them too many easy problems, which often fall within some error bound of optimal.
There is a deeper theory of hardness of approximation that discusses the limitations of approximation algorithms though.
You can use dynamic programming to solve some of these things. Greedy algorithms also come to mind. Scheduling theory is both deep and beautiful but those two I find will solve most of the problems I've faced. Perhaps I've been lucky.
What do you mean with startBy?
With startAfter and if there is only one resource, then a fast solution could be to use topological sorting. The example algorithm runs in linear time, but does not include the error case if the graph contains cycles.
Here's one that isn't.
Schedule a set of jobs i= 1,2...n on a single machine which each take time t(i) so that the average waiting time is minimized.
Solution: Sort in increasing order of t(i). O(n log n)
Good list here
Consider the scheduling problem that is in the class P:
Input: list of activities which include the start time and finish time.
Sort by finish time.
Select the first N elements of this sorted list to find the maximum amount of activities you can schedule in a given time.
You can add caveats like: all activities must end at 5pm, well in this case as you work through the list, stop once you reach an activity which ends after this time.