"Noise" in performance measurement - r

I have a large & complex shiny app which I am currently analyzing in terms of performance. I used profvis to get profiles of the app's performance and to identify possible bottlenecks. Identifying bottlenecks itself was successful since the relative amount of time spent shows a clear pattern. The problem which got me wondering is, even in exactly identical scenarios, the performance can vary very much in terms of absolute time. The same calculation can take 60 seconds on one run and 100 seconds on another run some time later. This makes it quite difficult for me to properly evaluate code changes which I try out for improving performance. And on the other hand this noise itself turned out to be a performance problem of my app, which I want to solve.
I already eliminated possible factors which could cause 'randomness' in performance inside and outside the app/code like random seeds, memory usage (gc(), rm, no other programs running), different laptops, internet connection, always using the same data and settings, etc. as far as possible.
It's worth mentioning that I am at most an advanced beginner and can easily have overlooked something. Could a highly modularized code with iterative function calls and nested functions be the source of the noise?
My main question is: Are there common sources of 'noisy' performance in R/Shiny for which I should check?
Apologizing for the open, quite unspecified question. I've already gone through several performance-related articles/guides for r/shiny performance covering caching, writing faster & more stable functions etc., but couldn't really find the problem of noisy performance.

Related

Memory virtualization with R on cluster

I don't know almost anything about parallel computing so this question might be very stupid and it is maybe impossible to do what I would like to.
I am using linux cluster with 40 nodes, however since I don't know how to write parallel code in R I am limited to using only one. On this node I am trying to analyse data which floods the memory (arround 64GB). So my problem isn't lack of computational power but rather memory limitation.
My question is, whether it is even possible to use some R package (like doSnow) for implicit parallelisation to use 2-3 nodes to increase the RAM limit or would I have to rewrite the script from ground to make it explicit parallelised ?
Sorry if my question is naive, any suggestions are welcomed.
Thanks,
Simon
I don't think there is such a package. The reason is that it would not make much sense to have one. Memory access is very fast, and accessing data from another computer over the network is very slow compared to that. So if such a package existed it would be almost useless, since the processor would need to wait for data over the network all the time, and this would make the computation very very slow.
This is true for common computing clusters, built from off-the-shelf hardware. If you happen to have a special cluster where remote memory access is fast, and is provided as a service of the operating system, then of course it might be not that bad.
Otherwise, what you need to do is to try to divide up the problem into multiple pieces, manually, and then parallelize, either using R, or another tool.
An alternative to this would be to keep some of the data on the disk, instead of loading all of it into the memory. You still need to (kind of) divide up the problem, to make sure that the part of the data in the memory is used for a reasonable amount of time for computation, before loading another part of the data.
Whether it is worth (or possible at all) doing either of these options, depends completely on your application.
Btw. a good list of high performance computing tools in R is here:
http://cran.r-project.org/web/views/HighPerformanceComputing.html
For future inquiry:
You may want to have a look at two packages "snow" and "parallel".
Library "snow" extends the functionality of apply/lapply/sapply... to work on more than one core and/or one node.
Of course, you can perform simple parallel computing using more than one core:
#SBATCH --cpus-per-task= (enter some number here)
You can also perform parallel computing using more than one node (preferably with the previously mentioned libraries) using:
#SBATCH --ntasks-per-node= (enter some number here)
However, for several implications, you may wanna think of using Python instead of R where parallelism can be much more efficient using "Dask" workers.
You might want to take a look at TidalScale, which can allow you to aggregate nodes on your cluster to run a single instance of Linux with the collective resources of the underlying nodes. www.tidalscale.com. Though the R application may be inherently single threaded, you'll be able to provide your R application with a single, simple coherent memory space across the nodes that will be transparent to your application.
Good luck with your project!

Is R+bigmemory package sufficient for column-oriented data management?

I have a collection of financial time series of various sorts. Most of my analysis is either a column or a row oriented, very rarely I have to do any sort of complex queries. Also, I am (by now) doing almost all analysis in R.
Because of this, I am seriously considering not deploying any sort of RDBMS and instead managing data in R directly (saving RDS files). This would save me the pain of installing an administering a DB as well as probably improve the data loading speeds.
Is there any reason I should consider otherwise? Do you know anyone who manages their data this way? I know this is vague, but I am looking for opinions, not answers.
If working in R is your comfort zone.. I'd keep your data management there as well, even if your analyses or runs are longer.
I've had a similar decision lately:
Should I go in the direction of learning and applying a new (language/dialect/system) to shave some milliseconds off execution time.
or...
Should I go forth with the same stodgy old tools I have used, even if they will run slower at execution time?
Is the product of your runs for you only? If so, I'd stick with data management in R only.. even if production runs are slower.
If you were designing something for a Bank, Cell Phone Service, or a similar transactional environment, I'd recommend finding the super solution.
But if your R production is for you.. I'd stay in R.
Consider the opportunity cost. Learning a new language/ecosystem - and something like PostgreSQL surely qualifies - will soak up far more time than you likely think. Those skills may be valuable, but will they generate a return on time invested that is as high as the return you would get from additional time spent on your existing analysis?
If it's for personal use and there is no pressing performance issue, stick with R. Given that it's generally easier to do foolish things with text and RDS files than it is with a fully-fledged DB, just make sure you back up everything. From being a huge skeptic about cloud-based storage I have over the past half-year become a huge convert and all but my most sensitive information is now stored there. I use Dropbox, which maintains previous versions of data if you do mess up badly.
Being able to check a document or script from the cafe on the corner on your smartphone is nice.
There is a column-by-column management package, colbycol in CRAN designed to provide DB-like functions for large datasets. I assume the author must have conducted the same sort of analysis.

Tuning Mathematical Parallel Codes

Assuming that I am interested in performance rather than portability of my linear algebra iterative multi-threaded solver and that I have the results of profiling my code in hand, how do I go about tuning my code to run optimally on that machine of my choice?
The algorithm involves Matrix-Vector multiplications, norms and dot-products. (FWIW, I am working on CG and GMRES).
I am working on codes which are of matrix size roughly equivalent to the full size of the RAM (~6GB). I'll be working on Intel i3 Laptop. I'll be linking my codes using Intel MKL.
Specifically,
Is there a good resource(PDF/Book/Paper) for learning manual tuning? There are numerous things that I learnt by doing for instance : Manual Unrolling isn't always optimal or about compiler flags but I would prefer a centralized resource.
I need something to translate profiler information to improved performance. For instance, my profiler tells me that my stacks of one processor are being accessed by another or that my mulpd ASM is taking too much time. I have no clue what these mean and how I could use this information for improving my code.
My intention is to spend as much time as needed to squeeze as much compute power as possible. Its more of a learning experience than for actual use or distribution as of now.
(I am concerned about manual tuning not auto-tuning)
Misc Details:
This differs from usual performance tuning since the major portions of the code are linked to Intel's proprietary MKL library.
Because of Memory Bandwidth issues in O(N^2) matrix-vector multiplications and dependencies, there is a limit to what I could manage on my own through simple observation.
I write in C and Fortran and I have tried both and as discussed a million times on SO, I found no difference in either if I tweak them appropriately.
Gosh, this still has no answers. After you've read this you'll still have no useful answers ...
You imply that you've already done all the obvious and generic things to make your codes fast. Specifically you have:
chosen the fastest algorithm for your problem (either that, or your problem is to optimise the implementation of an algorithm rather than to optimise the finding of a solution to a problem);
worked your compiler like a dog to squeeze out the last drop of execution speed;
linked in the best libraries you can find which are any use at all (and tested to ensure that they do in fact improve the performance of your program;
hand-crafted your memory access to optimise r/w performance;
done all the obvious little tricks that we all do (eg when comparing the norms of 2 vectors you don't need to take a square root to determine that one is 'larger' than another, ...);
hammered the parallel scalability of your program to within a gnat's whisker of the S==P line on your performance graphs;
always executed your program on the right size of job, for a given number of processors, to maximise some measure of performance;
and still you are not satisfied !
Now, unfortunately, you are close to the bleeding edge and the information you seek is not to be found easily in books or on web-sites. Not even here on SO. Part of the reason for this is that you are now engaged in optimising your code on your platform and you are in the best position to diagnose problems and to fix them. But these problems are likely to be very local indeed; you might conclude that no-one else outside your immediate research group would be interested in what you do, I know you wouldn't be interested in any of the micro-optimisations I do on my code on my platform.
The second reason is that you have stepped into an area that is still an active research front and the useful lessons (if any) are published in the academic literature. For that you need access to a good research library, if you don't have one nearby then both the ACM and IEEE-CS Digital Libraries are good places to start. (Post or comment if you don't know what these are.)
In your position I'd be looking at journals on 2 topics: peta- and exa-scale computing for science and engineering, and compiler developments. I trust that the former is obvious, the latter may be less obvious: but if your compiler already did all the (useful) cutting-edge optimisations you wouldn't be asking this question and compiler-writers are working hard so that your successors won't have to.
You're probably looking for optimisations which like, say, loop unrolling, were relatively difficult to find implemented in compilers 25 years ago and which were therefore bleeding-edge back then, and which themselves will be old and established in another 25 years.
EDIT
First, let me make explicit something that was originally only implicit in my 'answer': I am not prepared to spend long enough on SO to guide you through even a summary of the knowledge I have gained in 25+ years in scientific/engineering and high-performance computing. I am not given to writing books, but many are and Amazon will help you find them. This answer was way longer than most I care to post before I added this bit.
Now, to pick up on the points in your comment:
on 'hand-crafted memory access' start at the Wikipedia article on 'loop tiling' (see, you can't even rely on me to paste the URL here) and read out from there; you should be able to quickly pick up the terms you can use in further searches.
on 'working your compiler like a dog' I do indeed mean becoming familiar with its documentation and gaining a detailed understanding of the intentions and realities of the various options; ultimately you will have to do a lot of testing of compiler options to determine which are 'best' for your code on your platform(s).
on 'micro-optimisations', well here's a start: Performance Optimization of Numerically Intensive Codes. Don't run away with the idea that you will learn all (or even much) of what you want to learn from this book. It's now about 10 years old. The take away messages are:
performance optimisation requires intimacy with machine architecture;
performance optimisation is made up of 1001 individual steps and it's generally impossible to predict which ones will be most useful (and which ones actually harmful) without detailed understanding of a program and its run-time environment;
performance optimisation is a participation sport, you can't learn it without doing it;
performance optimisation requires obsessive attention to detail and good record-keeping.
Oh, and never write a clever piece of optimisation that you can't easily un-write when the next compiler release implements a better approach. I spend a fair amount of time removing clever tricks from 20-year old Fortran that was justified (if at all) on the grounds of boosting execution performance but which now just confuses the programmer (it annoys the hell out of me too) and gets in the way of the compiler doing its job.
Finally, one piece of wisdom I am prepared to share: these days I do very little optimisation that is not under one of the items in my first list above; I find that the cost/benefit ratio of micro-optimisations is unfavourable to my employers.

How to increase my Web Application's Performance?

I have a ASP.NET web application (.NET 2008) using MS SQL server 2005. I want to increase the performance of the web site. Does anyone know of an article containing steps to do that, step by step, in SQL (indexes, etc.), and in the code?
Performance tuning is a very specific process. I don't know of any articles that discuss directly how to achieve this, but I can give you a brief overview of the steps I follow when I need to improve performance of an application/website.
Profile.
Start by gathering performance data. At the end of the tuning process you will need some numbers to compare to actually prove you have made a difference. This means you need to choose some specific processes that you monitor and record their performance and throughput.
For example, on your site you might record how long a login takes. You need to keep this very narrow. Pick a specific action that you want to record and time it. (Use a tool to do the timing, or put some Stopwatch code in you app to report times. Also, don't just run it once. Run it multiple times. Try to ensure you know all the environment set up so you can duplicate this again at the end.
Try to make this as close to your production environment as possible. Make sure your code is compiled in release mode, and running on real separate servers, not just all on one box etc.
Instrument.
Now you know what action you want to improve, and you have a target time to beat, you can instrument your code. This means injecting (manually or automatically) extra code that times each method call, or each line and records times and or memory usage right down the call stack.
There are lots of tools out their that can help you with this and automate some of it. (Microsoft's CLR profiler (free), Redgate - Ants (commercial), the higher editions of visual studio have stuff built in, and loads more) But you don't have to use automatic tools, it's perfectly acceptable to just use the Stopwatch class to time each block of your code. What you are looking for is a bottle neck. The likely hood is that you will find a high proportion of the overall time is spent in a very small bit of code.
Tune.
Now you have some timing data, you can start tuning.
There are two approaches to consider here. Firstly, take an overall perspective. Consider if you need to re design the whole call stack. Are you repeating something unnecessarily? Or are you just doing something you don't need to?
Secondly, now you have an idea of where your bottle neck is you can try and figure out ways to improve this bit of code. I can't offer much advice here, because it depends on what your bottle neck is, but just look to optimise it. Perhaps you need to cache data so you don't have to loop over it twice. Or batch up SQL calls so you can do just one. Or tighten your query filters so you return less data.
Re-profile.
This is the most important step that people often miss out. Once you have tuned your code, you absolutely must re-profile it in the same environment that you ran your initial profiling in. It is very common to make minor tweaks that you think might improve performance and actually end up degrading it because of some unknown way that the CLR handles something. This is much more common in managed languages because you often don't know exactly what is going on under the covers.
Now just repeat as necessary.
If you are likely to be performance tuning often I find it good to have a whole batch of automated performance tests that I can run that check the performance and throughput of various different activities. This way I can run these with every release and record performance changes each release. It also means that I can check that after a performance tuning session I know I haven't made the performance of some other area any worse.
When you are profiling, don't always just think about the time to run a single action. Also consider profiling under load, with lots of users logged in. Sometimes apps perform great when there's just one user connected, but when they hit a certain number of users suddenly the whole thing grinds to a halt. Perhaps because suddnely they are spending more time context switching or swapping memory in and out to disk. If it's throughput you want to improve you need to be figuring out what is causing the limit on throughput.
Finally. Check out this huge MSDN article on Improving .NET Application Performance and Scalability. Specifically, you might want to look at chapter 6 and chapter 17.
I think the best we can do from here is give you some pointers:
query less data from the sql server (caching, appropriate query filters)
write better queries (indexing, joins, paging, etc)
minimise any inappropriate blockages such as locks between different requests
make sure session-state hasn't exploded in size
use bigger metal / more metal
use appropriate looping code etc
But to stress; from here anything is guesswork. You need to profile to find the general area for the suckage, and then profile more to isolate the specific area(s); but start by looking at:
sql trace between web-server and sql-server
network trace between web-server and client (both directions)
cache / state servers if appropriate
CPU / memory utilisation on the web-server
I think First of all you have to find your Bottlenecks and then try to improve those.
This helps you to perform exactly where you have serios problem.
An in addition you needto improve your Connection to DB. For exampleusing a Lazy , Singletone Pattern and also create Batch request instead of single requests.
It help you to decrease DB connection.
Check your cache and suitable loop structures.
another thing is to use appropriate types, forexample if you need int donot create a long and etc
at the end ypu can use some Profiler (specially in SQL) andcheckif your queries implemented as well as possible.

exploring mathematics of/in computer science [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
I have been working for two years in software industry. Some things that have puzzled me are as follows:
There is lack of application of mathematics in current software industry.
e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).
Great softwares (wind tunnel simulators etc) and computing programs(like matlab etc) are there to simulate real world problems (because they have their mathematical equations) but we in software industry still are clueless about how much actual resources in terms of memory , computing resources, clock speed , RAM etc would be needed when our server side application would actually be deployed. we just keep on guessing about the solution and solve such problem's by more or less 'hit and trial' (my observation).
Programming is done on API's, whether in c, C#, java etc. We are never able to exactly check the complexity of our code and hence efficiency because somewhere we are using an abstraction written by someone else whose source code we either don't have or we didn't have the time to check it.
e.g. If I write a simple client server app in C# or java, I am never able to calculate beforehand how much the efficiency and complexity of this code is going to be or what would be the minimum this whole client server app will require (my observation).
Load balancing and scalability analysis are just too vague and are merely solved by adding more nodes if requests on the server are increasing (my observation).
Please post answers to any of my above puzzling observations.
Please post relevant references also.
I would be happy if someone proves me wrong and shows the right way.
Thanks in advance
Ashish
I think there are a few reasons for this. One is that in many cases, simply getting the job done is more important than making it perform as well as possible. A lot of software that I write is stuff that will only be run on occasion on small data sets, or stuff where the performance implications are pretty trivial (it's a loop that does a fixed computation on each element, so it's trivially O(n)). For most of this software, it would be silly to spend time analyzing the running time in detail.
Another reason is that software is very easy to change later on. Once you've built a bridge, any fixes can be incredibly expensive, so it's good to be very sure of your design before you do it. In software, unless you've made a horrible architectural choice early on, you can generally find and optimize performance hot spots once you have some more real-world data about how it performs. In order to avoid those horrible architectural choices, you can generally do approximate, back-of-the-envelope calculations (make sure you're not using an O(2^n) algorithm on a large data set, and estimate within a factor of 10 or so how many resources you'll need for the heaviest load you expect). These do require some analysis, but usually it can be pretty quick and off the cuff.
And then there are cases in which you really, really do need to squeeze the ultimate performance out of a system. In these case, people frequently do actually sit down, work out the performance characteristics of the systems they are working with, and do very detailed analyses. See, for instance, Ulrich Drepper's very impressive paper What Every Programmer Should Know About Memory (pdf).
Think about the engineering sciences, they all have very well defined laws that are applicable to the design, and building of physical items, things like gravity, strength of materials, etc. Whereas in Computer science, there are not many well defined laws when it comes to building an application against.
I can think of many different ways to write a simple hello world program that would satisfy the requirment. However, if I have to build an electricity pole, I am severely constrained by the physical world, and the requirements of the pole.
Point by point
An electricity pole has to withstand the weather, a load, corrosion etc and these can be quantified and modelled. I can't quantify my website launch success, or how my database will grow.
Premature optimisation? Good enough is exactly that, fix it when needed. If you're a vendor, you've no idea what will be running your code in real life or how it's configured. Again you can't quantify it.
Premature optimisation
See point 1. I can add as needed.
Carrying on... even engineers bollix up. Collapsing bridges, blackout, car safety recalls, "wrong kind of snow" etc etc. Shall we change the question to "why don't engineers use more empirical observations?"
The answer to most of these is in order to have meaningful measurements (and accepted equations, limits, tolerances etc) that you have in real-world engineering you first need a way of measuring what it is that you are looking at.
Most of these things simply can't be measured easily - Software complexity is a classic, what is "complex"? How do you look at source code and decide if it is complex or not? McCabe's Cyclomatic Complexity is the closest standard we have for this but it's still basically just counting branch instructions in methods.
There is little math in software programs because the programs themselves are the equation. It is not possible to figure out the equation before it is actually run. Engineers use simple (and very complex) programs to simulate what happens in the real world. It is very difficult to simulate a simulator. additionally, many problems in computer science don't even have an answer mathematically: see traveling salesman.
Much of the mathematics is also built into languages and libraries. If you use a hash table to store data, you know to find any element can be done in constant time O(1), no matter how many elements are in the hash table. If you store it in a binary tree, it will take longer depending on the number of elements [0(n^2) if i remember correctly].
The problem is that software talks with other software, written by humans. The engineering examples you describe deal with physical phenomenon, which are constant. If I develop an electrical simulator, everyone in the world can use it. If I develop a protocol X simulator for my server, it will help me, but probably won't be worth the work.
No one can design a system from scratch and people that write semi-common libraries generally have plenty of enhancements and extensions to work on rather than writing a simulator for their library.
If you want a network traffic simulator you can find one, but it will tell you little about your server load because the traffic won't be using the protocol your server understands. Every server is going to see completely different sets of traffic.
There is lack of application of mathematics in current software industry.
e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).
I wouldn't say that luck or god are always the basis for load estimation. Often realistic data can be had.
It's also not true that there are no mathematical techniques to answer the question. Operations research and queuing theory can be applied to good advantage.
The real problem is that mechanical engineering is based on laws of physics and a foundation of thousands of years worth of empirical and scientific investigation. Computer science is only as old as me. Computer science will be much further along by the time your children and grandchildren apply the best practices of their day.
An MIT EE grad would not have this problem ;)
My thoughts:
Some people do actually apply math to estimate server load. The equations are very complex for many applications and many people resort to rules of thumb, guess and adjust or similar strategies. Some applications (real time applications with a high penalty for failure... weapons systems, powerplant control applications, avionics) carefully compute the required resources and ensure that they will be available at runtime.
Same as 1.
Engineers also use components provided by others, with a published interface. Think of electrical engineering. You don't usually care about the internals of a transistor, just it's interface and operating specifications. If you wanted to examine every component you use in all of it's complexity, you would be limited to what one single person can accomplish.
I have written fairly complex algorithms that determine what to scale when based on various factors such as memory consumption, CPU load, and IO. However, the most efficient solution is sometimes to measure and adjust. This is especially true if the application is complex and evolves over time. The effort invested in modeling the application mathematically (and updating that model over time) may be more than the cost of lost efficiency by try and correct approaches. Eventually, I could envision a better understanding of the correlation between code and the environment it executes in could lead to systems that predict resource usage ahead of time. Since we don't have that today, many organizations load test code under a wide range of conditions to empirically gather that information.
Software engineering are very different from the typical fields of engineering. Where "normal" engineering are bound to the context of our physical universe and the laws in it we've identified, there's no such boundary in the software world.
Producing software are usually an attempt to mirror a subset of the real-life world into a virtual reality. Here we define the laws ourselves, by only picking the ones we need and by making them just as complex as we need. Because of this fundamental difference, you need to look at the problem-solving from a different perspective. We try to make abstractions to make complex parts less complex, just like we teach kids that yellow + blue = green, when it's really the wavelength of the light that bounces on the paper that changes.
Once in a while we are bound by different laws though. Stuff like Big-O, Test-coverage, complexity-measurements, UI-measurements and the likes are all models of mathematic laws. If you look into digital signal processing, realtime programming and functional programming, you'll often find that the programmers use equations to figure out a way to do what they want. - but these techniques aren't really (to some extend) useful to create a virtual domain, that can solve complex logic, branching and interact with a user.
The reasons why wind tunnels, simulations, etc.. are needed in the engineering world is that it's much cheaper to build a scaled down prototype, than to build the full thing and then test it. Also, a failed test on a full scale bridge is destructive - you have to build a new one for each test.
In software, once you have a prototype that passes the requirements, you have the full-blown solution. there is no need to build the full-scale version. You should be running load simulations against your server apps before going live with them, but since loads are variable and often unpredictable, you're better off building the app to be able to scale to any size by adding more hardware than to target a certain load. Bridge builders have a given target load they need to handle. If they had a predicted usage of 10 cars at any given time, and then a year later the bridge's popularity soared to 1,000,000 cars per day, nobody would be surprised if it failed. But with web applications, that's the kind of scaling that has to happen.
1) Most business logic is usually broken down into decision trees. This is the "equation" that should be proofed with unit tests. If you put in x then you should get y, I don't see any issue there.
2,3) Profiling can provide some insight as to where performance issues lie. For the most part you can't say that software will take x cycles because that will change over time (ie database becomes larger, OS starts going funky, etc). Bridges for instance require constant maintenance, you can't slap one up and expect it to last 50 years without spending time and money on it. Using libraries is like not trying to figure out pi every time you want to find the circumference of a circle. It has already been proven (and is cost effective) so there is no need to reinvent the wheel.
4) For the most part web applications scale well horizontally (multiple machines). Vertical (multithreading/multiprocess) scaling tends to be much more complex. Adding machines is usually relatively easy and cost effective and avoid some bottlenecks that become limited rather easily (disk I/O). Also load balancing can eliminate the possibility of one machine being a central point of failure.
It isn't exactly rocket science as you never know how many consumers will come to the serving line. Generally it is better to have too much capacity then to have errors, pissed of customers and someone (generally your boss) chewing your hide out.

Resources