Parallel.ForEach Set MaxDegreeOfParallelism globally - asp.net

Say I have a situation where I really want to use Parallel.ForEach instead of a regular foreach loop because it's much more performant (and way cooler), but having .NET determine the degree of parallelism for me leads to disastrous effects.
Now say that I've discovered the magic number that I need to place in the ParallelOptions.MaxDegreeOfParallelism parameter so that my application is super tasty fast and not super tasty broken.
Instead of having to remember to include a ParallelOptions parameter with the magic number every time I invoke Parallel.ForEach (or any other derivative of it), is there any way to specify the degree of parallelism at the web.config level? Or some other approach where the value can be set globally and invisibly so I don't have to rely on the memory of myself and fellow developers any time we want to gain the benefits of parallelism?

The short answer is No, you can't configure this parameter using a config file, however you can still have a const variable that all your call can refer to.

You could make your own wrapper function that calls Parallel.Foreach that reads the degree from a config file or from a constant.

Related

How does functional programming avoid state when it seems unavoidable?

Let's say we define a function c sum(a, b), functional programming -style, that returns the sum of its arguments. So far so good; all the nice things of FP without any problems.
Now let's say we run this in an environment with dynamic typing and a singleton, stateful error stream. Then let's say we pass a value of a and/or b that sum isn't designed to handle (i.e. not numbers), and it needs to indicate an error somehow.
But how? This function is supposed to be pure and side-effect-less. How does it insert an error into the global error stream without violating that?
No programming language that I know of has anything like a "singleton stateful error stream" built in, so you'd have to make one. And you simply wouldn't make such a thing if you were trying to write your program in a pure functional style.
You could, however, have a sum function that returns either the sum or an indication of an error. The type used to do this is in fact often known by the name Either. Then you could easily make a function that invokes a whole bunch of computations that could possibly return an error, and returns a list of all the errors that were encountered in the other computations. That's pretty close to what you were talking about; it's just explicitly returned rather than being global.
Remember, the question when you're writing a functional program is "how do I make a program that has the behavior I want?" not, "how would I duplicate one particular approach taken in another programming style?". A "global stateful error stream" is a means not an end. You can't have a global stateful error stream in pure function style, no. But ask yourself what you're using the global stateful error stream to achieve; whatever it is, you can achieve that in functional programming, just not with the same mechanism.
Asking whether pure functional programming can implement a particular technique that depends on side effects is like asking how you use techniques from assembly in object-oriented programming. OO provides different tools for you to use to solve problems; limiting yourself to using those tools to emulate a different toolset is not going to be an effective way to work with them.
In response to comments: If what you want to achieve with your error stream is logging error messages to a terminal, then yes, at some level the code is going to have to do IO to do that.1
Printing to terminal is just like any other IO, there's nothing particularly special about it that makes it worthy of singling out as a case where state seems especially unavoidable. So if this turns your question into "How do pure functional programs handle IO?", then there are no doubt many duplicate questions on SO, not to mention many many blog posts and tutorials speaking precisely to that issue. It's not like it's a sudden surprise to implementors and users of pure programming languages, the question has been around for decades, and there have been some quite sophisticated thought put into the answers.
There are different approaches taken in different languages (IO monad in Haskell, unique modes in Mercury, lazy streams of requests and responses in historical versions of Haskell, and more). The basic idea is to come up with a model which can be manipulated by pure code, and hook up manipulations of the model to actual impure operations within the language implementation. This allows you to keep the benefits of purity (the proofs that apply to pure code but not to general impure code will still apply to code using the pure IO model).
The pure model has to be carefully designed so that you can't actually do anything with it that doesn't make sense in terms of actual IO. For example, Mercury does IO by having you write programs as if you're passing around the current state of the universe as an extra parameter. This pure model accurately represents the behaviour of operations that depend on and affect the universe outside the program, but only when there is exactly one state of the universe in the system at any one time, which is threaded through the entire program from start to finish. So some restrictions are put in
The type io is made abstract so that there's no way to construct a value of that type; the only way you can get one is to be passed one from your caller. An io value is passed into the main predicate by the language implementation to kick the whole thing off.
The mode of the io value passed in to main is declared such that it is unique. This means you can't do things that might cause it to be duplicated, such as putting it in a container or passing the same io value to multiple different invocations. The unique mode ensures that you can only ass the io value to a predicate that also uses the unique mode, and as soon as you pass it once the value is "dead" and can't be passed anywhere else.
1 Note that even in imperative programs, you gain a lot of flexibility if you have your error logging system return a stream of error messages and then only actually make the decision to print them close to the outermost layer of the program. If your log calls are directly writing the output immediately, here's just a few things I can think of off the top of my head that become much harder to do with such a system:
Speculatively execute a computation and see whether it failed by checking whether it emitted any errors
Combine multiple high level systems into a single system, adding tags to the logs to distinguish each system
Emit debug and info log messages only if there is also an error message (so the output is clean when there are no errors to debug, and rich in detail when there are)

How to use non-blocking or asynchronous IO with Boost Spirit?

Does Spirit provide any capabilities for working with non-blocking IO?
To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.
The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.
It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?
TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.
For anything less trivial, please heed to following considerations/hints/tips:
How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?
That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:
this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).
Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.
Now, if you still want to do this:
You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.
If you manage the above, I think by far the best approach would seem to
use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.
Some more loose pointers:
you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Should I cache instances of frequently accessed classes

New to .net and was wondering if there is a performance gain to keeping an instance of, for example a DAL object in scope?
Coming from the Coldfusion world I would instanciate a component and store it in the application scope so that every time my code needed to use that component it would not have to be instanciated over and over again effecting performance.
Is there any benefit to doing this in ASP.Net apps?
Unless you are actually experiencing a performance problem, than you need not worry yourself with optimizations like this.
Solve the business problems first, and use good design. As long as you have a decent abstraction layer for your data access code, then you can always implement a caching solution later down the road if it becomes a problem.
Remember that any caching solution increases complexity dramatically.
NO. In the multi-tier world of .asp this would be considered a case of "premature optimization". Once a sites suite of stubs, scripts and programs has scaled up and been running for a few months then you can look at logs and traces to see what might be cached, spawned or rewritten to improve performance. And as the infamous Jeff Atwood says "Most code optimizations for web servers will benifit from money being spent on new and improved hardware rather than tweaking code for hours and hours"
Yes indeed you can and probably should. Oftentimes the storage for this is in the Session; you store data that you want for the user.
If it's a global thing, you may load it in the Application_Start event and place it somewhere, possibly the HttpCache.
And just a note, some people use "Premature Optimisation" to avoid optimising at all; this is nonsense. It is reasonable to cache in this case.
It is very important to do the cost benefit analysis before caching any object, one must consider all the factors like
Performance advantage
Frequency of use
Hardware
Scalability
Maintainability
Time available for delivery (one of the most important factor)
Finally, it is always useful to cache object which are very costly to create or you are using very frequently i.e. Tables's Data (From DB) or xml data
Does the class you are considering this for have state? If not, (and DAL classes often do not have state, or do not need state), then you should make it's methods static, and then you don't need to instantiate it at all. If the only state it holds is a connection string, you can also make that property field a static property field, and avoid the requirement of instantiating it that way.
Otherwise, take a look at the design pattern called Flyweight

How to hand over variables to a function? With an array or variables?

When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.

ASP.NET How expensive is it to call an Application Variable many times?

The short of it is: Is it costly to check an Application Variable such as Application("WebAppName") more 10-20 times each time a page loads?
Background: (feel free to critique)
Some includes in my site contain many links and images which cannot use relative urls due to their inclusion in different paths.
Hence these includes contain frequent instances of
<img src="<%=Application("Webroot")%>images\image.gif">
Is it expensive to keep calling an Application variable like this?
Should I just put the Application value in some local variable to use where needed?
IMPORTANT NOTE:
I need my webapp to run fine on a server whether it be in the root web ("/") or in a virtual subweb ("/app").
Thanks in advance for any wisdom shared.
It's cheap - very, very cheap - just a dictionary lookup. Compared with almost anything else you'll do in the app (loading something from disk or the network) this will be statistical noise.
In general though, the best thing to do if you're worried about things like this is to measure it. Arbitrarily put 10,000 calls into a page, and see how that affects performance. See how it affects concurrency as well - can you still get the throughput you need when processing multiple concurrent requests?
Just for info, another option is:
<img src="<%=VirtualPathUtility.ToAbsolute("~/images/image.gif")%>"
This works well especially in MVC, where you might write an extension method to do the job, i.e.
<%=Html.Image("~/images/image.gif")%>
The Application object is a synchronized collection which uses ReadWriteObjectLock (an internal class that just uses the lock keyword), so if you are only reading from the collection it will be as fast as a hash table lookup as Jon mentioned, but if at the same time someone is writing to this collection, readers will block until write is complete. If you are worried so much about performance, call the indexer once, store it to a local variable and use this variable in your views.
Use Request.ApplicationPath instead (only works if your app is set as a virtual directory in IIS)
Short answer - measure it and decide on your own environment. I would say it does not matter.
Longer answer - you should have the call wrapped in something anyway... Like WebConfiguration.Root.
That will give you the option to do whatever optimization to it anytime in the future.

Resources