various definitions of abstraction layers - abstraction

I see there are different definition of abstraction layers used in software engineering, OSAL (OS abstraction layer), PAl (platform abstraction) and SAL (system abstraction). I understand that OSAL deals mainly with OS specifics, such mutexes, memory allocation etc. but I don't see how PAL/SAL are different? Or these are just different names for the same thing?
My current understanding is that PAL traditionally/historically deals with platform, i.e. hardware, wand attempts to encapsulate CPU specific features, while SAL (aka OSAL?) deals with higher level, which is operating system and attempts to abstract OS calls etc.
I will appreciate any feedback and comments.
Thanks.

Related

Are there any functional languages that don't have garbage collection

Or even heavily functional styles in non functional/non memory managed languages.
What sort of techniques are there to deal with problems like intermediate garbage? Cleaning up after lazynizess/thunk allocated memory. Performance(since you can't easily share resources between immutable variables if you have to track its progress to deallocate it(smart pointers?)
You might be interested in programming languages with linear or uniqueness types, these can manage resources (and memory in particular). Recent examples: ATS and LinearML.
There have been attempts at "region-based memory management" (e.g. Cyclone), but they haven't lifted off just yet -- regions also allow for (earlier) memory reclamation, but they aren't enough (e.g., there are programs which, when run with region-based memory management, will exhibit unacceptable performance). The two schemes could be mixed, I think.
Back to your question, some ATS programs can run without garbage collection. (I won't say that such programs are written in "functional" style, such as in SML, but in a mix of imperative and first-order functional style.)
The only relevant thing I can think of is how Mlton is eliminating a significant part of garbage collection with a region analysis. It should be possible, in theory, to implement a compiler which will treat an unmanageable and un-annotated pointer leak as an error, and then one would be able to use many functional programming techniques in an entirely manual memory management setting.

mpi under the hood

I need to deliver a presentation on programming in MPI. I need to add a segment on how MPI works under the hood. For Example What happens when I call MPI_Init?
Do you know of any good source from where I can learn these details?
The MPI Spec contains the description of the knobs, sliders, and displays that are on the outside of the "black box" of each API.
The interior details of the black boxes will be implementation dependent...and will also depend on the interconnect (e.g. TCP, IBV, DAPL, etc), the OS (e.g. is the implementation using LSB, or native libraries, etc), and on many other factors to a lesser degree (e.g. message size thresholds will trigger different code paths, and so on). Using "strace" and "ltrace" on the a.out may provide some insight into the actual goings on inside the blackbox.
The best recommendation is to pick an open source implementation and examine the code to determine the internal details.
MPI is a specification, not a particular implementation. The observable behavior is given in the MPI spec. How it works under the hood depends on the particular implementation. If you'd like to take a look at an example implementation, you might be interested in looking at MPICH2 and browsing their source code.
Complement your study of the source code of an implementation of MPI with consideration of how you would implement MPI_Init on your platform of choice. MPI sits on top of already available O/S functionality. I don't mean to suggest that you can figure out how a particular version of MPI is implemented by this approach, but to suggest that you can learn better what is going on under the hood by tackling the problem from another angle.
MPI is only a spec. MPI spec is implemented by various groups and organizations. You will want to pick one implementation, say, MPICH, and you can find their design documentation. That will tell you how the MPI spec is implemented by that group.
If you just want to describe what happens when an application written in MPI is started, you can read about MPI and MPI programming. I highly recommend http://www.citutor.org

exploring mathematics of/in computer science [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
I have been working for two years in software industry. Some things that have puzzled me are as follows:
There is lack of application of mathematics in current software industry.
e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).
Great softwares (wind tunnel simulators etc) and computing programs(like matlab etc) are there to simulate real world problems (because they have their mathematical equations) but we in software industry still are clueless about how much actual resources in terms of memory , computing resources, clock speed , RAM etc would be needed when our server side application would actually be deployed. we just keep on guessing about the solution and solve such problem's by more or less 'hit and trial' (my observation).
Programming is done on API's, whether in c, C#, java etc. We are never able to exactly check the complexity of our code and hence efficiency because somewhere we are using an abstraction written by someone else whose source code we either don't have or we didn't have the time to check it.
e.g. If I write a simple client server app in C# or java, I am never able to calculate beforehand how much the efficiency and complexity of this code is going to be or what would be the minimum this whole client server app will require (my observation).
Load balancing and scalability analysis are just too vague and are merely solved by adding more nodes if requests on the server are increasing (my observation).
Please post answers to any of my above puzzling observations.
Please post relevant references also.
I would be happy if someone proves me wrong and shows the right way.
Thanks in advance
Ashish
I think there are a few reasons for this. One is that in many cases, simply getting the job done is more important than making it perform as well as possible. A lot of software that I write is stuff that will only be run on occasion on small data sets, or stuff where the performance implications are pretty trivial (it's a loop that does a fixed computation on each element, so it's trivially O(n)). For most of this software, it would be silly to spend time analyzing the running time in detail.
Another reason is that software is very easy to change later on. Once you've built a bridge, any fixes can be incredibly expensive, so it's good to be very sure of your design before you do it. In software, unless you've made a horrible architectural choice early on, you can generally find and optimize performance hot spots once you have some more real-world data about how it performs. In order to avoid those horrible architectural choices, you can generally do approximate, back-of-the-envelope calculations (make sure you're not using an O(2^n) algorithm on a large data set, and estimate within a factor of 10 or so how many resources you'll need for the heaviest load you expect). These do require some analysis, but usually it can be pretty quick and off the cuff.
And then there are cases in which you really, really do need to squeeze the ultimate performance out of a system. In these case, people frequently do actually sit down, work out the performance characteristics of the systems they are working with, and do very detailed analyses. See, for instance, Ulrich Drepper's very impressive paper What Every Programmer Should Know About Memory (pdf).
Think about the engineering sciences, they all have very well defined laws that are applicable to the design, and building of physical items, things like gravity, strength of materials, etc. Whereas in Computer science, there are not many well defined laws when it comes to building an application against.
I can think of many different ways to write a simple hello world program that would satisfy the requirment. However, if I have to build an electricity pole, I am severely constrained by the physical world, and the requirements of the pole.
Point by point
An electricity pole has to withstand the weather, a load, corrosion etc and these can be quantified and modelled. I can't quantify my website launch success, or how my database will grow.
Premature optimisation? Good enough is exactly that, fix it when needed. If you're a vendor, you've no idea what will be running your code in real life or how it's configured. Again you can't quantify it.
Premature optimisation
See point 1. I can add as needed.
Carrying on... even engineers bollix up. Collapsing bridges, blackout, car safety recalls, "wrong kind of snow" etc etc. Shall we change the question to "why don't engineers use more empirical observations?"
The answer to most of these is in order to have meaningful measurements (and accepted equations, limits, tolerances etc) that you have in real-world engineering you first need a way of measuring what it is that you are looking at.
Most of these things simply can't be measured easily - Software complexity is a classic, what is "complex"? How do you look at source code and decide if it is complex or not? McCabe's Cyclomatic Complexity is the closest standard we have for this but it's still basically just counting branch instructions in methods.
There is little math in software programs because the programs themselves are the equation. It is not possible to figure out the equation before it is actually run. Engineers use simple (and very complex) programs to simulate what happens in the real world. It is very difficult to simulate a simulator. additionally, many problems in computer science don't even have an answer mathematically: see traveling salesman.
Much of the mathematics is also built into languages and libraries. If you use a hash table to store data, you know to find any element can be done in constant time O(1), no matter how many elements are in the hash table. If you store it in a binary tree, it will take longer depending on the number of elements [0(n^2) if i remember correctly].
The problem is that software talks with other software, written by humans. The engineering examples you describe deal with physical phenomenon, which are constant. If I develop an electrical simulator, everyone in the world can use it. If I develop a protocol X simulator for my server, it will help me, but probably won't be worth the work.
No one can design a system from scratch and people that write semi-common libraries generally have plenty of enhancements and extensions to work on rather than writing a simulator for their library.
If you want a network traffic simulator you can find one, but it will tell you little about your server load because the traffic won't be using the protocol your server understands. Every server is going to see completely different sets of traffic.
There is lack of application of mathematics in current software industry.
e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).
I wouldn't say that luck or god are always the basis for load estimation. Often realistic data can be had.
It's also not true that there are no mathematical techniques to answer the question. Operations research and queuing theory can be applied to good advantage.
The real problem is that mechanical engineering is based on laws of physics and a foundation of thousands of years worth of empirical and scientific investigation. Computer science is only as old as me. Computer science will be much further along by the time your children and grandchildren apply the best practices of their day.
An MIT EE grad would not have this problem ;)
My thoughts:
Some people do actually apply math to estimate server load. The equations are very complex for many applications and many people resort to rules of thumb, guess and adjust or similar strategies. Some applications (real time applications with a high penalty for failure... weapons systems, powerplant control applications, avionics) carefully compute the required resources and ensure that they will be available at runtime.
Same as 1.
Engineers also use components provided by others, with a published interface. Think of electrical engineering. You don't usually care about the internals of a transistor, just it's interface and operating specifications. If you wanted to examine every component you use in all of it's complexity, you would be limited to what one single person can accomplish.
I have written fairly complex algorithms that determine what to scale when based on various factors such as memory consumption, CPU load, and IO. However, the most efficient solution is sometimes to measure and adjust. This is especially true if the application is complex and evolves over time. The effort invested in modeling the application mathematically (and updating that model over time) may be more than the cost of lost efficiency by try and correct approaches. Eventually, I could envision a better understanding of the correlation between code and the environment it executes in could lead to systems that predict resource usage ahead of time. Since we don't have that today, many organizations load test code under a wide range of conditions to empirically gather that information.
Software engineering are very different from the typical fields of engineering. Where "normal" engineering are bound to the context of our physical universe and the laws in it we've identified, there's no such boundary in the software world.
Producing software are usually an attempt to mirror a subset of the real-life world into a virtual reality. Here we define the laws ourselves, by only picking the ones we need and by making them just as complex as we need. Because of this fundamental difference, you need to look at the problem-solving from a different perspective. We try to make abstractions to make complex parts less complex, just like we teach kids that yellow + blue = green, when it's really the wavelength of the light that bounces on the paper that changes.
Once in a while we are bound by different laws though. Stuff like Big-O, Test-coverage, complexity-measurements, UI-measurements and the likes are all models of mathematic laws. If you look into digital signal processing, realtime programming and functional programming, you'll often find that the programmers use equations to figure out a way to do what they want. - but these techniques aren't really (to some extend) useful to create a virtual domain, that can solve complex logic, branching and interact with a user.
The reasons why wind tunnels, simulations, etc.. are needed in the engineering world is that it's much cheaper to build a scaled down prototype, than to build the full thing and then test it. Also, a failed test on a full scale bridge is destructive - you have to build a new one for each test.
In software, once you have a prototype that passes the requirements, you have the full-blown solution. there is no need to build the full-scale version. You should be running load simulations against your server apps before going live with them, but since loads are variable and often unpredictable, you're better off building the app to be able to scale to any size by adding more hardware than to target a certain load. Bridge builders have a given target load they need to handle. If they had a predicted usage of 10 cars at any given time, and then a year later the bridge's popularity soared to 1,000,000 cars per day, nobody would be surprised if it failed. But with web applications, that's the kind of scaling that has to happen.
1) Most business logic is usually broken down into decision trees. This is the "equation" that should be proofed with unit tests. If you put in x then you should get y, I don't see any issue there.
2,3) Profiling can provide some insight as to where performance issues lie. For the most part you can't say that software will take x cycles because that will change over time (ie database becomes larger, OS starts going funky, etc). Bridges for instance require constant maintenance, you can't slap one up and expect it to last 50 years without spending time and money on it. Using libraries is like not trying to figure out pi every time you want to find the circumference of a circle. It has already been proven (and is cost effective) so there is no need to reinvent the wheel.
4) For the most part web applications scale well horizontally (multiple machines). Vertical (multithreading/multiprocess) scaling tends to be much more complex. Adding machines is usually relatively easy and cost effective and avoid some bottlenecks that become limited rather easily (disk I/O). Also load balancing can eliminate the possibility of one machine being a central point of failure.
It isn't exactly rocket science as you never know how many consumers will come to the serving line. Generally it is better to have too much capacity then to have errors, pissed of customers and someone (generally your boss) chewing your hide out.

Why has Ada no garbage collector?

I know GC wasn't popular in the days when Ada was developed and for the main use case of embedded programming it is still not a good choice.
But considering that Ada is a general purpose programming language why wasn't a partial and optional (traces only explicitly tagged memory objects) garbage collector introduced in later revisions of the language and the compiler implementations.
I simply can't think of developing a normal desktop application without a garbage collector anymore.
Ada was designed with military applications in mind. One of the big priorities in its design was determinism. i.e. one wanted an Ada program to consistently perform exactly the same way every time, in any environment, under all operating systems... that kinda thing.
A garbage collector turns one application into two, working against one another. Java programs develop hiccups at random intervals when the GC decides to go to work, and if it's too slow about it there's a chance that an application will run out of heap sometimes and not others.
Simplified: A garbage collector introduces some variability into a program that the designers didn't want. You make a mess - you clean it up! Same code, same behavior every time.
Not that Ada became a raging worldwide success, mind you.
Because Ada was designed for use in defense systems which control weapons in realtime, and garbage collection interferes with the timing of your application. This is dangerous which is why, for many years, Java came with a warning that it was not to be used for healthcare and military control systems.
I believe that the reason there is no longer such a disclaimer with Java is because the underlying hardware has become much faster as well as the fact that Java has better GC algorithms and better control over GC.
Remember that Ada was developed in the 1970's and 1980's at a time when computers were far less powerful than they are today, and in control applications timing issues were paramount.
First off, there is nothing in the language really that prohibits garbage collection.
Secondly some implementations do perform garbage collection. In particular, all the implementations that target the JVM garbage collect.
Thirdly, there is a way to get some amount of garbage collection with all compilers. You see, when an access type goes out of scope, if you specifially told the language to set aside a certian amount of space for storage of its objects, then that space will be destroyed at that point. I've used this in the past to get some modicum of garbage collection. The declaration voodo you use is:
type Foo is access Blah;
for Foo'storage_size use 100_000_000; --// 100K
If you do this, then all (100K of) memory allocated to Blah objects pointed to by Foo pointers will be cleaned up when the Foo type goes out of scope. Since Ada allows you to nest subroutines inside of other subroutines, this is particularly powerful.
To see more about what storage_size and storage pools can do for you, see LRM 13.11
Fourthly, well-written Ada programs don't tend to rely on dynamic memory allocation nearly as much as C programs do. C had a number of design holes that practicioners learned to use pointers to paint over. A lot of those idioms aren't nessecary in Ada.
the answer is more complicated: Ada does not require a garbage collector, because of real-time constraints and such. however, the language have been cleverly designed so as to allow the implementation of a garbage collector.
although, many (almost all) compilers do not include a garbage collector, there are some notable implementation:
a patch for GNAT
Ada compilers targeting the Java Virtual Machine (i don't know if those projects are still supported). It used the garbage collector of the JVM.
there are plenty other sources about garbage collection in Ada around the web. this subject has been discussed at length, mainly because of the fierce competition with Java in the mid '90s (have a look at this page: "Ada 95 is what the Java language should have been"), when Java was "The Next Big Thing" before Microsoft drew C#.
First off, I'd like to know who's using Ada these days. I actually like the language, and there's even a GUI library for Linux/Ada, but I haven't heard anything about active Ada development for years. Thanks to its military connections, I'm really not sure if it's ancient history or so wildly successful that all mention of its use is classified.
I think there's a couple of reason for no GC in Ada. First, and foremost, it dates back to an era where most compiled languages used primarily stack or static memory, or in a few cases, explicit heap allocate/free. GC as a general philosophy really only took off about 1990 or so, when OOP, improved memory management algorithms and processors powerful enough to spare the cycles to run it all came into their own. What simply compiling Ada could do to an IBM 4331 mainframe in 1989 was simply merciless. Now I have a cell phone that can outperform that machine's CPU.
Another good reason is that there are people who think that rigorous program design includes precise control over memory resources, and that there shouldn't be any tolerance for letting dynamically-acquired objects float. Sadly, far too many people ended up leaking memory as dynamic memory became more and more the rule. Plus, like the "efficiency" of assembly language over high-level languages, and the "efficiency" of raw JDBC over ORM systems, the "efficiency" of manual memory management tends to invert as it scales up (I've seen ORM benchmarks where the JDBC equivalent was only half as efficient). Counter-intuitive, I know, but these days systems are much better at globally optimizing large applications, plus they're able to make radical re-optimizations in response to superficially minor changes.Including dynamically re-balancing algorithms on the fly based on detected load.
I'm afraid I'm going to have to differ with those who say that real-time systems can't afford GC memory. GC is no longer something that freezes the whole system every couple of minutes. We have much more intelligent ways to reclaim memory these days.
Your question is incorrect. It does. See the package ada.finalization which handles GC for you.
I thought I'd share a really simple example of how to implement a Free() procedure (which would be used in a way familiar to all C programmers)...
with Ada.Integer_Text_IO, Ada.Unchecked_Deallocation;
use Ada.Integer_Text_IO;
procedure Leak is
type Int_Ptr is access Integer;
procedure Free is new Ada.Unchecked_Deallocation (Integer, Int_Ptr);
Ptr : Int_Ptr := null;
begin
Ptr := new Integer'(123);
Free (Ptr);
end Leak;
Calling Free at the end of the program will return the allocated Integer to the Storage Pool ("heap" in C parlance). You can use valgrind to demonstrate that this does in fact prevent 4 bytes of memory being leaked.
The Ada.Unchecked_Deallocation (a generically defined procedure) can be used on (I think) any type that may be allocated using the "new" keyword. The Ada Reference Manual ("13.11.2 Unchecked Storage Deallocation") has more details.

lowest level language until asp.net?

it's assembler right? can someone please point out the progression that we've had in programming languages since assembler to the days of asp.net, namely the chronological order of languages?
Here's a wiki timeline of all programming languages.
I would include a FTA table, but the list is very robust and extensive.
And also, the lowest language you ever get to is assembly (aside from straight up issuing machine instructions), regardless of what other language is built on top (including ASP.NET). Other languages are really just abstractions on top of assembly. In fact, ASP.NET gets compiled into IL (Intermediate Language) code, which then get's JITed into assembly. Assembly is as close to the metal as you're going to get.
To be pedantic, "assembler" is not actually a language (any more than "compiler" is;-) -- rather, it's a program that takes a source file in "assembly language" and emits binary machine code. The binary machine code can be said to be lower-level than the assembly language, since the latter allows use of some symbols and often includes a macro processing ability as well.
"Below" binary machine code, there may be other levels, known as "microcode" (but there might not be -- the CPU might be implemented entirely in real hardware, without any microprogramming aspect). That might be relevant only if the system's architecture allowed programmers to alter the microcode, especially by adding to it, etc -- there have been machines that did that, but I don't believe any currently commercialized CPU does. So you probably don't have to care about that (and the by-now-esoteric distinctions between vertical and horizontal microcode, etc, etc;-).
Programming languages are just ways to assemble solutions to computing problems.
The argument is "assembled out of what?"
From that point of view, I'd suggest the following evolutionary curve:
Napier's Bones
Babbage's difference engine
Jacquard (card) looms
(Conceptual) Abstract Turing machines/Post Systems/Church's calculus
Relay Computers (Aiken?)
Vacuum tubes as switching elements (Eniac)
Transistor-based computers
Microprogrammed machines
Integrated Circuits
Large Scale Circuits
with "assembler" being the programming language used to
put together solutions consisting of instructions for
real machines starting with the vacuum tube systems.
(I'm not sure the relay machines actually had assemblers).
Programming langauges are just ways to put together high
level commands that reduce in effect to assembler instructions.
There are two different dimensions to consider here, what I'd call vertical growth (languages build up over time from one generation to the next) and horizontal growth (syntactic improvements and reduction in complexity.)
A good explanation of vertical change is seen here: http://web.sxu.edu/rogers/sys/generations.html
And a nice, yet incomplete, illustration of horizontal change it here: http://oreilly.com/news/graphics/prog_lang_poster.pdf

Resources