So I'm having a hard time grasping the idea behind pointers and all that memory allocation.
I'm thinking nowadays with computer as powerful as they are right now why do we have to use pointers at all?
Isn't there always a workaround to do things without the help of pointers?
Pointers are an indirection: instead of working with the data itself, you are working with (something) that points to the data. Depending on the semantics of the language, this allows many things: cheaply switch to another instance of data (by setting the pointer to point to another instance), passing pointers allows access to the original data without having to make (a possibly expensive) copy, etc.
Memory allocation is related to pointers, but separate: you can have pointers without allocating memory. The reason you need pointers for memory allocation is that the actual address the allocated block of memory resides is not known at compile time, so you can only access it via a level of indirection (i.e. pointers) -- the compiler statically allocates space for the pointer that will point to the dynamically allocated memory.
Pointers are incredibly powerful. Just because computers have a faster processing time nowdays, doesn't mean that's any reason to abandon something as essential as pointers. Passing around giant chunks of memory on the stack is inefficient at best, catastrophic at worst. With pointers, you only need to maintain a reference to where the data resides, rather than duplicating huge chunks of memory each time you call a function.
Also, if you're copying all the data every time, how do you modify the original data? Aside from returning the copy of the structure in every call that touches it.
I remember reading somewhere that Dijkstra was assessing a student for a programming course; this student was quite intelligent but s/he wasn't able to solve the problem because there was sort of a mental block.
All the code was sort of ok, but what was needed was simply to use the expression
a[a[i+1]] = j;
and even if being so close to the solution still the goal seemed to be miles away.
Languages "without pointers" already exist... e.g. BASIC. Without explicit pointers, that is. But the indirection idea... the idea that you can have data to mean just where to find other data is central to programming.
The very idea of array is about being able to use computed values to find other values.
Trying to hide this idea is an horrible plan. According to Dijkstra anyone that has been exposed to the BASIC language has already received such a mental mutilation that is impossible to recover as a good programmer (and probably the absence of explicit indirection was one of the problems).
I think he was exaggerating.
Just a bit.
Related
Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.
I'm learning C++ on my own. I'm an EE and learned it about 20 years ago, but in the progress of my career I stopped programming and didn't take it up again until recently. I should add that I never took any classes in programming.
I have a theoretical question about pointers. In reading the books about pointers it seems they have an important role in C++. My problem is that I can't see what that is. I see that pointers have a role in arrays, but I can't see their role in anything else.
I can see what they do, but I don't see why use pointers in the situations I see them in. Either references or straight variables would work just as well. I have a feeling the answer lies in the area of memory ( it's optimal use), but I just don't know.
Any answers would be appreciated. Thanks.
Consider the following from cplusplus.com:
"[T]here may be cases where the memory needs of a program can only be
determined during runtime. For example, when the memory needed depends
on user input. On these cases, programs need to dynamically allocate
memory, for which the C++ language integrates the operators new and
delete."
If you could determine all your memory needs prior to run time and did not need to make use of any abstract data type like a linked list, then yes, it would be difficult to see their use. However, what if you want to store values in an array, but you don't yet know how big that array will need to be?
Another value of pointers arises when you consider passing values from function to function. You may find this thread of value regarding the differences between pointers and references in C++ and how/why to use each.
We have been having several pedagogical conversations focused on pointers on the CSEducators.SE site. I'd encourage you to read those as well:
Simple Pointer Examples in C
Lesson Idea: Arrays, Pointers, and Syntactic Sugar
Pointers come from C, which had no concept of reference, and which C++ inherited from.
Everything that can be done with a reference in C++ is done with a pointer in C.
I find this question really great because it is pure.
A programming language is considered "safe" when the programs written in it can only call functions and access data that the program can name.
Now, the concept of pointer was invented to break this sandbox of safety and provide developer with freedom to think and act outside of the box.
Think of pointers as poor man's tool to achieve something not provided by the programming language itself.
It is misleading to think you could achieve higher performance if programmed some algorithm using pointers. Optimization is privilege of the compiler and hardware, not human.
I'm wondering if there is any perf benchmark on raw objects vs pointers to objects.
I'm aware that it doesn't make sense to use pointers on reference types (e.g. maps) so please don't mention it.
I'm aware that you "must" use pointers if the data needs to be updated so please don't mention it.
Most of the answers/ docs that I've found basically rephrase the guidelines from the official documentation:
... If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
My question is simply what means "large" / "big"? Is a pointer on a string overkill ? what about a struct with two strings, what about a struct 3 string fields??
I think we deal with this use case quite often so it's a fair question to ask. Some advise to don't mind the performance issue but maybe some people want to use the right notation whenever they have to chance even if the performance gain is not signifiant. After all a pointer is not that expensive (i.e. one additional keystroke).
An example where it doesn't make sense to use a pointer is for reference types (slices, maps, and channels)
As mentioned in this thread:
The concept of a reference just means something that serves the purpose of referring you to something. It's not magical.
A pointer is a simple reference that tells you where to look.
A slice tells you where to start looking and how far.
Maps and channels also just tell you where to look, but the data they reference and the operations they support on it are more complex.
The point is that all the actually data is stored indirectly and all you're holding is information on how to access it.
As a result, in many cases you don't need to add another layer of indirection, unless you want a double indirection for some reason.
As twotwotwo details in "Pointers vs. values in parameters and return values", strings, interface values, and function values are also implemented with pointers.
As a consequence, you would rarely need a to use a pointer on those objects.
To quote the official golang documentation
...the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
It's very hard to give you exact conditions since there can be different performance goals. As a rule of thumb, by default, all objects larger than 128 bits should be passed by pointer. Possible exceptions of the rule:
you are writing latency sensitive server, so you want to minimise garbage collection pressure. In order to achieve that your Request struct has byte[8] field instead of pointer to Data struct which holds byte[8]. One allocation instead of two.
algorithm you are writing is more readable when you pass the struct and make a copy
etc.
I was wondering if it is possible to create a programming language without explicit memory allocation/deallocation (like C, C++ ...) AND without garbage collection (like Java, C#...) by doing a full analysis at the end of each scope?
The obvious problem is that this would take some time at the end of each scope, but I was wondering if it has become feasible with all the processing power and multiple cores in current CPU's. Do such languages exist already?
I also was wondering if a variant of C++ where smart pointers are the only pointers that can be used, would be exactly such a language (or am I missing some problems with that?).
Edit:
Well after some more research apparently it's this: http://en.wikipedia.org/wiki/Reference_counting
I was wondering why this isn't more popular. The disadvantages listed there don't seem quite serious, the overhead should be that large according to me. A (non-interpreted, properly written from the ground up) language with C family syntax with reference counting seems like a good idea to me.
The biggest problem with reference counting is that it is not a complete solution and is not capable of collecting a cyclic structure. The overhead is incurred every time you set a reference; for many kinds of problems this adds up quickly and can be worse than just waiting for a GC later. (Modern GC is quite advanced and awesome - don't count it down like that!!!)
What you are talking about is nothing special, and it shows up all the time. The C or C++ variant you are looking for is just plain regular C or C++.
For example write your program normally, but constrain yourself not to use any dynamic memory allocation (no new, delete, malloc, or free, or any of their friends, and make sure your libraries do the same), then you have that kind of system. You figure out in advance how much memory you need for everything you could do, and declare that memory statically (either function level static variables, or global variables). The compiler takes care of all the accounting the normal way, nothing special happens at the end of each scope, and no extra computation is necessary.
You can even configure your runtime environment to have a statically allocated stack space (this one isn't really under the compiler's control, more linker and operating system environment). Just figure out how deep your function call chain goes, and how much memory it uses (with a profiler or similar tool), an set it in your link options.
Without dynamic memory allocation (and thus no deallocation through either garbage collection or explicit management), you are limited to the memory you declared when you wrote the program. But that's ok, many programs don't need dynamic memory, and are already written that way. The real need for this shows up in embedded and real-time systems when you absolutely, positively need to know exactly how long an operation will take, how much memory (and other resources) it will use, and that the running time and the use of those resources can't ever change.
The great thing about C and C++ is that the language requires so little from the environment, and gives you the tools to do so much, that smart pointers or statically allocated memory, or even some special scheme that you dream up can be implemented. Requiring the use them, and the constraints you put on yourself just becomes a policy decision. You can enforce that policy with code auditing (use scripts to scan the source or object files and don't permit linking to the dynamic memory libraries)
Do you know if are there pointers in Haskell?
If yes: how do you use them? Are there any problems with them? And why aren't they popular?
If no: is there any reason for it?
Yes there are. Take a look at Foreign.Ptr or Data.IORef
I suspect this wasn't what you are asking for though. As Haskell is for the most part without state, it means pointers don't fit into the language design. Having a pointer to memory outside the function would mean that a function is no longer pure and only allowing pointers to values within the current function is useless.
Haskell does provide pointers, via the foreign function interface extension. Look at, for example, Foreign.Storable.
Pointers are used for interoperating with C code. Not for every day Haskell programming.
If you're looking for references -- pointers to objects you wish to mutate -- there are STRef and IORef, which serve many of the same uses as pointers. However, you should rarely -- if ever -- need Refs.
If you simply wish to avoid copying large values, as sepp2k supposes, then you need do nothing: in most implementation, all non-trivial values are allocated separately on a heap and refer to one another by machine-level addresses (i.e. pointers). But again, you need do nothing about any of this, it is taken care of for you.
To answer your question about how values are passed, they are passed in whatever way the implementation sees fit: since you can't mutate the values anyway, it doesn't impact the meaning of the code (as long as the strictness is respected); usually this works out to by-need unless you're passing in e.g. Int values that the compiler can see have already been evaluated...
Pass-by-need is like pass-by-reference, except that any given reference could refer either to an actual evaluated value (which cannot be changed), or to a "thunk" for a not-yet-evaluated value. Wikipedia has more.