Does Haskell have pointers? - pointers

Do you know if are there pointers in Haskell?
If yes: how do you use them? Are there any problems with them? And why aren't they popular?
If no: is there any reason for it?

Yes there are. Take a look at Foreign.Ptr or Data.IORef
I suspect this wasn't what you are asking for though. As Haskell is for the most part without state, it means pointers don't fit into the language design. Having a pointer to memory outside the function would mean that a function is no longer pure and only allowing pointers to values within the current function is useless.

Haskell does provide pointers, via the foreign function interface extension. Look at, for example, Foreign.Storable.
Pointers are used for interoperating with C code. Not for every day Haskell programming.
If you're looking for references -- pointers to objects you wish to mutate -- there are STRef and IORef, which serve many of the same uses as pointers. However, you should rarely -- if ever -- need Refs.

If you simply wish to avoid copying large values, as sepp2k supposes, then you need do nothing: in most implementation, all non-trivial values are allocated separately on a heap and refer to one another by machine-level addresses (i.e. pointers). But again, you need do nothing about any of this, it is taken care of for you.
To answer your question about how values are passed, they are passed in whatever way the implementation sees fit: since you can't mutate the values anyway, it doesn't impact the meaning of the code (as long as the strictness is respected); usually this works out to by-need unless you're passing in e.g. Int values that the compiler can see have already been evaluated...
Pass-by-need is like pass-by-reference, except that any given reference could refer either to an actual evaluated value (which cannot be changed), or to a "thunk" for a not-yet-evaluated value. Wikipedia has more.

Related

Julia functions: making mutable types immutable

Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.

why use pointers

I'm learning C++ on my own. I'm an EE and learned it about 20 years ago, but in the progress of my career I stopped programming and didn't take it up again until recently. I should add that I never took any classes in programming.
I have a theoretical question about pointers. In reading the books about pointers it seems they have an important role in C++. My problem is that I can't see what that is. I see that pointers have a role in arrays, but I can't see their role in anything else.
I can see what they do, but I don't see why use pointers in the situations I see them in. Either references or straight variables would work just as well. I have a feeling the answer lies in the area of memory ( it's optimal use), but I just don't know.
Any answers would be appreciated. Thanks.
Consider the following from cplusplus.com:
"[T]here may be cases where the memory needs of a program can only be
determined during runtime. For example, when the memory needed depends
on user input. On these cases, programs need to dynamically allocate
memory, for which the C++ language integrates the operators new and
delete."
If you could determine all your memory needs prior to run time and did not need to make use of any abstract data type like a linked list, then yes, it would be difficult to see their use. However, what if you want to store values in an array, but you don't yet know how big that array will need to be?
Another value of pointers arises when you consider passing values from function to function. You may find this thread of value regarding the differences between pointers and references in C++ and how/why to use each.
We have been having several pedagogical conversations focused on pointers on the CSEducators.SE site. I'd encourage you to read those as well:
Simple Pointer Examples in C
Lesson Idea: Arrays, Pointers, and Syntactic Sugar
Pointers come from C, which had no concept of reference, and which C++ inherited from.
Everything that can be done with a reference in C++ is done with a pointer in C.
I find this question really great because it is pure.
A programming language is considered "safe" when the programs written in it can only call functions and access data that the program can name.
Now, the concept of pointer was invented to break this sandbox of safety and provide developer with freedom to think and act outside of the box.
Think of pointers as poor man's tool to achieve something not provided by the programming language itself.
It is misleading to think you could achieve higher performance if programmed some algorithm using pointers. Optimization is privilege of the compiler and hardware, not human.

Why should I use a pointer ( performance)?

I'm wondering if there is any perf benchmark on raw objects vs pointers to objects.
I'm aware that it doesn't make sense to use pointers on reference types (e.g. maps) so please don't mention it.
I'm aware that you "must" use pointers if the data needs to be updated so please don't mention it.
Most of the answers/ docs that I've found basically rephrase the guidelines from the official documentation:
... If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
My question is simply what means "large" / "big"? Is a pointer on a string overkill ? what about a struct with two strings, what about a struct 3 string fields??
I think we deal with this use case quite often so it's a fair question to ask. Some advise to don't mind the performance issue but maybe some people want to use the right notation whenever they have to chance even if the performance gain is not signifiant. After all a pointer is not that expensive (i.e. one additional keystroke).
An example where it doesn't make sense to use a pointer is for reference types (slices, maps, and channels)
As mentioned in this thread:
The concept of a reference just means something that serves the purpose of referring you to something. It's not magical.
A pointer is a simple reference that tells you where to look.
A slice tells you where to start looking and how far.
Maps and channels also just tell you where to look, but the data they reference and the operations they support on it are more complex.
The point is that all the actually data is stored indirectly and all you're holding is information on how to access it.
As a result, in many cases you don't need to add another layer of indirection, unless you want a double indirection for some reason.
As twotwotwo details in "Pointers vs. values in parameters and return values", strings, interface values, and function values are also implemented with pointers.
As a consequence, you would rarely need a to use a pointer on those objects.
To quote the official golang documentation
...the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
It's very hard to give you exact conditions since there can be different performance goals. As a rule of thumb, by default, all objects larger than 128 bits should be passed by pointer. Possible exceptions of the rule:
you are writing latency sensitive server, so you want to minimise garbage collection pressure. In order to achieve that your Request struct has byte[8] field instead of pointer to Data struct which holds byte[8]. One allocation instead of two.
algorithm you are writing is more readable when you pass the struct and make a copy
etc.

time.Time: pointer or value

The Go docs say (emphasis added):
Programs using times should typically store and pass them as values, not pointers. That is, time variables and struct fields should be of type time.Time, not *time.Time. A Time value can be used by multiple goroutines simultaneously.
Is the last sentence (about using a Time value in multiple goroutines simultaneously) the only reason that they should "typically" be stored and passed as a value, rather than a pointer? Is this common to other structs as well? I tried looking for any logic that specifically enables this in the time.Time declaration and methods, but didn't notice anything special there.
Update: I often have to serve JSON representations of my structs, and I'd rather omit empty/uninitialized times. The json:",omitempty" tag doesn't work with time.Time values, which appears to be the expected behavior, but the best workaround seems to be to use a pointer, which goes against the advice in the docs quoted above.
It's common for many kind of simple values.
In Go, when some value isn't bigger than one or two words, it's common to simply use it as a value instead of using a pointer. Simply because there's no reason to use a pointer if the object is small and you don't pass it to be changed.
You might have to unlearn the practice of languages where everything structured couldn't be handled as values. It's probably natural for you to use integers or floating point numbers as values, not pointers. Why not do the same for times ?
Regarding your precise problem with JSON and assuming you don't want to write a specific Marshaller just for this, there's no problem in using a *time.Time. In fact this issue was already mentioned in the golang-nuts list.

Do people use the Hungarian Naming Conventions in the real world? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Is it worth learning the convention or is it a bane to readability and maintainability?
Considering that most people that use Hungarian Notation is following the misunderstood version of it, I'd say it's pretty pointless.
If you want to use the original definition of it, it might make more sense, but other than that it is mostly syntactic sugar.
If you read the Wikipedia article on the subject, you'll find two conflicting notations, Systems Hungarian Notation and Apps Hungarian Notation.
The original, good, definition is the Apps Hungarian Notation, but most people use the Systems Hungarian Notation.
As an example of the two, consider prefixing variables with l for length, a for area and v for volume.
With such notation, the following expression makes sense:
int vBox = aBottom * lVerticalSide;
but this doesn't:
int aBottom = lSide1;
If you're mixing the prefixes, they're to be considered part of the equation, and volume = area * length is fine for a box, but copying a length value into an area variable should raise some red flags.
Unfortunately, the other notation is less useful, where people prefix the variable names with the type of the value, like this:
int iLength;
int iVolume;
int iArea;
some people use n for number, or i for integer, f for float, s for string etc.
The original prefix was meant to be used to spot problems in equations, but has somehow devolved into making the code slightly easier to read since you don't have to go look for the variable declaration. With todays smart editors where you can simply hover over any variable to find the full type, and not just an abbreviation for it, this type of hungarian notation has lost a lot of its meaning.
But, you should make up your own mind. All I can say is that I don't use either.
Edit Just to add a short notice, while I don't use Hungarian Notation, I do use a prefix, and it's the underscore. I prefix all private fields of classes with a _ and otherwise spell their names as I would a property, titlecase with the first letter uppercase.
The Hungarian Naming Convention can be useful when used correctly, unfortunately it tends to be misused more often than not.
Read Joel Spolsky's article Making Wrong Code Look Wrong for appropriate perspective and justification.
Essentially, type based Hungarian notation, where variables are prefixed with information about their type (e.g. whether an object is a string, a handle, an int, etc.) is mostly useless and generally just adds overhead with very little benefit. This, sadly, is the Hungarian notation most people are familiar with. However, the intent of Hungarian notation as envisioned is to add information on the "kind" of data the variable contains. This allows you to partition kinds of data from other kinds of data which shouldn't be allowed to be mixed together except, possibly, through some conversion process. For example, pixel based coordinates vs. coordinates in other units, or unsafe user input versus data from safe sources, etc.
Look at it this way, if you find yourself spelunking through code to find out information on a variable then you probably need to adjust your naming scheme to contain that information, this is the essence of the Hungarian convention.
Note that an alternative to Hungarian notation is to use more classes to show the intent of variable usage rather than relying on primitive types everywhere. For example, instead of having variable prefixes for unsafe user input, you can have simple string wrapper class for unsafe user input, and a separate wrapper class for safe data. This has the advantage, in strongly typed languages, of having partitioning enforced by the compiler (even in less strongly typed languages you can usually add your own tripwire code) but adds a not insignificant amount of overhead.
I still use Hungarian Notation when it comes to UI elements, where several UI elements are related to a particular object/value, e.g.,
lblFirstName for the label object, txtFirstName for the text box. I definitely can't name them both "FirstName" even if that is the concern/responsibility of both objects.
How do others approach naming UI elements?
I think hungarian notation is an interesting footnote along the 'path' to more readable code, and if done properly, is preferable to not-doing it.
In saying that though, I'd rather do away with it, and instead of this:
int vBox = aBottom * lVerticalSide;
write this:
int boxVolume = bottomArea * verticalHeight;
It's 2008. We don't have 80 character fixed width screens anymore!
Also, if you're writing variable names which are much longer than that you should be looking at refactoring into objects or functions anyway.
It is pointless (and distracting) but is in relatively heavy use at my company, at least for types like ints, strings, booleans, and doubles.
Things like sValue, iCount, dAmount or fAmount, and bFlag are everywhere.
Once upon a time there was a good reason for this convention. Now, it is a cancer.
Sorry to follow up with a question, but does prefixing interfaces with "I" qualify as hungarian notation? If that is the case, then yes, a lot of people are using it in the real world. If not, ignore this.
I see Hungarian Notation as a way to circumvent the capacity of our short term memories. According to psychologists, we can store approximately 7 plus-or-minus 2 chunks of information. The extra information added by including a prefix helps us by providing more details about the meaning of an identifier even with no other context. In other words, we can guess what a variable is for without seeing how it is used or declared. This can be avoided by applying oo techniques such as encapsulation and the single responsibility principle.
I'm unaware of whether or not this has been studied empirically. I would hypothesize that the amount of effort increases dramatically when we try to understand classes with more than nine instance variables or methods with more than 9 local variables.
When I see Hungarian discussion, I'm glad to see people thinking hard about how to make their code clearer, and how to mistakes more visible. That's exactly what we should all be doing!
But don't forget that you have some powerful tools at your disposal besides naming.
Extract Method If your methods are getting so long that your variable declarations have scrolled off the top of the screen, consider making your methods smaller. (If you have too many methods, consider a new class.)
Strong typing If you find that you are taking zip codes stored in an integer variable and assigning them to a shoe size integer variable, consider making a class for zip codes and a class for shoe size. Then your bug will be caught at compile time, instead of requiring careful inspection by a human. When I do this, I usually find a bunch of zip code- and shoe size-specific logic that I've peppered around my code, which I can then move in to my new classes. Suddenly all my code gets clearer, simpler, and protected from certain classes of bugs. Wow.
To sum up: yes, think hard about how you use names in code to express your ideas clearly, but also look to the other powerful OO tools you can call on.
Isn't scope more important than type these days, e.g.
l for local
a for argument
m for member
g for global
etc
With modern techniques of refactoring old code, search and replace of a symbol because you changed its type is tedious, the compiler will catch type changes, but often will not catch incorrect use of scope, sensible naming conventions help here.
I don't use a very strict sense of hungarian notation, but I do find myself using it sparing for some common custom objects to help identify them, and also I tend to prefix gui control objects with the type of control that they are. For example, labelFirstName, textFirstName, and buttonSubmit.
I use Hungarian Naming for UI elements like buttons, textboxes and lables. The main benefit is grouping in the Visual Studio Intellisense Popup. If I want to access my lables, I simply start typing lbl.... and Visual Studio will suggest all my lables, nicley grouped together.
However, after doing more and more Silverlight and WPF stuff, leveraging data binding, I don't even name all my controls anymore, since I don't have to reference them from code-behind (since there really isn't any codebehind anymore ;)
What's wrong is mixing standards.
What's right is making sure that everyone does the same thing.
int Box = iBottom * nVerticleSide
The original prefix was meant to be
used to spot problems in equations,
but has somehow devolved into making
the code slightly easier to read since
you don't have to go look for the
variable declaration. With todays
smart editors where you can simply
hover over any variable to find the
full type, and not just an
abbreviation for it, this type of
hungarian notation has lost a lot of
its meaning.
I'm breaking the habit a little bit but prefixing with the type can be useful in JavaScript that doesn't have strong variable typing.
When using a dynamically typed language, I occasionally use Apps Hungarian. For statically typed languages I don't. See my explanation in the other thread.
Hungarian notation is pointless in type-safe languages. e.g. A common prefix you will see in old Microsoft code is "lpsz" which means "long pointer to a zero-terminated string". Since the early 1700's we haven't used segmented architectures where short and long pointers exist, the normal string representation in C++ is always zero-terminated, and the compiler is type-safe so won't let us apply non-string operations to the string. Therefore none of this information is of any real use to a programmer - it's just more typing.
However, I use a similar idea: prefixes that clarify the usage of a variable.
The main ones are:
m = member
c = const
s = static
v = volatile
p = pointer (and pp=pointer to pointer, etc)
i = index or iterator
These can be combined, so a static member variable which is a pointer would be "mspName".
Where are these useful?
Where the usage is important, it is a good idea to constantly remind the programmer that a variable is (e.g.) a volatile or a pointer
Pointer dereferencing used to do my head in until I used the p prefix. Now it's really easy to know when you have an object (Orange) a pointer to an object (pOrange) or a pointer to a pointer to an object (ppOrange). To dereference an object, just put an asterisk in front of it for each p in its name. Case solved, no more deref bugs!
In constructors I usually find that a parameter name is identical to a member variable's name (e.g. size). I prefer to use "mSize = size;" than "size = theSize" or "this.size = size". It is also much safer: I don't accidentally use "size = 1" (setting the parameter) when I meant to say "mSize = 1" (setting the member)
In loops, my iterator variables are all meaningful names. Most programmers use "i" or "index" and then have to make up new meaningless names ("j", "index2") when they want an inner loop. I use a meaningful name with an i prefix (iHospital, iWard, iPatient) so I always know what an iterator is iterating.
In loops, you can mix several related variables by using the same base name with different prefixes: Orange orange = pOrange[iOrange]; This also means you don't make array indexing errors (pApple[i] looks ok, but write it as pApple[iOrange] and the error is immediately obvious).
Many programmers will use my system without knowing it: by add a lengthy suffix like "Index" or "Ptr" - there isn't any good reason to use a longer form than a single character IMHO, so I use "i" and "p". Less typing, more consistent, easier to read.
This is a simple system which adds meaningful and useful information to code, and eliminates the possibility of many simple but common programming mistakes.
I've been working for IBM for the past 6 months and I haven't seen it anywhere (thank god because I hate it.) I see either camelCase or c_style.
thisMethodIsPrettyCool()
this_method_is_pretty_cool()
It depends on your language and environment. As a rule I wouldn't use it, unless the development environment you're in makes it hard to find the type of the variable.
There's also two different types of Hungarian notation. See Joel's article. I can't find it (his names don't exactly make them easy to find), anyone have a link to the one I mean?
Edit: Wedge has the article I mean in his post.
Original form (The Right Hungarian Notation :) ) where prefix means type (i.e. length, quantity) of value stored by variable is OK, but not necessary in all type of applications.
The popular form (The Wrong Hungarian Notation) where prefix means type (String, int) is useless in most of modern programming languages.
Especially with meaningless names like strA. I can't understand we people use meaningless names with long prefixes which gives nothing.
I use type based (Systems HN) for components (eg editFirstName, lblStatus etc) as it makes autocomplete work better.
I sometimes use App HN for variables where the type infomation is isufficient. Ie fpX indicates a fixed pointed variable (int type, but can't be mixed and matched with an int), rawInput for user strings that haven't been validated etc
Being a PHP programmer where it's very loosely typed, I don't make a point to use it. However I will occasionally identify something as an array or as an object depending on the size of the system and the scope of the variable.

Resources