I'm wondering if there is any perf benchmark on raw objects vs pointers to objects.
I'm aware that it doesn't make sense to use pointers on reference types (e.g. maps) so please don't mention it.
I'm aware that you "must" use pointers if the data needs to be updated so please don't mention it.
Most of the answers/ docs that I've found basically rephrase the guidelines from the official documentation:
... If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
My question is simply what means "large" / "big"? Is a pointer on a string overkill ? what about a struct with two strings, what about a struct 3 string fields??
I think we deal with this use case quite often so it's a fair question to ask. Some advise to don't mind the performance issue but maybe some people want to use the right notation whenever they have to chance even if the performance gain is not signifiant. After all a pointer is not that expensive (i.e. one additional keystroke).
An example where it doesn't make sense to use a pointer is for reference types (slices, maps, and channels)
As mentioned in this thread:
The concept of a reference just means something that serves the purpose of referring you to something. It's not magical.
A pointer is a simple reference that tells you where to look.
A slice tells you where to start looking and how far.
Maps and channels also just tell you where to look, but the data they reference and the operations they support on it are more complex.
The point is that all the actually data is stored indirectly and all you're holding is information on how to access it.
As a result, in many cases you don't need to add another layer of indirection, unless you want a double indirection for some reason.
As twotwotwo details in "Pointers vs. values in parameters and return values", strings, interface values, and function values are also implemented with pointers.
As a consequence, you would rarely need a to use a pointer on those objects.
To quote the official golang documentation
...the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
It's very hard to give you exact conditions since there can be different performance goals. As a rule of thumb, by default, all objects larger than 128 bits should be passed by pointer. Possible exceptions of the rule:
you are writing latency sensitive server, so you want to minimise garbage collection pressure. In order to achieve that your Request struct has byte[8] field instead of pointer to Data struct which holds byte[8]. One allocation instead of two.
algorithm you are writing is more readable when you pass the struct and make a copy
etc.
There are lots of real-world reasons you'd want to do this. Ours is because we have a list of variable length data structures, and we want to be able to change the size of one of the elements without recopying them all.
Here's a few things I've tried:
Just have a lot of kernel arguments. Sure, sounds hacky, but works for small N. This is actually what we've been doing.
Do 1) with some sort of macro loop which extends the kernel args to the max size (which I think is device dependent). I don't really want to do this... it sounds bad.
Create some sort of list of structs which contain pointers, and fill it before your kernel invocation. I tried this, and I think it violates the spec. According to what I've seen on the nVidia forums, preserving the address of a device pointer beyond one kernel invocation is illegal. If anyone can point to where in the spec it says this, I'd love to know, because I can't find it. However, this definitely breaks on ATI hardware, as it moves the objects around.
Give up, store the variable sized objects in a big array, and write a clever algorithm to use empty space so the whole array must be reflowed less often. This will work, but is an inelegant, complicated design. Also, it requires lots of scary pointer arithmetic...
Does anyone else have other ideas? What about experiences trying to do this; is there a least hacky way? Why?
To 3:
OpenCL 1.1 spec page 193 says "Arguments to kernel functions in a program cannot be declared as a pointer to a pointer(s)."
Struct containing a pointer to pointer (pointer to a buffer object) might not be against strict reading of this sentence but it's within the spirit: No pointers to buffer objects may be passed as arguments from host code to kernel even if they're hidden inside a user defined struct.
I'd opt for option 5: Do not use variable size data structures. If you have any way of making them constant size by all means do it. It will make your life a whole lot easier. To be precise there is no 'variable size structure'. Every struct definition produces constant sized structs, so if the size has changed then the struct itself has changed and therefore requires another mem object. Every pointer passed to kernel function must have a single type.
In addition to sharpnelis answer option 5:
If the objects have similar size you could use unions on the biggest possible object size. But make sure you use explicit alignment. Pass a second buffer identifying the union used in each object in your variable-sized-objects-in-static-size-union buffer.
I reverted to this when using opencl lib code that only allowed one variable array of arbitrary type. I simply used cl_float2 to pass two floats. Since the cl_floatN types are implemented as unions - what works for the build in types will work for you as well.
Do you know if are there pointers in Haskell?
If yes: how do you use them? Are there any problems with them? And why aren't they popular?
If no: is there any reason for it?
Yes there are. Take a look at Foreign.Ptr or Data.IORef
I suspect this wasn't what you are asking for though. As Haskell is for the most part without state, it means pointers don't fit into the language design. Having a pointer to memory outside the function would mean that a function is no longer pure and only allowing pointers to values within the current function is useless.
Haskell does provide pointers, via the foreign function interface extension. Look at, for example, Foreign.Storable.
Pointers are used for interoperating with C code. Not for every day Haskell programming.
If you're looking for references -- pointers to objects you wish to mutate -- there are STRef and IORef, which serve many of the same uses as pointers. However, you should rarely -- if ever -- need Refs.
If you simply wish to avoid copying large values, as sepp2k supposes, then you need do nothing: in most implementation, all non-trivial values are allocated separately on a heap and refer to one another by machine-level addresses (i.e. pointers). But again, you need do nothing about any of this, it is taken care of for you.
To answer your question about how values are passed, they are passed in whatever way the implementation sees fit: since you can't mutate the values anyway, it doesn't impact the meaning of the code (as long as the strictness is respected); usually this works out to by-need unless you're passing in e.g. Int values that the compiler can see have already been evaluated...
Pass-by-need is like pass-by-reference, except that any given reference could refer either to an actual evaluated value (which cannot be changed), or to a "thunk" for a not-yet-evaluated value. Wikipedia has more.
When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Is it worth learning the convention or is it a bane to readability and maintainability?
Considering that most people that use Hungarian Notation is following the misunderstood version of it, I'd say it's pretty pointless.
If you want to use the original definition of it, it might make more sense, but other than that it is mostly syntactic sugar.
If you read the Wikipedia article on the subject, you'll find two conflicting notations, Systems Hungarian Notation and Apps Hungarian Notation.
The original, good, definition is the Apps Hungarian Notation, but most people use the Systems Hungarian Notation.
As an example of the two, consider prefixing variables with l for length, a for area and v for volume.
With such notation, the following expression makes sense:
int vBox = aBottom * lVerticalSide;
but this doesn't:
int aBottom = lSide1;
If you're mixing the prefixes, they're to be considered part of the equation, and volume = area * length is fine for a box, but copying a length value into an area variable should raise some red flags.
Unfortunately, the other notation is less useful, where people prefix the variable names with the type of the value, like this:
int iLength;
int iVolume;
int iArea;
some people use n for number, or i for integer, f for float, s for string etc.
The original prefix was meant to be used to spot problems in equations, but has somehow devolved into making the code slightly easier to read since you don't have to go look for the variable declaration. With todays smart editors where you can simply hover over any variable to find the full type, and not just an abbreviation for it, this type of hungarian notation has lost a lot of its meaning.
But, you should make up your own mind. All I can say is that I don't use either.
Edit Just to add a short notice, while I don't use Hungarian Notation, I do use a prefix, and it's the underscore. I prefix all private fields of classes with a _ and otherwise spell their names as I would a property, titlecase with the first letter uppercase.
The Hungarian Naming Convention can be useful when used correctly, unfortunately it tends to be misused more often than not.
Read Joel Spolsky's article Making Wrong Code Look Wrong for appropriate perspective and justification.
Essentially, type based Hungarian notation, where variables are prefixed with information about their type (e.g. whether an object is a string, a handle, an int, etc.) is mostly useless and generally just adds overhead with very little benefit. This, sadly, is the Hungarian notation most people are familiar with. However, the intent of Hungarian notation as envisioned is to add information on the "kind" of data the variable contains. This allows you to partition kinds of data from other kinds of data which shouldn't be allowed to be mixed together except, possibly, through some conversion process. For example, pixel based coordinates vs. coordinates in other units, or unsafe user input versus data from safe sources, etc.
Look at it this way, if you find yourself spelunking through code to find out information on a variable then you probably need to adjust your naming scheme to contain that information, this is the essence of the Hungarian convention.
Note that an alternative to Hungarian notation is to use more classes to show the intent of variable usage rather than relying on primitive types everywhere. For example, instead of having variable prefixes for unsafe user input, you can have simple string wrapper class for unsafe user input, and a separate wrapper class for safe data. This has the advantage, in strongly typed languages, of having partitioning enforced by the compiler (even in less strongly typed languages you can usually add your own tripwire code) but adds a not insignificant amount of overhead.
I still use Hungarian Notation when it comes to UI elements, where several UI elements are related to a particular object/value, e.g.,
lblFirstName for the label object, txtFirstName for the text box. I definitely can't name them both "FirstName" even if that is the concern/responsibility of both objects.
How do others approach naming UI elements?
I think hungarian notation is an interesting footnote along the 'path' to more readable code, and if done properly, is preferable to not-doing it.
In saying that though, I'd rather do away with it, and instead of this:
int vBox = aBottom * lVerticalSide;
write this:
int boxVolume = bottomArea * verticalHeight;
It's 2008. We don't have 80 character fixed width screens anymore!
Also, if you're writing variable names which are much longer than that you should be looking at refactoring into objects or functions anyway.
It is pointless (and distracting) but is in relatively heavy use at my company, at least for types like ints, strings, booleans, and doubles.
Things like sValue, iCount, dAmount or fAmount, and bFlag are everywhere.
Once upon a time there was a good reason for this convention. Now, it is a cancer.
Sorry to follow up with a question, but does prefixing interfaces with "I" qualify as hungarian notation? If that is the case, then yes, a lot of people are using it in the real world. If not, ignore this.
I see Hungarian Notation as a way to circumvent the capacity of our short term memories. According to psychologists, we can store approximately 7 plus-or-minus 2 chunks of information. The extra information added by including a prefix helps us by providing more details about the meaning of an identifier even with no other context. In other words, we can guess what a variable is for without seeing how it is used or declared. This can be avoided by applying oo techniques such as encapsulation and the single responsibility principle.
I'm unaware of whether or not this has been studied empirically. I would hypothesize that the amount of effort increases dramatically when we try to understand classes with more than nine instance variables or methods with more than 9 local variables.
When I see Hungarian discussion, I'm glad to see people thinking hard about how to make their code clearer, and how to mistakes more visible. That's exactly what we should all be doing!
But don't forget that you have some powerful tools at your disposal besides naming.
Extract Method If your methods are getting so long that your variable declarations have scrolled off the top of the screen, consider making your methods smaller. (If you have too many methods, consider a new class.)
Strong typing If you find that you are taking zip codes stored in an integer variable and assigning them to a shoe size integer variable, consider making a class for zip codes and a class for shoe size. Then your bug will be caught at compile time, instead of requiring careful inspection by a human. When I do this, I usually find a bunch of zip code- and shoe size-specific logic that I've peppered around my code, which I can then move in to my new classes. Suddenly all my code gets clearer, simpler, and protected from certain classes of bugs. Wow.
To sum up: yes, think hard about how you use names in code to express your ideas clearly, but also look to the other powerful OO tools you can call on.
Isn't scope more important than type these days, e.g.
l for local
a for argument
m for member
g for global
etc
With modern techniques of refactoring old code, search and replace of a symbol because you changed its type is tedious, the compiler will catch type changes, but often will not catch incorrect use of scope, sensible naming conventions help here.
I don't use a very strict sense of hungarian notation, but I do find myself using it sparing for some common custom objects to help identify them, and also I tend to prefix gui control objects with the type of control that they are. For example, labelFirstName, textFirstName, and buttonSubmit.
I use Hungarian Naming for UI elements like buttons, textboxes and lables. The main benefit is grouping in the Visual Studio Intellisense Popup. If I want to access my lables, I simply start typing lbl.... and Visual Studio will suggest all my lables, nicley grouped together.
However, after doing more and more Silverlight and WPF stuff, leveraging data binding, I don't even name all my controls anymore, since I don't have to reference them from code-behind (since there really isn't any codebehind anymore ;)
What's wrong is mixing standards.
What's right is making sure that everyone does the same thing.
int Box = iBottom * nVerticleSide
The original prefix was meant to be
used to spot problems in equations,
but has somehow devolved into making
the code slightly easier to read since
you don't have to go look for the
variable declaration. With todays
smart editors where you can simply
hover over any variable to find the
full type, and not just an
abbreviation for it, this type of
hungarian notation has lost a lot of
its meaning.
I'm breaking the habit a little bit but prefixing with the type can be useful in JavaScript that doesn't have strong variable typing.
When using a dynamically typed language, I occasionally use Apps Hungarian. For statically typed languages I don't. See my explanation in the other thread.
Hungarian notation is pointless in type-safe languages. e.g. A common prefix you will see in old Microsoft code is "lpsz" which means "long pointer to a zero-terminated string". Since the early 1700's we haven't used segmented architectures where short and long pointers exist, the normal string representation in C++ is always zero-terminated, and the compiler is type-safe so won't let us apply non-string operations to the string. Therefore none of this information is of any real use to a programmer - it's just more typing.
However, I use a similar idea: prefixes that clarify the usage of a variable.
The main ones are:
m = member
c = const
s = static
v = volatile
p = pointer (and pp=pointer to pointer, etc)
i = index or iterator
These can be combined, so a static member variable which is a pointer would be "mspName".
Where are these useful?
Where the usage is important, it is a good idea to constantly remind the programmer that a variable is (e.g.) a volatile or a pointer
Pointer dereferencing used to do my head in until I used the p prefix. Now it's really easy to know when you have an object (Orange) a pointer to an object (pOrange) or a pointer to a pointer to an object (ppOrange). To dereference an object, just put an asterisk in front of it for each p in its name. Case solved, no more deref bugs!
In constructors I usually find that a parameter name is identical to a member variable's name (e.g. size). I prefer to use "mSize = size;" than "size = theSize" or "this.size = size". It is also much safer: I don't accidentally use "size = 1" (setting the parameter) when I meant to say "mSize = 1" (setting the member)
In loops, my iterator variables are all meaningful names. Most programmers use "i" or "index" and then have to make up new meaningless names ("j", "index2") when they want an inner loop. I use a meaningful name with an i prefix (iHospital, iWard, iPatient) so I always know what an iterator is iterating.
In loops, you can mix several related variables by using the same base name with different prefixes: Orange orange = pOrange[iOrange]; This also means you don't make array indexing errors (pApple[i] looks ok, but write it as pApple[iOrange] and the error is immediately obvious).
Many programmers will use my system without knowing it: by add a lengthy suffix like "Index" or "Ptr" - there isn't any good reason to use a longer form than a single character IMHO, so I use "i" and "p". Less typing, more consistent, easier to read.
This is a simple system which adds meaningful and useful information to code, and eliminates the possibility of many simple but common programming mistakes.
I've been working for IBM for the past 6 months and I haven't seen it anywhere (thank god because I hate it.) I see either camelCase or c_style.
thisMethodIsPrettyCool()
this_method_is_pretty_cool()
It depends on your language and environment. As a rule I wouldn't use it, unless the development environment you're in makes it hard to find the type of the variable.
There's also two different types of Hungarian notation. See Joel's article. I can't find it (his names don't exactly make them easy to find), anyone have a link to the one I mean?
Edit: Wedge has the article I mean in his post.
Original form (The Right Hungarian Notation :) ) where prefix means type (i.e. length, quantity) of value stored by variable is OK, but not necessary in all type of applications.
The popular form (The Wrong Hungarian Notation) where prefix means type (String, int) is useless in most of modern programming languages.
Especially with meaningless names like strA. I can't understand we people use meaningless names with long prefixes which gives nothing.
I use type based (Systems HN) for components (eg editFirstName, lblStatus etc) as it makes autocomplete work better.
I sometimes use App HN for variables where the type infomation is isufficient. Ie fpX indicates a fixed pointed variable (int type, but can't be mixed and matched with an int), rawInput for user strings that haven't been validated etc
Being a PHP programmer where it's very loosely typed, I don't make a point to use it. However I will occasionally identify something as an array or as an object depending on the size of the system and the scope of the variable.