Differences in ownership rules between PyDict_SetItem vs. PyList_SetItem or PyTuple_SetItem? - cpython

Two questions about ownership rules and reference counts using functions in CPython like PyDict_SetItem and PyList_SetItem:
I would appreciate it if someone could explain why PyDict_SetItem does not take ownership of the passed in value, while PyList_SetItem and PyTuple_SetItem do? Here's the passage from CPython docs that notes this difference:
When you pass an object reference into another function, in general,
the function borrows the reference from you — if it needs to store it,
it will use Py_INCREF() to become an independent owner. There are
exactly two important exceptions to this rule: PyTuple_SetItem() and
PyList_SetItem(). These functions take over ownership of the item
passed to them — even if they fail! (Note that PyDict_SetItem() and
friends don’t take over ownership — they are “normal.”)
Is this somehow logical (which would make it easier to internalize and remember) or is it mostly just a random historical artifact of a design decision made long ago?
Also, just to make sure I understand - the statement that PyDict_SetItem "does not take ownership of the reference" means that it calls Py_INCREF on the value that I pass in internally, right? ie- In most cases I would need to call Py_DECREF on the value soon after calling PyDict_SetItem (like what is done in the code in this stackoverflow question: Reference counting using PyDict_SetItemString)?
Thanks.

Related

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

If I use Google Closure, do I need to worry about absolute equality?

I was presented with this argument when fixing a bug related to implicit casting and speaking to the developer who told me:
If you use Closure, then you do not need absolute equality. Any value would be typed, therefor you don't need to worry about implicit casts.
My response was, that's dicey. You're making a lot of assumptions; Closure does not alter JavaScript, it's primarily a labyrinthine super layer (aside: probably a moot one, now that we have TypeScript).
Anyway, if one of these implicit things does slip by because the annotations don't resolve perfectly for some reason, you can end up with a tough bug (which is what happened; it got assigned to me because I guess the other dev didn't think it could be the problem).
I got responses of, "well if that dev had properly typed that object and this wasn't just an object and..."
Or...you could just protect against this sort of thing easily by using three equal signs instead of two. Use an assertion or console log to check the condition if necessary. Don't just leave it hanging out there.
Anyway what do you think; if you're using Closure, should you still observe the general best practice of using absolute equality in your JS code?
I know this leads to a wider conversation as well (e.g. Java 8's "optional" being "totally useless"), curious in the context of Closure though.
Question is a bit vague, code example would have helped. But yes, closure does not necessarily type every object and variable. I always use === unless there is a specific reason not to.

How does functional programming avoid state when it seems unavoidable?

Let's say we define a function c sum(a, b), functional programming -style, that returns the sum of its arguments. So far so good; all the nice things of FP without any problems.
Now let's say we run this in an environment with dynamic typing and a singleton, stateful error stream. Then let's say we pass a value of a and/or b that sum isn't designed to handle (i.e. not numbers), and it needs to indicate an error somehow.
But how? This function is supposed to be pure and side-effect-less. How does it insert an error into the global error stream without violating that?
No programming language that I know of has anything like a "singleton stateful error stream" built in, so you'd have to make one. And you simply wouldn't make such a thing if you were trying to write your program in a pure functional style.
You could, however, have a sum function that returns either the sum or an indication of an error. The type used to do this is in fact often known by the name Either. Then you could easily make a function that invokes a whole bunch of computations that could possibly return an error, and returns a list of all the errors that were encountered in the other computations. That's pretty close to what you were talking about; it's just explicitly returned rather than being global.
Remember, the question when you're writing a functional program is "how do I make a program that has the behavior I want?" not, "how would I duplicate one particular approach taken in another programming style?". A "global stateful error stream" is a means not an end. You can't have a global stateful error stream in pure function style, no. But ask yourself what you're using the global stateful error stream to achieve; whatever it is, you can achieve that in functional programming, just not with the same mechanism.
Asking whether pure functional programming can implement a particular technique that depends on side effects is like asking how you use techniques from assembly in object-oriented programming. OO provides different tools for you to use to solve problems; limiting yourself to using those tools to emulate a different toolset is not going to be an effective way to work with them.
In response to comments: If what you want to achieve with your error stream is logging error messages to a terminal, then yes, at some level the code is going to have to do IO to do that.1
Printing to terminal is just like any other IO, there's nothing particularly special about it that makes it worthy of singling out as a case where state seems especially unavoidable. So if this turns your question into "How do pure functional programs handle IO?", then there are no doubt many duplicate questions on SO, not to mention many many blog posts and tutorials speaking precisely to that issue. It's not like it's a sudden surprise to implementors and users of pure programming languages, the question has been around for decades, and there have been some quite sophisticated thought put into the answers.
There are different approaches taken in different languages (IO monad in Haskell, unique modes in Mercury, lazy streams of requests and responses in historical versions of Haskell, and more). The basic idea is to come up with a model which can be manipulated by pure code, and hook up manipulations of the model to actual impure operations within the language implementation. This allows you to keep the benefits of purity (the proofs that apply to pure code but not to general impure code will still apply to code using the pure IO model).
The pure model has to be carefully designed so that you can't actually do anything with it that doesn't make sense in terms of actual IO. For example, Mercury does IO by having you write programs as if you're passing around the current state of the universe as an extra parameter. This pure model accurately represents the behaviour of operations that depend on and affect the universe outside the program, but only when there is exactly one state of the universe in the system at any one time, which is threaded through the entire program from start to finish. So some restrictions are put in
The type io is made abstract so that there's no way to construct a value of that type; the only way you can get one is to be passed one from your caller. An io value is passed into the main predicate by the language implementation to kick the whole thing off.
The mode of the io value passed in to main is declared such that it is unique. This means you can't do things that might cause it to be duplicated, such as putting it in a container or passing the same io value to multiple different invocations. The unique mode ensures that you can only ass the io value to a predicate that also uses the unique mode, and as soon as you pass it once the value is "dead" and can't be passed anywhere else.
1 Note that even in imperative programs, you gain a lot of flexibility if you have your error logging system return a stream of error messages and then only actually make the decision to print them close to the outermost layer of the program. If your log calls are directly writing the output immediately, here's just a few things I can think of off the top of my head that become much harder to do with such a system:
Speculatively execute a computation and see whether it failed by checking whether it emitted any errors
Combine multiple high level systems into a single system, adding tags to the logs to distinguish each system
Emit debug and info log messages only if there is also an error message (so the output is clean when there are no errors to debug, and rich in detail when there are)

Does Haskell have pointers?

Do you know if are there pointers in Haskell?
If yes: how do you use them? Are there any problems with them? And why aren't they popular?
If no: is there any reason for it?
Yes there are. Take a look at Foreign.Ptr or Data.IORef
I suspect this wasn't what you are asking for though. As Haskell is for the most part without state, it means pointers don't fit into the language design. Having a pointer to memory outside the function would mean that a function is no longer pure and only allowing pointers to values within the current function is useless.
Haskell does provide pointers, via the foreign function interface extension. Look at, for example, Foreign.Storable.
Pointers are used for interoperating with C code. Not for every day Haskell programming.
If you're looking for references -- pointers to objects you wish to mutate -- there are STRef and IORef, which serve many of the same uses as pointers. However, you should rarely -- if ever -- need Refs.
If you simply wish to avoid copying large values, as sepp2k supposes, then you need do nothing: in most implementation, all non-trivial values are allocated separately on a heap and refer to one another by machine-level addresses (i.e. pointers). But again, you need do nothing about any of this, it is taken care of for you.
To answer your question about how values are passed, they are passed in whatever way the implementation sees fit: since you can't mutate the values anyway, it doesn't impact the meaning of the code (as long as the strictness is respected); usually this works out to by-need unless you're passing in e.g. Int values that the compiler can see have already been evaluated...
Pass-by-need is like pass-by-reference, except that any given reference could refer either to an actual evaluated value (which cannot be changed), or to a "thunk" for a not-yet-evaluated value. Wikipedia has more.

How to hand over variables to a function? With an array or variables?

When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.

Resources