How does subtracting .as_ptr() values work? - pointers

Looking at a code snippet for parsing HTTP requests provided as part of the Tokio examples, I see the following code:
let toslice = |a: &[u8]| {
let start = a.as_ptr() as usize - src.as_ptr() as usize;
assert!(start < src.len());
(start, start + a.len())
};
As I understand, the above code snippet it is getting the pointer location for the input vector a and the pointer location for a variable outside the scope of the closure and subtracting them. Then it returns a tuple containing this calculated value and the calculated value plus the length of the input vector.
What is this trying to accomplish? One could end up with a negative number and then panic because it wouldn't cast to usize. In fact, when I compile the example, this is exactly what happens when the input is the bytes for the string GET or POST, but not for other values. Is this a performance optimization for doing some sort of substring from a vector?

Yes this is just subtracting pointers. The missing context is the closure clearly intends that a must be a subslice (substring) of the closed-over src slice. Thus toslice(a) ends up returning the start and end indices of a inside src. More explicitly, let (start, end) = toslice(a); means src[start] is a[0] (not just equal value, they are the same address), and src[end - 1] is the last byte of a.
Violating this context assumption will likely produce panics. That's fine because this closure is a local variable that is only used locally and not exposed to any unknown users, so the only calls to the closure are in the example you linked and they evidently all satisfy the constraint.

Related

Should I return Multiple Values with caution?

In Practical Common Lisp, Peter Seibel write:
The mechanism by which multiple values are returned is implementation dependent just like the mechanism for passing arguments into functions is. Almost all language constructs that return the value of some subform will "pass through" multiple values, returning all the values returned by the subform. Thus, a function that returns the result of calling VALUES or VALUES-LIST will itself return multiple values--and so will another function whose result comes from calling the first function. And so on.
The implementation dependent does worry me.
My understanding is that the following code might just return primary value:
> (defun f ()
(values 'a 'b))
> (defun g ()
(f))
> (g) ; ==> a ? or a b ?
If so, does it mean that I should use this feature sparingly?
Any help is appreciated.
It's implementation-dependent in the sense that how multiple values are returned at the CPU level may vary from implementation to implementation. However, the semantics are well-specified at the language level and you generally do not need to be concerned about the low-level implementation.
See section 2.5, "Function result protocol", of The Movitz development platform for an example of how one implementation handles multiple return values:
The CPU’s carry flag (i.e. the CF bit in the eflags register) is used to signal whether anything other than precisely one value is being returned. Whenever CF is set, ecx holds the number of values returned. When CF is cleared, a single value in eax is implied. A function’s primary value is always returned in eax. That is, even when zero values are returned, eax is loaded with nil.
It's this kind of low-level detail that may vary from implementation to implementation.
One thing to be aware: there is a limit for the number of values which can be returned on a specific Common Lisp implementation.
The variable MULTIPLE-VALUES-LIMIT has the implementation/machine specific value of the maximum numbers of values which can be returned. The standard says that it should not be smaller than 20. SBCL has a very large number on my computer, while LispWorks has only 51, ECL has 64 and CLISP has 128.
But I can't remember seeing Lisp code which wants to return more than 5 values.

What is the core difference between t=&T{} and t=new(T)

It seems that both ways to create a new object pointer with all "0" member values, both returns a pointer:
type T struct{}
...
t1:=&T{}
t2:=new(T)
So what is the core difference between t1 and t2, or is there anything that "new" can do while &T{} cannot, or vice versa?
[…] is there anything that "new" can do while &T{} cannot, or vice versa?
I can think of three differences:
The "composite literal" syntax (the T{} part of &T{}) only works for "structs, arrays, slices, and maps" [link], whereas the new function works for any type [link].
For a struct or array type, the new function always generates zero values for its elements, whereas the composite literal syntax lets you initialize some of the elements to non-zero values if you like.
For a slice or map type, the new function always returns a pointer to nil, whereas the composite literal syntax always returns an initialized slice or map. (For maps this is very significant, because you can't add elements to nil.) Furthermore, the composite literal syntax can even create a non-empty slice or map.
(The second and third bullet-points are actually two aspects of the same thing — that the new function always creates zero values — but I list them separately because the implications are a bit different for the different types.)
For structs and other composites, both are same.
t1:=&T{}
t2:=new(T)
//Both are same
You cannot return the address of un-named variable initialised to zero value of other basic types like int without using new. You would need to create a named variable and then take its address.
func newInt() *int {
return new(int)
}
func newInt() *int {
// return &int{} --> invalid
var dummy int
return &dummy
}
See ruakh's answer. I want to point out some of the internal implementation details, though. You should not make use of them in production code, but they help illuminate what really happens behind the scenes, in the Go runtime.
Essentially, a slice is represented by three values. The reflect package exports a type, SliceHeader:
SliceHeader is the runtime representation of a slice. It cannot be used safely or portably and its representation may change in a later release. Moreover, the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
If we use this to inspect a variable of type []T (for any type T), we can see the three parts: the pointer to the underlying array, the length, and the capacity. Internally, a slice value v always has all three of these parts. There's a general condition that I think should hold, and if you don't use unsafe to break it, it seems by inspection that it will hold (based on limited testing anyway):
either the Data field is not zero (in which case Len and Cap can but need not be nonzero), or
the Data field is zero (in which case the Len and Cap should both be zero).
That slice value v is nil if the Data field is zero.
By using the unsafe package, we can break it deliberately (and then put it all back—and hopefully nothing goes wrong while we have it broken) and thus inspect the pieces. When this code on the Go Playground is run (there's a copy below as well), it prints:
via &literal: base of array is 0x1e52bc; len is 0; cap is 0.
Go calls this non-nil.
via new: base of array is 0x0; len is 0; cap is 0.
Go calls this nil even though we clobbered len() and cap()
Making it non-nil by unsafe hackery, we get [42] (with cap=1).
after setting *p1=nil: base of array is 0x0; len is 0; cap is 0.
Go calls this nil even though we clobbered len() and cap()
Making it non-nil by unsafe hackery, we get [42] (with cap=1).
The code itself is a bit long so I have left it to the end (or use the above link to the Playground). But it shows that the actual p == nil test in the source compiles to just an inspection of the Data field.
When you do:
p2 := new([]int)
the new function actually allocates only the slice header. It sets all three parts to zero and returns the pointer to the resulting header. So *p2 has three zero fields in it, which makes it a correct nil value.
On the other hand, when you do:
p1 := &[]int{}
the Go compiler builds an empty array (of size zero, holding zero ints) and then builds a slice header: the pointer part points to the empty array, and the length and capacity are set to zero. Then p1 points to this header, with the non-nil Data field. A later assignment, *p1 = nil, writes zeros into all three fields.
Let me repeat this with boldface: these are not promised by the language specification, they're just the actual implementation in action.
Maps work very similarly. A map variable is actually a pointer to a map header. The details of map headers are even less accessible than those of slice headers: there is no reflect type for them. The actual implementation is viewable here under type hmap (note that it is not exported).
What this means is that m2 := new(map[T1]T2) really only allocates one pointer, and set that pointer itself to nil. There is no actual map! The new function returns the nil pointer, and m2 is then nil. Likewise var m1 map[T1]T2 just sets a simple pointer value in m1 to nil. But var m3 map[T1]T2{} allocates an actual hmap structure, fills it in, and makes m3 point to it. We can once again peek behind the curtain on the Go Playground, with code that is not guaranteed to work tomorrow, to see this in effect.
As someone writing Go programs, you don't need to know any of this. But if you have worked with lower-level languages (assembly and C for instance), these explain a lot. In particular, these explain why you cannot insert into a nil map: the map variable itself holds a pointer value, and until the map variable itself has a non-nil pointer to a (possibly empty) map-header, there is no way to do the insertion. An insertion could allocate a new map and insert the data, but the map variable wouldn't point to the correct hmap header object.
(The language authors could have made this work by using a second level of indirection: a map variable could be a pointer pointing to the variable that points to the map header. Or they could have made map variables always point to a header, and made new actually allocate a header, the way make does; then there would never be a nil map. But they didn't do either of these, and we get what we get, which is fine: you just need to know to initialize the map.)
Here's the slice inspector. (Use the playground link to view the map inspector: given that I had to copy hmap's definition out of the runtime, I expect it to be particularly fragile and not worth showing. The slice header's structure seems far less likely to change over time.)
package main
import (
"fmt"
"reflect"
"unsafe"
)
func main() {
p1 := &[]int{}
p2 := new([]int)
show("via &literal", *p1)
show("\nvia new", *p2)
*p1 = nil
show("\nafter setting *p1=nil", *p1)
}
// This demonstrates that given a slice (p), the test
// if p == nil
// is really a test on p.Data. If it's zero (nil),
// the slice as a whole is nil. If it's nonzero, the
// slice as a whole is non-nil.
func show(what string, p []int) {
pp := unsafe.Pointer(&p)
sh := (*reflect.SliceHeader)(pp)
fmt.Printf("%s: base of array is %#x; len is %d; cap is %d.\n",
what, sh.Data, sh.Len, sh.Cap)
olen, ocap := len(p), cap(p)
sh.Len, sh.Cap = 1, 1 // evil
if p == nil {
fmt.Println(" Go calls this nil even though we clobbered len() and cap()")
answer := 42
sh.Data = uintptr(unsafe.Pointer(&answer))
fmt.Printf(" Making it non-nil by unsafe hackery, we get %v (with cap=%d).\n",
p, cap(p))
sh.Data = 0 // restore nil-ness
} else {
fmt.Println("Go calls this non-nil.")
}
sh.Len, sh.Cap = olen, ocap // undo evil
}

How to safely remove item from a vector?

Let's say I have this vector:
let mut v = vec![1,2,3];
And I want to remove some item from it:
v.remove(3);
It panics. How can I catch/gracefully handle that panic? I tried to use panic::catch_unwind but it doesn't seem to work with vectors (std::vec::Vec<i32> may not be safely transferred across an unwind boundary). Should I manually check if item exists at an index before removing it?
In general, vector and slice methods consider it a programming error if they receive an index that is out of range, and the convention in Rust is to panic for programming errors. If your code panics, you generally need to fix the code to uphold the invariant that was disregarded.
Some of the slice methods have variants that don't panic for invalid indices. One example is the indexing operator [index], which panics for and out-of-bounds index, and the get() method, which returns None if the index is out of bounds.
The remove() method does not have an equivalent that does not panic. You should check the index manually before passing it in:
if (index < v.len()) {
v.remove(index);
} else {
// Handle error
}
In real applications, this should rarely be necessary, though. The code that generates the index to be deleted can usually be written in a way that it will only yield in-bounds indices.

slow execution of string comparision

my problem why my program takes much large time to execute, this program is supposed to check the user password, the approach used is
take password form console in to array and
compare it with previously saved password
comparision is done by function str_cmp()-returns zero if strings are equal,non zero if not equal
#include<stdio.h>
char str_cmp(char *,char *);
int main(void)
{
int i=0;
char c,cmp[10],org[10]="0123456789";
printf("\nEnter your account password\ntype 0123456789\n");
for(i=0;(c=getchar())!=EOF;i++)
cmp[i]=c;
if(!str_cmp(org,cmp))
{
printf("\nLogin Sucessful");
}
else
printf("\nIncorrect Password");
return 0;
}
char str_cmp(char *porg,char *pcmp)
{
int i=0,l=0;
for(i=0;*porg+i;i++)
{
if(!(*porg+i==*pcmp+i))
{
l++;
}
}
return l;
}
There are libraries available to do this much more simply but I will assume that this is an assignment and either way it is a good learning experience. I think the problem is in your for loop in the str_cmp function. The condition you are using is "*porg+i". This is not really doing a comparison. What the compiler is going to do is go until the expression is equal to 0. That will happen once i is so large that *porg+i is larger than what an "int" can store and it gets reset to 0 (this is called overflowing the variable).
Instead, you should pass a size into the str_cmp function corresponding to the length of the strings. In the for loop condition you should make sure that i < str_size.
However, there is a build in strncmp function (http://www.elook.org/programming/c/strncmp.html) that does this exact thing.
You also have a different problem. You are doing pointer addition like so:
*porg+i
This is going to take the value of the first element of the array and add i to it. Instead you want to do:
*(porg+i)
That will add to the pointer and then dereference it to get the value.
To clarify more fully with the comparison because this is a very important concept for pointers. porg is defined as a char*. This means that you have a variable that has the memory address of a 'char'. When you use the dereference operator (*, for example *porg) on the variable, it returns the value at stored in that piece of memory. However, you can add a number to the memory location to move to a different memory location. porg + 1 is going to return the memory location after porg. Therefore, when you do *porg + 1 you are getting the value at the memory address and adding 1 to it. On the other hand, when you do *(porg + 1) you are getting the value at the memory address one after where porg is pointing to. This is useful for arrays because arrays are store their values one after another. However, a more understandable notation for doing this is: porg[1]. This says "get the value 1 after the beginning of the array" or in other words "get the second element of the array".
All conditions in C are checking if the value is zero or non-zero. Zero means false, and every other value means true. When you use this expression (*porg + 1) for a condition it is going to do the calculation (value at porg + 1) and check if it is zero or not.
This leads me to the other very important concept for programming in C. An int can only hold values up to a certain size. If the variable is added to enough where it is larger than that maximum value, it will cycle around to 0. So lets say the maximum value of an int is 256 (it is in fact much larger). If you have an int that has the value of 256 and add 1 to it, it will become zero instead of 257. In reality the maximum number is 65,536 for most compilers so this is why it is taking so long. It is waiting until *porg + i is greater than 65,536 so that it becomes zero again.
Try including string.h:
#include <string.h>
Then use the built-in strcmp() function. The existing string functions have already been written to be as fast as possible in most situations.
Also, I think your for statement is messed up:
for(i=0;*porg+i;i++)
That's going to dereference the pointer, then add i to it. I'm surprised the for loop ever exits.
If you change it to this, it should work:
for(i=0;porg[i];i++)
Your original string is also one longer than you think it is. You allocate 10 bytes, but it's actually 11 bytes long. A string (in quotes) is always ended with a null character. You need to declare 11 bytes for your char array.
Another issue:
if(!(*porg+i==*pcmp+i))
should be changed to
if(!(porg[i]==pcmp[i]))
For the same reasons listed above.

Stackoverflow with specialized Hashtbl (via Hashtbl.make)

I am using this piece of code and a stackoverflow will be triggered, if I use Extlib's Hashtbl the error does not occur. Any hints to use specialized Hashtbl without stackoverflow?
module ColorIdxHash = Hashtbl.Make(
struct
type t = Img_types.rgb_t
let equal = (==)
let hash = Hashtbl.hash
end
)
(* .. *)
let (ctable: int ColorIdxHash.t) = ColorIdxHash.create 256 in
for x = 0 to width -1 do
for y = 0 to height -1 do
let c = Img.get img x y in
let rgb = Color.rgb_of_color c in
if not (ColorIdxHash.mem ctable rgb) then ColorIdxHash.add ctable rgb (ColorIdxHash.length ctable)
done
done;
(* .. *)
The backtrace points to hashtbl.ml:
Fatal error: exception Stack_overflow Raised at file "hashtbl.ml",
line 54, characters 16-40 Called from file "img/write_bmp.ml", line
150, characters 52-108 ...
Any hints?
Well, you're using physical equality (==) to compare the colors in your hash table. If the colors are structured values (I can't tell from this code), none of them will be physically equal to each other. If all the colors are distinct objects, they will all go into the table, which could really be quite a large number of objects. On the other hand, the hash function is going to be based on the actual color R,G,B values, so there may well be a large number of duplicates. This will mean that your hash buckets will have very long chains. Perhaps some internal function isn't tail recursive, and so is overflowing the stack.
Normally the length of the longest chain will be 2 or 3, so it wouldn't be surprising that this error doesn't come up often.
Looking at my copy of hashtbl.ml (OCaml 3.12.1), I don't see anything non-tail-recursive on line 54. So my guess might be wrong. On line 54 a new internal array is allocated for the hash table. So another idea is just that your hashtable is just getting too big (perhaps due to the unwanted duplicates).
One thing to try is to use structural equality (=) and see if the problem goes away.
One reason you may have non-termination or stack overflows is if your type contains cyclic values. (==) will terminates on cyclic values (while (=) may not), but Hash.hash is probably not cycle-safe. So if you manipulate cyclic values of type Img_types.rgb_t, you have to devise your one cycle-safe hash function -- typically, calling Hash.hash on only one of the non-cyclic subfields/subcomponents of your values.
I've already been bitten by precisely this issue in the past. Not a fun bug to track down.

Resources