Pin memory access type in exception handling

Pin memory access type in exception handling - intel-pin

I am implementing an exception handling function using Pin. In my exception handling code, I particularly search for memory access error, say, memory read error and memory write error. I wrote some code below:
BOOL catchSignalTest(THREADID tid, INT32 sig, CONTEXT *ctx, BOOL hasHandler, const EXCEPTION_INFO *pExceptInfo, VOID *v)
{
ADDRINT exptAddr = PIN_GetExceptionAddress(pExceptInfo);
ADDRINT exptAddr = PIN_GetExceptionAddress(pExceptInfo);
FAULTY_ACCESS_TYPE ty = PIN_GetFaultyAccessType(pExceptInfo); <----- ty is unknown type!!!
}
.....
PIN_InterceptSignal(SIGSEGV, catchSignalTest, 0);
What really confuses me is that, even for a typical memory read access error below:
mov eax, [ebx] <--- ebx = 0x01, which makes the read operation failed
The FAULTY_ACCESS_TYPE of my code above is still UNKNOWN. Note that according to its definition, I suppose the access type should be FAULTY_ACCESS_READ.
Am I doning anything wrong here?

Before you call PIN_GetFaultyAccessType, you probably want to:
(1) call PIN_GetExceptionCode to get the EXCEPTION_CODE
(2) call PIN_GetExceptionClass to get the EXCEPTION_CLASS
as the fault access type may only be valid/useful if the class is EXCEPTCLASS_ACCESS_FAULT
At a guess, since you're accessing an odd location [with a 32 bit word fetch], the PIN library may set [probably sets] the x86's hardware "alignment check" (#AC) bit.
Then, you'll get EXCEPTCODE_ACCESS_MISALIGNED which would explain the results you get for the type (e.g. the alignment gets checked first, before the access). Since it's an alignment exception, the other access type codes don't really fit.
IMO, if the PIN library does not set #AC, then, EXCEPTCODE_ACCESS_MISALIGNED is kind of a pointless NOP.
You could try various ebx values like 4 against a known page that you changed the memory protections on (e.g. generate an access exception that you know is not also misaligned).

Related

Custom errors in golang and pointer receivers

Reading about value receivers vs pointer receivers across the web and stackoverflow, I understand the basic rule to be: If you don't plan to modify the receiver, and the receiver is relatively small, there is no need for pointers.
Then, reading about implementing the error interface (eg. https://blog.golang.org/error-handling-and-go), I see that examples of the Error() function all use pointer receiver.
Yet, we are not modifying the receiver, and the struct is very small.
I feel like the code is much nicer without pointers (return &appError{} vs return appError{}).
Is there a reason why the examples are using pointers?

First, the blog post you linked and took your example from, appError is not an error. It's a wrapper that carries an error value and other related info used by the implementation of the examples, they are not exposed, and not appError nor *appError is ever used as an error value.
So the example you quoted has nothing to do with your actual question. But to answer the question in title:
In general, consistency may be the reason. If a type has many methods and some need pointer receiver (e.g. because they modify the value), often it's useful to declare all methods with pointer receiver, so there's no confusion about the method sets of the type and the pointer type.
Answering regarding error implementations: when you use a struct value to implement an error value, it's dangerous to use a non-pointer to implement the error interface. Why is it so?
Because error is an interface. And interface values are comparable. And they are compared by comparing the values they wrap. And you get different comparison result based what values / types are wrapped inside them! Because if you store pointers in them, the error values will be equal if they store the same pointer. And if you store non-pointers (structs) in them, they are equal if the struct values are equal.
To elaborate on this and show an example:
The standard library has an errors package. You can create error values from string values using the errors.New() function. If you look at its implementation (errors/errors.go), it's simple:
// Package errors implements functions to manipulate errors.
package errors
// New returns an error that formats as the given text.
func New(text string) error {
return &errorString{text}
}
// errorString is a trivial implementation of error.
type errorString struct {
s string
}
func (e *errorString) Error() string {
return e.s
}
The implementation returns a pointer to a very simple struct value. This is so that if you create 2 error values with the same string value, they won't be equal:
e1 := errors.New("hey")
e2 := errors.New("hey")
fmt.Println(e1, e2, e1 == e2)
Output:
hey hey false
This is intentional.
Now if you would return a non-pointer:
func New(text string) error {
return errorString{text}
}
type errorString struct {
s string
}
func (e errorString) Error() string {
return e.s
}
2 error values with the same string would be equal:
e1 = New("hey")
e2 = New("hey")
fmt.Println(e1, e2, e1 == e2)
Output:
hey hey true
Try the examples on the Go Playground.
A shining example why this is important: Look at the error value stored in the variable io.EOF:
var EOF = errors.New("EOF")
It is expected that io.Reader implementations return this specific error value to signal end of input. So you can peacefully compare the error returned by Reader.Read() to io.EOF to tell if end of input is reached. You can be sure that if they occasionally return custom errors, they will never be equal to io.EOF, this is what errors.New() guarantees (because it returns a pointer to an unexported struct value).

Errors in go only satisfy the error interface, i.e. provide a .Error() method. Creating custom errors, or digging through Go source code, you will find errors to be much more behind the scenes. If a struct is being populated in your application, to avoid making copies in memory it is more efficient to pass it as a pointer. Furthermore, as illustrated in The Go Programming Language book:
The fmt.Errorf function formats an error message using fmt.Sprintf and returns a new error value. We use it to build descriptive errors by successively prefixing additional context information to the original error message. When the error is ultimately handled by the program’s main function, it should provide a clear causal chain from the root problem to the overall failure, reminiscent of a NASA accident investigation:
genesis: crashed: no parachute: G-switch failed: bad relay orientation
Because error messages are frequently chained together, message strings should not be capitalized and newlines should be avoided. The resulting errors may be long, but they will be self-contained when found by tools like grep.
From this we can see that if a single 'error type' holds a wealth of information, and on top of this we are 'chaining' them together to create a detailed message, using pointers will be the best way to achieve this.

We can look at this from the error handling's perspective, instead of the error creation.
Error Definiton Side's Story
type ErrType1 struct {}
func (e *ErrType1) Error() string {
return "ErrType1"
}
type ErrType2 struct {}
func (e ErrType2) Error() string {
return "ErrType1"
}
Error Handler Side's Story
err := someFunc()
switch err.(type) {
case *ErrType1
...
case ErrType2, *ErrType2
...
default
...
}
As you can see, if you implements a error type on a value receiver, then when you are doing the type assertion, you need to worry about both cases.
For ErrType2, both &ErrType2{} and ErrType2{} satisfy the interface.
Because someFunc returns an error interface, you never know if it returns a struct value or a struct pointer, especially when someFunc isn't written by you.
Therefore, by using a pointer receiver doesn't stop a user from returning a pointer as an error.
That been said, all other aspects such as
Stack vs. Heap (memory allocation, GC pressure) still apply.
Choose your implementation according to your use cases.
In general, I prefer to a pointer receiver for the reason I demonstrated above. I prefer to Friendly API over performance and sometimes, when error type contains huge information, it's more performant.

No :)
https://blog.golang.org/error-handling-and-go#TOC_2.
Go interfaces allow for anything that complies with the error interface to be handled by code expecting error
type error interface {
Error() string
}
Like you mentioned, If you don't plan to modify state there is little incentive to pass around pointers:
allocating to heap
GC pressure
Mutable state and concurrency, etc
On a random rant , Anecdotally, I personally think that seeing examples like this one are why new go programers favor pointer receivers by default.

The tour of go explains the general reasons for pointer receivers pretty well:
https://tour.golang.org/methods/8
There are two reasons to use a pointer receiver.
The first is so that the method can modify the value that its receiver points to.
In general, all methods on a given type should have either value or pointer receivers, but not a mixture of both.

When Qt-5 will fail the connect

Reading Qt signal & slots documentation, it seems that the only reason for a new style connection to fail is:
"If there is already a duplicate (exact same signal to the exact same slot on the same objects), the connection will fail and connect will return false"
Which means that connection was already successful the first time and does not allow multi-connections when using Qt::UniqueConnection.
Does this means that Qt-5 style connection will always success? Are there any other reasons for failure?

The new-style connect can still fail at runtime for a variety of reasons:
Either sender or receiver is a null pointer. Obviously this requires a check that can only happen at runtime.
The PMF you specified for a signal is not actually a signal. Lacking proper C++ reflection capabilities, all you can do at compile time is checking that the signal is a non-static member function of the sender's class.
However, that's not enough to make it a signal: it also needs to be in a signals: section in your class definition. When moc sees your class definition, it will generate some metadata containing the information that that function is indeed a signal. So, at runtime, the pointer passed to connect is looked up in a table, and connect itself will fail if the pointer is not found (because you did not pass a signal).
The check on the previous point actually requires a comparison between pointers to member functions. It's a particularly tricky one, because it will typically involve different TUs:
one is the TU containing moc-generated data (typically a moc_class.cpp file). In this TU there's the aforementioned table containing, amongst other things, pointers to the signals (which are just ordinary member functions).
is the TU where you actually invoke connect(sender, &Sender::signal, ...), which generates the pointer that gets looked up in the table.
Now, the two TUs may be in the same application, or perhaps one is in a library and the other in your application, or maybe in two libraries, etc; your platform's ABI starts to get into play.
In theory, the pointers stored when doing 1. are identical to the pointers generated when doing 2.; in practice, we've found cases where this does not happen (cf. this bug report that I reported some time ago, where older versions of GNU ld on ARM generated code that failed the comparison).
For Qt this meant disabling certain optimizations and/or passing some extra flags to the places where we know this to happen and break user software. For instance, as of Qt 5.9, there is no support for -Bsymbolic* flags on GCC on anything but x86 and x86-64.
Of course, this does not mean we've found and fixed all the possible places. New compilers and more aggressive optimizations might trigger this bug again in the future, making connect return false, even when everything is supposed to work.

Yes it can fail if either sender or receiver are not valid objects (nullptr for example)
Example
QObject* obj1 = new QObject();
QObject* obj2 = new QObject();
// Will succeed
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);
delete obj1;
obj1 = nullptr;
// Will fail even if it compiles
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);

Do not try to register pointer type. I've used the macro
#define QT_REG_TYPE(T) qRegisterMetaType<T>(#T)
with pointer type CMyWidget*, that was the problem. Using the type directly worked.

No it's not always successful. The docs give an example here where connect would return false because the signal should not contain variable names.
// WRONG
QObject::connect(scrollBar, SIGNAL(valueChanged(int value)),
label, SLOT(setNum(int value)));

Passing a managed class member to a c++ method

I have a C++ dll with the following method:
//C++ dll method (external)
GetServerInterface(ServerInterface* ppIF /*[OUT]*/)
{
//The method will set ppIF
}
//ServerInterface is defined as:
typedef void * ServerInterface;
To access the dll from a C# project, I created a C++/CLI project and declared a managed class as follows:
public ref class ComWrapperManager
{
//
//
ServerInterface _serverInterface;
void Connect();
//
//
}
I use the Connect() method to call GetServerInterface as shown below. The first call works, the second doesn't. Can someone explain why? I need to persist that pointer as a member variable in the managed class. Any better way to do this?
void Connect()
{
ServerInterface localServerInterface;
GetServerInterface(&localServerInterface); //THIS WORKS
GetServerInterface(&_serverInterface); //THIS DOESNT
//Error 1 error C2664: 'ServerInterface ' :
//cannot convert parameter 1 from //'cli::interior_ptr<Type>'
//to 'ServerInterface *'
}

You are passing a pointer to a member of a managed object. Such pointers are special, known as interior pointers. They are tracked by the garbage collector, it will modify the pointer value when the managed object is moved when the GC compacts the heap.
Problem is, you are passing that pointer to unmanaged code. The GC is not capable of modifying the copy of the pointer value that the native code is using. Now disaster strikes when another thread triggers a garbage collection, just when the native code is executing and dereferences the pointer. The object no longer exists at the original address. Very, very bad. And extremely hard to diagnose since it is so unlikely to happen.
The compiler can see you making this mistake. And complains with C2664.
The workaround is to pass a pointer that's stored in a memory location that's not going to get moved by the GC. Such a location is very easy to come by, a local variable qualifies. It is stored on the stack, it isn't going to be moved. So make it look like this instead:
void Connect()
{
ServerInterface temp;
GetServerInterface(&temp);
this->_serverInterface = temp;
// etc..
}
Which you already discovered yourself, just don't forget to assign the class member.

Here's why you can't do the second one: _serverInterface is a void pointer that is part of a managed class. Think about what the garbage collector does... It's allowed to move the managed objects around in memory however it wants, so the address of the void pointer can change from moment to moment. Therefore, it's not valid to use that address.
There are two solutions to this:
As you noted, where you can pass the address of a stack variable to the unmanaged method. Unlike managed objects, the stack doesn't move when the garbage collector does its thing, so the address doesn't change. You can then take the data stored in the stack variable and copy it to the class field, and that works fine, because you're not dealing with the address of it.
As the other answerer noted, you can lock your managed object in memory. Once it can't move, you can take the address of the void pointer field without issue. (He's showing C# syntax where you're looking for C++/CLI syntax. I'm not at a compiler to check, but I believe that the C++/CLI syntax is not the same.)
Of the two solutions, I prefer #1, the one you already have implemented: Solution #2 introduces a block of unmovable memory in the middle of the space that the garbage collector wants to rearrange. Given a choice, I prefer not to hamstring the garbage collector.

Behaviour of non-const int pointer on a const int

#include<stdio.h>
int main()
{
const int sum=100;
int *p=(int *)∑
*p=101;
printf("%d, %d",*p,sum);
return 0;
}
/*
output
101, 101
*/
p points to a constant integer variable, then why/how does *p manage to change the value of sum?

It's undefined behavior - it's a bug in the code. The fact that the code 'appears to work' is meaningless. The compiler is allowed to make it so your program crashes, or it's allowed to let the program do something nonsensical (such as change the value of something that's supposed to be const). Or do something else altogether. It's meaningless to 'reason' about the behavior, since there is no requirement on the behavior.
Note that if the code is compiled as C++ you'll get an error since C++ won't implicitly cast away const. Hopefully, even when compiled as C you'll get a warning.

p contains the memory address of the variable sum. The syntax *p means the actual value of sum.
When you say
*p=101
you're saying: go to the address p (which is the address where the variable sum is stored) and change the value there. So you're actually changing sum.

You can see const as a compile-time flag that tells the compiler "I shouldn't modify this variable, tell me if I do." It does not enforce anything on whether you can actually modify the variable or not.
And since you are modifying that variable through a non-const pointer, the compiler is indeed going to tell you:
main.c: In function 'main':
main.c:6:16: warning: initialization discards qualifiers from pointer target type
You broke your own promise, the compiler warns you but will let you proceed happily.

The behavior is undefined, which means that it may produce different outcomes on different compiler implementations, architecture, compiler/optimizer/linker options.
For the sake of analysis, here it is:
(Disclaimer: I don't know compilers. This is just a logical guess at how the compiler may choose to handle this situation, from a naive assembly-language debugger perspective.)
When a constant integer is declared, the compiler has the choice of making it addressable or non-addressable.
Addressable means that the integer value will actually occupy a memory location, such that:
The lifetime will be static.
The value might be hard-coded into the binary, or initialized during program startup.
It can be accessed with a pointer.
It can be accessed from any binary code that knows of its address.
It can be placed in either read-only or writable memory section.
For everyday CPUs the non-writeability is enforced by memory management unit (MMU). Messing the MMU is messy impossible from user-space, and it is not worth for a mere const integer value.
Therefore, it will be placed into writable memory section, for simplicity's sake.
If the compiler chooses to place it in non-writable memory, your program will crash (access violation) when it tries to write to the non-writable memory.
Setting aside microcontrollers - you would not have asked this question if you were working on microcontrollers.
Non-addressable means that it does not occupy a memory address. Instead, every code that references the variable (i.e. use the value of that integer) will receive a r-value, as if you did a find-and-replace to change every instance of sum into a literal 100.
In some cases, the compiler cannot make the integer non-addressable: if the compiler knows that you're taking the address of it, then surely the compiler knows that it has to put that value in memory. Your code belongs to this case.
Yet, with some aggressively-optimizing compiler, it is entirely possible to make it non-addressable: the variable could have been eliminated and the printf will be turned into int main() { printf("%s, %s", (b1? "100" : "101"), (b2? "100" : "101")); return 0; } where b1 and b2 will depend on the mood of the compiler.
The compiler will sometimes take a split decision - it might do one of those, or even something entirely different:
Allocate a memory location, but replace every reference with a constant literal. When this happens, a debugger will tell you the value is zero but any code that uses that location will appear to contain a hard-coded value.
Some compiler may be able to detect that the cast causes a undefined behavior and refuse to compile.

Shared XDR routines and pointer to .rodata section

I used rpcgen to generate the client and server stub for program I'm developing. So, the stubs use XDR to encapsulate data and send them through the net. When I execute this piece of code, a segmentation fault is thrown:
char *str = "Hello!";
my_remote_call(str, strlen(str));
Instead, no problems if I modify it in this way:
char *str = "Hello!";
char *str2 = (char*) malloc(strlen(str));
memcpy(str2, str, strlen(str));
my_remote_call(str2, strlen(str2));
With GDB I found the segmentation fault is generated in the xdr_u_char() function called by my_remote_call(). My question is:
in the first case the Hello string is allocated in the .rodata section by compiler while in the second a part of heap is used to memorize the string. Might the segmentation fault be generate because the xdr_u_char signature require explicitly
char*
and not a
const char*
as you can see here? So in this case means that the xdr_u_char() function changes my data?

I believe it is changing data when it is receiving, and not sending, it. Are you sure your remote call is indeed using XDR with XDR_ENCODE mode?

To transmit string to XDR you should use xdr_string not xdr_u_char; show us the *.x file for rpcgen ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex