forcing stack w/i 32bit when -m64 -mcmodel=small - 32bit-64bit

have C sources that must compile in 32bit and 64bit for multiple platforms.
structure that takes the address of a buffer - need to fit address in a 32bit value.
obviously where possible these structures will use natural sized void * or char * pointers.
however for some parts an api specifies the size of these pointers as 32bit.
on x86_64 linux with -m64 -mcmodel=small tboth static data and malloc()'d data fit within the 2Gb range. data on the stack, however, still starts in high memory.
so given a small utility _to_32() such as:
int _to_32( long l ) {
int i = l & 0xffffffff;
assert( i == l );
return i;
}
then:
char *cp = malloc( 100 );
int a = _to_32( cp );
will work reliably, as would:
static char buff[ 100 ];
int a = _to_32( buff );
but:
char buff[ 100 ];
int a = _to_32( buff );
will fail the assert().
anyone have a solution for this without writing custom linker scripts?
or any ideas how to arrange the linker section for stack data, would appear it is being put in this section in the linker script:
.lbss :
{
*(.dynlbss)
*(.lbss .lbss.* .gnu.linkonce.lb.*)
*(LARGE_COMMON)
}
thanks!

The stack location is most likely specified by the operating system and has nothing to do with the linker.
I can't imagine why you are trying to force a pointer on a 64 bit machine into 32 bits. The memory layout of structures is mainly important when you are sharing the data with something which may run on another architecture and saving to a file or sending across a network, but there are almost no valid reasons that you would send a pointer from one computer to another. Debugging is the only valid reason that comes to mind.
Even storing a pointer to be used later by another run of your program on the same machine would almost certainly be wrong since where your program is loaded can differ. Making any use of such a pointer would be undefined abd unpredictable.

the short answer appears to be there is no easy answer. at least no easy way to reassign range/location of the stack pointer.
the loader 'ld-linux.so' at a very early stage in process activation gets the address in the hurd loader - in the glibc sources, elf/ and sysdeps/x86_64/ search out elf_machine_load_address() and elf_machine_runtime_setup().
this happens in the preamble of calling your _start() entry and related setup to call your main(), is not for the faint hearted, even i couldn't convince myself this was a safe route.
as it happens - the resolution presents itself in some other old school tricks... pointer deflations/inflation...
with -mcmodel=small then automatic variables, alloca() addresses, and things like argv[], and envp are assigned from high memory from where the stack will grow down. those addresses are verified in this example code:
#include <stdlib.h>
#include <stdio.h>
#include <alloca.h>
extern char etext, edata, end;
char global_buffer[128];
int main( int argc, const char *argv[], const char *envp )
{
char stack_buffer[128];
static char static_buffer[128];
char *cp = malloc( 128 );
char *ap = alloca( 128 );
char *xp = "STRING CONSTANT";
printf("argv[0] %p\n",argv[0]);
printf("envp %p\n",envp);
printf("stack %p\n",stack_buffer);
printf("global %p\n",global_buffer);
printf("static %p\n",static_buffer);
printf("malloc %p\n",cp);
printf("alloca %p\n",ap);
printf("const %p\n",xp);
printf("printf %p\n",printf);
printf("First address past:\n");
printf(" program text (etext) %p\n", &etext);
printf(" initialized data (edata) %p\n", &edata);
printf(" uninitialized data (end) %p\n", &end);
}
produces this output:
argv[0] 0x7fff1e5e7d99
envp 0x7fff1e5e6c18
stack 0x7fff1e5e6a80
global 0x6010e0
static 0x601060
malloc 0x602010
alloca 0x7fff1e5e69d0
const 0x400850
printf 0x4004b0
First address past:
program text (etext) 0x400846
initialized data (edata) 0x601030
uninitialized data (end) 0x601160
all access to/from the 32bit parts of structures must be wrapped with inflate() and deflate() routines, e.g.:
void *inflate( unsigned long );
unsigned int deflate( void *);
deflate() tests for bits set in the range 0x7fff00000000 and marks the pointer so that inflate() will recognize how to reconstitute the actual pointer.
hope that helps if anyone similarly must support structures with 32bit storage for 64bit pointers.

Related

OpenCL: Passing a pointer to local memory

I have the following example code:
int compute_stuff(int *array)
{
/* do stuff with array */
...
return x;
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
int result;
/* do stuff with local memory block */
result = compute_stuff(local_mem_block + (LENGTH*get_local_id(0)));
...
}
The above example compiles and executes fine on my NVIDIA card (RTX 2080).
But when I try to compile on a Macbook with AMD card, I get the following error:
error: passing '__local int *' to parameter of type '__private int *' changes address space of pointer
OK, so then I change the "compute_stuff" function to the following:
int compute_stuff(__local int *array)
Now both NVIDIA and AMD compile it fine, no problem...
But then I have one more test, to compile it on the same Macbook using WINE (rather than boot to Windows in bootcamp), and it gives the following error:
error: parameter may not be qualified with an address space
So it seems as though one is not supposed to qualify a function parameter with an address space. Fair enough. But if I do not do that, then the AMD on native Windows thinks that I am trying to change the address space of the pointer to private (I guess because it assumes that all function arguments will be private?).
What is a good way to handle this so that all three environments are happy to compile it? As a last resort, I am thinking of simply having the program check to see if the build failed without qualifier, and if so, substitute in the "__local" qualifier and build a second time... Seems like a hack, but it could work.
I agree with ProjectPhysX that it appears to be a bug with the WINE implementation. I also found the following appears to satisfy all three environments:
int compute_stuff(__local int * __private array)
{
...
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
__local int * __private samples;
samples = local_mem_block + (LENGTH*get_local_id(0));
result = compute_stuff(samples);
}
The above is explicitly stating that the pointer itself is private while the memory it is pointing to is kept in local address space. So this removes any ambiguity.
The int* in int compute_stuff(int *array) is __generic address space. The call result = compute_stuff(local_mem_block+...); implicitly converts it to __local, which is allowed according to the OpenCL 2.0 Khronos specification.
It could be that AMD defaults to OpenCL 1.2. Maybe explicitely set –cl-std=CL2.0 in clBuildProgram() or clCompileProgram().
To keep the code compatible with OpenCL 1.2, you can explicitly set the pointer in the function to __local: int compute_stuff(__local int *array). OpenCL allows to set function parameters to the address spaces __global and __local. WINE seems to have a bug here. Maybe inlining the function can solve it: int __attribute__((always_inline)) compute_stuff(__local int *array).
As a last resort, you can do your proposed method. You can detect if it runs on WINE system like this. With that, you could switch between the two code variants without compiling twice and detecting the error.

Finding pointer with 'find out what writes to this address' strange offset

I'm trying to find a base pointer for UrbanTerror42.
My setup is as followed, I have a server with 2 players.
cheat-engine runs on client a.
I climb a ladder with client b and then scan for incease/decrease.
When I have found the values, I use find out what writes to this address.
But the offset are very high and point to empty memory.
I don't really know how to proceed
For the sake of clarity, I have looked up several other values and they have the same problem
I've already looked at a number of tutorials and forums, but that's always about values where the offsets are between 0 and 100 and not 80614.
I would really appreciate it if someone could tell me why this happened and what I have to do/learn to proceed.
thanks in advance
Urban Terror uses the Quake Engine. Early versions of this engine use the Quake Virtual Machine and the game logic is implemented as bytecode which is compiled into assembly by the Quake Virtual Machine. Custom allocation routines are used to load these modules into memory, relative and hardcoded offsets/addresses are created at runtime to accommodate these relocations and do not use the normal relocation table method of the portable executable file format. This is why you see these seemingly strange numbers that change every time you run the game.
The Quake Virtual Machines are file format .qvm and these qvms in memory are tracked in the QVM table. You must find the QVM table to uncover this mystery. Once you find the 2-3 QVMs and record their addresses, finding the table is easy, as you're simply doing a scan for pointers that point to these addresses and narrowing down your results by finding those which are close in memory to each other.
The QVM is defined like:
struct vmTable_t
{
vm_t vm[3];
};
struct vm_s {
// DO NOT MOVE OR CHANGE THESE WITHOUT CHANGING THE VM_OFFSET_* DEFINES
// USED BY THE ASM CODE
int programStack; // the vm may be recursively entered
intptr_t(*systemCall)(intptr_t *parms);
//------------------------------------
char name[MAX_QPATH];
// for dynamic linked modules
void *dllHandle;
intptr_t entryPoint; //(QDECL *entryPoint)(int callNum, ...);
void(*destroy)(vm_s* self);
// for interpreted modules
qboolean currentlyInterpreting;
qboolean compiled;
byte *codeBase;
int codeLength;
int *instructionPointers;
int instructionCount;
byte *dataBase;
int dataMask;
int stackBottom; // if programStack < stackBottom, error
int numSymbols;
struct vmSymbol_s *symbols;
int callLevel; // counts recursive VM_Call
int breakFunction; // increment breakCount on function entry to this
int breakCount;
BYTE *jumpTableTargets;
int numJumpTableTargets;
};
typedef struct vm_s vm_t;
The value in EAX in your original screenshot should be the same as either the codeBase or dataBase member variable of the QVM structure. The offsets are just relative to these addresses. Similarly to how you deal with ASLR, you must calculate the addresses at runtime.
Here is a truncated version of my code that does exactly this and additionally grabs important structures from memory, as an example:
void OA_t::GetVM()
{
cg = nullptr;
cgs = nullptr;
cgents = nullptr;
bLocalGame = false;
cgame = nullptr;
for (auto &vm : vmTable->vm)
{
if (strstr(vm.name, "qagame")) { bLocalGame = true; continue; }
if (strstr(vm.name, "cgame"))
{
cgame = &vm;
gamestatus = GSTAT_GAME;
//char* gamestring = Cvar_VariableString("fs_game");
switch (cgame->instructionCount)
{
case 136054: //version 88
cgents = (cg_entities*)(cgame->dataBase + 0x1649c);
cg = (cg_t*)(cgame->dataBase + 0xCC49C);
cgs = (cgs_t*)(cgame->dataBase + 0xf2720);
return;
Full source code for reference available at OpenArena Aimbot Source Code, it even includes a video overview of the code.
Full disclosure: that is a link to my website and the only viable resource I know of that covers this topic.

Can I assign a #define value to Char*?

I have been writing a C - programm. Where I have a structure with char* members.
#define SS_Value_1 "Value for SS1"
#define SS_Value_1 "Value for SS2"
struct aSamplestruct {
char* s1;
char* s2;
}aSample;
aSample ss;
fun1( &aSample );
I am sending the structure point to a function and I know the best practice is to allocate memory to S1 and copy the string what ever we want and free the allocated memory after usage, As shown below
ss->s1 = (char*) MEM_alloc(sizeof(char) * (strlen(SS_Value_1) + 1);
strcpy(ss->s1, SS_Value_1);
use the variable ss.s1 in a report and do mem free.
MEM_free(ss.s1);
Its working fine no worries. but I have to write the same piece of code for some 10 char* members in 36 different conditions.
The other way is that, without allocating any memory I am able to directly assign the #define values to my structure members as below.
ss->s1 = SS_Value_1;
use this variable in a report and no need to free any memory.
this way also fine, no problems in a sample execution.
what I would like to know is
whether this will cause any memory leaks ?
will it stop executing for large data ?
Thanks in advance
Regards,
Sudhir
This is similar to
char *str;
str = "abc";
That is, you are declaring a char pointer, creating a string literal and assigning its address to the pointer. should work for any amount of data.

reading an array in a function

I am trying using the arduino IDE to write a sketch. I have data in progmem and want to move the data with a function to a memory address allocated using malloc. My code is below:
const uint8_t Data_6 [256] PROGMEM = { 0x11, 0x39......};
void setup() {
Serial.begin(57600);
oddBallData (Data_6, 0x00, 256);
}
void main() {
}
void oddBallData(const uint8_t *data, uint8_t mem, uint16_t bytes) {
uint8_t *buff1 = (uint8_t*)malloc(sizeof(bytes));
if (buff1 = 0) {
Serial.println(F("FATAL ERROR - NO MEMORY"));
}
else {
for (uint16_t x = 0; x < 6; x++ ) {
buff1[x] = data[x]; //edited from data[0] to [x] made a mistake in post
Serial.println(buff1[x],HEX);
}
}
buff1[0] = data[0];
Serial.println(buff1[0],HEX);
free(buff1);
}
I have some data saved in progmem and want to write that data to a second device using i2c protocol. I have multiple constant arrays of data saved to my progmem, with different sizes. So I have used malloc to reserve some memory from the heap, inside of the function.
I have not been able to write the data from the progmem so I have stripped things back to so that I am just trying to point to the progmem data using malloc and then print it.
This is where I found a the problem. If I print a single array entry from the data constant. It prints the correct value. If I use a loop I get mixed results, the loop works as long as the condition check value is below 3 or sometimes below 6!!!...?
If above this value the entire print is just garbage. Can anyone explain what I am seeing?
The culprit is probably
uint8_t *buff1 = (uint8_t*)malloc(sizeof(bytes));
sizeof(bytes) returns the size of the variable (which is probably 2 bytes) so you are just allocating 2 bytes of memory. You should use the value directly, eg:
uint8_t* buff1 = malloc(bytes);
Mind that the cast is not required in C since a void* is convertible to any other pointer type directly.
Again - AVR PROGMEM is not directly accessible from memory space, it needs different instruction than access into the RAM. If you are using it like this, you'll get RAM content on passed address, not the FLASH one. You have to use special functions for this. For example memcpy_P(ram_buff,flash_ptr); makes a copy from flash into the ram. Or you can read one byte by pgm_read_byte(flash_ptr + offset)
BTW: If you are using Data_6[0] and it's working, it's just because compiler sees it as a constant and constant can be replaced by its value compile time.
I Guess you just forgot to flush()
try to do Serial.flushI() after Serial.println(buff1[x],HEX);
you can also check flush documentation

Condition Variable in Shared Memory - is this code POSIX-conformant?

Does the POSIX standard allow a named shared memory block to contain a mutex and condition variable?
We've been trying to use a mutex and condition variable to synchronise access to named shared memory by two processes on a LynuxWorks LynxOS-SE system (POSIX-conformant).
One shared memory block is called "/sync" and contains the mutex and condition variable, the other is "/data" and contains the actual data we are syncing access to.
We're seeing failures from pthread_cond_signal() if both processes don't perform the mmap() calls in exactly the same order, or if one process mmaps in some other piece of shared memory before it mmaps the "/sync" memory.
This example code is about as short as I can make it:
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/file.h>
#include <stdlib.h>
#include <pthread.h>
#include <errno.h>
#include <iostream>
#include <string>
using namespace std;
static const string shm_name_sync("/sync");
static const string shm_name_data("/data");
struct shared_memory_sync
{
pthread_mutex_t mutex;
pthread_cond_t condition;
};
struct shared_memory_data
{
int a;
int b;
};
//Create 2 shared memory objects
// - sync contains 2 shared synchronisation objects (mutex and condition)
// - data not important
void create()
{
// Create and map 'sync' shared memory
int fd_sync = shm_open(shm_name_sync.c_str(), O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
ftruncate(fd_sync, sizeof(shared_memory_sync));
void* addr_sync = mmap(0, sizeof(shared_memory_sync), PROT_READ|PROT_WRITE, MAP_SHARED, fd_sync, 0);
shared_memory_sync* p_sync = static_cast<shared_memory_sync*> (addr_sync);
// init the cond and mutex
pthread_condattr_t cond_attr;
pthread_condattr_init(&cond_attr);
pthread_condattr_setpshared(&cond_attr, PTHREAD_PROCESS_SHARED);
pthread_cond_init(&(p_sync->condition), &cond_attr);
pthread_condattr_destroy(&cond_attr);
pthread_mutexattr_t m_attr;
pthread_mutexattr_init(&m_attr);
pthread_mutexattr_setpshared(&m_attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&(p_sync->mutex), &m_attr);
pthread_mutexattr_destroy(&m_attr);
// Create the 'data' shared memory
int fd_data = shm_open(shm_name_data.c_str(), O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
ftruncate(fd_data, sizeof(shared_memory_data));
void* addr_data = mmap(0, sizeof(shared_memory_data), PROT_READ|PROT_WRITE, MAP_SHARED, fd_data, 0);
shared_memory_data* p_data = static_cast<shared_memory_data*> (addr_data);
// Run the second process while it sleeps here.
sleep(10);
int res = pthread_cond_signal(&(p_sync->condition));
assert(res==0); // <--- !!!THIS ASSERT WILL FAIL ON LYNXOS!!!
munmap(addr_sync, sizeof(shared_memory_sync));
shm_unlink(shm_name_sync.c_str());
munmap(addr_data, sizeof(shared_memory_data));
shm_unlink(shm_name_data.c_str());
}
//Open the same 2 shared memory objects but in reverse order
// - data
// - sync
void open()
{
sleep(2);
int fd_data = shm_open(shm_name_data.c_str(), O_RDWR, S_IRUSR|S_IWUSR);
void* addr_data = mmap(0, sizeof(shared_memory_data), PROT_READ|PROT_WRITE, MAP_SHARED, fd_data, 0);
shared_memory_data* p_data = static_cast<shared_memory_data*> (addr_data);
int fd_sync = shm_open(shm_name_sync.c_str(), O_RDWR, S_IRUSR|S_IWUSR);
void* addr_sync = mmap(0, sizeof(shared_memory_sync), PROT_READ|PROT_WRITE, MAP_SHARED, fd_sync, 0);
shared_memory_sync* p_sync = static_cast<shared_memory_sync*> (addr_sync);
// Wait on the condvar
pthread_mutex_lock(&(p_sync->mutex));
pthread_cond_wait(&(p_sync->condition), &(p_sync->mutex));
pthread_mutex_unlock(&(p_sync->mutex));
munmap(addr_sync, sizeof(shared_memory_sync));
munmap(addr_data, sizeof(shared_memory_data));
}
int main(int argc, char** argv)
{
if(argc>1)
{
open();
}
else
{
create();
}
return (0);
}
Run this program with no args, then another copy with args, and the first one will fail at the assert checking the pthread_cond_signal().
But change the order of the open() function to mmap() the "/sync" memory before the "/data" and it will all work fine.
This seems like a major bug in LynxOS to me, but LynuxWorks claim that using mutex and condition variable within named shared memory in this way is not covered by the POSIX standard, so they are not interested.
Can anyone determine if this code does actually violate POSIX?
Or does anyone have any convincing documentation that it is POSIX compliant?
Edit: we know that PTHREAD_PROCESS_SHARED is POSIX and is supported by LynxOS. The area of contention is whether mutexes and semaphores can be used within named shared memory (as we have done) or if POSIX only allows them to be used when one process creates and mmaps the shared memory and then forks the second process.
The pthread_mutexattr_setpshared function may be used to allow a pthread mutex in shared memory to be accessed by any thread which has access to that memory, even threads in different processes. According to this link, pthread_mutex_setpshared conforms to POSIX P1003.1c. (Same thing goes for the condition variable, see pthread_condattr_setpshared.)
Related question: pthread condition variables on Linux, odd behaviour
I can easily see how PTHREAD_PROCESS_SHARED can be tricky to implement on the OS-level (e.g. MacOS doesn't, except for rwlocks it seems). But just from reading the standard, you seem to have a case.
For completeness, you might want to assert on sysconf(_SC_THREAD_PROCESS_SHARED) and the return value of the *_setpshared() function calls— maybe there's another "surprise" waiting for you (but I can see from the comments that you already checked that SHARED is actually supported).
#JesperE: you might want to refer to the API docs at the OpenGroup instead of the HP docs.
May be there is some pointers in pthread_cond_t (without pshared), so you must place it into the same addresses in both threads/processes. With same-ordered mmaps you may get a equal addresses for both processes.
In glibc the pointer in cond_t was to thread descriptor of thread, owned mutex/cond.
You can control addresses with non-NULL first parameter to mmap.

Resources