Finding pointer with 'find out what writes to this address' strange offset - pointers

I'm trying to find a base pointer for UrbanTerror42.
My setup is as followed, I have a server with 2 players.
cheat-engine runs on client a.
I climb a ladder with client b and then scan for incease/decrease.
When I have found the values, I use find out what writes to this address.
But the offset are very high and point to empty memory.
I don't really know how to proceed
For the sake of clarity, I have looked up several other values and they have the same problem
I've already looked at a number of tutorials and forums, but that's always about values where the offsets are between 0 and 100 and not 80614.
I would really appreciate it if someone could tell me why this happened and what I have to do/learn to proceed.
thanks in advance

Urban Terror uses the Quake Engine. Early versions of this engine use the Quake Virtual Machine and the game logic is implemented as bytecode which is compiled into assembly by the Quake Virtual Machine. Custom allocation routines are used to load these modules into memory, relative and hardcoded offsets/addresses are created at runtime to accommodate these relocations and do not use the normal relocation table method of the portable executable file format. This is why you see these seemingly strange numbers that change every time you run the game.
The Quake Virtual Machines are file format .qvm and these qvms in memory are tracked in the QVM table. You must find the QVM table to uncover this mystery. Once you find the 2-3 QVMs and record their addresses, finding the table is easy, as you're simply doing a scan for pointers that point to these addresses and narrowing down your results by finding those which are close in memory to each other.
The QVM is defined like:
struct vmTable_t
{
vm_t vm[3];
};
struct vm_s {
// DO NOT MOVE OR CHANGE THESE WITHOUT CHANGING THE VM_OFFSET_* DEFINES
// USED BY THE ASM CODE
int programStack; // the vm may be recursively entered
intptr_t(*systemCall)(intptr_t *parms);
//------------------------------------
char name[MAX_QPATH];
// for dynamic linked modules
void *dllHandle;
intptr_t entryPoint; //(QDECL *entryPoint)(int callNum, ...);
void(*destroy)(vm_s* self);
// for interpreted modules
qboolean currentlyInterpreting;
qboolean compiled;
byte *codeBase;
int codeLength;
int *instructionPointers;
int instructionCount;
byte *dataBase;
int dataMask;
int stackBottom; // if programStack < stackBottom, error
int numSymbols;
struct vmSymbol_s *symbols;
int callLevel; // counts recursive VM_Call
int breakFunction; // increment breakCount on function entry to this
int breakCount;
BYTE *jumpTableTargets;
int numJumpTableTargets;
};
typedef struct vm_s vm_t;
The value in EAX in your original screenshot should be the same as either the codeBase or dataBase member variable of the QVM structure. The offsets are just relative to these addresses. Similarly to how you deal with ASLR, you must calculate the addresses at runtime.
Here is a truncated version of my code that does exactly this and additionally grabs important structures from memory, as an example:
void OA_t::GetVM()
{
cg = nullptr;
cgs = nullptr;
cgents = nullptr;
bLocalGame = false;
cgame = nullptr;
for (auto &vm : vmTable->vm)
{
if (strstr(vm.name, "qagame")) { bLocalGame = true; continue; }
if (strstr(vm.name, "cgame"))
{
cgame = &vm;
gamestatus = GSTAT_GAME;
//char* gamestring = Cvar_VariableString("fs_game");
switch (cgame->instructionCount)
{
case 136054: //version 88
cgents = (cg_entities*)(cgame->dataBase + 0x1649c);
cg = (cg_t*)(cgame->dataBase + 0xCC49C);
cgs = (cgs_t*)(cgame->dataBase + 0xf2720);
return;
Full source code for reference available at OpenArena Aimbot Source Code, it even includes a video overview of the code.
Full disclosure: that is a link to my website and the only viable resource I know of that covers this topic.

Related

Understanding the method for OpenCL reduction on float

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

variable in a header file shared between different projects

I have a solution which includes three projects. one is creating static library i.e .lib file. It contains one header file main.h and one main.cpp file. cpp file contains the definition of functions of header file.
second project is .exe project which includes the header file main.h and calls a function of header file.
third project is also a .exe project which includes the header file and uses a variable flag of header file.
Now both .exe projects are creating different instance of the variable. But I want to share same instance of the variable between the projects dynamically. as I have to map the value generated by one project into other project at the same instant.
Please help me as I am nearing my project deadline.
Thanks for the help.
Here are some part of the code.
main.cpp and main.h are files of .lib project
main.h
extern int flag;
extern int detect1(void);
main.cpp
#include<stdio.h>
#include"main.h"
#include <Windows.h>
#include <ShellAPI.h>
using namespace std;
using namespace cv;
int flag=0;
int detect1(void)
{
int Cx=0,Cy=0,Kx=20,Ky=20,Sx=0,Sy=0,j=0;
//create the cascade classifier object used for the face detection
CascadeClassifier face_cascade;
//use the haarcascade_frontalface_alt.xml library
face_cascade.load("E:\\haarcascade_frontalface_alt.xml");
//System::DateTime now = System::DateTime::Now;
//cout << now.Hour;
//WinExec("E:\\FallingBlock\\FallingBlock\\FallingBlock\\bin\\x86\\Debug\\FallingBlock.exe",SW_SHOW);
//setup video capture device and link it to the first capture device
VideoCapture captureDevice;
captureDevice.open(0);
//setup image files used in the capture process
Mat captureFrame;
Mat grayscaleFrame;
//create a window to present the results
namedWindow("capture", 1);
//create a loop to capture and find faces
while(true)
{
//capture a new image frame
captureDevice>>captureFrame;
//convert captured image to gray scale and equalize
cvtColor(captureFrame, grayscaleFrame, CV_BGR2GRAY);
equalizeHist(grayscaleFrame, grayscaleFrame);
//create a vector array to store the face found
std::vector<Rect> faces;
//find faces and store them in the vector array
face_cascade.detectMultiScale(grayscaleFrame, faces, 1.1, 3, CV_HAAR_FIND_BIGGEST_OBJECT|CV_HAAR_SCALE_IMAGE, Size(30,30));
//draw a rectangle for all found faces in the vector array on the original image
for(unsigned int i = 0; i < faces.size(); i++)
{
Point pt1(faces[i].x + faces[i].width, faces[i].y + faces[i].height);
Point pt2(faces[i].x, faces[i].y);
rectangle(captureFrame, pt1, pt2, cvScalar(0, 255, 0, 0), 1, 8, 0);
if(faces.size()>=1)
j++;
Cx = faces[i].x + (faces[i].width / 2);
Cy = faces[i].y + (faces[i].height / 2);
if(j==1)
{
Sx=Cx;
Sy=Cy;
flag=0;
}
}
if(Cx-Sx > Kx)
{
flag = 1;
printf("%d",flag);
}
else
{
if(Cx-Sx < -Kx)
{
flag = 2;
printf("%d",flag);
//update(2);
}
else
{
if(Cy-Sy > Ky)
{
flag = 3;
printf("%d",flag);
//update(3);
}
else
{
if(Cy-Sy < -Ky)
{
flag = 4;
printf("%d",flag);
//update(4);
}
else
if(abs(Cx-Sx) < Kx && abs(Cy-Sy)<Ky)
{
flag = 0;
printf("%d",flag);
//update(0);
}
}
}
}
2nd project's code
face.cpp
#include"main.h"
#include<stdio.h>
int main()
{
detect1();
}
3rd project's code
tetris.cpp
#include"main.h"
int key;
key = flag;
if(key==0)
{
MessageBox(hwnd,"Space2","TetRiX",0);
}
if(key==4)
{
tetris.handleInput(1);
tetris.drawScreen(2);
//MessageBox(hwnd,"Space2","TetRiX",0);
}
You need to look up how to do inter-process communication in the operating system under which your applications will run. (At this point I assume that the processes are running on the same computer.) It looks like you're using Windows (based on seeing a call to "MessageBox") so the simplest means would be for both processes to use RegisterWindowMessage create a commonly-understood message value, and then send the data via LPARAM using either PostMessage or SendMessage. (You'll need each of them to get the window handle of the other, which is reasonably easy.) You'll want to have some sort of exclusion mechanism (mutex or critical section) in both processes to ensure that the shared value can't be read and written at the same time. If both processes can do the "change and exchange" then you'll have an interesting problem to solve if both try to do that at the same time, because you'll have to deal with the possibility of deadlocks over that shared value.
You can also use shared memory, but that's a bit more involved.
If the processes are on different computers you'll need to do it via TCP/IP or a protocol on top of TCP/IP. You could use a pub-sub arrangement--or any number of things. Without an understanding of exactly what you're trying to accomplish, it's difficult to know what to recommend.
(For the record, there is almost no way in a multi-process/multi-threaded O/S to share something "at the same instant." You can get arbitrarily close, but computers don't work like that.)
Given the level of difficulty involved, is there some other design that might make this cleaner? Why do these processes have to exchange information this way? Must it be done using separate processes?

segmentation fault when using shared memory created by open_shm on Xeon Phi

I have written my code for single Xeon Phi node( with 61 cores on it). I have two files. I have called MPI_Init(2) before calling any other mpi calls. I have found ntasks, rank also using mpi calls. I have also included all the required libraries. Still i get an error. Can you please help me out with this?
In file 1:
int buffsize;
int *sendbuff,**recvbuff,buffsum;
int *shareRegion;
shareRegion = (int*)gInit(MPI_COMM_WORLD, buffsize, ntasks); /* gInit is in file 2 */
buffsize=atoi(argv[1]);
sendbuff=(int *)malloc(sizeof(int)*buffsize);
if( taskid == 0 ){
recvbuff=(int **)malloc(sizeof(int *)*ntasks);
recvbuff[0]=(int *)malloc(sizeof(int)*ntasks*buffsize);
for(i=1;i<ntasks;i++)recvbuff[i]=recvbuff[i-1]+buffsize;
}
else{
recvbuff=(int **)malloc(sizeof(int *)*1);
recvbuff[0]=(int *)malloc(sizeof(int)*1);
}
for(i=0;i<buffsize;i++){
sendbuff[i]=1;
MPI_Barrier(MPI_COMM_WORLD);
call(sendbuff, buffsize, shareRegion, recvbuff[0],buffsize,taskid,ntasks);
In file 2:
void* gInit( MPI_Comm comm, int size, int num_proc)
{
int share_mem = shm_open("share_region", O_CREAT|O_RDWR,0666 );
if( share_mem == -1)
return NULL;
int rank;
MPI_Comm_rank(comm,&rank);
if( ftruncate( share_mem, sizeof(int)*size*num_proc) == -1 )
return NULL;
int* shared = mmap(NULL, sizeof(int)*size*num_proc, PROT_WRITE | PROT_READ, MAP_SHARED, share_mem, 0);
if(shared == (void*)-1)
printf("error in mem allocation (mmap)\n");
*(shared+(rank)) = 0
MPI_Barrier(MPI_COMM_WORLD);
return shared;
}
void call(int *sendbuff, int sendcount, volatile int *sharedRegion, int **recvbuff, int recvcount, int rank, int size)
{
int i=0;
int k,j;
j=rank*sendcount;
for(i=0;i<sendcount;i++)
{
sharedRegion[j] = sendbuff[i];
j++;
}
if( rank == 0)
for(k=0;k<size;k++)
for(i=0;i<sendcount;i++)
{
j=0;
recvbuff[k][i] = sharedRegion[j];
j++;
}
}
Then i am doing some computation in file 1 on this recvbuff.
I get this segmentation fault while using sharedRegion variable.
MPI represents the Message Passing paradigm. That means, processes (ranks) are isolated and are generally running on a distributed machine. They communicate via explicit communication messages, recent versions allow also one-sideded, but still explicit, data transfer. You can not assume that shared memory is available for the processes. Have a look at any MPI tutorial to see how MPI is used.
Since you did not specify on what kind of machine you are running, any further suggestion is purely speculative. If you actually are on a shared memory machine, you may want to use a real shared memory paradigm instead, e.g. OpenMP.
While it's possible to restrict MPI to only use one machine and have shared memory (see the RMA chapter, especially in MPI-3), if you're only ever going to use one machine, it's easier to use some other paradigm.
However, if you're going to use multiple nodes and have multiple ranks on one node (multi-core processes for example), then it might be worth taking a look at MPI-3 RMA to see how it can help you with both locally shared memory and remote memory access. There are multiple papers out on the subject, but because they're so new, there's not a lot of good tutorials yet. You'll have to dig around a bit to find something useful to you.
The ordering of these two lines:
shareRegion = (int*)gInit(MPI_COMM_WORLD, buffsize, ntasks); /* gInit is in file 2 */
buffsize=atoi(argv[1]);
suggest that buffsize could possibly have different values before and after the call to gInit. If buffsize as passed in the first argument to the program is larger than its initial value while gInit is called, then out-of-bounds memory access would occur later and lead to a segmentation fault.
Hint: run your code as an MPI singleton (e.g. without mpirun) from inside a debugger (e.g. gdb) or change the limits so that cores would get dumped on error (e.g. with ulimit -c unlimited) and then examine the core file(s) with the debugger. Compiling with debug information (e.g. adding -g to the compiler options) helps a lot in such cases.

forcing stack w/i 32bit when -m64 -mcmodel=small

have C sources that must compile in 32bit and 64bit for multiple platforms.
structure that takes the address of a buffer - need to fit address in a 32bit value.
obviously where possible these structures will use natural sized void * or char * pointers.
however for some parts an api specifies the size of these pointers as 32bit.
on x86_64 linux with -m64 -mcmodel=small tboth static data and malloc()'d data fit within the 2Gb range. data on the stack, however, still starts in high memory.
so given a small utility _to_32() such as:
int _to_32( long l ) {
int i = l & 0xffffffff;
assert( i == l );
return i;
}
then:
char *cp = malloc( 100 );
int a = _to_32( cp );
will work reliably, as would:
static char buff[ 100 ];
int a = _to_32( buff );
but:
char buff[ 100 ];
int a = _to_32( buff );
will fail the assert().
anyone have a solution for this without writing custom linker scripts?
or any ideas how to arrange the linker section for stack data, would appear it is being put in this section in the linker script:
.lbss :
{
*(.dynlbss)
*(.lbss .lbss.* .gnu.linkonce.lb.*)
*(LARGE_COMMON)
}
thanks!
The stack location is most likely specified by the operating system and has nothing to do with the linker.
I can't imagine why you are trying to force a pointer on a 64 bit machine into 32 bits. The memory layout of structures is mainly important when you are sharing the data with something which may run on another architecture and saving to a file or sending across a network, but there are almost no valid reasons that you would send a pointer from one computer to another. Debugging is the only valid reason that comes to mind.
Even storing a pointer to be used later by another run of your program on the same machine would almost certainly be wrong since where your program is loaded can differ. Making any use of such a pointer would be undefined abd unpredictable.
the short answer appears to be there is no easy answer. at least no easy way to reassign range/location of the stack pointer.
the loader 'ld-linux.so' at a very early stage in process activation gets the address in the hurd loader - in the glibc sources, elf/ and sysdeps/x86_64/ search out elf_machine_load_address() and elf_machine_runtime_setup().
this happens in the preamble of calling your _start() entry and related setup to call your main(), is not for the faint hearted, even i couldn't convince myself this was a safe route.
as it happens - the resolution presents itself in some other old school tricks... pointer deflations/inflation...
with -mcmodel=small then automatic variables, alloca() addresses, and things like argv[], and envp are assigned from high memory from where the stack will grow down. those addresses are verified in this example code:
#include <stdlib.h>
#include <stdio.h>
#include <alloca.h>
extern char etext, edata, end;
char global_buffer[128];
int main( int argc, const char *argv[], const char *envp )
{
char stack_buffer[128];
static char static_buffer[128];
char *cp = malloc( 128 );
char *ap = alloca( 128 );
char *xp = "STRING CONSTANT";
printf("argv[0] %p\n",argv[0]);
printf("envp %p\n",envp);
printf("stack %p\n",stack_buffer);
printf("global %p\n",global_buffer);
printf("static %p\n",static_buffer);
printf("malloc %p\n",cp);
printf("alloca %p\n",ap);
printf("const %p\n",xp);
printf("printf %p\n",printf);
printf("First address past:\n");
printf(" program text (etext) %p\n", &etext);
printf(" initialized data (edata) %p\n", &edata);
printf(" uninitialized data (end) %p\n", &end);
}
produces this output:
argv[0] 0x7fff1e5e7d99
envp 0x7fff1e5e6c18
stack 0x7fff1e5e6a80
global 0x6010e0
static 0x601060
malloc 0x602010
alloca 0x7fff1e5e69d0
const 0x400850
printf 0x4004b0
First address past:
program text (etext) 0x400846
initialized data (edata) 0x601030
uninitialized data (end) 0x601160
all access to/from the 32bit parts of structures must be wrapped with inflate() and deflate() routines, e.g.:
void *inflate( unsigned long );
unsigned int deflate( void *);
deflate() tests for bits set in the range 0x7fff00000000 and marks the pointer so that inflate() will recognize how to reconstitute the actual pointer.
hope that helps if anyone similarly must support structures with 32bit storage for 64bit pointers.

JIT compilation and DEP

I was thinking of trying my hand at some jit compilataion (just for the sake of learning) and it would be nice to have it work cross platform since I run all the major three at home (windows, os x, linux).
With that in mind, I want to know if there is any way to get out of using the virtual memory windows functions to allocate memory with execution permissions. Would be nice to just use malloc or new and point the processor at such a block.
Any tips?
DEP is just turning off Execution permission from every non-code page of memory. The code of application is loaded to memory which has execution permission; and there are lot of JITs which works in Windows/Linux/MacOSX, even when DEP is active. This is because there is a way to dynamically allocate memory with needed permissions set.
Usually, plain malloc should not be used, because permissions are per-page. Aligning of malloced memory to pages is still possible at price of some overhead. If you will not use malloc, some custom memory management (only for executable code). Custom management is a common way of doing JIT.
There is a solution from Chromium project, which uses JIT for javascript V8 VM and which is cross-platform. To be cross-platform, the needed function is implemented in several files and they are selected at compile time.
Linux: (chromium src/v8/src/platform-linux.cc) flag is PROT_EXEC of mmap().
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
const size_t msize = RoundUp(requested, AllocateAlignment());
int prot = PROT_READ | PROT_WRITE | (is_executable ? PROT_EXEC : 0);
void* addr = OS::GetRandomMmapAddr();
void* mbase = mmap(addr, msize, prot, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mbase == MAP_FAILED) {
/** handle error */
return NULL;
}
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, msize);
return mbase;
}
Win32 (src/v8/src/platform-win32.cc): flag is PAGE_EXECUTE_READWRITE of VirtualAlloc
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
// The address range used to randomize RWX allocations in OS::Allocate
// Try not to map pages into the default range that windows loads DLLs
// Use a multiple of 64k to prevent committing unused memory.
// Note: This does not guarantee RWX regions will be within the
// range kAllocationRandomAddressMin to kAllocationRandomAddressMax
#ifdef V8_HOST_ARCH_64_BIT
static const intptr_t kAllocationRandomAddressMin = 0x0000000080000000;
static const intptr_t kAllocationRandomAddressMax = 0x000003FFFFFF0000;
#else
static const intptr_t kAllocationRandomAddressMin = 0x04000000;
static const intptr_t kAllocationRandomAddressMax = 0x3FFF0000;
#endif
// VirtualAlloc rounds allocated size to page size automatically.
size_t msize = RoundUp(requested, static_cast<int>(GetPageSize()));
intptr_t address = 0;
// Windows XP SP2 allows Data Excution Prevention (DEP).
int prot = is_executable ? PAGE_EXECUTE_READWRITE : PAGE_READWRITE;
// For exectutable pages try and randomize the allocation address
if (prot == PAGE_EXECUTE_READWRITE &&
msize >= static_cast<size_t>(Page::kPageSize)) {
address = (V8::RandomPrivate(Isolate::Current()) << kPageSizeBits)
| kAllocationRandomAddressMin;
address &= kAllocationRandomAddressMax;
}
LPVOID mbase = VirtualAlloc(reinterpret_cast<void *>(address),
msize,
MEM_COMMIT | MEM_RESERVE,
prot);
if (mbase == NULL && address != 0)
mbase = VirtualAlloc(NULL, msize, MEM_COMMIT | MEM_RESERVE, prot);
if (mbase == NULL) {
LOG(ISOLATE, StringEvent("OS::Allocate", "VirtualAlloc failed"));
return NULL;
}
ASSERT(IsAligned(reinterpret_cast<size_t>(mbase), OS::AllocateAlignment()));
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, static_cast<int>(msize));
return mbase;
}
MacOS (src/v8/src/platform-macos.cc): flag is PROT_EXEC of mmap, just like Linux or other posix.
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
const size_t msize = RoundUp(requested, getpagesize());
int prot = PROT_READ | PROT_WRITE | (is_executable ? PROT_EXEC : 0);
void* mbase = mmap(OS::GetRandomMmapAddr(),
msize,
prot,
MAP_PRIVATE | MAP_ANON,
kMmapFd,
kMmapFdOffset);
if (mbase == MAP_FAILED) {
LOG(Isolate::Current(), StringEvent("OS::Allocate", "mmap failed"));
return NULL;
}
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, msize);
return mbase;
}
And I also want note, that bcdedit.exe-like way should be used only for very old programs, which creates new executable code in memory, but not sets an Exec property on this page. For newer programs, like firefox or Chrome/Chromium, or any modern JIT, DEP should be active, and JIT will manage memory permissions in fine-grained manner.
One possibility is to make it a requirement that Windows installations running your program be either configured for DEP AlwaysOff (bad idea) or DEP OptOut (better idea).
This can be configured (under WinXp SP2+ and Win2k3 SP1+ at least) by changing the boot.ini file to have the setting:
/noexecute=OptOut
and then configuring your individual program to opt out by choosing (under XP):
Start button
Control Panel
System
Advanced tab
Performance Settings button
Data Execution Prevention tab
This should allow you to execute code from within your program that's created on the fly in malloc() blocks.
Keep in mind that this makes your program more susceptible to attacks that DEP was meant to prevent.
It looks like this is also possible in Windows 2008 with the command:
bcdedit.exe /set {current} nx OptOut
But, to be honest, if you just want to minimise platform-dependent code, that's easy to do just by isolating the code into a single function, something like:
void *MallocWithoutDep(size_t sz) {
#if defined _IS_WINDOWS
return VirtualMalloc(sz, OPT_DEP_OFF); // or whatever
#elif defined IS_LINUX
// Do linuxy thing
#elif defined IS_MACOS
// Do something almost certainly inexplicable
#endif
}
If you put all your platform dependent functions in their own files, the rest of your code is automatically platform-agnostic.

Resources