I've been studying C++ Game Hacking via tutorials for a week or two and I almost get it all.
However, there's on thing that keeps bothering me over and over again.
To customize a value (f.e. player's health) we must search the memory address with Cheat Engine (or such) and set the value to something else.
These memory addresses are obviously different every time we start the program, cause it wont always use the same spot in RAM.
To solve this problem, people will try to find a static pointer to the memory address which contains the value; how are the pointers static, how can they reserve a static address from RAM?
Actually, it's not the pointer to the game variable that is static, but the offset of the variable's address, in reference to the address of another data.
If the game you want to "hack" is always storing data in the same, solid structure, then it's possible to find these offsets. When you know the offsets, then the only thing you need to do when the game starts is to find the address which the offsets are referencing to, instead of performing multiple scans - one for each variable.
Edit:
Additionaly, programs are very likely to be given the same virtual address space every time you run them, so in practice it would look like the variables with static offsets have the same addresses every time you run the program (further reading here, and there).
Related
I work with Unity and C# - when making multiplayer games I've been told that when it comes to values like positions that are floats, I should use a bit shift operator on them before sending them and reverse the operation on receive. I have been told this not only allows for larger numbers values and is capable of maintaining floating point precision which may be lost. However, if I do not have to, I do not wish to run this operation every time I receive a packet unless I have to. Though the bottle necks seem to be the actual parsing of the bytes received. Especially without message framing and attempting to move from string to byte array. (But that's another story!)
My question are:
Are these valid reason to undergo the operation? Are they accurate statements?
If not should I be running bit shift ops on my floats?
If I should, what are the real reasons to do it?
Any additional information would be most appreciated.
One of the resourcesI'm referring to:
Main reasons for going back and forth to/from network byte order is to combat endianness caused problems, mainly to ensure each byte of multi byte values (long, int but also floats) is read and written in the way giving the same results regardless of architecture. This issue can be theoretically ignored if you are sure you are exchanging data between systems using the same endianness, but that's rather bad idea from very beginning as you are simply creating unneded technological debt and keep unjustified exceptions in the code ("It all works BUT on the same endianness only. What can go wrong?").
Depending on your app architecture you can rewrite the packet payload/data once you receive it and then use that version further in the code. Also note that you need to encode the data again prior sending it out.
I'm trying to work with 2D in vulkan along with 3D. So right now testing out updating a texture for every frame as whatever 2D is going on. I've gotten something of a texture updater working, the problem is that it's very slow and probably not the way it's supposed to be done. Is there any better way of getting this done? The code is based on the https://vulkan-tutorial.com/ code.
https://vulkan-tutorial.com/code/26_depth_buffering.cpp
void UpdateTexture()
{
vkDeviceWaitIdle(device);
vkFreeMemory(device, textureImageMemory, nullptr);
VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;
createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
memcpy(data, pixel2.data(), static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);
createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
vkDestroyBuffer(device, stagingBuffer, nullptr);
vkFreeMemory(device, stagingBufferMemory, nullptr);
createTextureImageView();
createDescriptorPool();
createDescriptorSets();
createCommandBuffers();
}
This code looks like a direct translation of some OpenGL code, and not particularly good/modern OpenGL code at that.
There's a lot wrong in this code, but most of it boils down to over-synchronization.
First, you should always view any call to vkDeviceWaitIdle as the wrong thing to do. The only exception would be when you are preparing to destroy the VkDevice itself. There is no other reason to do a full CPU/GPU sync like that.
Presumably, this synchronization exists so that you can be sure the GPU is finished using the image before modifying it. This is the wrong thing to do. You should instead employ multiple-buffering. That is, you should have two images that you use. One is currently being used in a rendering process, while the other is being transferred into.
Instead of doing a full device sync, you instead synchronize with the batch you sent two frames ago. That is, if you're wanting to transfer data for use by frame 10, then you must first do a fence-sync operation with the batch you sent in frame 8. Frame 9 is still being processed, but frame 8 is probably done by now. So the synchronization shouldn't hurt too much.
Second, never allocate memory in the middle of an operation like this. Memory gets allocated early in your application, and you leave it allocated until it's time to destroy your application. If you need a staging buffer, then keep it around and reuse it in subsequent frames. Make sure to allocate sufficient storage up-front.
Whatever your createBuffer call is doing, it seems very much like a bad idea. Vulkan is not OpenGL; Vulkan separated memory from buffers/textures that use it for a reason. Creating APIs that hide this separation basically throws all of that away.
Similarly, never unmap memory, unless you're about to destroy that memory object. There's no problem in Vulkan (or OpenGL) with leaving a piece of memory mapped indefinitely. Just map the entire memory's range and leave it mapped. Indeed, you could just pass the mapped pointer directly to your image loader, depending on how the memory get written by the image loading code (if it tries to read data from this pointer, they could be trouble).
Lastly, the commands doing the transfer need to be synchronized with the commands that consume the image. How this happens depends on which queues are being used to do the transfer.
And of course, if you want optimal performance, you may want to check to see if your implementation can read from linear images in your shader. If it can, then you may not need staging at all; you can just write the data directly to the memory in Vulkan's image format, and use it directly.
Employing all of the above is going to add a lot of complexity to your application. But that's how it's supposed to work.
A naive way consists in using the CPU to define the update depending on the time or data and then update the data for the shader, such as a MVP transformation matrix. But this is inefficient with lots of syncing and too low refresh rates, and also overloading the cpu in a loop.
So people recommend using many buffers sometimes mentioning old drivers. If someone can clarify it, that would be nice. I have a naive and probably wrong guess. If they know exactly the frame rate, then they can calculate the time for each frame and dispatch several frames in advance. But it confuses me because the frame rate is dynamic, especially for new screens with the FreeSync functionality that have dynamic refresh rates.
I have thought of a third possibility. One can use the clock directly in the shader. GL_EXT_shader_realtime_clock provides clockRealtimeEXT. It has no defined unit, and will wrap when exceeding the maximum value. But it is said "globally coherent by all invocations on the GPU". During initialization, you can measure its rate using a uniform buffer, and then assume the rate will be constant. And also manage the wrapping.
Then if you can write your shaders as a function of time, for example in a translation, that would be efficient. You just need the initial data. Remember that one must avoid if conditions in shaders.
I am trying to disassemble a code from a old radio containing a 68xx (68hc12 like) microcontroller. The problem is, I dont have the access to the interrupt vector of the micro in the top of the ROM, so I don't know where start to look. I only have the code below the top. There is some suggestion of where or how can I find meaningful routines in the code data?
You can't really disassemble reliably without knowing where the reset vector points. What you can do, however, is try to narrow down the possible reset addresses by eliminating all those other addresses that cannot possibly be a starting point.
So, given that any address in the memory map that contains a valid opcode is a potential reset point, you need to either eliminate it, or keep it for further analysis.
For the 68HC11 case, you could try to guess somewhat the entry point by looking for LDS instructions with legitimate operand value (i.e., pointing at or near the top of available RAM -- if multiple RAM banks, then to any of them).
It may help a bit if you know the device's full memory map, i.e., if external memory is used, its mapping and possible mapped peripherals (e.g., LCD). Do you also know CONFIG register contents?
The LDS instruction is usually either the very first instruction, or close thereafter (so look back a few instructions when you feel you have finally singled out your reset address). The problem here is some data may, by chance, appear as LDS instructions so you could end up with multiple potentially valid entry points. Only one of them is valid, of course.
You can eliminate further by disassembling a few instructions starting from each of these LDS instructions until you either hit an illegal opcode (i.e. obviously not a valid code sequence but an accidental data arrangement that looks like opcodes), or you see a series of instructions that are commonly used in 68HC11 initialization. These involve (usually) initialization of any one or more of the registers BPROT, OPTION, SCI, INIT ($103D in most parts, but for some $3D), etc.
You could write a relatively small script (e.g., in Lua) to do the basic scanning of the memory map and produce a (hopefully small) set of potential reset points to be examined further with a true disassembler for hints like the ones I mentioned.
Now, once you have the reset vector figured out the job becomes somewhat easier but you still need to figure out where any interrupt handlers are located. For this your hint is an RTI instruction and whatever preceding code that normally should acknowledge the specific interrupt it handles.
Hope this helps.
I'm doing a cousework for a distributed sytems module, and within it I neef to apply a variable clock incrementor; my tutor has gone over both Lamport and Vector clocks, but said "I cant hint at that" when I asked him about applying a variable length/size per clock.
I wish I knew what to do,
Andy
I suppose you mean vector clocks of variable size?
This is technically not possible due to the way vector clocks are defined and used, however it brings the problem, that you would need to know about all nodes which will communicate together and use a vector clock right in the beginning. This way you wouldn’t be allowed to expand your service, and also if you tear down a node, to never start it again, the time for it would be still sent around and waste resources.
One of my professors in distributed systems mentioned, that Amazon is/was using “dynamic” vector clocks for some services, and they had an algorithm which automatically removed “old” entries from the vector clcoks. They supposesdly concluded something like, this worked so far fine. However I never saw the paper about this.
I wrote a reasonably basic memory allocator using sbrk. I ask for a chunk of memory, say 65k and carve it up as needed for variables requesting dynamic memory. I free the memory by adding it back to the 65k block. The 65k block is derived from a union sizeof(16-bytes). Then I align the block along an even 16-byte boundary. But I'm getting unusual behavior.
Accessing the memory appears fine as I allocate and begin to populate my data structures accept that on one of my function calls, I pass a pointer to a member variable in a global structure but the address of the pointer argument doesn't map directly to the address of that member.
For example, the real address of this particular member happens to be: 0x100313d50 but when executing a particular function (nothing special) the address of the member is being represented as 0x100313d70. Inside the debugger I can query the real address and it appears correct when inside the function where this manifests. This isn't the first member being accessed either, it's the third so two prior memory accesses are fine, but during the third access I'm seeing this unusual shifting.
Is it possible that I'm accessing this memory via a misaligned block? It's possible but I'd expect the get a SIGBUS exception thrown (SPARC chip). I'm compiling using -memalign=16s so it ought to SIGBUS instead of trapping and fixing the misalignment.
All of my structures are padded on a multiple of 16-bytes: sizeof(structure)%16 = 0. Has anyone had experience with this type of behavior? Generally speaking, what type of things/stuff/etc. might cause a pointer to misrepresent a memory address?
Cheers,
Tracy.
Solaris 10, SunStudio-12, C language on modern SPARC processor (in case this helps).
I figure I should answer my own question in the event someone else out there has a similar problem.
The reason why the memory address was shifting is because a prior call to a utility function accidentally overwrote the meta-address of the global structure thusly rewriting the meta-address of that block so lookups on that block were shifted even though the actual data still resided in the original block.
In simple words, I wrote past my buffer. Since I hand out memory from the tail, overwriting would blow away my much needed meta-address for my global structure (or whatever). Now I know what undefined behavior looks like.