I've written an Ada program which encrypts files. It reads them block by block to conserve memory on the target machine. Unfortunately Ada's Directories library reads filesizes in a Long_Integer, limiting the read to almost 2GB files. When trying to read files over 2GB, the program fails at runtime getting a stack overflow error.
The documentation for it here is the origin of my understanding above. How can I read a filesize into a type I define myself? One I can make requiring something like 25 bytes to increase the cap to 100GB.
I just posted GCC bug 55119 on this.
While you're waiting (!), the code below works on Mac OS X Mountain Lion. On Windows, it's more complicated; see adainclude/adaint.{c,h}.
The Ada spec:
with Ada.Directories;
package Large_Files is
function Size (Name : String) return Ada.Directories.File_Size;
end Large_Files;
and body (copied in part from Ada.Directories):
with GNAT.OS_Lib;
with System;
package body Large_Files is
function Size (Name : String) return Ada.Directories.File_Size
is
C_Name : String (1 .. Name'Length + 1);
function C_Size (Name : System.Address) return Long_Long_Integer;
pragma Import (C, C_Size, "large_file_length");
begin
if not GNAT.OS_Lib.Is_Regular_File (Name) then
raise Ada.Directories.Name_Error
with "file """ & Name & """ does not exist";
else
C_Name (1 .. Name'Length) := Name;
C_Name (C_Name'Last) := ASCII.NUL;
return Ada.Directories.File_Size (C_Size (C_Name'Address));
end if;
end Size;
end Large_Files;
and the C interface:
/* large_files_interface.c */
#include <sys/stat.h>
long long large_file_length (const char *name)
{
struct stat statbuf;
if (stat(name, &statbuf) != 0) {
return 0;
} else {
return (long long) statbuf.st_size;
}
}
You might need to use struct stat64 and stat64() on other Unix systems.
Compile the C interface as normal, then add -largs large_files_interface.o to your gnatmake command line.
EDIT: on Mac OS X (and Debian), which are x86_64 machines, sizeof(long) is 8 bytes; so the comment in adaint.c is misleading and Ada.Directories.Size can return up to 2**63-1.
Related
The question is about the performance of protected type and entry using the GNAT compiler on Linux. Mutex is only used as an example (Ada does not need it).
I compared the performance of Ada Mutex implementation from Rosetta Code
(https://rosettacode.org/wiki/Mutex#Ada) to a very simple C/pthread implementation called from Ada using import and C interface.
It turned out the Ada protected type + entry was 36.8x times slower.
I know that GNAT may go through its run time library and eventually ends up
calling OS primitives s.a. pthread. I excepted some overhead but not that much.
The question is: Why?
Just in case - this is my simple pthread implementation:
-- cmutex.ads
package CMutex is
procedure Lock with
Import => True,
Convention => C,
External_Name => "mutex_lock";
procedure Unlock with
Import => True,
Convention => C,
External_Name => "mutex_unlock";
end CMutex;
// C code
#include <pthread.h>
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
void mutex_lock()
{
pthread_mutex_lock(&mtx);
}
void mutex_unlock()
{
pthread_mutex_unlock(&mtx);
}
==== EDIT ===
Adding minimal reproachable code. It is a single thread (just for test), the dummy variable is to prevent the optimizer from optimizing the entire loop out.
The test code for the pthread (cmutex) above is:
with Text_IO; use Text_IO;
with CMutex;
procedure test is
dummy : Integer := 0;
begin
for i in 1 .. 100_000_000 loop
CMutex.Lock;
dummy := dummy + 1;
CMutex.Unlock;
end loop;
Text_IO.Put_Line(Integer'image(dummy));
end test;
And the test code for the protected type + entry example is:
with Text_IO; use Text_IO;
with Ada_Mutex;
procedure test1 is
dummy : Integer := 0;
mtx : Ada_Mutex.Mutex;
begin
for i in 1 .. 100_000_000 loop
mtx.Seize;
dummy := dummy + 1;
mtx.Release;
end loop;
Text_IO.Put_Line(Integer'image(dummy));
end test1;
Where Ada_Mutex is a package containing the example form Rosetta code:
package Ada_Mutex is
protected type Mutex is
entry Seize;
procedure Release;
private
Owned : Boolean := False;
end Mutex;
end Ada_mutex;
--------------------------------
package body Ada_Mutex is
protected body Mutex is
entry Seize when not Owned is
begin
Owned := True;
end Seize;
procedure Release is
begin
Owned := False;
end Release;
end Mutex;
end Ada_Mutex;
Running time of the code that using the pthread mutex is (in Intel NUC i7):
$ time ./test
100000000
real 0m0.557s
user 0m0.553s
sys 0m0.005s
And the code that uses protected type and entry:
$ time ./test1
100000000
real 0m19.009s
user 0m19.005s
sys 0m0.005s
With no optimization (-O0) times are:
real 0m0.746s
user 0m0.746s
sys 0m0.000s
and
real 0m20.173s
user 0m20.172s
sys 0m0.000s
For pthread and protected type+entry respectively.
Note that the user time ~= real time, which means the processor
was busy (it did not idle, or otherwise yield control)
I'm writing a code for calculate AES by CUDA but I have some problems with realloc() on the CPU side of the code.
When I read the data from the disk I need to encrypt I use this piece of code:
puint4 * cipher_block;
u32 posizione = 0;
FILE * inputFile = fopen("input.in","rb");
while(!feof(inputFile)){
if((cipher_block = (puint4 *)realloc(cipher_block,sizeof(puint4)*(posizione+1))) == NULL){
printf("\nERROR\n");
}
//read data
.....
posizione++;
}//end while
It works fine until I add (following) another piece of code which I use for allocate the memory on the GPU:
puint4 * round_key_GPU;
puint4 * cipher_block_GPU;
cudaMalloc((void**)&round_key_GPU,sizeof(puint4)*11);
cudaMalloc((void**)&cipher_block_GPU,sizeof(puint4)*(posizione+1));
//other instructions
....
/* Libero le risorse */
cudaFree(round_key_GPU);
cudaFree(cipher_block_GPU);
free(round_key_GPU);
free(cipher_block_GPU);
free(cipher_block);
free(round_key);
When I add this piece of code the realloc() instruction returns the error:
*** Error in `./test.x': realloc(): invalid pointer: 0x0000000000402338 ***
And if I delete
free(round_key_GPU);
free(cipher_block_GPU);
It return the error:
test.x: malloc.c:2842: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed.
So I think that maybe the round_key_GPU and cipher_block_GPU pointer have been allocated before the while loop finish to read all the data and then the realloc() overwrite the memory space of the pointers. What do you think? If is this the case how can i read the data from the disk without know how much data i have to read?
(P.S. puint4 is a typedef structure with 4 unsigned int, like uint4)
Thank!
Davide
I'm trying to modify graphic assets of the software I'm using (for aesthetic puroposess, I guess it's hard to do something harmful with graphic assets) but developer encrypted them. I'm not sure why he decided to do that since I used and modified a bunch of similar softwares and developers of those didn't bother (as I can see no reason why encrypting those assets would be necessary).
So anyway here are examples of those encrypted graphic assets:
http://www.mediafire.com/view/sx2yc0w5wkr9m2h/avatars_50-alpha.jpg
http://www.mediafire.com/download/i4fc52438hkp55l/avatars_80.png
Is there a way of decrypting those? If so how should I go about this?
The header "CF10" seems to be a privately added signature to signify the rest of the file is "encoded". This is a very simple XOR encoding: xor 8Dh was the first value I tried, and I got it right first time too. The reasoning behind trying that as the first value is that the value 8D occurs very frequently in the first 100 bytes-or-so, where they typically could be lots of zeroes.
"Decrypting" is thus very straightforward: if a file starts with the four bytes CF10, remove them and apply xor 8Dh on the rest of the file. Decoding the files show the first "JPG" is in fact a tiny PNG image (and not a very interesting one to boot), the second is indeed a PNG file:
The file extension may or may not be the original file extension; the one sample called ".jpg" is in fact also a PNG file, as can be seen by its header signature.
The following quick-and-dirty C source will decode the images. The same program can be adjusted to encode them as well, because the xor operation is exactly the same. The only thing needed is add a bit of logic flow:
read the first 4 bytes (maximum) of the input file and test if this forms the string CF10
if not, the file is not encoded:
a. write CF10 to the output file
b. encode the image by applying xor 8Dh on each byte
if so,
b. decode the image by applying xor 8Dh on each byte.
As you can see, there is no "3a" and both "b" steps are the same.
#include <stdio.h>
#include <string.h>
#ifndef MAX_PATH
#define MAX_PATH 256
#endif
#define INPUTPATH "c:\\documents"
#define OUTPUTPATH ""
int main (int argc, char **argv)
{
FILE *inp, *outp;
int i, encode_flag = 0;
char filename_buffer[MAX_PATH];
char sig[] = "CF10", *ptr;
if (argc != 3)
{
printf ("usage: decode [input] [output]\n");
return -1;
}
filename_buffer[0] = 0;
if (!strchr(argv[1], '/') && !strchr(argv[1], 92) && !strchr(argv[1], ':'))
strcpy (filename_buffer, INPUTPATH);
strcat (filename_buffer, argv[1]);
inp = fopen (filename_buffer, "rb");
if (inp == NULL)
{
printf ("bad input file '%s'\n", filename_buffer);
return -2;
}
ptr = sig;
while (*ptr)
{
i = fgetc (inp);
if (*ptr != i)
{
encode_flag = 1;
break;
}
ptr++;
}
if (encode_flag)
{
/* rewind file because we already read some bytes */
fseek (inp, 0, SEEK_SET);
printf ("encoding input file: '%s'\n", filename_buffer);
} else
printf ("decoding input file: '%s'\n", filename_buffer);
filename_buffer[0] = 0;
if (!strchr(argv[2], '/') && !strchr(argv[2], 92) && !strchr(argv[2], ':'))
strcpy (filename_buffer, OUTPUTPATH);
strcat (filename_buffer, argv[2]);
outp = fopen (filename_buffer, "wb");
if (outp == NULL)
{
printf ("bad output file '%s'\n", filename_buffer);
return -2;
}
printf ("output file: '%s'\n", filename_buffer);
if (encode_flag)
fwrite (sig, 1, 4, outp);
do
{
i = fgetc(inp);
if (i != EOF)
fputc (i ^ 0x8d, outp);
} while (i != EOF);
fclose (inp);
fclose (outp);
printf ("all done. bye bye\n");
return 0;
}
Ok so when it comes to practical usage of the code provided by #Jongware that was unclear to me - I figured it out with some help:)
I compiled the code using Visual Studio (you can find guides on how to do that, basically create new Visual C++ project and in Project -> Project Propeties choose C/C++ -> All options and Compile as C Code (/TC)).
Then I opened program in command prompt using parameter "program encrypted_file decrypted_file".
Thanks a lot for help Jongware!
in making a simple cgi server for a course. To do that in some point I have to make a fork/exec to launch the cgi handler, the problem is that the exec keep returning errno 14. I've tried the following code in a standalone version an it works with and without the absolute path.
Here's the code:
static void _process_cgi(int fd, http_context_t* ctx)
{
pid_t childProcess;
int ret;
char returnValue[1024];
log(LOG, "calling cgi", &ctx->uri[1], 0);
if((childProcess = fork()) != 0)
{
///
/// Set the CGI standard output to the socket.
///
dup2(fd, STANDARD_OUTPUT);
//ctx->uri = "/simple.cgi"
execl("/home/dvd/nwebdir/simple.cgi",&ctx->uri[1]);
sprintf(returnValue,"%d",errno);
log(LOG, "exec returned ", returnValue, 0);
return -1;
}
ret = waitpid(childProcess,NULL,0);
sprintf(returnValue,"%d",ret);
log(LOG, "cgi returned", returnValue, 0);
}
Here is the list of sys calls that the server passes before reaching my code (in order):
- chdir
- fork
- setpqrp
- fork
I don't know if this is relevant or not, but in my test program I don't have chdir nor setpqrp.
The test code is the following:
pid_t pid;
if ((pid = fork()) != 0)
{
execl("simple.cgi","simple");
//execl("/home/dvd/nwebdir/simple.cgi","simple");
return 0;
}
printf("waiting\n");
waitpid(pid, NULL, 0);
printf("Parent exiting\n");
Note I've tried both execl and execlp in the server code.
You can find the basic server implementation (without CGI) in here, the only changes I made was in the web funcion:
http://www.ibm.com/developerworks/systems/library/es-nweb/index.html
Regards
execl("simple.cgi","simple", NULL);
The null is needed because execl() is a varargs - function.
execl("/home/dvd/nwebdir/simple.cgi", &ctx->uri[1], (char *)0);
The last argument to execl() must be a null char *. You can usually get away with writing NULL instead of (char *)0, but it might not produce the correct result if you have #define NULL 0 and you are on a machine where sizeof(int) != sizeof(char *), such as a 64-bit system.
BTW, either you copied the code incorrectly or it has logic error. Fork() returns non-zero to parent process, not child one, so condition shall be reverted. (There is no comment button here so making answer.)
I was thinking of trying my hand at some jit compilataion (just for the sake of learning) and it would be nice to have it work cross platform since I run all the major three at home (windows, os x, linux).
With that in mind, I want to know if there is any way to get out of using the virtual memory windows functions to allocate memory with execution permissions. Would be nice to just use malloc or new and point the processor at such a block.
Any tips?
DEP is just turning off Execution permission from every non-code page of memory. The code of application is loaded to memory which has execution permission; and there are lot of JITs which works in Windows/Linux/MacOSX, even when DEP is active. This is because there is a way to dynamically allocate memory with needed permissions set.
Usually, plain malloc should not be used, because permissions are per-page. Aligning of malloced memory to pages is still possible at price of some overhead. If you will not use malloc, some custom memory management (only for executable code). Custom management is a common way of doing JIT.
There is a solution from Chromium project, which uses JIT for javascript V8 VM and which is cross-platform. To be cross-platform, the needed function is implemented in several files and they are selected at compile time.
Linux: (chromium src/v8/src/platform-linux.cc) flag is PROT_EXEC of mmap().
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
const size_t msize = RoundUp(requested, AllocateAlignment());
int prot = PROT_READ | PROT_WRITE | (is_executable ? PROT_EXEC : 0);
void* addr = OS::GetRandomMmapAddr();
void* mbase = mmap(addr, msize, prot, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mbase == MAP_FAILED) {
/** handle error */
return NULL;
}
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, msize);
return mbase;
}
Win32 (src/v8/src/platform-win32.cc): flag is PAGE_EXECUTE_READWRITE of VirtualAlloc
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
// The address range used to randomize RWX allocations in OS::Allocate
// Try not to map pages into the default range that windows loads DLLs
// Use a multiple of 64k to prevent committing unused memory.
// Note: This does not guarantee RWX regions will be within the
// range kAllocationRandomAddressMin to kAllocationRandomAddressMax
#ifdef V8_HOST_ARCH_64_BIT
static const intptr_t kAllocationRandomAddressMin = 0x0000000080000000;
static const intptr_t kAllocationRandomAddressMax = 0x000003FFFFFF0000;
#else
static const intptr_t kAllocationRandomAddressMin = 0x04000000;
static const intptr_t kAllocationRandomAddressMax = 0x3FFF0000;
#endif
// VirtualAlloc rounds allocated size to page size automatically.
size_t msize = RoundUp(requested, static_cast<int>(GetPageSize()));
intptr_t address = 0;
// Windows XP SP2 allows Data Excution Prevention (DEP).
int prot = is_executable ? PAGE_EXECUTE_READWRITE : PAGE_READWRITE;
// For exectutable pages try and randomize the allocation address
if (prot == PAGE_EXECUTE_READWRITE &&
msize >= static_cast<size_t>(Page::kPageSize)) {
address = (V8::RandomPrivate(Isolate::Current()) << kPageSizeBits)
| kAllocationRandomAddressMin;
address &= kAllocationRandomAddressMax;
}
LPVOID mbase = VirtualAlloc(reinterpret_cast<void *>(address),
msize,
MEM_COMMIT | MEM_RESERVE,
prot);
if (mbase == NULL && address != 0)
mbase = VirtualAlloc(NULL, msize, MEM_COMMIT | MEM_RESERVE, prot);
if (mbase == NULL) {
LOG(ISOLATE, StringEvent("OS::Allocate", "VirtualAlloc failed"));
return NULL;
}
ASSERT(IsAligned(reinterpret_cast<size_t>(mbase), OS::AllocateAlignment()));
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, static_cast<int>(msize));
return mbase;
}
MacOS (src/v8/src/platform-macos.cc): flag is PROT_EXEC of mmap, just like Linux or other posix.
void* OS::Allocate(const size_t requested,
size_t* allocated,
bool is_executable) {
const size_t msize = RoundUp(requested, getpagesize());
int prot = PROT_READ | PROT_WRITE | (is_executable ? PROT_EXEC : 0);
void* mbase = mmap(OS::GetRandomMmapAddr(),
msize,
prot,
MAP_PRIVATE | MAP_ANON,
kMmapFd,
kMmapFdOffset);
if (mbase == MAP_FAILED) {
LOG(Isolate::Current(), StringEvent("OS::Allocate", "mmap failed"));
return NULL;
}
*allocated = msize;
UpdateAllocatedSpaceLimits(mbase, msize);
return mbase;
}
And I also want note, that bcdedit.exe-like way should be used only for very old programs, which creates new executable code in memory, but not sets an Exec property on this page. For newer programs, like firefox or Chrome/Chromium, or any modern JIT, DEP should be active, and JIT will manage memory permissions in fine-grained manner.
One possibility is to make it a requirement that Windows installations running your program be either configured for DEP AlwaysOff (bad idea) or DEP OptOut (better idea).
This can be configured (under WinXp SP2+ and Win2k3 SP1+ at least) by changing the boot.ini file to have the setting:
/noexecute=OptOut
and then configuring your individual program to opt out by choosing (under XP):
Start button
Control Panel
System
Advanced tab
Performance Settings button
Data Execution Prevention tab
This should allow you to execute code from within your program that's created on the fly in malloc() blocks.
Keep in mind that this makes your program more susceptible to attacks that DEP was meant to prevent.
It looks like this is also possible in Windows 2008 with the command:
bcdedit.exe /set {current} nx OptOut
But, to be honest, if you just want to minimise platform-dependent code, that's easy to do just by isolating the code into a single function, something like:
void *MallocWithoutDep(size_t sz) {
#if defined _IS_WINDOWS
return VirtualMalloc(sz, OPT_DEP_OFF); // or whatever
#elif defined IS_LINUX
// Do linuxy thing
#elif defined IS_MACOS
// Do something almost certainly inexplicable
#endif
}
If you put all your platform dependent functions in their own files, the rest of your code is automatically platform-agnostic.