32-bit pointer overflow in 64-bit gcc code - fails in compile - pointers

I am compiling a very large legacy Fortran 90 code (screamer) with gFortran on a Mac (2.2 GHz Intel Core i7) running Yosemite. (gFortran V5.1.0) I have 16 GB of RAM. The code is memory intensive and I am trying to increase array sizes to solve larger problems. I have maintained the code for >10 years and rewriting 200,000 lines of code right now is not an option. As I carefully increase the size of the 2-D matrix (am(max_nodes, max_nodes)) and several 1-D vectors (RHS(max_nodes) and a(max_nodes*2)) by varying the integer "max_nodes" I eventually get to a 32-bit pointer limit (4 byte unsigned integer limit) during compilation. See below.
final section layout:
__TEXT/__text addr=0x100001390, size=0x0006B9CB, fileOffset=0x00001390, type=1
__TEXT/__text_startup addr=0x10006CD60, size=0x00000041, fileOffset=0x0006CD60, type=1
__TEXT/__text_exit addr=0x10006CDB0, size=0x00000031, fileOffset=0x0006CDB0, type=1
__TEXT/__stubs addr=0x10006CDE2, size=0x00000252, fileOffset=0x0006CDE2, type=28
__TEXT/__stub_helper addr=0x10006D034, size=0x000003EE, fileOffset=0x0006D034, type=32
__TEXT/__cstring addr=0x10006D428, size=0x0000CFCB, fileOffset=0x0006D428, type=13
__TEXT/__const addr=0x10007A400, size=0x00008F00, fileOffset=0x0007A400, type=0
__TEXT/__eh_frame addr=0x100083300, size=0x0000DCF8, fileOffset=0x00083300, type=19
__DATA/__got addr=0x100091000, size=0x00000060, fileOffset=0x00091000, type=29
__DATA/__nl_symbol_ptr addr=0x100091060, size=0x00000010, fileOffset=0x00091060, type=29
__DATA/__la_symbol_ptr addr=0x100091070, size=0x00000318, fileOffset=0x00091070, type=27
__DATA/__mod_init_func addr=0x100091388, size=0x00000010, fileOffset=0x00091388, type=33
__DATA/__mod_term_func addr=0x100091398, size=0x00000008, fileOffset=0x00091398, type=34
__DATA/__const addr=0x1000913A0, size=0x000007C8, fileOffset=0x000913A0, type=0
__DATA/__static_data addr=0x100091B68, size=0x00000003, fileOffset=0x00091B68, type=0
__DATA/__data addr=0x100091B80, size=0x000003E0, fileOffset=0x00091B80, type=0
__DATA/__bss4 addr=0x100091F60, size=0x00000018, fileOffset=0x00000000, type=25
__DATA/__bss5 addr=0x100091F80, size=0x00020000, fileOffset=0x00000000, type=25
__DATA/__bss3 addr=0x1000B1F80, size=0x00000028, fileOffset=0x00000000, type=25
__DATA/__pu_bss2 addr=0x1000B1FA8, size=0x00000008, fileOffset=0x00000000, type=25
__DATA/__bss2 addr=0x1000B1FB0, size=0x00000024, fileOffset=0x00000000, type=25
__DATA/__pu_bss5 addr=0x1000B1FE0, size=0x0000024C, fileOffset=0x00000000, type=25
__DATA/__pu_bss4 addr=0x1000B2230, size=0x00000018, fileOffset=0x00000000, type=25
__DATA/__common addr=0x1000B2260, size=0x000020D8, fileOffset=0x00000000, type=25
__DATA/__zo_bss3 addr=0x1000B4338, size=0x00000021, fileOffset=0x00000000, type=25
__DATA/__huge addr=0x1000B4360, size=0x984EB80C, fileOffset=0x00000000, type=25
ld: 32-bit RIP relative reference out of range (2147639505 max is +/-4GB): from _main_loop_ (0x10000E120) to _a.4206 (0x180034380) in '_main_loop_' from screamer64.a(main_loop.o) for architecture x86_64
collect2: error: ld returned 1 exit status
In this error message main_loop is the core solver subroutine in screamer that populates and solves the large matrices. In this subroutine the large real*8 matrix and real*8 vectors are defined.
This Register Instruction Pointer (RIP) error is noted many times on the web. So far this available information has not helped me solve my problem. Note: the signed 4 byte integer limit is 2,147,483,647 so the error seems to be directly related to the use of a 32-bit pointer.
The gFortran compiler options include -mcmodel=medium that should take the pointers to 64 bits. -m64 has no effect. The total memory used by the primary matrix and vectors when the pointer limit is reached is greater than 2.4 GB. The confusing thing is that the code is fully 64 bit so I was not expecting 32-bit pointers. See below for 64-bit check.
rbspielman$ file screamer64
screamer64: Mach-O 64-bit executable x86_64
The primary matrix and vector are all real*8 (64-bit). All large arrays in declared directly in this one subroutine and are not placed in common.
All other variables in common are ordered by size. real*8, real, int, char.
Simple test programs demonstrate that there is no fundamental memory limit. I can easily define static arrays to > 10 GB without a problem. Larger arrays also work but end up using virtual memory and slow down as expected.
Clearly there is some sort of memory or pointer size limit but I just cannot figure it out. The code matrix solvers are massive and more realistic test programs would be tedious.
(I also compile screamer in Ubuntu LINUX without a problem up to the same array limit as the Mac. Compilations in Windows 8 fail at the usual 2 GB memory limit NOT at the pointer limit.)
Suggestions would be appreciated.

I just ran into the same problem with GNU Fortran (GCC) 5.1.0 in a Mac running 10.11.5 but the solution offered by the OP did not work for me.
However, I did find a solution: after systematically pruning my rather pedestrian legacy code, I found that every array has to be explicitly filled with something. You can't start filling it within your code. I know it sounds silly, but once I initialized every "real" array (32 bit, it is legacy code) with 0.0 before I did any I/O or other work, it linked without complaint.
And, yes, as with the OP, my code worked until I changed the size of an array.
The reason why this worked may be in the contents of this bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63793 but I am not good enough to tell you how to come up with a better workaround. My only guess is that initializing every array at the beginning favors the GOT instead of the RIP. (When will this be fixed? I just don't know how to push this up the line and the bug report is dated 2014-11-09)

Related

Why load-foreign-library does not work in ECL?

I compiled ECL 16.1.3 on Windows and I want to load the shared library but FFI does not work.
At first I used CFFI and then got the error 'unable to load'. Then I found the ECL limitation (On platforms where ECL’s dynamic FFI is not supported (ie. when :dffi is not present in features), cffi:load-foreign-library does not work and you must use ECL’s own ffi:load-foreign-library with a constant string argument) in the CFFI manual.
I do not have :dffi so I decided to use ffi:load-foreign-library instead of cffi:load-foreign-librarybut ffi:load-foreign-library also does not work.
(ffi:load-foreign-library "С:/.../libglib-2.0-0.dll")
nil
So I have two questions:
1) How to make ffi:load-foreign-library work?
2) How to compile ECL with :dffi support?

how to increase buffer size in scilab

I'm using a variable that is using too large symbol/string in Scilab, which is giving following error:
Too large string. at line 44 of exec file called by :
exec('/proj/shubhamj/shubhamj/scilab/final_add_from_script.sce', -1)
I've already used stacksize('max').
According to this thread on the mailing list for Scilab the error comes from the length of the command. You can get the same error without the exec() if you call a command that is too long even in your current script (where the exec() call currently is).
If we look at the documentation the default stacksize is approx. 76MB (megabytes) and that is a lot of characters which makes this issue 99.9% not related to the size of the stack.

Is there a working distribution of sqlite available for OpenVMS?

I am looking for a working distribution of SQLite for OpenVMS. I tried building SQLite 3.7.9 from the amalgamation file, using patches I found in a mailing list, but it does not quite work.
I am using HP C V7.1-015 on OpenVMS Alpha 7.3-2.
Since I cannot install python, which seems to include SQLite3, I have to build from sources.
I compile using the following commands:
$ CC /OPTIMIZE -
/DEFINE=(SQLITE_THREADSAFE=0, -
SQLITE_OMIT_LOAD_EXTENSION=1, -
SQLITE_OMIT_COMPILEOPTION_DIAGS=1, -
SQLITE_OMIT_MEMORYDB=1, -
SQLITE_OMIT_TEMPDB=1, -
SQLITE_OMIT_DEPRECATED=1, -
SQLITE_OMIT_SHARED_CACHE=1, -
_USE_STD_STAT=ENABLE) -
/FLOAT=IEEE_FLOAT -
sqlite3.c
$ CC /OPTIMIZE -
/DEFINE=(SQLITE_THREADSAFE=0, -
SQLITE_OMIT_LOAD_EXTENSION=1, -
SQLITE_OMIT_COMPILEOPTION_DIAGS=1, -
SQLITE_OMIT_MEMORYDB=1, -
SQLITE_OMIT_TEMPDB=1, -
SQLITE_OMIT_DEPRECATED=1, -
SQLITE_OMIT_SHARED_CACHE=1, -
_USE_STD_STAT=ENABLE) -
/FLOAT=IEEE_FLOAT -
shell.c
I copied the defines from the mailing list, and added /FLOAT=IEEE_FLOAT to get rid of most warnings regarding floating points (related to overflows due to exponent 308).
During compilation I got some informationals and warnings.
I get the following messages while linking:
$ LINK shell.obj,sqlite3.obj
...
%LINK-W-NUDFSYMS, 2 undefined symbols:
%LINK-I-UDFSYM, __STD_FSTAT
%LINK-I-UDFSYM, __STD_STAT
...
Since I am a little bit lost here, I rather have SQLite3 sources which compile on OpenVMS.
The specific problem you're getting from the linker arises from the fact that you've requested capability at compile time that your system doesn't have. I believe the _USE_STD_STAT option first became available in OpenVMS v8.2, yet you're on 7.3-2. Your compiler and your headers know what to do when _USE_STD_STAT is defined, but the functions to process the X/Open-compliant stat structure do not exist in the C run-time (CRTL in VMS parlance) on your system, and your linker is telling you, "ain't got those functions."
Ideally you would be able to upgrade your operating system. Current as of this writing is v8.4. v7.3-2 was released eight and a half years ago and v8.2 over seven years ago. I understand that there are technical, budgetary, and even political reason that upgrades aren't always possible. If it were me, and I were stuck on OpenVMS Alpha v7.3-2, I would try removing the _USE_STD_STAT=ENABLE from the compilation and see what blows up.
One of the side effects of enabling _USE_STD_STAT is that you also get _LARGEFILE along with it. If that's the only reason SQLite needs the option, you may be fine but limited to 4GB databases. I suspect there's more to it than that, i.e., SQLite very likely makes use of elements in the stat structure that do actually require the newer structure.
You can read up on the differences in the traditional and standards-compliant stat structures at http://h71000.www7.hp.com/doc/84final/5763/5763profile_062.html#index_x_1699.
I've recently improved my VMSish patch for SQLite and made it available for SQLite version 3.7.14.1: http://www.mail-archive.com/sqlite-users#sqlite.org/msg73570.html (or http://sqlite.1065341.n5.nabble.com/Building-SQLite-3-7-14-1-for-OpenVMS-td65277.html).
POSIX locking still doesn't work though, and I was unable to find out why.
Well, there was a message on the sqlite-users mailing list on getting SQLite 3.7.9 working on OpenVMS. I don't know how relevant that is to the version you've got (or if the patch was adopted by the SQLite developers; they're a bit picky for legal reasons IIRC) but it looks likely to be useful. Good luck.

Increase the stack size of Application in Xcode 4

I want to increase the stack size of an ipad application to 16MB .I have done it in xcode build setting "-WI-stack_size 1000000 to the Other Linker Flags field ".
but getting build error of
i686-apple-darwin10-gcc-4.2.1: stack_size: No such file or directory
i686-apple-darwin10-gcc-4.2.1: 1000000: No such file or directory
How can i resolve that ?
Do not know if you got it solved or not, but the correct way to indicate such option to the linker (according to this site) is to add -Wl,-stack_size,1000000 to the Other Linker Flags field of the Build Styles pane. You are missing the commas.
In case you are using clang or gcc the cli would be:
g++ -Wl,-stack_size -Wl,1000000
Hope it helps.
If you think you need to increase the stack size you're almost certainly doing something very wrong in your code, like allocating way too large objects on the stack (ie a 16 MByte C array).
Instead, allocate the memory or use the proper Objective-C data structures (ie NSMutableArray).

What's the equivalent of rdtsc opcode for PPC?

I have an assembly program that has the following code.
This code compiles fine for a intel processor. But, when I use a PPC (cross)compiler, I get an error that the opcode is not recognized. I am trying to find if there is an equivalent opcode for PPC architecture.
.file "assembly.s"
.text
.globl func64
.type func64,#function
func64:
rdtsc
ret
.size func64,.Lfe1-func64
.globl func
.type func,#function
func:
rdtsc
ret
PowerPC includes a "time base" register which is incremented regularly (although perhaps not at each clock -- it depends on the actual hardware and the operating system). The TB register is a 64-bit value, read as two 32-bit halves with mftb (low half) and mftbu (high half). The four least significant bits of TB are somewhat unreliable (they increment monotonically, but not necessarily with a fixed rate).
Some of the older PowerPC processors do not have the TB register (but the OS might emulate it, probably with questionable accuracy); however, the 603e already has it, so it is a fair bet that most if not all PowerPC systems actually in production have it. There is also an "aternate time base register".
For details, see the Power ISA specification, available from the power.org Web site. At the time of writing that answer, the current version was 2.06B, and the TB register and opcodes were documented at pages 703 to 706.
When you need a 64-bit value on a 32-bit architecture (not sure how it works on 64-bit) and you read the TB register you can run into the problem of the lower half going from 0xffffffff to 0 - granted this doesn't happen often but you can be sure it will happen when it will do the most damage ;)
I recommend you read the upper half first, then the lower and finally the upper again. Compare the two uppers and if they are equal, no problemo. If they differ (the first should be one less than the last) you have to look at the lower to see which upper it should be paired with: if its highest bit is set it should be paired with the first, otherwise with the last.
Apple has three versions of mach_absolute_time() for the different types of code:
32-bit
64-bit kernel, 32-bit app
64-bit kernel, 64-bit app
Inspired by a comment from Peter Cordes and the disassembly of clang's __builtin_readcyclecounter:
mfspr 3, 268
blr
For gcc you can do the following:
unsigned long long rdtsc(){
unsigned long long rval;
__asm__ __volatile__("mfspr %%r3, 268": "=r" (rval));
return rval;
}
Or For clang:
unsigned long long readTSC() {
// _mm_lfence(); // optionally wait for earlier insns to retire before reading the clock
return __builtin_readcyclecounter();
}

Resources