Meaning of unix v6 assembly code generated by cc compiler? - unix

For example, below is a piece of C code and its assembly code generated by cc compiler.
// C code (pre K&R C)
foo(a, b) {
int c, d;
c = a;
d = b;
return c+d;
}
// corresponding assembly code generated by cc
.global _foo
.text
_foo:
~~foo:
~a=4
~b=6
~c=177770
~d=177766
jsr r5, csv
sub $4, sp
mov 4(r5), -10(r5)
mov 6(r5), -12(r5)
mov -10(r5), r0
add -12(r5), r0
jbr L1
L1: jmp cret
I can understand most of the code. But I don't know what does ~~foo: do. And where do the magic numbers come from in ~c=177770 and ~d=177766. The hardware is pdp-11/40.

The tildes look like data which determines the stack usage. You might find it helpful to recall that the pdp-11 used 16-bit integers, and that DEC preferred octal numbers over hexadecimal.
That
jsr r5, csv
is a way of making register 5 (r5) point to some data (perhaps the list of offsets).
The numbers correspond to offsets on the stack in octal. The caller is assumed to do something like
push a and b onto the stack (positive offsets)
push the return address onto the stack (offset=0)
possibly push other stuff in the csv function
c and d are local variables (negative offsets, hence the "17777x")
That line
~d=177776
looks odd - I'd expect
~d=177766
since it should be below c on the stack. The -10 and -12 offsets in the register operands look like they're also octal numbers. You should be able to match up the offsets with the variables, by context.
That's just an educated guess: I adapted the jsr+r5 idiom a while back in a text-editor.
The lines with tildes are symbol definitions. A clue for that is in the DECUS C Compiler Reference, found at
ftp://ftp.update.uu.se/pub/pdp11/rsx/lang/decusc/2.19/005003/CC.DOC
which says
3.3 Global Symbols Containing Radix-50 '$' and '.'
______ _______ __________ ________ ___
With this version of Decus C, it is possible to generate and
access global symbols which contain the Radix-50 '.' and '$'.
The compiler allows identifiers to contain the Ascii '$', which
becomes a Radix-50 '$' in the object code. The AS assembly code
shows this character as a tilde (~). The underscore character
(_) in a C program becomes a '.' in both the AS assembly
language and in the object code. This allows C programs to
access all global symbols:
extern int $dsw;
. . .
printf("Directive status = %06o\n", $dsw);
The above prints the current contents of the task's directive
status word.
So you could read
~a=4
as
$a=4
and see that $a is a (more or less) conventional symbol.

Related

Ada program to detect an end of line

I was assigned this task as my homework. I have a file which contains lines of text of varying lengths. The program is supposed to write the data onto the screen in precisely the same order in which it is written in the file, yet it fails to do so. To achieve the desired result I tried reading only one character per iteration so as to detect new line characters. What am I doing wrong?
WITH Ada.Text_IO;
WITH Ada.Characters.Latin_1;
USE Ada.Text_IO;
PROCEDURE ASCII_artwork IS
File : File_Type;
c : Character;
BEGIN
Open(File, In_File, "Winnie_The_Pooh.txt");
WHILE NOT End_Of_File(File) LOOP
Get(File, C);
IF (C = Ada.Characters.Latin_1.LF) THEN Put_Line(" "); ELSE
Put(C);
END IF;
END LOOP;
Close(File);
END ASCII_Artwork;
For each file, the Ada runtime maintains a fictitious "cursor". This is not the typical file position cursor (index), but one that indicates the position on a page, line, etc. (see also RM A.10 (7)). This is somewhat of an inheritance from the early versions of Ada.
Get stems from this same era and is expected to update the location of this cursor when some particular control characters are being read (e.g. an end-of-line mark). If Get reads such such a control character, it will only use it to update the cursor (internally) and then continue to read a next character (see also RM A.10.7 (3)). You'll therefore never detect an end-of-line mark when using Get.
This behavior, however, has some uncomfortable consequence: if a file ends with a sequence of control characters, then Get will keep reading those characters and hit the end of the file causing an End_Error exception.
You can, of course, catch this exception and handle it, but such a construct is dubious as having a sequence of control characters at the end of a file is actually not such an abnormal case (and hence dubious if worth an exception). As a programmer, however, you cannot change this behavior: it's defined by the language and the language will not be changed because it has been decided to keep Ada (highly) backwards compatible (which in itself is understandable given its field of application).
Hence, in your case, if you want stick to a character-by-character processing approach, I would suggest to move away from Get and instead use (for example) streams to perform I/O as in the example below.
main.adb
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Text_IO.Text_Streams; use Ada.Text_IO.Text_Streams;
procedure ASCII_artwork IS
File : File_Type;
Input : Stream_Access;
Output : Stream_Access;
C : Character;
begin
Open (File, In_File, "Winnie_The_Pooh.txt");
Input := Stream (File);
Output := Stream (Standard_Output);
while not End_Of_File (File) loop
Character'Read (Input, C);
Character'Write (Output, C);
end loop;
Close(File);
end ASCII_Artwork;
Output is as expected (i.e. the content of this the file at ascii-art.de).
NOTE: Check the source code of the GNAT runtime to actually see how Get works internally (focus on the loop at the end).
As explained by DeeDee, text inputs are buffered linewise in Ada. The idea is to be able to read two integers on the same line. For consistency sake (the designers of Ada are picky on that...), Get(File, C) does the same. It is not practical in your case. Fortunately, Ada 95 has introduced Get_Immediate, to solve precisely that issue.
Otherwise, as suggested by Frédéric, you could use the function Get_Line to absorb Winnie_The_Pooh.txt line by line seamlessly. By the way, the Get_Line method will convert the different end-of-line conventions automatically.
Line terminators in Ada.Text_IO are a concept, not a character or sequence of characters in the file. (Although most commonly used file systems implement them as characters or sequences of characters in the file, there exist file systems that do not.) Line terminators must therefore be manipulated using the operations in the package. For reading, End_Of_Line checks to see if the cursor is at a line terminator, Skip_Line skips the next line terminator, and Get_Line may skip a line terminator. For writing, New_Line and Put_Line write line terminators.
For your problem, the canonical solution is to use the Get_Line function to read lines, and Put_Line to output the lines read.

Is this use of character string pointers safe?

While implementing a string utility function, I came across a couple of character pointer expressions that I think may be unsafe. I googled, searched on SO, read my Fortran 95 language guide (Gehrke 1996) as well as various excerpts on display in Google books. However, I could not find any sources discussing this particular usage.
Both ifort and gfortran compile the following program without warning:
PROGRAM test_pointer
IMPLICIT NONE
CHARACTER(LEN=100), TARGET :: string = "A string variable"
CHARACTER(LEN=0), TARGET :: empty = ""
CHARACTER(LEN=:), POINTER :: ptr
ptr => NULL()
IF(ptr == "") PRINT *, 'Nullified pointer is equal to ""'
ptr => string(-2:-3)
IF(ptr == "") PRINT *, 'ptr equals "", but the (empty) sub string was out of bounds.'
ptr => empty(1:0)
IF(ptr == "") PRINT *, 'ptr equals "", it was not possible to specify subarray within bonds'
END PROGRAM
The output of the program is:
Nullified pointer is equal to ""
ptr equals "", but the (empty) sub string was out of bounds.
ptr equals "", it was not possible to specify subarray within bonds
So apparently, the evaluations of the pointer make sense to the compiler and the outcome is what you would expect. Can somebody explain why the above code did not result in at least one segmentation fault? Does the standard really allow out-of-bounds substrings? What about the use of a nullified character pointer?
edit : After reading Vladimir F's answer, I realized that I forgot to activate runtime checking. The nullified pointer actually does trigger a run time error.
Why they do not result in a segfault? Dereferencing a nullified pointer is not conforming to the standard (in C terms it is undefined behaviour). The standard does not say what a non-conforming program should do. The standard only applies to programs which conform to it! Anything can happen for non-conforming programs!
I get this (sunf90):
****** FORTRAN RUN-TIME SYSTEM ******
Attempting to use an unassociated POINTER 'PTR'
Location: line 8 column 6 of 'charptr.f90'
Aborted
and with another compiler (ifort):
forrtl: severe (408): fort: (7): Attempt to use pointer PTR when it is not associated with a target
Image PC Routine Line Source
a.out 0000000000402EB8 Unknown Unknown Unknown
a.out 0000000000402DE6 Unknown Unknown Unknown
libc.so.6 00007FA0AE123A15 Unknown Unknown Unknown
a.out 0000000000402CD9 Unknown Unknown Unknown
For the other two accesses, you are not accessing anything, you are creating a substring of length 0, there is no need to access the character variable, the result is just an empty string.
Specifically, the Fortran standard (F2008:6.4.1.3) says this about creating a substring:
Both the starting point and the ending point shall be within the
range 1, 2, ..., n unless the starting point exceeds the ending
point, in which case the substring has length zero.
For this reason the first part is not standard conforming, but the other ones are.

I don't understand the result of this little program

i've made this little program to test a little part of a bigger program.
int main()
{
char c[]="ddddddddddddd";
char *g= malloc(4*sizeof(char));
*g=NULL;
strcpy (g,c);
printf("Hello world %s!\n",g);
return 0;
}
I expected that the function would return "Hello World dddd" ,since the length of g is 4*sizeof(char), but it returns " Hello World ddddddddddddd ".Can you explain me Where I'm wrong ?
Don't do that, it's undefined behaviour.
The strcpy function will happily copy all those characters in c regardless of the size of g.
That's because it copies characters up to the first \0 in c. In this particular case it may corrupt your heap, or it may not, depending on the minimum size of things that get allocated in the heap (many have a "resolution" of sixteen bytes for example).
There are other functions you can use (though they're optional) if you want your code to be more robust, such as strncpy (provided you understand the limitations), or strcpy_s(), as detailed in Appendix K of the ISO C11 standard (and earlier iterations as well).
Or, if you can't use those for some reason, it's up to the developer to ensure they don't break the rules.

gfortran: why "-posinf" leads to Arithmetic overflow?

to get +-inf on 64 bit system i used the next code
double precision, parameter :: pinf = transfer(z'7FF0000000000000',1d0) ! 64 bit
double precision, parameter :: ninf = transfer(z'FFF0000000000000',1d0) ! 64 bit
and it works well.
On 32-bit
I've got an compilation error only(!) for ninf:
double precision, parameter :: ninf = transfer(z'FFF0000000000000',1d0
1
Error: Integer too big for integer kind 8 at (1)
assignment ninf = -pinf not helps and leads to compilation Arithmetic overflow error:
double precision, parameter :: ninf = -pinf
1
Error: Arithmetic overflow at (1)
I know about ieee_arithmetic module but gcc don't handle it.
Is there any multi-architecture way to set constants to positive/negative infinities?
Update
Gfortran option -fno-range-check suppress errors and successfully compile that code.
It's not important but I'm still interesting.
Why gfortran allows constant definition of +Infinity but yelling in loud about exactly the same thing with -Infinity?
In this case gfortran is internally representing your hexadecimal ("Z") literals as the largest unsigned integer size available. Since transfer is a Fortran intrinsic, and Fortran does not have unsigned integers, the first thing gfortran does is to assign the literal to a signed type, which causes your bit pattern for negative infinity to overflow. This happens in many other cases where you use BOZ literals, and I think that this is a bug in gfortran.
I think this only shows up on a 32 bit system because on your 64 bit system, gfortran probably has a 128 bit integer type available; a 128 bit signed integer will not "overflow" with that bit pattern.
But it is also the case that your code does not conform to the Fortran standard, which says that hex literals can only appear inside data statements or the functions int, real, or dble. However, putting a hex literal in dble does the same thing as transfer anyway. If gfortran did not have a bug in it, your program would work, but it would technically be incorrect.
Anyway, the following code works for me in gfortran, and I believe it will solve your problem in a way that's standard-compliant and avoids -fno-range-check:
integer, parameter :: i8 = selected_int_kind(13)
integer, parameter :: r8 = selected_real_kind(12)
integer(i8), parameter :: foo = int(Z'7FF0000000000000',i8)
integer(i8), parameter :: bar = ibset(foo,bit_size(foo)-1)
real(r8), parameter :: posinf = transfer(foo,1._r8)
real(r8), parameter :: neginf = transfer(bar,1._r8)
print *, foo, bar
print *, posinf, neginf
end
Output:
9218868437227405312 -4503599627370496
Infinity -Infinity
The key is to create the pattern for positive infinity first (since it works), and then create the pattern for negative infinity by simply setting the sign bit (the last one). The ibset intrinsic is only for integers, so you then have to use transfer on those integers to set your real positive/negative infinity.
(My use of i8/r8 is just habit, since I've worked with compilers where the kind parameter was not equal to the number of bytes. They are both equal to 8 in this case.)
I'm not using the same compiler as you are (I'm using g95 with compiler option -i4 set for 32-bit integers, and one workaround (if you're staunch about using transfer for that purpose) that I found was to specify the integer argument as a parameter like so:
Note: with my compiler, I was able to assign the number directly to the parameter. I'm not sure if it's the same on yours, but I'm pretty sure that you're only really supposed to use the transfer function when you're not really dealing with constants -- like if you're doing fancy stuff with floating point numbers and need like really nitty gritty control over the representation thereof.
Note the variables pdirect and ndirect.
program main
integer(8), parameter :: pinfx= z'7FF0000000000000'
integer(8), parameter :: ninfx= z'FFF0000000000000'
double precision, parameter :: pinf = transfer(pinfx, 1d0)
double precision, parameter :: ninf = transfer(ninfx, 1d0)
double precision, parameter :: pdirect = z'7FF0000000000000'
double precision, parameter :: ndirect = z'7FF0000000000000'
write (*,*) 'PINFX ', pinfx
write (*,*) 'NINFX ', ninfx
write (*,*) 'PINF ', pinf
write (*,*) 'NINF ', ninf
write (*,*) 'PDIRECT', pdirect
write (*,*) 'NDIRECT', ndirect
end program
This produces the output:
PINFX 9218868437227405312
NINFX -4503599627370496
PINF +Inf
NINF -Inf
PDIRECT +Inf
NDIRECT +Inf
I hope this helps!

Variable Names in SWI Prolog

I have been using the chr library along with the jpl interface. I have a general inquiry though. I send the constraints from SWI Prolog to an instance of a java class from within my CHR program. The thing is if the input constraint is leq(A,B) for example, the names of the variables are gone, and the variable names that appear start with _G. This happens even if I try to print leq(A,B) without using the interface at all. It appears that whenever the variable is processed the name is replaced with a fresh one. My question is whether there is a way to do the mapping back. For example whether there is a way to know that _G123 corresponds to A and so on.
Thank you very much.
(This question has nothing to do with CHR nor is it specific to SWI).
The variable names you use when writing a Prolog program are discarded completely by the Prolog system. The reason is that this information cannot be used to print variables accurately. There might be several independent instances of that variable. So one would need to add some unique identifier to the variable name. Also, maintaining that information at runtime would incur significant overheads.
To see this, consider a predicate mylist/1.
?- [user].
|: mylist([]).
|: mylist([_E|Es]) :- mylist(Es).
|: % user://2 compiled 0.00 sec, 4 clauses
true.
Here, we have used the variable _E for each element of the list. The toplevel now prints all those elements with a unique identifier:
?- mylist(Fs).
Fs = [] ;
Fs = [_G295] ;
Fs = [_G295, _G298] .
Fs = [_G295, _G298, _G301] .
The second answer might be printed as Fs = [_E] instead. But what about the third? It cannot be printed as Fs = [_E,_E] since the elements are different variables. So something like Fs = [_E_295,_E_298] is the best we could get. However, this would imply a lot of extra book keeping.
But there is also another reason, why associating source code variable names with runtime variables would lead to extreme complexities: In different places, that variable might have a different name. Here is an artificial example to illustrate this:
p1([_A,_B]).
p2([_B,_A]).
And the query:
?- p1(L), p2(L).
L = [_G337, _G340].
What names, would you like, these two elements should have? The first element might have the name _A or _B or maybe even better: _A_or_B. Or, even _Ap1_and_Bp2. For whom will this be a benefit?
Note that the variable names mentioned in the query at the toplevel are retained:
?- Fs = [_,F|_], mylist(Fs).
Fs = [_G231, F] ;
Fs = [_G231, F, _G375] ;
Fs = [_G231, F, _G375, _G378]
So there is a way to get that information. On how to obtain the names of variables in SWI and YAP while reading a term, please refer to this question.

Resources