What's the name of the popcount function in Julia? - julia

What's the name of the function that tells you how many bits are set in some variable? This surely already exists in Base or maybe some standard library.

To quote Keno Fischer...
Try count_ones. As you can see it uses the popcnt instruction:
julia> code_native(count_ones,(Int64,))
.section __TEXT,__text,regular,pure_instructions
Filename: int.jl
Source line: 192
push RBP
mov RBP, RSP
Source line: 192
popcnt RAX, RDI
pop RBP
ret
Is your question in any way related to the Hacker News buzz about Replacing a 32-bit loop count variable with 64-bit introduces crazy performance deviations?

Related

x86 assembly, moving data from an array to a register

Ive been going over the book over and over again and cannot understand why this is giving me "improper operand type". It should work!
This is inline assembly in Visual Studio.
function(unsigned int* a){
unsigned int num;
_asm {
mov eax, a //This stores address (start of the array) in eax
mov num, dword ptr [eax*4] //This is the line I am having issues with.
That last line, I am trying to store the 4 byte value that is in the array. But I get error C2415: improper operand type
What am I doing wrong? How do I copy 4 byte value from an array into a 32 bit register?
In Visual C++'s inline assembly, all variables are accessed as memory operands1; in other words, wherever you write num you can think that the compiler will replace dword ptr[ebp - something].
Now, this means that in the last mov you are effectively trying to perform a memory-memory mov, which isn't provided on x86. Use a temporary register instead:
mov eax, dword ptr [a] ; load value of 'a' (which is an address) in eax
mov eax, dword ptr [eax] ; dereference address, and load contents in eax
mov dword ptr [num], eax ; store value in 'num'
Notice that I removed the *4, as it doesn't really make sense to multiply a pointer by four - maybe you meant to use a as base plus some other index?
1 Other compilers, such as gcc, provide means to control way more finely the interaction between inline assembly and compiler generated code, which provides great flexibility and power but has quite a steep learning curve and requires great care to get everything right.

x86 Assembly pointers

I am trying to wrap my mind around pointers in Assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax], ebx
and when should dword ptr [eax] should be used?
Also when I try to do mov eax, [ebx] I get a compile error, why is this?
As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.
So, this:
mov eax, ebx
simply copies the value in ebx into eax. In a pseudo-C notation, this would be: eax = ebx.
Whereas this:
mov eax, [ebx]
dereferences the contents of ebx and stores the pointed-to value in eax. In a pseudo-C notation, this would be: eax = *ebx.
Finally, this:
mov [eax], ebx
stores the value in ebx into the memory location pointed to by eax. Again, in pseudo-C notation: *eax = ebx.
The registers here could also be replaced with memory operands, such as symbolic variable names. So this:
mov eax, [myVar]
dereferences the address of the variable myVar and stores the contents of that variable in eax, like eax = myVar.
By contrast, this:
mov eax, myVar
stores the address of the variable myVar into eax, like eax = &myVar.
At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.
To get the address of a variable in MASM, you would use the OFFSET keyword:
mov eax, OFFSET myVar
However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.
Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar is an int, you would do:
mov eax, DWORD PTR [myVar] ; eax = myVar
or
mov DWORD PTR [myVar], eax ; myVar = eax
This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar is a DWORD-sized memory location.
You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al and ah are always BYTE-sized, ax is always WORD-sized, eax is always DWORD-sized, and rax is always QWORD-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.
Also when I try to do mov eax, [ebx] I get a compile error, why is this?
Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:
mov eax, DWORD PTR [ebx]
and means that the memory location pointed to by ebx will be dereferenced and that DWORD-sized value will be loaded into eax.
why I cant do mov a, [eax] Should that not make "a" a pointer to wherever eax is pointing?
No. This combination of operands is not allowed. As you can see from the documentation for the MOV instruction, there are essentially five possibilities (ignoring alternate encodings and segments):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
Notice that there is no mov memory, memory, which is what you were trying.
However, you can make a point to what eax is pointing to by simply coding:
mov DWORD PTR [a], eax
Now a and eax have the same value. If eax was a pointer, then a is now a pointer to that same memory location.
If you want to set a to the value that eax is pointing to, then you will need to do:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
I realize this is all somewhat confusing. The mov instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.

How does interaction with computer hardware look in Lisp?

In C, it is easy to manipulate memory and hardware registers, because concepts such as "address" and "volatile" are built into the language. Consequently, most OSs are written in the C family of languages. For example, I can copy an arbitrary function to an arbitrary location in memory, then call that location as a function (assuming the hardware doesn't stop me from executing data, of course; this would work on certain microcontrollers).
int hello_world()
{
printf("Hello, world!");
return 0;
}
int main()
{
unsigned char buf[1000];
memcpy(buf, (const void*)hello_world, sizeof buf);
int (*x)() = (int(*)())buf;
x();
}
However, I have been reading about the Open Genera operating system for certain dedicated Lisp machines. Wikipedia says:
Genera is written completely in Lisp; even all the low-level system code is written in Lisp (device drivers, garbage collection, process scheduler, network stacks, etc.)
I am completely new to Lisp, but this seems like a difficult thing to do: Common Lisp, from what I've seen, doesn't have good abstractions for the hardware it's running on. How would Common Lisp operating systems do something basic such as compile the following trivial function, write its machine code representation to memory, then call it?
(defun hello () (format t "Hello, World!"))
Of course, Lisp can be easily implemented in itself, but in the words of Sam Hughes, "somewhere down the line, abstraction runs out and a machine has to execute an instruction."
The Lisp machine was a computer hardware with a CPU just like modern machines today, only the CPU had some special instructions that mapped better to LISP. It still was a stack machine and it compiled it's source to CPU instructions just as modern Common Lisp implementations do today on more general CPUs.
In the Lisp machines wikipedia page you can see how a function gets compiled:
(defun example-count (predicate list)
(let ((count 0))
(dolist (i list count)
(when (funcall predicate i)
(incf count)))))
(disassemble (compile #'example-count))
0 ENTRY: 2 REQUIRED, 0 OPTIONAL ;Creating PREDICATE and LIST
2 PUSH 0 ;Creating COUNT
3 PUSH FP|3 ;LIST
4 PUSH NIL ;Creating I
5 BRANCH 15
6 SET-TO-CDR-PUSH-CAR FP|5
7 SET-SP-TO-ADDRESS-SAVE-TOS SP|-1
10 START-CALL FP|2 ;PREDICATE
11 PUSH FP|6 ;I
12 FINISH-CALL-1-VALUE
13 BRANCH-FALSE 15
14 INCREMENT FP|4 ;COUNT
15 ENDP FP|5
16 BRANCH-FALSE 6
17 SET-SP-TO-ADDRESS SP|-2
20 RETURN-SINGLE-STACK
This is then stored in some memory place and when running this function it just jumps or calls to this. As with any assembly code the CPU gets instructed to continue running some other code when it's done running this and it may be the Lisp main loop itself (REPL).
The same code compiled with SBCL:
; Size: 203 bytes
; 02CB9181: 48C745E800000000 MOV QWORD PTR [RBP-24], 0 ; no-arg-parsing entry point
; 189: 488B4DF0 MOV RCX, [RBP-16]
; 18D: 48894DE0 MOV [RBP-32], RCX
; 191: 660F1F840000000000 NOP
; 19A: 660F1F440000 NOP
; 1A0: L0: 488B4DE0 MOV RCX, [RBP-32]
; 1A4: 8D41F9 LEA EAX, [RCX-7]
; 1A7: A80F TEST AL, 15
; 1A9: 0F8598000000 JNE L2
; 1AF: 4881F917001020 CMP RCX, 537919511
; 1B6: 750A JNE L1
; 1B8: 488B55E8 MOV RDX, [RBP-24]
; 1BC: 488BE5 MOV RSP, RBP
; 1BF: F8 CLC
; 1C0: 5D POP RBP
; 1C1: C3 RET
; 1C2: L1: 488B45E0 MOV RAX, [RBP-32]
; 1C6: 488B40F9 MOV RAX, [RAX-7]
; 1CA: 488945D8 MOV [RBP-40], RAX
; 1CE: 488B45E0 MOV RAX, [RBP-32]
; 1D2: 488B4801 MOV RCX, [RAX+1]
; 1D6: 48894DE0 MOV [RBP-32], RCX
; 1DA: 488B55F8 MOV RDX, [RBP-8]
; 1DE: 4883EC18 SUB RSP, 24
; 1E2: 48896C2408 MOV [RSP+8], RBP
; 1E7: 488D6C2408 LEA RBP, [RSP+8]
; 1EC: B902000000 MOV ECX, 2
; 1F1: FF1425B80F1020 CALL QWORD PTR [#x20100FB8] ; %COERCE-CALLABLE-TO-FUN
; 1F8: 488BC2 MOV RAX, RDX
; 1FB: 488D5C24F0 LEA RBX, [RSP-16]
; 200: 4883EC18 SUB RSP, 24
; 204: 488B55D8 MOV RDX, [RBP-40]
; 208: B902000000 MOV ECX, 2
; 20D: 48892B MOV [RBX], RBP
; 210: 488BEB MOV RBP, RBX
; 213: FF50FD CALL QWORD PTR [RAX-3]
; 216: 480F42E3 CMOVB RSP, RBX
; 21A: 4881FA17001020 CMP RDX, 537919511
; 221: 0F8479FFFFFF JEQ L0
; 227: 488B55E8 MOV RDX, [RBP-24]
; 22B: BF02000000 MOV EDI, 2
; 230: 41BBF0010020 MOV R11D, 536871408 ; GENERIC-+
; 236: 41FFD3 CALL R11
; 239: 488955E8 MOV [RBP-24], RDX
; 23D: E95EFFFFFF JMP L0
; 242: CC0A BREAK 10 ; error trap
; 244: 02 BYTE #X02
; 245: 19 BYTE #X19 ; INVALID-ARG-COUNT-ERROR
; 246: 9A BYTE #X9A ; RCX
; 247: L2: CC0A BREAK 10 ; error trap
; 249: 02 BYTE #X02
; 24A: 02 BYTE #X02 ; OBJECT-NOT-LIST-ERROR
; 24B: 9B BYTE #X9B ; RCX
NIL
Not quite as few instructions is it. When running this function that is the machine code that gets control and it gives the control back to the system since the return address is perhaps the REPL or next instruction just like with compiled C.
A special thing about lisps in general is that lexical closures need to be handled. In C when a call is done the variables don't exist anymore, but in Lisps it may return or store a function that use those variables at a later time and that is no longer in scope. This means variables need to be handled almost as inefficient as in interpreted code in compiled code, especially with a old compiler that doesn't do much optimization.
A C compiler does it fare translating as well or else what would be the reason for programming C than in assembly? The Intel x86 processors doesn't have support for arguments in procedure calls. It is emulated by the C compiler. The caller sets values on the stack and it has a cleanup where it undoes it afterward. looping constructs such as for and while doesn't exist. Only branch/jmp. Yes, in C you get a more feel for the underlying hardware but it really isn't the same as machine code. It only leaks more.
A Lisp implementation as OS can have features such as low level assembly instructions as lisp opcodes. Compilation would then be to translate everything to low level lisp, then it's a 1:1 from those to machince bytes.
An operating system with a c library and a c compiler together does the very same thing. It runs translation to machine code and can then run the code in itself. This is how Lisp systems are meant to work too so the only thing you need is the API to hardware that can be as low level as memory mapping I/O etc.
Even without abstraction lisp can emit assembler. See
Movitz's network code
An ARM assembler
But it can also be used to create a thin but powerful abstraction over machine code. See Henry Bakers's Comfy Compiler
Finally check SBCL VOP's (example), they allow you to control what assembly code. Altough with virtual registers, as this happens before register allocation.
You may find this post interesting, as it deals with how to emit assembly from SBCL.
Btw, even though you can write drivers and such in lisp, it is not a good idea to needlessly duplicate the effort, so even Lisp implementations in Lisp, like SBCL, have some C parts to allow interfacing with the OS.
These C header files, along with the C source and assembly files, are
then used (figure 2) to produce the sbcl executable itself. The
executable is as yet not useful; while it provides an interface to the
operating system services, and a garbage collector
Taken from seccion 3.2 from SBCL: a Sanely-Bootstrappable Common Lisp
I haven't checked out how Mezzano works, feel free to dig into it.
Lisp Machines have a few low-level internal functions that allow them to access memory and hardware registers directly. These are used in the guts of the operating system.

strcmp and strcmp_sse functions in libc

I've seen that in libc.so the actual type of strcmp_sse to call is decided by the function strcmp itself.
Here it is the code:
strcmp:
.text:000000000007B9F0 cmp cs:__cpu_features.kind, 0
.text:000000000007B9F7 jnz short loc_7B9FE
.text:000000000007B9F9 call __init_cpu_features
.text:000000000007B9FE
.text:000000000007B9FE loc_7B9FE: ; CODE XREF: .text:000000000007B9F7j
.text:000000000007B9FE lea rax, __strcmp_sse2_unaligned
.text:000000000007BA05 test cs:__cpu_features.cpuid._eax, 10h
.text:000000000007BA0F jnz short locret_7BA2B
.text:000000000007BA11 lea rax, __strcmp_ssse3
.text:000000000007BA18 test cs:__cpu_features.cpuid._ecx, 200h
.text:000000000007BA22 jnz short locret_7BA2B
.text:000000000007BA24 lea rax, __strcmp_sse2
.text:000000000007BA2B
.text:000000000007BA2B locret_7BA2B: ; CODE XREF: .text:000000000007BA0Fj
.text:000000000007BA2B ; .text:000000000007BA22j
.text:000000000007BA2B retn
What I do not understand is that the address of the strcmp_sse function to call is placed in rax and never actually called. Therefore I am wondering: who is going do call *rax? When?
Linux dynamic linker supports a special symbol type called STT_GNU_IFUNC. Strcmp is likely implemented as an IFUNC. 'Regular' symbols in a dynamic library are nothing more but a mapping from a name to the address. IFUNCs are a bit more complex than that: the address isn't readily available, in order to obtain it the linker must execute a piece of code from the library itself. We are seeing an example of such a peice of code here. Note that in x86_64 ABI a function returns the result in RAX.
This technique is typically used to pick the optimal implementation based on the CPU features. Please note that the selection logic runs only once; all but the first call to strcmp are fast.

Indirect Register Addressing

I am trying to figure out how register indirect addressing works. I have a variable that stores the value of 5 as follows:
section .data
number db 53 ; ascii code for 5
section .bss
number2 resb 1
section .text
global _start
_start:
mov eax,number
mov number2,[eax]
At the last two lines of the code what I am essentially trying to do is made eax act like a pointer to the data stored at number and then move this data into the number2 variable. I had though indirect register addressing was done via [register] but my code does not seem to work. Any help with regards to syntax would be much appreciated.
Labels work as addresses in nasm so your mov number2, [eax] would translate to something like mov 0x12345678, [eax] which is of course invalid because you cannot move data to immediate operand. So you would need mov [number2], [eax] but that's also invalid.
You can achieve this using some register to temporarily hold the value [eax]:
mov eax, number
mov dl, [eax]
mov [number2], dl
The problem here is, that number and number2 are not numbers, i.e. immediate literals. Instead they are interpreted as absolute memory addresses and the corresponding instructions, if they would exist would be e.g.
mov eax, [0x80000100] ;; vs
mov [0x80000104], [eax] ;; Invalid instruction
One has to pay attention to the instruction format as well, as answered by Mika Lammi -- is the instruction
mov src, dst ;; vs
mov dst, src
In addition, one should match the register size to the variable size; i.e
.data
number db 1; // this is a byte
.code
mov al, number

Resources