Using assembly code written for 32-bit in 64-bit application - encryption

Can I use the assembly routines for Serpent encryption in the link below written for 32-bit x86 from a 64-bit program on an x86-64 machine? That is, without launching a separate 32-bit process for it? If not, does anyone have a pointer to an optimized implementation of Serpent that works in both 32 and 64 bit (LGPL is OK but cannot use GPL since it's a commercial project)?
http://gladman.plushost.co.uk/oldsite/cryptography_technology/serpent/serpent.asm

You will need to convert the portions of the code that transfer the results and data to/from memory to use 64-bit address registers. Also, stack manipulation code will need to use the 64-bit stack registers. Other than that, it's likely to work without major changes.

This code seems compatible, at least for me (generally, IA32 assembly is pretty backward-compatible, as sizes of registers do not change, there are just new ones added on x86-64). Also, best way is to check it by yourself.

Related

Possible to link 64 bit library to 32 bit application?

Actually I want to link 64 bit library to my 32 bit application.
I want to use a library which works faster under 64 bits under some circumstances. But i have to link that library to my 32 bit application . Is it possible or not ??
In a word, no. The only way to get compiled 64bit code talking to compiled 32bit code is via some form of IPC (e.g. pipe, named pipe, or network connection). That may well introduce performance bottlenecks of its own, so probably isn't worth the bother.
It is not easy as #robthebloke mentioned. But NVIDIA RTX Remix Runtime does it somehow, making older 32bit games running on 64-bit vulkan driver (bypassing 2-4GB virtual memory limit).

From a programming point of view, what does it mean when a program is 32 or 64 bit?

I'm a beginner programmer in my first year of Computer Science.
I'm curious about the 32 bit and 64 bit systems, and how it affects developing software.
When I download software I need to choose between the two, while other software only has a 32 bit version.
Are there different ways of programming for a 64 bit system?
Is it compiled in the same way?
What are the main benefits of a separate 64 bit app?
Cheers
Are there different ways of programming for a 64 bit system?
Yes and no. No, in the sense that most of the time you should be able to write platform-independent code, even if you are coding in a language like C. Yes, in the sense that having knowledge of the underlying architecture (not just the word size!) helps to speed up critical parts of your program. For instance, you may be able to use special instructions available.
Is it compiled in the same way?
Again, yes and no. Compilers for systems languages work in similar ways for all architectures, but of course, the details differ a bit. For instance, the compiler will use knowledge about your architecture to generate as efficient code as possible for it, but also has to take care of differences between architectures and other details, like calling conventions.
What are the main benefits of a separate 64 bit app?
I assume you are asking about the usual desktop CPUs, i.e. x86 architecture, but note that there are other architectures with word sizes ranging from 8-bit to 128-bit. Typically, people would compile a program targeting a single architecture (i.e. for a given machine), and that's about it.
However, x86 is a bit special, in that the CPU can operate in different modes, each with a different word size: 16-bit, 32-bit and 64-bit (among other differences). Effectively, they implement several ISAs (Instruction Set Architectures) in a single CPU.
This was done to preserve backwards compatibility, and it is key to their commercial success. Consider that, when people bought the first 64-bit capable CPUs, it was most likely that they were still using 32-bit operating systems and software, so they really needed the compatibility. The other options are emulating it (poor performance) or making sure all the popular customer software has been ported (hard to achieve in ecosystems like Windows with many independent, proprietary vendors).
There are several benefits of 64-bit x86 over 32-bit x86: more addressable memory, more integer registers, twice the XMM registers, a better calling convention, guaranteed SSE2... The only downside is using 64-bit pointers, which implies more memory and cache usage. In practice, many programs can expect to be slightly faster in x64 (e.g. 10%), but pointer-heavy programs may even see a decrease in performance.
Generally speaking the main benefit of 64 bit application is that it has access to more memory. Having 32 bit pointer you can access only 4GB of memory.
Most modern compilers have option to compile either 32 bit or 64 bit code.
32/64 coding is the same unless you are dealing with huge in-memory objects, where you would need to use 64 bit specifically.
An interesting fact/example is that Unix time is stored as a single number. It is calculated as a number of seconds passed from January 1st 1970. This number will soon reach 32-bit size, so eventually we will have to upgrade all of our systems to 64-bit so they can hold such a large number.

Java JDK 32 bits vs 64 bits

I am creating a quite simple application which reads and display text files and search through them.
I am asking myself if there is any interest for me to propose 32 and 64 bits version to the user.
Is the difference only in having access to more memory heap size with the 64 bit version or is there any other interest ?
Will a 32 bit compiled program work on a 64 bits JVM (I assume yes)
The only differences between 32-bit and 64-bit builds of any program are the sizes of machine words, the amount of addressable memory, and the Operating System ABI in use. With Java, the language specification means that the differences in machine word size and OS ABI should not matter at all unless you're using native code as well. (Native code must be built to be the same as the word-size of the JVM that will load it; you can't mix 32-bit and 64-bit builds in the same process without very exotic coding indeed, and you shouldn't be doing that with Java about.)
The only times that have swung it for me is when there have been native libraries involved that have pushed it one way or the other. If you're just in Java land then realistically, unless you need >4GB of heap size, there's very little difference.
EDIT: The differences include things like it uses slightly more memory than 32 bit, significantly more if you're using a version before 6u23 and aren't using -XX:+UseCompressedOops. There may also be a slight performance difference between the two, but again nothing huge.

Assembly language standard

Is there a standard that defines the syntax and semantics of assembly language? Similarly as language C has ISO standard and language C# has ECMA standard? Is there only one standard, or are there more of them?
I'm asking because I noticed that assembly language code looked different on Windows and Linux environment. I hoped that assembly language is not dependent on OS, that it's only language with some defined standard and via assembler (compiler of assembly language) is translated into machine instructions for particular processor.
thank you for answer
Yes, there is a standard.
People that built assemblers even up til the 1980s chose an incredible variety of syntax schemes.
The IEEE community reacted with a standard to try to avoid that problem:
694-1985 - IEEE Standard for Microprocessor Assembly Language
As with many things in the software world, it was and continues to be largely ignored.
The closest thing to a standard is that the vendor that created the processor/instruction set will have a document describing that language and often that vendor will provide some sort of an assembler (program). Some vendors are more detail and standard oriented than others so you get what you get. Then things like this intel/at&t happen to mess things up. Add to that gnu assembler loves to mess up the assembly language for the chips it supports as well so in general you have chaos.
If there were an assembly language whose use were comparable to C or C++ then you would expect an organization to try to come up with a standard. Part of the problem would still be that with things like the C language there is an interpretation before it hits the hardware, with assembler there is none to very little so a chip vendor is going to make whatever they want to make due to market factors and the standard would have to be dragged along to match the hardware, instead of the other way around where a standard drives the vendors.
The opencore processor might be one that could be standards driven since it is not vendor specific, perhaps it is already.
With assembly assume that each version of each assembler program/software/tool has its own syntax rules within the same instruction set as well as across different instruction sets. (which is actually what you get with C/C++ but that is another topic) either choose your favorite tool and only know it, or try to memorize all the variations across all the tools, or my preference is to try to avoid as many tool specific syntax and nuances, and try to find the middle ground that works or at least has a chance to work or port across tools.
No, there is no standard.
There are even two different types of syntax: the intel-syntax which is predominant on Windows plattforms and the AT&T-sytanx which is dominant in the *nix-world.
Regarding the differently looking code in the wikipedia: the windows example uses the Win32API and the linux example uses a system call of the 0x80 interrupt.
Assembly languages differ from processor to processor so no, there is no standard.
In general, the "standard" assembly language for a particular family of processor is whatever the processor designers say it is. For example, the "standard" syntax for x86 is whatever Intel says it is. However, that doesn't prevent other people from creating a variant of the assembly language that targets the processor with slightly different syntax or additional features (Nasm is one example).
Well, I'm not sure if you are asking about syntax for x86 processors (I suppose yes, because you're mentioning NASM).
But there are two common standards:
Intel syntax that was originally used for documentation of the x86 platform
AT&T syntax which is common in Linux/Unix worlds.
NASM you have mentioned prefers the Intel syntax.
You can find some examples of the syntax differences in this article: http://www.ibm.com/developerworks/linux/library/l-gas-nasm/index.html.
There's none because there are many different CPUs with different instructions and other peculiarities and it's entirely up to their designer what syntax to use and how to name things. And there's little need to standardize that because assembly code is inherently unportable and needs to be rewritten for a different CPU anyway.
Assembly language is not OS-specific per se, it's CPU-specific, but for an assembly routine to access things that appear standard to you (e.g. some subroutine to print text in the console) OS-specific code is needed. For MSDOS you'd use BIOS and DOS interrupt service routines (invokable on the x86 CPU through int 13h, int 10h, int 21h, int 33h, etc instructions), for Windows you'd use Windows' (available through int 2eh and sysenter/syscall instructions), for Linux you'd use Linux' (e.g. int 80h). All of them are implemented differently in different OSes and expect different number and kinds of parameters and in different places (registers or memory). You can't standardize this part. The only thing you can do about it is build a compatibility/abstraction layer on top of the OS functionality so it looks the same from your assembly routines' point of view.
Assembly syntax / language depends on CPU rather then OS. For the x86 CPU family there are however two syntax's AT&T (used by Unix like operating systems by default) and Intel (used by Windows and DOS etc.)
However the two assembly examples on the wiki are both doing different things. The windows example uses the WIN32 API and to show a message box, so all function arguments are pushed onto the stack in reversed order and then calls the function MessageBox() which on his turn creates the messagebox.
The linux example uses the write syscall to write a string to stdout. Here all 'arguments' are stored in the registers and then the int 0x80 creates an 'interrupt' now the OS is entering kernel land and the kernel prints the string to stdout.
The linux assemly could be rewritten like:
section .data
msg: db "Hello, world!", 10
.len: equ $ - msg
section .text
extern write
extern exit
global _start
_start:
push msg.len
push msg
push dword 1
call write
push dword 0
call exit
The above assembly must be linked against libc and then this will call write in libc which on his turn executes exactly the same code as the example on the wiki.
Another thing to note, is that Windows and Unix like operating system use different file formats in there libraries and applications.
Unix like systems use ELF http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and windows uses PE http://en.wikipedia.org/wiki/Portable_Executable
This is why you see different sections in the assemblies on the wiki page.

Can C/C++ software be compiled into bytecode for later execution? (Architecture independent unix software.)

I would want to compile existing software into presentation that can later be run on different architectures (and OS).
For that I need a (byte)code that can be easily run/emulated on another arch/OS (LLVM IR? Some RISC assemby?)
Some random ideas:
Compiling into JVM bytecode and running with java. Too restricting? C-compilers available?
MS CIL. C-Compilers available?
LLVM? Can Intermediate representation be run later?
Compiling into RISC arch such as MMIX. What about system calls?
Then there is the system call mapping thing, but e.g. BSD have system call translation layers.
Are there any already working systems that compile C/C++ into something that can later be run with an interpreter on another architecture?
Edit
Could I compile existing unix software into not-so-lowlevel binary, which could be "emulated" more easily than running full x86 emulator? Something more like JVM than XEN HVM.
There are several C to JVM compilers listed on Wikipedia's JVM page. I've never tried any of them, but they sound like an interesting exercise to build.
Because of its close association with the Java language, the JVM performs the strict runtime checks mandated by the Java specification. That requires C to bytecode compilers to provide their own "lax machine abstraction", for instance producing compiled code that uses a Java array to represent main memory (so pointers can be compiled to integers), and linking the C library to a centralized Java class that emulates system calls. Most or all of the compilers listed below use a similar approach.
C compiled to LLVM bit code is not platform independent. Have a look at Google portable native client, they are trying to address that.
Adobe has alchemy which will let you compile C to flash.
There are C to Java or even JavaScript compilers. However, due to differences in memory management, they aren't very usable.
Web Assembly is trying to address that now by creating a standard bytecode format for the web, but unlike the JVM bytecode, Web Assembly is more low level, working at the abstraction level of C/C++, and not Java, so it's more like what's typically called an "assembly language", which is what C/C++ code is normally compiled to.
LLVM is not a good solution for this problem. As beautiful as LLVM IR is, it is by no means machine independent, nor was it intended to be. It is very easy, and indeed necessary in some languages, to generate target dependent LLVM IR: sizeof(void*), for example, will be 4 or 8 or whatever when compiled into IR.
LLVM also does nothing to provide OS independence.
One interesting possibility might be QEMU. You could compile a program for a particular architecture and then use QEMU user space emulation to run it on different architectures. Unfortunately, this might solve the target machine problem, but doesn't solve the OS problem: QEMU Linux user mode emulation only works on Linux systems.
JVM is probably your best bet for both target and OS independence if you want to distribute binaries.
As Ankur mentions, C++/CLI may be a solution. You can use Mono to run it on Linux, as long as it has no native bits. But unless you already have a code base you are trying to port at minimal cost, maybe using it would be counter productive. If it makes sense in your situation, you should go with Java or C#.
Most people who go with C++ do it for performance reasons, but unless you play with very low level stuff, you'll be done coding earlier in a higher level language. This in turn gives you the time to optimize so that by the time you would have been done in C++, you'll have an even faster version in whatever higher level language you choose to use.
The real problem is that C and C++ are not architecture independent languages. You can write things that are reasonably portable in them, but the compiler also hardcodes aspects of the machine via your code. Think about, for example, sizeof(long). Also, as Richard mentions, there's no OS independence. So unless the libraries you use happen to have the same conventions and exist on multiple platforms then it you wouldn't be able to run the application.
Your best bet would be to write your code in a more portable language, or provide binaries for the platforms you care about.

Resources