AVR inline-assembly Fsin operand constraints - arduino

i am coding in arduino but i am also using assembly code between my c code. i want to calculate the sin of a value. so far i have this code:
void setup() {
// put your setup code here, to run once:
}
void loop() {
// put your main code here, to run repeatedly:
Serial.begin(9600);
float answer;
float angle = 2;
int a = 2;
int b = 3;
asm("ADD %0,%1" : "+r"(a) : "r"(b));
asm("fsin" : "=t" (answer) : "0" (angle));
Serial.print(answer);
}
the error i get for this is: inconsistent operand constraints in an 'asm'
the funny thing is that i dont get this error when i remove the last line (Serial.print(answer))
also i found this code for a 8086 assembler and not avr and in 8086 "=t" is specific for floats but i cannot find anything similar for avr.

Whatever 8-bit AVR you are using almost certainly does not have an fsin instruction. Since 8086 and AVR are two different architectures, they are going to have different instructions and also instructions with the same name might have different meanings. You cannot expect to copy assembly code from one architecture to another. The 8-bit AVRs do not have any native support for floating point numbers at all; that gets added in software by your compiler.
What you are looking for is the sin function provided by avr-libc. This is just a normal C function that you can call by first adding #include <math.h> to the top of your program and then writing something like answer = sin(angle);.

Related

What is the rule behind instruction count in Intel PIN?

I wanted to count instructions in simple recursive fibo function O(2^n). I succeded to do so with bubble sort and matrix multiplication, but in this case it seemed like instruction count ignored my fibo function. Here is the code used for instrumentation:
// Insert a call at the entry point of a routine to increment the call count
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_rtnCount), IARG_END);
// For each instruction of the routine
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins))
{
// Insert a call to docount to increment the instruction counter for this rtn
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_icount), IARG_END);
}
I started to wonder what's the difference between this program and the previous ones and my first thought was: here I'm not using an array.
This is what I realised after some manual tests:
a = 5; // instruction ignored by PIN and
// pretty much everything not using array
fibo[1] = 1 // instruction counted properly
a = fibo[1] // instruction ignored by PIN
So it seems like only instructions counted are writes to the memory (that's what I assume). After I changed my fibo function to this it works:
long fibonacciNumber(int n, long *fiboNumbers)
{
if (n < 2) {
fiboNumbers[n] = n;
return n;
}
fiboNumbers[n] = fiboNumbers[n-1] + fiboNumbers[n-2];
return fibonacciNumber(n - 1, fiboNumbers) + fibonacciNumber(n - 2, fiboNumbers);
}
But I would like to count instructions also for programs that aren't written by me. Is there a way to count all type of instrunctions? Is there any particular reason why only this instructions are counted? Any help appreciated.
//Edit
I used disassembly option in Visual Studio to check how it looks and it still makes no sense for me. I can't find the reason why only assingment to array is interpreted by PIN as instruction.
instruction_comparison
This exceeded all my expectations, counted as 2 instructions:
even 2 instructions, not one
PIN, like other low-level profiling and analysis tools, measures individual instructions, low-level orders like "add these two registers" or "load a value from that memory address". The sequence of instructions which a program comprises are generally produced from a high-level language like C++ through a compiler. An individual line of C++ code might be transformed into exactly one instruction, but it's also common for a line to translate to several instructions or even to zero instructions; and the instructions for a line of code may be interleaved with those of other instructions.
Your compiler can output an assembly-language file for your source code, showing what instructions were produced for which lines of code. (For GCC and Clang, this is done with the -S flag.) Note that reading the assembly code output from a compiler is not the best way to learn assembly. Also, I would point you to godbolt.org, a very convenient tool for analyzing assembly output.

Julia ccall outb - Problems with libc

I run the following ccall's:
status = ccall((:ioperm, "libc"), Int32, (Uint, Uint, Int32), 0x378, 5, 1)
ccall((:outb, "libc"), Void, (Uint8, Uint16), 0x00, 0x378)
After the second ccall I receive the following Error message:
ERROR: ccall: could not find function outb in library libc
in anonymous at no file
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:285
After some research and messing around I found the following information:
ioperm is in libc, but outb is not
However, both ioperm and outb are defined in the same header file <sys/io.h>
An equivalent version of C code compiles and runs smoothly.
outb in glibc, however on the system glibc is defined as libc
Same problem with full path names /lib/x86_64-linux-gnu/libc.so.6
EDIT:
Thanks for the insight #Employed Russian! I did not look closely enough to realize the extern declaration. Now, all of my above notes make total sense!
Great, we found that ioperm is a libc function that is declared in <sys/io.h>, and that outb is not in libc, but is defined in <sys/io.h> as a volatile assembly instruction.
Which library, or file path should I use?
Implementation of ccall.
However, both ioperm and outb are defined in the same header file <sys/io.h>
By "defined" you actually mean "declared". They are different. On my system:
extern int ioperm (unsigned long int __from, unsigned long int __num,
int __turn_on) __attribute__ ((__nothrow__ , __leaf__));
static __inline void
outb (unsigned char __value, unsigned short int __port)
{
__asm__ __volatile__ ("outb %b0,%w1": :"a" (__value), "Nd" (__port));
}
It should now be obvious why you can call ioperm but not outb.
Update 1
I am still lost as to how to correct the error.
You can't import outb from libc. You would have to provide your own native library, e.g.
void my_outb(unsigned char value, unsigned short port) {
outb(value, port);
}
and import my_outb from it. For symmetry, you should probably implement my_ioperm the same way, so you are importing both functions from the same native library.
Update 2
Making a library worked, but in terms of performance it is horrible.
I guess that's why the original is implemented as an inline function: you are only executing a single outb instruction, so the overhead of a function call is significant.
Unoptimized python does x5 better.
Probably by having that same outb instruction inlined into it.
Do you know if outb exist in some other library, not in libc
That is not going to help: you will still have a function call overhead. I am guessing that when you call the imported function from Julia, you probably execute a dlopen and dlsym call, which would impose an overhead of additional several 100s of instructions.
There is probably a way to "bind" the function dynamically once, and then use it repeatedly to make the call (thus avoiding repeated dlopen and dlsym). That should help.

Can one induce a user-defined error from an OpenCL kernel?

For debugging purposes, I would like to be able to induce an OpenCL error of my choosing from a kernel. My intended use case would be to use this capability like an assertion
__kernel void myKernel(...)
{
...
if(i < j){
InduceOpenCLError(-9999);
}
...
};
Is this possible, and if not, is there any other useful way to include an "assertion" which will obviously induce a runtime error if a certain assumption is not true?
This question is related, but slightly different:
OpenCL: Manually throw an exception in kernel
Unfortunately, this is something missing from OpenCL. As the question you referenced suggested you do have printf to report an error, but even that is kind of clunky and doesn't help you detect an error programmatically.
If you are really set on returning an error codes, I can think of a couple options (none easy).
First, you could pass a buffer to contain all the status values for each work item. After running the kernel, you'd need host code to go through and check the values. You could conditionally include this code as shown below just for debugging. (The following being totally untested.)
#ifndef RETURN_STATUSES
#define RETURN_STATUS(S) \
do { \
_kernel_status[get_global_id(0)] = (S); \
return; \
} while (0)
#else
#define RETURN_STATUS(S) return
#endif
kernel void myKernel(
... normal args
#ifdef RETURN_STATUSES
, global int *_kernel_status
#endif
)
{
...
if (i < j) {
RETURN_STATUS(-9999);
}
}
Another option might be to atomically set a single value. Again, this has significant performance impact and would be good for debug only.
The lack of an efficient way to indicate error in OpenCL kernels is definitely a sore point for me too.

does Qt implement _countof or equivalent?

I really like using the _countof() macro in VS and I'm wondering if there is an OS-generic implementation of this in Qt.
For those unaware, _countof() gives you the number of elements in the array. so,
wchar_t buf[256];
_countof(buf) => 256 (characters)
sizeof(buf) => 512 (bytes)
It's very nice for use with, say, unicode strings, where it gives you character count.
I'm hoping Qt has a generic version.
_countof is probably defined like this:
#define _countof(arr) (sizeof(arr) / sizeof((arr)[0]))
You can use a definition like this with any compiler and OS.
If there is no such macro provided by Qt you can simply define a custom one yourself in one of your header files.
sth's code will work fine, but won't detect when you're trying to get the size of a pointer rather than an array. The MS solution does this (as danielweberdlc says), but it's possible to have this as a standard solution for C++:
#if defined(Q_OS_WIN)
#define ARRAYLENGTH(x) _countof(x)
#else // !Q_OS_WIN
template< typename T, std::size_t N >
inline std::size_t ARRAYLENGTH(T(&)[N]) { return N; }
#endif // !Q_OS_WIN
A more detailed description of this solution is given here.

How can I perform 64-bit division with a 32-bit divide instruction?

This is (AFAIK) a specific question within this general topic.
Here's the situation:
I have an embedded system (a video game console) based on a 32-bit RISC microcontroller (a variant of NEC's V810). I want to write a fixed-point math library. I read this article, but the accompanying source code is written in 386 assembly, so it's neither directly usable nor easily modifiable.
The V810 has built-in integer multiply/divide, but I want to use the 18.14 format mentioned in the above article. This requires dividing a 64-bit int by a 32-bit int, and the V810 only does (signed or unsigned) 32-bit/32-bit division (which produces a 32-bit quotient and a 32-bit remainder).
So, my question is: how do I simulate a 64-bit/32-bit divide with a 32-bit/32-bit one (to allow for the pre-shifting of the dividend)? Or, to look at the problem from another way, what's the best way to divide an 18.14 fixed-point by another using standard 32-bit arithmetic/logic operations? ("best" meaning fastest, smallest, or both).
Algebra, (V810) assembly, and pseudo-code are all fine. I will be calling the code from C.
Thanks in advance!
EDIT: Somehow I missed this question... However, it will still need some modification to be super-efficient (it has to be faster than the floating-point div provided by the v810, though it may already be...), so feel free to do my work for me in exchange for reputation points ;) (and credit in my library documentation, of course).
GCC has such a routine for many processors, named _divdi3 (usually implemented using a common divmod call). Here's one. Some Unix kernels have an implementation too, e.g. FreeBSD.
If your dividend is unsigned 64 bits, your divisor is unsigned 32 bits, the architecture is i386 (x86), the div assembly instruction can help you with some preparation:
#include <stdint.h>
/* Returns *a % b, and sets *a = *a_old / b; */
uint32_t UInt64DivAndGetMod(uint64_t *a, uint32_t b) {
#ifdef __i386__ /* u64 / u32 division with little i386 machine code. */
uint32_t upper = ((uint32_t*)a)[1], r;
((uint32_t*)a)[1] = 0;
if (upper >= b) {
((uint32_t*)a)[1] = upper / b;
upper %= b;
}
__asm__("divl %2" : "=a" (((uint32_t*)a)[0]), "=d" (r) :
"rm" (b), "0" (((uint32_t*)a)[0]), "1" (upper));
return r;
#else
const uint64_t q = *a / b; /* Calls __udivdi3 in libgcc. */
const uint32_t r = *a - b * q; /* `r = *a % b' would use __umoddi3. */
*a = q;
return r;
#endif
}
If the line above with __udivdi3 doesn't compile for you, use the __div64_32 function from the Linux kernel: https://github.com/torvalds/linux/blob/master/lib/div64.c

Resources