executing SPI command in function in under 1ms - 8051 - microcontroller

Currently I have an AT89C2051 microcontroller hooked up to an ISD soundchip through a multiplexer-demultiplexer setup. I have other things too but my focus is making sound execute as fast as possible. Currently the speed of the chip is 3.6Mhz since another microcontroller is driving this microcontroller.
Based on documentation and experimentation, The sound chip requires 7 bytes to be sent to it in order for me to make it play sound between any two ranges of memory. The part that takes the time is transmitting the seven bytes.
This is the code I have so far that works:
FLUSH bit P3.7 ;Low=enable data reception
ENXMIT bit P3.5 ;High=Enable data transmission
GLOBALCLK bit P3.1 ;TXD: clock (connects to soundcard clock)
GLOBALDAT bit P3.0 ;RXD: I/O data line (connects to MISO and MOSI)
C_SND2 = address of soundcard 2
C_SND = address of soundcard 1
O_SND:
setb FLUSH ;disable reception
clr ENXMIT ;disable transmission
mov R7,A ;Parameter in: Accumulator = # bytes to transfer out.
mov A,#C_SND2 ;A=address of soundcard 2
mov R6,#C_SND ;R6=address of soundcard 1
jnb SS,nc1 ;Parameter in: SS = soundcard to use.
xch A,R6 ;Switch A + R6 if other soundcard is wanted.
nc1:
;NOTE: soundcard Slave select lines are connected together through an inverter.
mov P1,R6 ;Enable wrong soundcard (to disable the correct one)
mov R0,#BUFOUT ;Set data space pointer
mov P1,A ;Now enable only the correct soundcard
setb ENXMIT ;Enable data transmission
tx2:
mov A,#R0 ;Load a byte from our data space
;This fragment executes 8x but I only showed it one time here.
;I avoided loops. DJNZ requires two clock cycles (7uS) to process command.
clr FLUSH ;Enable data input **
setb GLOBALDAT ;Set data to high impedance so input can be captured **
clr GLOBALCLK ;Lower clock line to accept bit input **
mov C,GLOBALDAT ;Get incoming bit
setb FLUSH ;Disable data input
rrc A ;store incoming bit and load next output bit
mov GLOBALDAT,C ;set data line to bit
setb GLOBALCLK ;raise clock so soundcard accepts bit
;end of repeating fragment
mov #R0,A ;save what soundcard sent us to our data space
inc R0 ;increment pointer
djnz R7,tx2 ;Keep going until all bytes are processed
clr ENXMIT ;Disable further transmissions
setb GLOBALDAT ;Set data line to high
mov P1,R6 ;reset the SS line to tell soundcard we're done.
;Save audio statuses to RAM
mov AUDSTATL,BUFOUT
mov AUDSTAT,BUFOUT+2
ret
As you can see, the data line (RXD) from the microcontroller is shared across every data line in the system through multiplexers/demultiplexers. This means that I need to make the line only unidirectional (not bi-directional) by enabling reception and transmitting nothing when I want to receive data.
I called the receive enable "FLUSH" because it also flushes other output lines which are out of the scope of this question.
Now what I want to try to do is make this code fragment execute much faster.
So I'm looking at these lines:
clr FLUSH ;Enable data input **
setb GLOBALDAT ;Set data to high impedance so input can be captured **
clr GLOBALCLK ;Lower clock line to accept bit input **
and thought instead of consecutive clear and setb statements on individual pins on the same port, I could use ANL or ORL but then if I did it direct on the port, the result might not update correctly due to the behaviour of the 8051.
Is there any other way I can modify the repetitive code to make the thing run faster?
I already did save at least 380 microseconds (6.5 microseconds per removal of DJNZ multiplied by the usage of it 8 times for a byte + 1 to load counter variable for DJNZ + other commands in loop then multiplied by bytes to process command (7 bytes))
But I want to save more than that.
Any ideas?
Except that I don't plan to remove the outer loop because doing that will increase the need for rom space substantially more and I don't have too much free rom space left.

It is possible to use two different port pins for FLUSH and ENXMIT. By doing so, you can go for ANL or ORL on the port directly.

Related

scanning multiple serial data bits reliably - 8051

My hardware currently has four sets of sensors that I treat as four separate serial ports with receive functionality only enabled wired to the lower 4 bits of port 0. I have attempted numerous times to retrieve the correct serial port data (by aiming the lazer direct at the sensor) without success. I then researched that for more reliability, on a standard UART, each bit is sampled at 16x a second (I found this 3/4 down the page on https://www.allaboutcircuits.com/technical-articles/back-to-basics-the-universal-asynchronous-receiver-transmitter-uart/).
So I ended up rolling off my own version of that but due to my timings, my count is more like 32x a second, but that's ok.
I'm going to explain what I did first so everyone understands what is going on.
code explanation
I have four consecutive address locations setup to point to values of counters for each bit. Four bits are read simultaneously from hardware and a counter for that bit goes up or down based on whether that bit is set (light detected on that group of sensors) or clear (light not detected). This loop executes frequently at about a 9600bps speed.
The second loop only executes when a value is needed. This happens once every 16 times that the last loop executes (more like at a 600bps speed). It takes the counter value of each bit as if it was a signed number and uses the MSB value as the final value of that bit. Those MSB values get crammed together to form the official bit read from the sensors.
Is this approach OK to reliably determine whether the bit value is set or cleared?
And could I somehow redo this code so the processes run faster? because each loop consumes a large number of clock cycles (32 to 40) and if I can get it down to maybe 20 clock cycles, I'd be happy.
Also, this code is executed on AT89S52 microcontroller so I'm using its extended memory addresses.
the code
;memory is preinitialized to nulls
LAZMAJ equ 0E0h ;majority counters start address (end address at 0E4h)
MAJT equ 20h ;Majority value at bit address
mov A,P0 ;get bit values from hardware
mov R1,#LAZMAJ ;go to start of pointer
;loop uses 40 clock cycles out of 192 available
countmaj:
rrc A ;get bit
jnc noincmaj
inc #R1 ;bit is set so add 1 to counter for that bit
noincmaj:
jc incmaj
dec #R1 ;bit is clear so subtract 1 from counter for that bit
incmaj:
inc R1 ;move pointer to next bit
cjne R1,#LAZMAJ+4,countmaj ;see if pointer is out of range
;it is so end loop
;loop uses about 32 clock cycles and executes when we want data
mov R1,#LAZMAJ+4 ;go to out of range position
chkmaj:
dec R1 ;decrement pointer first so we are within range
mov MAJT,#R1 ;load value to majority variable. treat it as signed
mov #R1,#0h ;clear value from memory space
mov C,MAJT.7 ;Take sign and use that as carry
rlc A ;and put it into our final variable
cjne R1,#LAZMAJ,chkmaj ;if pointer isn't in first address then keep going
;otherwise exit loop and A=value we want

Serial point to point protocol but with 8 bytes instead of 16

I was looking at answers in Simple serial point-to-point communication protocol and it doesn't help me enough with my issue. I am also trying to communicate data between a computer and an 8-bit microcontroller at first, then eventually I want to communicate the one microcontroller to about 40 others via wireless radio modules. Basically one is designated as a master and the rest are slaves.
speed is an issue
The issue at hand is speed. because communication of every packet needs to be done at least 4x a second back and forth between the master and each slave.
Let's assume baud rate for data is 9600bps. That's 960 bytes a second.
If I used 16-byte packets then: 40 (slaves) times 16 (bytes) times 2 (ways) = 640. Divide that into 960 and that would mean well more than 1/2 a second. Not good.
If I used 8-byte packets then: 40 (slaves) times 8 (bytes) times 2 (ways) = 320. Divide that into 960 and that would mean 1/3 second. It's so-so.
But the thing is I need to watch my baud because too high of baud might mean missed data at larger distances, but you can see the speed difference between an 8 and 16 byte packet.
packet format idea
In my design, I may have a need to transmit a number in the low millions so that will use 24-bits which fits in my idea. But here's my initial idea:
Byte 1: Recipient address 0-255
Byte 2: Sender address 0-255
Byte 3: Command
Byte 4-6: Data
Byte 7-8: 16-bit fletcher checksum of above data
I don't mind if the above format is adjusted, just as long as I have at least 6 bits to identify the sender and receiver (since I'll only deal with 40 units), and the data with command included should be at least 4 bytes total.
How should I modify my data packet idea so that even the device that just turned on in the middle of reception can be in sync with the next set of data? Is there a way without stripping a bit from each data byte?
Rely on the check sum! My packet would consists of:
Recipient's address (0..40) XORed with 0x55
Sender's address (0..40) XORed with 0xAA
Command Byte
Data Byte 0
Data Byte 1
Data Byte 2
CRC8 sum, as suggested by Vroomfondel
Every receiver should have a sliding window of the last seven received bytes. When a byte was shifted in, that window should checked if it is valid:
Are the two addresses in the valid range?
Is it a valid command?
Is the CRC correct?
Especially the last one should safely reject packets on which the receiver hopped on off-sync.
If you have less than 32 command codes, you may go down to six bytes per packet: 40[Senders] times 40[Receivers] times 32[Commands] evaluates to 51200, which would fit into 16 bits instead of 24.
Don't forget to turn off the parity bit!
Update 2017-12-09: Here a receiving function:
typedef uint8_t U8;
void ByteReceived(U8 Byte)
{
static U8 Buf[7]; //Bytes received so far
static U8 BufBC=0;
Buf[BufBC++] = Byte;
if (BufBC<7) return; //Msg incomplete
/*** Seven Byte Message received ***/
//Check Addresses
U8 Adr;
Adr = Buf[0] ^ 0x55; if (Adr >= 40) goto Fail;
Adr = Buf[1] ^ 0xAA; if (Adr >= 40) goto Fail;
if (Buf[2] > ???) goto Fail; //Check Cmd
if (CalcCRC8(Buf, 6) != Buf[6]) goto Fail;
Evaluate(...);
BufBC=0; //empty Buf[]
return;
Fail:
//Seven Byte Msg invalid -> chop off first byte, could use memmove()
Buf[0] = Buf[1];
Buf[1] = Buf[2];
Buf[2] = Buf[3];
Buf[3] = Buf[4];
Buf[4] = Buf[5];
Buf[5] = Buf[6];
BufBC = 6;
}

Dereference pointers in XMM register (gather)

If I have some pointer or pointer-like values packed into an SSE or AVX register, is there any particularly efficient way to dereference them, into another such register? ("Particularly efficient" meaning "more efficient than just using memory for the values".) Is there any way to dereference them all without writing an intermediate copy of the register out to memory?
Edit for clarification: that means, assuming 32-bit pointers and SSE, to index into four arbitrary memory areas at once with the four sections of an XMM register and return four results at once to another register. Or as close to "at once" as possible. (/edit)
Edit2: thanks to PaulR's answer I guess the terminology I'm looking for is "gather", and the question therefore is "what's the best way to implement gather for systems pre-AVX2?".
I assume there isn't an instruction for this since ...well, one doesn't appear to exist as far as I can tell and anyway it doesn't seem to be what SSE is designed for at all.
("Pointer-like value" meaning something like an integer index into an array pretending to be the heap; mechanically very different but conceptually the same thing. If, say, one wanted to use 32-bit or even 16-bit values regardless of the native pointer size, to fit more values in a register.)
Two possible reason I can think of why one might want to do this:
thought it might be interesting to explore using the SSE registers for general-purpose... stuff, perhaps to have four identical 'threads' processing potentially completely unrelated/non-contiguous data, slicing through the registers "vertically" rather than "horizontally" (i.e. instead of the way they were designed to be used).
to build something like romcc if for some reason (probably not a good one), one didn't want to write anything to memory, and therefore would need more register storage.
This might sound like an XY problem, but it isn't, it's just curiosity/stupidity. I'll go looking for nails once I have my hammer.
The question is not entirely clear, but if you want to dereference vector register elements then the only instructions which might help you here are AVX2's gathered loads, e.g. _mm256_i32gather_epi32 et al. See the AVX2 section of the Intel Intrinsics Guide.
SYNOPSIS
__m256i _mm256_i32gather_epi32 (int const* base_addr, __m256i vindex, const int scale)
#include "immintrin.h"
Instruction: vpgatherdd ymm, vm32x, ymm
CPUID Flag : AVX2
DESCRIPTION
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
OPERATION
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MEM[base_addr + SignExtend(vindex[i+31:i])*scale]
ENDFOR
dst[MAX:256] := 0
So if I understood this correctly, your title is misleading and you really want to:
index into the concatenation of all XMM registers
with an index held in a part of an XMM register
Right?
That's hard. And a little weird, but I'm OK with that.
Assuming crazy tricks are allowed, I propose self-modifying code: (not tested)
pextrb eax, xmm?, ? // question marks are the position of the pointer
mov edx, eax
shr eax, 1
and eax, 0x38
add eax, 0xC0 // C0 makes "hack" put its result in eax
mov [hack+4], al // xmm{al}
and edx, 15
mov [hack+5], dl // byte [dl] of xmm reg
call hack
pinsrb xmm?, eax, ? // put value back somewhere
...
hack:
db 66 0F 3A 14 00 00 // pextrb ?, ? ,?
ret
As far as I know, you can't do that with full ymm registers (yet?). With some more effort, you could extend it to xmm8-xmm15. It's easily adjustable to other "pointer" sizes and other element sizes.

how to use Base Pointer in Assembly 8086 to go through the stack?

I have an assigment in which I have to insert two digit integers into the stack. Search a number in the stack and return in which position this number is, print all the numbers in the stack and delete a number from the stack.
Right now I'm trying to print all the numbers in the stack by going through the stack using the base pointer, but my code doesn't work.
mov di,offset bp
mov ax, [di] ;trying to move de value stored in di direction in stack to Ax
mov digito,ah
mov digito2,al
mov dl,digito
mov ah,02
int 21h
mov dl,digito2
mov ah,02
int 21h
mov ah,01
int 21h
So in this code I'm trying to print the two number digit by getting the bp into di (so later I can decrement it to go through all the stack), and the passing the number stored in that direction in Ax. I'm a newby in assembly so I don't know what I'm doing.
Thank you in advance for your time. (And sorry for my english)
Sorry for the delayed reply. First, bp doesn't really have an "offset", so you could remove that. Second, bp won't automatically point into the stack unless you have made it so (mov bp, sp).
You don't mention an OS, but int 21h identifies it as DOS... which is real mode, segmented memory model. mov ax, [di] defaults to mov ax, ds:[di]. If you've assembled this into a ".com" file, cs, ds, es, and ss are all the same. If you've assembled it into an ".exe" file, this is not so! You may want to write it as mov ax, ss:[di] to be sure. In contrast, mov ax, [bp] defaults to mov ax, ss:[bp], so you may want to use bp instead of di here. I suspect that's how you're "supposed" to do it. If you've got a ".com" file, you can forget about this part (in 32-bit code you can forget about it too, but that doesn't apply to you).
Then... your attempt to print a number isn't really going to work properly. Look for "How do I print a number?" examples for more information on that - too much to get into here...
This is too hard an assignment for a beginner, IMO (but "the instructor is always right" :) ).

Pointers and Indexes in Intel 8086 Assembly

I have a pointer to an array, DI.
Is it possible to go to the value pointed to by both DI and another pointer?
e.g:
mov bl,1
mov bh,10
inc [di+bl]
inc [di+bh]
And, on a related note, is there a single line opcode to swap the values of two registers? (In my case, BX and BP?)
For 16-bit programs, the only supported addressing forms are:
[BX+SI]
[BX+DI]
[BP+SI]
[BP+DI]
[SI]
[DI]
[BP]
[BX]
Each of these may include either an 8- or 16-bit constant displacement.
(Source: Intel Developer's Manual volume 2A, page 38)
The problem with the example provided is that bl and bh are eight-bit registers and cannot be used as the base pointer. However, if you set bx to the desired value then inc [di+bx] (with a suitable size specifier for the pointer) is valid.
As for swapping "the high and low bits of a register," J-16 SDiZ's suggestion of ror bx, 8 is fine for exchanging bl and bh (and IIRC, it is the optimal way to do so). However, if you want to exchange bit 0 of (say) bl with bit 7 of bl, you'll need more logic than that.
DI is not a pointer, it is an index.
You can you ROR BX, 8 to rotate a lower/higher byte of a register.

Resources