Banks size in 8051 - intel

I read a book about intel 8051 in which the author says, 8051 has three banks 00h to 1Fh, each bank has 8 registers and each bank is of 8 bytes. ?
Now I am confused what does he mean by each bank is of 8 bytes when each bank has 8 registers each 8 bytes wide. Kindly guide me
Regards

bank is of 8 bytes when each bank has 8 registers each 8 bytes wide
A register is 8 bits wide, and not 8 bytes.

Also, look at the Chapter 14 Figure 3 Memory Spaces chart here: (http://www.the8051microcontroller.com/select-figures)
Hopefully, it will make the picture clearer.

In the 8051, there are 4 bank registers B0 to B3. Their memory address locations are
B0 - 00H - 07H
B1 - 08H - 0FH
B2 - 10H - 17H
B3 - 18H - 2FH
The default bank is B0.
Each bank is 8 bytes. In each bank, there are 8 Registers which are 1 byte each R0 - R7. Each register is 1 byte that is 8 bits.
The banks can be switched by using the PSW (Processor Status Word) Register.
To sum it up,
Each register is 8 bits(1 byte) R0 - R7
Each bank is 8 bytes B0 - B3

Related

What is the purpose of payload in packet too big ICMPv6 message

I have gone through the RFC 4443 and 8201, perhaps I did not understand as I am new to some terminology described in these RFC but I want to understand the implication of using payload in ICMP packet to big message?
As per the RFC 4443
Linkt to 4443 RFC
The payload will contain as much of invoking packet not exceeding the minimum path MTU in packet to big message.
I don't understand the use case of such payload, even in the RFC 8201 there is no mention about the usages of payload.
only one comment present was
Added clarification in Section 4, "Protocol Requirements", that
nodes should validate the payload of ICMP PTB messages per RFC
4443, and that nodes should detect decreases in PMTU as fast as
possible.
What can be the implication validating payload in the PTB message and how should we validate the payload and based on what conditions?
Packet Too Big Message as per RFC 4443.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MTU |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| As much of invoking packet |
+ as possible without the ICMPv6 packet +
| exceeding the minimum IPv6 MTU [IPv6] |

What is the difference between a bank conflict and channel conflict on AMD hardware?

I am learning OpenCL programming and running some programs on AMD GPU. I referred the AMD OpenCL Programming guide to read about global memory optimization for GCN Architecture. I am not able to understand the difference between a bank conflict and a channel conflict.
Can someone explain me what is the difference between them?
Thanks in advance.
If two memory access requests are directed to the same controller, the hardware serializes the access. This is called a channel conflict. Which means, each of integrated memory controller circuits can serve to a single task at a time, if you happen to map any two tasks' address to access to same channel, they are served serially.
Similarly, if two memory access requests go to the same memory bank, hardware serializes the access. This is called a bank conflict. If there are multiple memory chips, then you should avoid using a stride of the special width of the hardware.
Example with 4 channels and 2 banks: (not a real world example since banks must be more than or equal to channels)
address 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
channel 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1
bank 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
so you should not read like this:
address 1 3 5 7 9
channel 1 3 1 3 1 // %50 channel conflict
bank 1 1 1 1 1 //%100 bank conflict,serialized on bank level
nor this:
address 1 5 9 13
channel 1 1 1 1 // %100 channel conflict, serialized
bank 1 1 1 1 // %100 bank conflict, serialized
but this could be ok:
address 1 6 11 16
channel 1 2 3 4 // no conflict, %100 channel usage
bank 1 2 1 2 // no conflict, %100 bank usage
because the stride is not a multiple of channel nor bank widths.
Edit: if your algorithms are more of a local-storage optimized, then you should pay attention to local data store channel conflicts. On top of this, some cards can use constant memory as an independent channel source to speed up reading rates.
Edit: You can use multiple wavefronts to hide conflict-based latencies or you can use instruction level parallelism too.
Edit: Number of local data store channels are much faster and more numerous than global channels so optimizing for LDS (local data share) is very important so uniform-gathering on global channels then scattering on local channels shouldn't be as problematic as scattering on global channels and uniform-gathering on local channels.
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-472173
For an AMD APU with a decent mainboard, you should be able to select an n-way channel interleaving or n-way bank interleaving for your desire if your software is not alterable.

Pipelining gate 2015

Consider the sequence of machine instructions given below:
MUL R5, R0, R1
DIV R6, R2, R3
ADD R7, R5, R6
SUB R8, R7, R4
In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages:
Instruction Fetch and Decode (IF),
Operand Fetch (OF),
Perform Operation (PO) and
Write back the Result (WB).
The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is
Since its clearly given that operand forwarding should be used from PO to OF stage, so answer to above should be 15 clock cycles.
But at many places answer is given as 13 clock cycles. 13 answer will come when we use operand forwarding from PO to PO.
My answer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB
Answer given at many places:
1 2 3 4 5 6 7 8 9 10 11 12 13
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB
can any one tell which answer is correct?
Correct answer is C , 13 clock cycles.
http://geeksquiz.com/gate-gate-cs-2015-set-2-question-54/
http://gateoverflow.in/8218/gate2015-2_44
Operand forwarding takes immediately after the last PO cycle, We do not need to wait one more clock cycle.
so this is the correct sequence
1 2 3 4 5 6 7 8 9 10 11 12 13
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB

What is the right order of bytes in a TCP header?

I'm trying to encode a TCP header myself, but can't understand what is the right order of bits/octets in it. This is what RFC 793 says:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
This means that Source Port should take first two octets and the lowest bit should be in the first octet. This means to me that in order to encode source port 180 I should start my TCP header with these two bytes:
B4 00 ...
However, all examples I can find tell me to do it the other way around:
00 B4 ...
Why?
This means that Source Port should take first two octets
Correct.
and the lowest bit should be in the first octet.
Incorrect. It doesn't mean that. It doesn't say anything about it.
All multi-byte integers in all IP headers are represented in network byte order, which is big-endian. This is specified in RFC 1700.

What is the correct name of this error correction method (it is similar to Hamming Code)

What is the correct name of this error correction method?
It is quite similar to Hamming Code, but much more simple. I also cannot find it in the literature any more. The only internet sources, I'm now able to find, which describes the method, are this:
http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/2-physical/errors-Hamming.html
And the german-language Wikipedia.
http://de.wikipedia.org/w/index.php?title=Fehlerkorrekturverfahren
In the Wikipedia article, the method is called Hamming-ECC method. But I'm not 100% sure, this is correct.
Here is an example, which describes the way the method works.
Payload: 10011010
Step 1: Determine parity bit positions. Bits, which are powers of 2 (1, 2, 4, 8, 16, etc.) are parity bits:
Position: 1 2 3 4 5 6 7 8 9 10 11 12
Data to be transmitted: ? ? 1 ? 0 0 1 ? 1 0 1 0
Step 2: Calculate parity bit values. Each bit position in the transmission is assigned to a position number. In this example, the position number is a 4-digit number, because we have 4 parity bits. Calculate XOR of the values of those positions (in 4-digit format), where the payload is a 1 bit in the transmission:
0011 Position 3
0111 Position 7
1001 Position 9
XOR 1011 Position 11
--------------------
0110 = parity bit values
Step 3: Insert parity bit values into the transmission:
Position: 1 2 3 4 5 6 7 8 9 10 11 12
Data to be transmitted: 0 1 1 1 0 0 1 0 1 0 1 0
Is is quite simple to verify, if a received message was transmitted correctly and single-bit errors can be corrected. Here is an example. The receiver calulates XOR of the calculated and received payload bits where the value is a 1 bit. Is the result is 0, there the transmission is error-free. Otherwise the result contains the position of the bit with the wrong value.
Received message: 0001101100101101
Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Received data: 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1
Parity bits: X X X X X
00101 Position 5
00111 Position 7
01011 Position 11
01101 Position 13
XOR 01110 Position 14
--------------------
01010 Parity bits calculated
XOR 00111 Parity bits received
--------------------
01101 => Bit 13 ist defective!
I hope, anybody here knows the correct name of the method.
Thanks for any help.
This looks like a complicated implementation of the Hamming(15,11) encoding & decoding algorithm.
Interleaving the parity bits with the information bits does not change the behaviour (or performance) of the code. Your description only uses 8 information bits, where the Hamming(15,11) corrects all single bit errors even with 11 information bits being transmitted.
Your description does not explain how the transmitted 12-bit message gets extended to a 16-bit message on the receive side.

Resources