Pipelining gate 2015 - pipeline

Consider the sequence of machine instructions given below:
MUL R5, R0, R1
DIV R6, R2, R3
ADD R7, R5, R6
SUB R8, R7, R4
In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages:
Instruction Fetch and Decode (IF),
Operand Fetch (OF),
Perform Operation (PO) and
Write back the Result (WB).
The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is
Since its clearly given that operand forwarding should be used from PO to OF stage, so answer to above should be 15 clock cycles.
But at many places answer is given as 13 clock cycles. 13 answer will come when we use operand forwarding from PO to PO.
My answer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB
Answer given at many places:
1 2 3 4 5 6 7 8 9 10 11 12 13
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB
can any one tell which answer is correct?

Correct answer is C , 13 clock cycles.
http://geeksquiz.com/gate-gate-cs-2015-set-2-question-54/
http://gateoverflow.in/8218/gate2015-2_44
Operand forwarding takes immediately after the last PO cycle, We do not need to wait one more clock cycle.
so this is the correct sequence
1 2 3 4 5 6 7 8 9 10 11 12 13
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB

Related

Is there an R function to help me plot the network connections for a single node?

This is my original dataset. R1,R2 and R3 are word association responses for the cue word. tf and df are total and document frequency of the cue word, respectively.
[1]: https://i.stack.imgur.com/wpfZy.png [Image shows original dataframe}
I have cleaned up a dataset into a nodes list and an edge list. I have over a million rows in both lists. Plotting this as a network graph would take too long, and also be very dense, i.e. not understandable.
[2]: https://i.stack.imgur.com/mfSfN.png [Image shows node-list]
[3]: https://i.stack.imgur.com/l60Eu.png [Image shows edge-list]
I want to be able to make a network graph for the cue words, such that upon entering a cue word, I get a network of words that are either responses to it, or are words that the cue word is a response for.
For example, I want to see all the connections for the word 'money'. Using filter(nword == "money") only shows the node 'money' as an output, but I want all nodes connected to the cue word (in this case, 'money').
[4]: https://i.stack.imgur.com/1bKrr.png [Image shows filter()]
Is there a function or a chunk of code that would help me resolve this issue?
from
to
1
1
1
6
1
8
1
17
1
18
1
22
1
23
1
38
1
67
1
80
2
82736
2
88035
2
103428
3
11
3
27
3
45
node_id
nword
n
1
money
13633
2
food
12338
3
water
12276
4
car
8907
5
music
8351
6
green
7890
7
red
7623
8
love
7406
9
sex
6552
10
happy
6432
11
cold
6333
12
bad
6132
13
sad
5958
14
dog
5940
15
white
5910
16
school
5832
17
fun
5594
18
time
5467
19
black
5233
20
hair
5219

How-to go parallel

I have a dataset as below:
Custid Product
12 A
12 B
12 C
13 A
13 B
13 D
14 B
14 D
14 E
15 A
15 E
15 B
16 C
16 A
16 D
So I have 5 distinct products (A B C D E) for customers (each getting 3). Now I want 5 text files for each product with the custids in them. for example:
test file for A should have custids-
12
13
15
16
and similarly other products should have text files with their custids that are asigned those products.
Is there a way to do it via parallel processing in R as I have millions of records with such data?
by(dat,dat$Product,function(x)write.csv(x,paste0(x[1,2],".txt")))
Now go to your working directory and check for the existence of these files. or try reading from your console: read.csv("A.txt")
To do it in a parallel way use the package parallel.
library(parallel)
lst=split(x = df,f = df$Product)
mcmapply(function(t,n){write(t$Custid,paste0(n,".txt"),ncolumns = 1,append = TRUE)},lst,names(lst),mc.preschedule=TRUE)

What is the difference between a bank conflict and channel conflict on AMD hardware?

I am learning OpenCL programming and running some programs on AMD GPU. I referred the AMD OpenCL Programming guide to read about global memory optimization for GCN Architecture. I am not able to understand the difference between a bank conflict and a channel conflict.
Can someone explain me what is the difference between them?
Thanks in advance.
If two memory access requests are directed to the same controller, the hardware serializes the access. This is called a channel conflict. Which means, each of integrated memory controller circuits can serve to a single task at a time, if you happen to map any two tasks' address to access to same channel, they are served serially.
Similarly, if two memory access requests go to the same memory bank, hardware serializes the access. This is called a bank conflict. If there are multiple memory chips, then you should avoid using a stride of the special width of the hardware.
Example with 4 channels and 2 banks: (not a real world example since banks must be more than or equal to channels)
address 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
channel 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1
bank 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
so you should not read like this:
address 1 3 5 7 9
channel 1 3 1 3 1 // %50 channel conflict
bank 1 1 1 1 1 //%100 bank conflict,serialized on bank level
nor this:
address 1 5 9 13
channel 1 1 1 1 // %100 channel conflict, serialized
bank 1 1 1 1 // %100 bank conflict, serialized
but this could be ok:
address 1 6 11 16
channel 1 2 3 4 // no conflict, %100 channel usage
bank 1 2 1 2 // no conflict, %100 bank usage
because the stride is not a multiple of channel nor bank widths.
Edit: if your algorithms are more of a local-storage optimized, then you should pay attention to local data store channel conflicts. On top of this, some cards can use constant memory as an independent channel source to speed up reading rates.
Edit: You can use multiple wavefronts to hide conflict-based latencies or you can use instruction level parallelism too.
Edit: Number of local data store channels are much faster and more numerous than global channels so optimizing for LDS (local data share) is very important so uniform-gathering on global channels then scattering on local channels shouldn't be as problematic as scattering on global channels and uniform-gathering on local channels.
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-472173
For an AMD APU with a decent mainboard, you should be able to select an n-way channel interleaving or n-way bank interleaving for your desire if your software is not alterable.

Extract specific observations of csv file in R

I've imported a csv file using read.csv.
It gives me a data frame with 18k observations of 1 variable, which looks like this:
V1
1 Energies (kJ/mol)
2 Bond Angle Proper Dih. Improper Dih. LJ-14
3 3.12912e+04 4.12307e+03 1.63677e+04 1.25619e+02 1.04394e+04
4 Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
5 9.21339e+04 2.82339e+05 -1.15807e+06 -7.21252e+05 -7.25781e+03
6 Step Time Lambda
7 1 1.00000 0.00000
8 Energies (kJ/mol)
9 Bond Angle Proper Dih. Improper Dih. LJ-14
10 2.71553e+04 4.11858e+03 1.63855e+04 1.22226e+02 1.03903e+04
11 Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
12 9.20926e+04 2.65253e+05 -1.15928e+06 -7.43766e+05 -7.27887e+03
13 Step Time Lambda
14 2 2.00000 0.00000
...
I want to extract the Potential energy in a vector. I've tried grep and readLines in multiple varieties and functions, but nothing works. Does anybody have an idea how to solve this problem?
Thanks! :)
So is this the right answer (from a former fizzsics major):
Lines <- readLines(textConnection("1 Energies (kJ/mol)
2 Bond Angle Proper Dih. Improper Dih. LJ-14
3 3.12912e+04 4.12307e+03 1.63677e+04 1.25619e+02 1.04394e+04
4 Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
5 9.21339e+04 2.82339e+05 -1.15807e+06 -7.21252e+05 -7.25781e+03
6 Step Time Lambda
7 1 1.00000 0.00000
8 Energies (kJ/mol)
9 Bond Angle Proper Dih. Improper Dih. LJ-14
10 2.71553e+04 4.11858e+03 1.63855e+04 1.22226e+02 1.03903e+04
11 Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
12 9.20926e+04 2.65253e+05 -1.15928e+06 -7.43766e+05 -7.27887e+03
13 Step Time Lambda
14 2 2.00000 0.00000"))
> grep("Potential", Lines) # identify the lines with "Potential"
[1] 4 11
Need to move to the next line and get the 5th item:
> read.table(text=Lines[ grep("Potential", Lines)+1])[ , 5]
[1] -721252 -743766

Banks size in 8051

I read a book about intel 8051 in which the author says, 8051 has three banks 00h to 1Fh, each bank has 8 registers and each bank is of 8 bytes. ?
Now I am confused what does he mean by each bank is of 8 bytes when each bank has 8 registers each 8 bytes wide. Kindly guide me
Regards
bank is of 8 bytes when each bank has 8 registers each 8 bytes wide
A register is 8 bits wide, and not 8 bytes.
Also, look at the Chapter 14 Figure 3 Memory Spaces chart here: (http://www.the8051microcontroller.com/select-figures)
Hopefully, it will make the picture clearer.
In the 8051, there are 4 bank registers B0 to B3. Their memory address locations are
B0 - 00H - 07H
B1 - 08H - 0FH
B2 - 10H - 17H
B3 - 18H - 2FH
The default bank is B0.
Each bank is 8 bytes. In each bank, there are 8 Registers which are 1 byte each R0 - R7. Each register is 1 byte that is 8 bits.
The banks can be switched by using the PSW (Processor Status Word) Register.
To sum it up,
Each register is 8 bits(1 byte) R0 - R7
Each bank is 8 bytes B0 - B3

Resources