What is the difference between a bank conflict and channel conflict on AMD hardware? - opencl

I am learning OpenCL programming and running some programs on AMD GPU. I referred the AMD OpenCL Programming guide to read about global memory optimization for GCN Architecture. I am not able to understand the difference between a bank conflict and a channel conflict.
Can someone explain me what is the difference between them?
Thanks in advance.

If two memory access requests are directed to the same controller, the hardware serializes the access. This is called a channel conflict. Which means, each of integrated memory controller circuits can serve to a single task at a time, if you happen to map any two tasks' address to access to same channel, they are served serially.
Similarly, if two memory access requests go to the same memory bank, hardware serializes the access. This is called a bank conflict. If there are multiple memory chips, then you should avoid using a stride of the special width of the hardware.
Example with 4 channels and 2 banks: (not a real world example since banks must be more than or equal to channels)
address 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
channel 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1
bank 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
so you should not read like this:
address 1 3 5 7 9
channel 1 3 1 3 1 // %50 channel conflict
bank 1 1 1 1 1 //%100 bank conflict,serialized on bank level
nor this:
address 1 5 9 13
channel 1 1 1 1 // %100 channel conflict, serialized
bank 1 1 1 1 // %100 bank conflict, serialized
but this could be ok:
address 1 6 11 16
channel 1 2 3 4 // no conflict, %100 channel usage
bank 1 2 1 2 // no conflict, %100 bank usage
because the stride is not a multiple of channel nor bank widths.
Edit: if your algorithms are more of a local-storage optimized, then you should pay attention to local data store channel conflicts. On top of this, some cards can use constant memory as an independent channel source to speed up reading rates.
Edit: You can use multiple wavefronts to hide conflict-based latencies or you can use instruction level parallelism too.
Edit: Number of local data store channels are much faster and more numerous than global channels so optimizing for LDS (local data share) is very important so uniform-gathering on global channels then scattering on local channels shouldn't be as problematic as scattering on global channels and uniform-gathering on local channels.
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-472173
For an AMD APU with a decent mainboard, you should be able to select an n-way channel interleaving or n-way bank interleaving for your desire if your software is not alterable.

Related

Hierarchy chart of 250+ shops and 35,000 employees

are there any tips for make org charts for a 35,000 member organization?
I've attached an org chart for a single shop.
Scenario: We have 250+ shops. Each shop is made up of multiple sections. Each section has a unique section name. Each section is made up of a different amount of managers, technicians, and supervisors. Each shop can be considered a child that reports to a parent. Each parent not only has that particular child shop, but also can have multiple other shops under them as well. That parent can also be a child to a different shop, which is making group_by a challenge. A is a child to parent B, but B is also a child to parent C, who is also a child to parent D, for example.
source doc is an excel doc with 35,000 rows and 50+ columns. Each shop is identified by a shop code and each shop code reports to a parent with it's own shop code.
group_by(parent id, child id might not work because a parent to one shop can be a child to a different parent.
Unit ID Reports To Unit name managers in unit supervisors in unit technicians in unit
10 11 i 2 0 4
9 11 h 2 1 0
8 9 g 4 3 2
6 7 f 2 3 4
5 7 e 1 2 3
4 5 d 2 1 0
3 4 c 4 3 2
2 4 b 2 3 4
1 2 a 1 2 3
You are looking for BALKAN OrgChartJS, it has the functionalities you are asking for:
Code demo with chart and 100k nodes (rows)
Code demo for Import from CSV file and other formats
Also you can read directly from the Excel(CSV) file with http request and load the chart, without any server side code
Disclaimer: I'm a developer in BALKAN App

Are messages dropped on RAFT?

I'm reading about raft, but I got a bit confused when it comes to consensus after a network partition.
So, considering a cluster of 2 nodes, 1 Leader, 1 Follower.
Before partitioning, there where X messages written, successfully replicated, and then imagine that a network problem caused partitioning, so there are 2 partitions, A(ex-leader) and B(ex-follower), which are now both leaders (receiving writes):
before partition | Messages |x| Partition | Messages
Leader | 0 1 2 3 4 |x| Partition A | 5 6 7 8 9
Follower | 0 1 2 3 4 |x| Partition B | 5' 6' 7' 8' 9'
After the partition event, we've figured it out, what happens?
a) We elect 1 new leader and consider its log? (dropping messages of the new follower?
e.g:
0 1 2 3 4 5 6 7 8 9 (total of 10 messages, 5 dropped)
or even:
0 1 2 3 4 5' 6' 7' 8' 9' (total of 10 messages, 5 dropped)
(depending on which node got to be leader)
b) We elect a new leader and find a way to make consensus of all the messages?
0 1 2 3 4 5 5' 6 6' 7 7' 8 8' 9 9' (total of 15 messages, 0 dropped)
if b, is there any specific way of doing that? or it depends on client implementation? (e.g.: message timestamp...)
The leaders log is taken to be "the log" when the leader is elected and has successfully written its initial log entry for the term. However in your case the starting premise is not correct. In a cluster of 2 nodes, a node needs 2 votes to be leader, not 1. So given a network partition neither node will be leader.

Is there an efficient algorithm to create this type of schedule? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am creating a schedule for a sports league with several dozen teams. I already have all of the games in a set order and now I just need to assign one team to be the "home" team and one to be "away" for each game.
The problem has two constraints:
Each pair of teams must play an equal number of home and away
games against each other. For example, if team A and team B play 4
games, then 2 must be hosted by A and 2 by B. Assume that each pair
of teams plays an even number of games against each other.
No team should have more than three consecutive home games or three
consecutive away games at any point in the schedule.
I have been trying to use brute force in R to solve this problem but I can't get any of my code blocks to solve the issue in a timely fashion. Does anyone have any advice on how to deal with either (or both) of the above constraints algorithmically?
You need to do more research on simple scheduling.
There are a lot of references on line for these things.
Here are the basics for your application. Let's assume a league of 6 teams; the process is the same for any number.
Match 1: Simply write down the team numbers in order, in pairs, in a ring. Flatten he ring into two lines. Matches are upper (home) and lower(away).
1 2 3
6 5 4
Matches 2-5: Team 1 stays in place; the others rotate around the ring.
1 6 2
5 4 3
1 5 6
4 3 2
1 4 5
3 2 6
1 3 4
2 6 5
That's one full cycle. To balance the home-away schedule, simply invert the fixtures every other match:
1 2 3 5 4 3 1 5 6 3 2 6 1 3 4
6 5 4 1 6 2 4 3 2 1 4 5 2 6 5
There's your first full round. Simply replicate this, again switching home-away fixtures in alternate rounds. Thus, the second round would be:
6 5 4 1 6 2 4 3 2 1 4 5 2 6 5
1 2 3 5 4 3 1 5 6 3 2 6 1 3 4
Repeat this pair of rounds as many times as needed to get the length of schedule you need.
If you have an odd quantity of teams, simply declare one of the numbers to be the "bye" in the schedule. I find it easiest to follow if I use the non-rotating team -- team 1 in this example.
Note that this home-switching process guarantees that no team has three consecutive matches either home or away: they get two in a row when rounding the end of the row. However, even the two-in-a-row doesn't suffer at the end of the round: both of those teams break the streak in the first match of the next round.
Unfortunately, for an arbitrary existing schedule, you are stuck with a brute-force search with backtracking. You can employ some limits and heuristics, such as balancing partial home-away fixtures as the first option at each juncture. Still, the better approach is to make your original schedule correct by design.
There's also a slight problem that you cannot guarantee that your existing schedule will fulfill the given requirements. For instance, given the 8-team fixtures in this order:
1 2 3 4
5 6 7 8
1 2 5 6
3 4 7 8
1 3 5 7
2 4 6 8
It is not possible to avoid having at least two teams playing three consecutive home or away matches.

Simple formula to ensure two teams get mixed up?

Say I have a number of users who are evenly split between two teams (if there is an odd count then one team will have one extra player).
I want to make sure everyone gets to be on the same team as everyone else over the course of 3 games with team changes after each game.
What's an easy mechanism for doing this for any number of players?
If it makes it easier for the purpose of explanation, I can give each player a number from 1 to N (where N is the number of players).
TIA
Let's assume we have 6 players, 1 - 6. You want to create different teams for 3 rounds of play.
For the first round, you deal the players.
1 2
3 4
5 6
For the second round, you put the winning team first, then the losing team. Let's assume that the team with player 1 won. Then the list of players would look like this.
1 3 5 2 4 6
And you would deal them like this.
1 3
5 2
4 6
For the third round, you do the same as the second round. This time, let's assume the team with player 3 won.
3 2 6 1 5 4
And you would deal them like this.
3 2
6 1
5 4
With only 3 rounds and many more than 6 players, everybody isn't going to be able to play everybody. But this shuffling algorithm gives a good mixture and is relatively simple to implement.

How to get whether OS is 32 bit or 64 bit by UNIX command?

How will you get to know the bits of operating system? Thanks in advance.
In linux, the answer to such a generic question is just using
uname -m
or even:
getconf LONG_BIT
In C you can use the uname(2) system call.
In windows you can use:
systeminfo | find /I "System type"
or even examine the environment:
set | find "ProgramFiles(x86)"
(or with getenv() in C)
Original question:
How to know the bits of operating system in C or by some other way?
The correct way is to use some system API or command to get the architecture.
Comparing sizeof in C won't give you the system pointer size but the target architecture's pointer size. That's because most architectures/OSes are backward compatible so they can run previous 16 or 32-bit programs without problem. A 32-bit program's pointer is still 32-bit long even on 64 bit OS. And even on 64-bit architectures, some OSes may still use 32-bit pointers such as x32-abi
if you use c, you can get sizeof(void*) or sizeof(long) .if =8 then 64bits else 32bits.It's the same to all arch.
I'm so sorry for my carelessness and mistake.It's only for linux. In Linux Device Driver 3rd,11.1 section: Use of Standard C Types. It says
The program can be used to show that long integers and pointers
feature a different size on 64-bit platforms, as demonstrated by
running the program on different Linux computers:
arch Size: char short int long ptr long-long u8 u16 u32 u64
i386 1 2 4 4 4 8 1 2 4 8
alpha 1 2 4 8 8 8 1 2 4 8
armv4l 1 2 4 4 4 8 1 2 4 8
ia64 1 2 4 8 8 8 1 2 4 8
m68k 1 2 4 4 4 8 1 2 4 8
mips 1 2 4 4 4 8 1 2 4 8
ppc 1 2 4 4 4 8 1 2 4 8
sparc 1 2 4 4 4 8 1 2 4 8
sparc64 1 2 4 4 4 8 1 2 4 8
x86_64 1 2 4 8 8 8 1 2 4 8
And there is some exception.For example:
It's interesting to note that the SPARC 64 architecture runs with a
32-bit user space, so pointers are 32 bits wide there, even though
they are 64 bits wide in kernel space. This can be verified by loading
the kdatasize module (available in the directory misc-modules within
the sample files). The module reports size information at load time
using printk and returns an error (so there's no need to unload it):
#user1437033 I guess Windows isn't compatible with gcc standard .So you maybe get answer from windows' programmers.
#Paul R We should consider it regular code ,right? If you use cross compile tools ,such as arm(it only has 32bits),then you also can't get answer.
Ps:I don't support you use Dev c++ compiler,it's weird in many scenes and isn't standard.Code blocks or vs 2010 may be a good choice.
I hope this can help you.

Resources