boost mpi equivalent of status.MPI_SOURCE - mpi

is there a boost::MPI equivalent of the following C MPI code? I'm trying to port the following standard MPI code which is a basic master slave template found here. Following from the boost mpi documentation theres only 3 parameters , for an mpi_send or mpi_recv rank, tag and buffer.
while (work != NULL) {
/* Receive results from a slave */
MPI_Recv(&result, /* message buffer */
1, /* one data item */
MPI_INT, /* of type double real */
MPI_ANY_SOURCE, /* receive from any sender */
MPI_ANY_TAG, /* any type of message */
MPI_COMM_WORLD, /* default communicator */
&status); /* info about the received message */
/* Send the slave a new work unit */
MPI_Send(&work, /* message buffer */
1, /* one data item */
MPI_INT, /* data item is an integer */
status.MPI_SOURCE, /* to who we just received from */
WORKTAG, /* user chosen message tag */
MPI_COMM_WORLD); /* default communicator */
/* Get the next unit of work to be done */
work = get_next_work_item();
}

From the boost.MPI documentation:
MPI_ANY_SOURCE becomes any_source
MPI_ANY_TAG becomes any_tag
The communicator::recv() method returns an instance of the status class that provides all the information that you need:
status.MPI_SOURCE is returned by status::source()
status.MPI_TAG is returned by status::tag()
It also provides two cast operators to covert its content to MPI_Status structure.

Related

Berkeley Packet Filters for VLAN priority

I need to filter priority into VLAN header to ensure Voice priority value.
Using BPF filtering is possible to evaluate which packets contains priority bits value into VLAN header are equals to five ?
Regards
Vincenzo
Yes you can, the exact way to do it depends on the type of eBPF program.
For programs with __sk_buff contexts(TC, Socket filter, cGroup SKB)
eBPF program types which get a __sk_buff as context can just access the vlan_tci field. This fields should already be in host byte order so you can just mask and bit shift the value to get the PCP field.
For XDP programs
In XDP programs we need to manually parse all network layers before we can access the TCI field.
XDP tutorial has a few parsing functions which are a good base, including parse_ethhdr_vlan:
/* Notice, parse_ethhdr() will skip VLAN tags, by advancing nh->pos and returns
* next header EtherType, BUT the ethhdr pointer supplied still points to the
* Ethernet header. Thus, caller can look at eth->h_proto to see if this was a
* VLAN tagged packet.
*/
static __always_inline int parse_ethhdr_vlan(struct hdr_cursor *nh,
void *data_end,
struct ethhdr **ethhdr,
struct collect_vlans *vlans)
{
struct ethhdr *eth = nh->pos;
int hdrsize = sizeof(*eth);
struct vlan_hdr *vlh;
__u16 h_proto;
int i;
/* Byte-count bounds check; check if current pointer + size of header
* is after data_end.
*/
if (nh->pos + hdrsize > data_end)
return -1;
nh->pos += hdrsize;
*ethhdr = eth;
vlh = nh->pos;
h_proto = eth->h_proto;
/* Use loop unrolling to avoid the verifier restriction on loops;
* support up to VLAN_MAX_DEPTH layers of VLAN encapsulation.
*/
#pragma unroll
for (i = 0; i < VLAN_MAX_DEPTH; i++) {
if (!proto_is_vlan(h_proto))
break;
if (vlh + 1 > data_end)
break;
h_proto = vlh->h_vlan_encapsulated_proto;
if (vlans) /* collect VLAN ids */
vlans->id[i] =
(bpf_ntohs(vlh->h_vlan_TCI) & VLAN_VID_MASK);
vlh++;
}
nh->pos = vlh;
return h_proto; /* network-byte-order */
}
You will have to modify this function for your purposes since it currently discards the PCP field you are after vlans->id[i] = (bpf_ntohs(vlh->h_vlan_TCI) & VLAN_VID_MASK);

How can I use the enum tcp mib definitions in a kprobe program?

Here, I tried to write a program(kprobe) to include the enum tcp mib like #tcp_states in the book BPF Performance Tools bpftrace. The enum tcp mib is in '/include/uapi/linux/snmp.h':
#!/usr/local/bin/bpftrace
#include <net/net_namespace.h>
#include <net/netns/mib.h>
#include <net/snmp.h>
#include <uapi/linux/snmp.h>
#define TCP_MIB_MAX __TCP_MIB_MAX
kprobe:sk_alloc
{
$net = (struct net *)arg0;
$mi = (struct netns_mib *)$net->mib;
$ib = (struct tcp_mib *)$mi;
#mib[1] = "TCP_MIB_NUM";
#mib[2] = "TCP_MIB_RTOALGORITHM";
#mib[3] = "TCP_MIB_RTOMIN";
#mib[4] = "TCP_MIB_RTOMAX;
#mib[5] = "TCP_MIB_MAXCONN";
#mib[6] = "TCP_MIB_ACTIVEOPENS";
#mib[7] = "TCP_MIB_PASSIVEOPENS";
#mib[8] = "TCP_MIB_ATTEMPTFAILS";
#mib[9] = "TCP_MIB_ESTABRESETS";
#mib[10] = "TCP_MIB_CURRESTAB";
#mib[11] = "TCP_MIB_INSEGS";
#mib[12] = "TCP_MIB_OUTSEGS";
#mib[13] = "TCP_MIB_RETRANSSEGS";
#mib[14] = "TCP_MIB_INERRS";
#mib[15] = "TCP_MIB_OUTRSTS";
#mib[16] = "TCP_MIB_CSUMERRORS";
printf("-------------------------------\n");
time();
printf("sk_alloc: %s pid: %d\n", comm, pid);
printf("\n");
printf("$ib: %u\n", $ib->miss[6]);
$mib_s = $ib->mibs[TCP_MIB_MAX];
$mib_str = #mib[$mib_s];
printf("TCP mib is: %s\n", $mib_str);
clear(#mib);
}
And when I tried to run it the output was:
the index 94779518808448 is out of bounds for array of size 16
Then I tried to instead of TCP_MIB_MAX, to put specific array positions e.g 5, (I modify the above code):
$mib_s = $ib->mibs[5];
And when I tried to run it, the output was:
...
-----------------------------
21:40:15
sk_alloc: systemd-logind pid: 920
$ib: 1516359680
TCP mib is:
-----------------------------
21:40:15
sk_alloc: systemd-logind pid: 920
$ib: 1516359680
TCP mib is:
...
Why does not show TCP mib? and shows nothing in the output?
How can I use the array properly to show #mib?
TCP_MIB_MAX and __TCP_MIB_MAX are equal to 16, which is equal to the size of the struct tcp_mib in the kernel:
enum
{
TCP_MIB_NUM = 0,
TCP_MIB_RTOALGORITHM, /* RtoAlgorithm */
TCP_MIB_RTOMIN, /* RtoMin */
TCP_MIB_RTOMAX, /* RtoMax */
TCP_MIB_MAXCONN, /* MaxConn */
TCP_MIB_ACTIVEOPENS, /* ActiveOpens */
TCP_MIB_PASSIVEOPENS, /* PassiveOpens */
TCP_MIB_ATTEMPTFAILS, /* AttemptFails */
TCP_MIB_ESTABRESETS, /* EstabResets */
TCP_MIB_CURRESTAB, /* CurrEstab */
TCP_MIB_INSEGS, /* InSegs */
TCP_MIB_OUTSEGS, /* OutSegs */
TCP_MIB_RETRANSSEGS, /* RetransSegs */
TCP_MIB_INERRS, /* InErrs */
TCP_MIB_OUTRSTS, /* OutRsts */
TCP_MIB_CSUMERRORS, /* InCsumErrors */ // == 15
__TCP_MIB_MAX // == 16
};
and
#define TCP_MIB_MAX __TCP_MIB_MAX
struct tcp_mib {
unsigned long mibs[TCP_MIB_MAX];
};
(include/uapi/linux/snmp.h and include/net/snmp.h)
But because arrays are indexed from 0, you can go only up to TCP_MIB_MAX - 1 when indexing $ib->mibs. This is why you get a complaint about the out-of-bound index.
Then when you select a smaller index, you can access to the array item as expected. But I'm not sure what you are trying to do with:
$mib_s = $ib->mibs[5];
$mib_str = #mib[$mib_s];
To me it looks like you are reading the value from the MIB ($ib->mibs[TCP_MIB_ACTIVEOPENS]), which may supply any value, possibly big, and likely null (I suspect this is the case here). Then you use that value as... an index in #mib? So if the counter is at 10k, you try to take the 10,000th cell of a 16-sized array? I suppose in your case the value is 0, so you are doing $mib_str = #mib[0], which is likely an empty string because you never set a value for #mib[0].
To fix all of this I would start by using the correct indices (from 0 to 15) for the #mib array, to avoid any confusion. Then you probably need to rethink what you are trying to print exactly, but I'm not sure that the two lines above is what you want.

Why doesn't HAL_UART_Transmit_DMA() work for serial ports on a Nucleo F103RB?

I have the following code, much of it generated by STM32CubeMX. (I've elided the huge number of generated comments, to make it readable.)
volatile int txDoneFlag = 0;
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart){
txDoneFlag = 1;
}
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USART2_UART_Init();
MX_USART1_UART_Init();
MX_DMA_Init();
MX_USART3_UART_Init();
while (1)
{
LD2_GPIO_Port->BSRR = (uint32_t)LD2_Pin;
HAL_UART_Transmit_DMA(&huart1, (uint8_t*)"1: on \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_UART_Transmit_DMA(&huart2, (uint8_t*)"2: on \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_UART_Transmit_DMA(&huart3, (uint8_t*)"3: on \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_Delay(100);
LD2_GPIO_Port->BSRR = (uint32_t)LD2_Pin << 16U;
HAL_UART_Transmit_DMA(&huart1, (uint8_t*)"1: off \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_UART_Transmit_DMA(&huart2, (uint8_t*)"2: off \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_UART_Transmit_DMA(&huart3, (uint8_t*)"3: off \n", 16);
while(!txDoneFlag);
txDoneFlag = 0;
HAL_Delay(100);
}
}
The DMA was setup in the STM32CubeMX generator, so it should be correct.
When I run this code, it gets stuck in an endless loop at the first while(!txDoneFlag);, implying that HAL_UART_TxCpltCallback() is never called.
This makes me think I need to do something further to enable DMA.
How can I make HAL_UART_Transmit_DMA() work?
I've already tried reordering the generated MX... calls, so that MX_DMA_Init() is called before the ...UART_Init()s.
--
Update: requested code. All three MX_USARTn_UART_Init() functions have identical bodies (with the exception of the uart number.
/**
* #brief USART3 Initialization Function
* #param None
* #retval None
*/
static void MX_USART3_UART_Init(void)
{
/* USER CODE BEGIN USART3_Init 0 */
/* USER CODE END USART3_Init 0 */
/* USER CODE BEGIN USART3_Init 1 */
/* USER CODE END USART3_Init 1 */
huart3.Instance = USART3;
huart3.Init.BaudRate = 115200;
huart3.Init.WordLength = UART_WORDLENGTH_8B;
huart3.Init.StopBits = UART_STOPBITS_1;
huart3.Init.Parity = UART_PARITY_NONE;
huart3.Init.Mode = UART_MODE_TX_RX;
huart3.Init.HwFlowCtl = UART_HWCONTROL_NONE;
huart3.Init.OverSampling = UART_OVERSAMPLING_16;
if (HAL_UART_Init(&huart3) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN USART3_Init 2 */
/* USER CODE END USART3_Init 2 */
}
/**
* Enable DMA controller clock
*/
static void MX_DMA_Init(void)
{
/* DMA controller clock enable */
__HAL_RCC_DMA1_CLK_ENABLE();
/* DMA interrupt init */
/* DMA1_Channel2_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel2_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel2_IRQn);
/* DMA1_Channel3_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel3_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel3_IRQn);
/* DMA1_Channel4_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel4_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel4_IRQn);
/* DMA1_Channel5_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel5_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel5_IRQn);
/* DMA1_Channel6_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel6_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel6_IRQn);
/* DMA1_Channel7_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel7_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel7_IRQn);
}
Make sure in your STM32Cube ioc file that the Global Interrupt for the UART peripheral you are using is checked for all three, some IRQs are combined depending on the chip.
They are located in your stm32F1xx_it.c file. If need be, set breakpoints and make sure those interrupts are firing. Inside the ISR would be where you can see which callback is being called, if any at all (if some config is missing). What does your configuration look like in your MX_DMA_Init() and your UART_INIT()? Can you share those? Otherwise you can make sure everything is wired up yourself. Although, I may caution you against using this technique with the DMA, the entire point of the DMA is to be able to execute other instructions on the CPU, while the DMA contacts memory and handles memory operations.
With all three UART peripherals sending messages and the DMA, you should be able to use a few flags in the Callbacks and use if statements instead of blocking with a while loop.
Callbacks in HAL are weakly typed, so you need to make sure that the symbol has a path to correct definition ie.. extern or a clear include path, such that the correct memory address is assigned to your callback in your main file here. That way, when the callback is issued from the ISR, your Callback in your main file will be the one it goes to.

Assessing how large a file (How much RAM it will take) will be in R before loading it

I was wondering if there's some function that given a file name + path can asses how much RAM R will need to use it? i want to be able to know this info before I'm loading it.
You can use 'fstat'
http://linux.die.net/man/2/fstat
It will report information about your file, such as actual filesize.
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for file system I/O */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};

Open Addressing vs. Separate Chaining

Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
I personally think the answer is open addressing with linear probing, because it doesn't need any additional storage space in case of collisions. Is this correct?
Answering the question: Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
Open addressing/probing that allows a high fill. Because as you said so yourself, there is no extra space required for collisions (just, well, possibly time -- of course this is also assuming the hash function isn't perfect).
If you did not specify "load factor close to 1" or included "cost" metrics in the question then it would be entirely different.
Happy coding.
A hashmap that is that full will degrade into a linear search, so you will want to keep them under 90% full.
You are right about open addressing using less memory, chaining will need a pointer or offset field in each node.
I have created a hasharray data structure for when I need very lightweight hashtables that will not have alot of inserts. To keep memory usage low all data is embedded in the same block of memory, with the HashArray structure at the start, then two arrays for hashs & values. Hasharray can only be used with the lookup key is stored in the value.
typedef uint16_t HashType; /* this can be 32bits if needed. */
typedef uint16_t HashSize; /* this can be made 32bits if large hasharrays are needed. */
struct HashArray {
HashSize length; /* hasharray length. */
HashSize count; /* number of hash/values pairs contained in the hasharray. */
uint16_t value_size; /* size of each value. (maximum size of value 64Kbytes) */
/* these last two fields are just for show, they are not defined in the HashArray struct. */
uint16_t hashs[length]; /* array of hashs for each value, this helps with resolving bucket collision */
uint8_t values[length * value_size]; /* array holding all values. */
};
#define hasharray_get_hashs(array) (HashType *)(((uint8_t *)(array)) + sizeof(HashArray))
#define hasharray_get_values(array) ((uint8_t *)(array)) + sizeof(HashArray) + \
((array)->length * sizeof(HashType))
#define hasharray_get_value(array, idx) (hasharray_get_values(array) + ((idx) * (array)->value_size))
The macros hasharray_get_hashs & hasharray_get_values are used to get the 'hashs' & 'values' arrays.
I have used this for adding fast lookup of complex objects that are already stored in an array. The objects have a string 'name' field which is used for the lookup. The names are hashed and inserted into the hasharray with the objects index. The values stored in the hasharray can be indexes/pointers/whole objects (I only use small 16bit index values).
If you want to pack the hasharray till it is almost full, then you will want to use full 32bit Hashs instead of the 16bit ones defined above. Larger 32bit hashs will help keep searchs fast when the hasharray is more then 90% full.
The hasharray as defined above can only hold a maximum of 65535, which is fine since I never use it on anything that would have more the a few hundred values. Anything that needs more that that should just use an normal hashtable. But if memory is really an issue, the HashSize type could be changed to 32bits. Also I use power-of-2 lengths to keep the hash lookup fast. Some people prefer to use prime bucket lengths, but that is only needed if the hash function has bad distribution.
#define hasharray_empty_hash 0xFFFF /* hash value to mark empty slots. */
void *hasharray_search(HashArray *array, HashType hash, uint32_t *next) {
HashType *hashs = hasharray_get_hashs(array);
uint32_t mask = array->length - 1;
uint32_t start_idx;
uint32_t idx;
hash = (hash == hasharray_empty_hash) ? 0 : hash; /* need one hash value to mark empty slots. */
start_hash_idx = (hash & mask);
if(*next == 0) {
idx = start_idx; /* new search. */
} else {
idx = *next & mask; /* continuing search to next slot. */
}
/* find hash in hash array. */
do {
/* check for hash match. */
if(hashs[idx] == hash) goto found_hash;
/* check for end of chain. */
if(hashs[idx] == hasharray_empty_hash) break;
idx++;
idx &= mask;
} while(idx != start_idx);
/* maximum tries reached (i.e. did a linear search of whole array) or end of chain. */
return NULL;
found_hash:
*next = idx + 1; /* where to continue search at, if this is not the right value. */
return hasharray_get_values(array) + (idx * array->value_size);
}
hash collisions will happen so the code that calls hasharray_search() needs to compare the search key with the one stored in the value object. If they don't match then hasharray_search() is called again. Also non-unique keys can exist, since searching can continue until 'NULL' is returned to find all values that match one key. The search function uses linear probing to be cache freindly.
typedef struct {
char *name; /* this is the lookup key. */
char *type;
/* other field info... */
} Field;
typedef struct {
Field *list; /* array of Field objects. */
HashArray *lookup; /* hasharray for fast lookup of Field objects by name. The values stored in this hasharray are 16bit indices. */
uint32_t field_count; /* number of Field objects in 'list'. */
} Fields;
extern Fields *fields_new(uint16_t count) {
Fields *fields;
fields = calloc(1, sizeof(Fields));
fields->list = calloc(count, sizeof(Field));
/* allocate hasharray to hold at most 'count' uint16_t values.
* The hasharray will round 'count' up to the next power-of-2.
* That power-of-2 length must be atleast (count+1), so that there will always be one empty slot.
*/
fields->lookup = hasharray_new(count, sizeof(uint16_t));
fields->field_count = count;
}
extern Field *fields_lookup_by_name(Fields *fields, const char *name) {
HashType hash = str_to_hash(name);
Field *field;
uint32_t next = 0;
uint16_t *rc;
uint16_t idx;
do {
rc = hasharray_search(fields->lookup, hash, &next);
if(rc == NULL) break; /* field not found. */
/* found a possible match. */
idx = *rc;
assert(idx < fields->field_count);
field = &(fields->list[idx]);
/* compare lookup name with field's name. */
if(strcmp(name, field->name) == 0) {
/* found match. */
return field;
}
/* field didn't match continue search to next field. */
} while(1);
return NULL;
}
The worst case searching will degrade to a linear search of the whole array if it is 99% full and the key doesn't exist. If the keys are integers, then a linear search shouldn't be to bad, also only keys with the same hash value will need to be compared. I try to keep the hasharrays sized so they are only about 70-80% full, the space wasted on empty slots isn't much if the values are only 16bit values. With this design you only waste 4bytes per empty slot when using 16bit hashs & 16bit index values. The array of objects (Field structs in the above example) has no empty spots.
Also most hashtable implementations that I have seen don't store the computed hashs and require full key compares to resolve bucket collisions. Comparing the hashs helps a lot since only a small part of the hash value is used to lookup the bucket.
As the others said, in linear probing, when load factor near to 1, the time complexity near to linear search. (When it's full, its infinite.) There is a memory-efficiency trade off here. While segregate chaining always give us theoretically constant time.
Normally, under linear probing, it's recommended to keep the load factor between 1/8 and 1/2. when the array is 1/2 full, we resize it to double the size of original array. (Reference: Algorithms. by Robert Sedgewick. Kevin Wayne. ). When delete, we resize the array to 1/2 of original size as well. If you are really interested, it's good for you to begin with the book I mentioned above.
In practical, it's said that 0.72 is an empirical value we usually use.

Resources