Only 14 RAPI events are available on Xeon Phi. Why so few? - intel

I'm trying to use RAPI to monitor the performance of my Xeon Phi code. I just compiled and installed a native version of RAPI follwoing the documentation. And the following list is what I get when I execute "rapi_avail" on my Xeon Phi which shall display all available events. Surprisingly, only 14 are available.
Is this correct? Only 14 are available on Xeon Phi?
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache misses
PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses
PAPI_L3_ICM 0x80000005 No No Level 3 instruction cache misses
PAPI_L1_TCM 0x80000006 No No Level 1 cache misses
PAPI_L2_TCM 0x80000007 No No Level 2 cache misses
PAPI_L3_TCM 0x80000008 No No Level 3 cache misses
PAPI_CA_SNP 0x80000009 No No Requests for a snoop
PAPI_CA_SHR 0x8000000a No No Requests for exclusive access to shared cache line
PAPI_CA_CLN 0x8000000b No No Requests for exclusive access to clean cache line
PAPI_CA_INV 0x8000000c No No Requests for cache line invalidation
PAPI_CA_ITV 0x8000000d No No Requests for cache line intervention
PAPI_L3_LDM 0x8000000e No No Level 3 load misses
PAPI_L3_STM 0x8000000f No No Level 3 store misses
PAPI_BRU_IDL 0x80000010 No No Cycles branch units are idle
PAPI_FXU_IDL 0x80000011 No No Cycles integer units are idle
PAPI_FPU_IDL 0x80000012 No No Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013 No No Cycles load/store units are idle
PAPI_TLB_DM 0x80000014 Yes No Data translation lookaside buffer misses
PAPI_TLB_IM 0x80000015 Yes No Instruction translation lookaside buffer misses
PAPI_TLB_TL 0x80000016 No No Total translation lookaside buffer misses
PAPI_L1_LDM 0x80000017 No No Level 1 load misses
PAPI_L1_STM 0x80000018 No No Level 1 store misses
PAPI_L2_LDM 0x80000019 Yes No Level 2 load misses
PAPI_L2_STM 0x8000001a No No Level 2 store misses
PAPI_BTAC_M 0x8000001b No No Branch target address cache misses
PAPI_PRF_DM 0x8000001c No No Data prefetch cache misses
PAPI_L3_DCH 0x8000001d No No Level 3 data cache hits
PAPI_TLB_SD 0x8000001e No No Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f No No Failed store conditional instructions
PAPI_CSR_SUC 0x80000020 No No Successful store conditional instructions
PAPI_CSR_TOT 0x80000021 No No Total store conditional instructions
PAPI_MEM_SCY 0x80000022 No No Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023 No No Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024 No No Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025 No No Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026 No No Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027 No No Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028 No No Cycles with maximum instructions completed
PAPI_HW_INT 0x80000029 No No Hardware interrupts
PAPI_BR_UCN 0x8000002a No No Unconditional branch instructions
PAPI_BR_CN 0x8000002b No No Conditional branch instructions
PAPI_BR_TKN 0x8000002c No No Conditional branch instructions taken
PAPI_BR_NTK 0x8000002d No No Conditional branch instructions not taken
PAPI_BR_MSP 0x8000002e Yes No Conditional branch instructions mispredicted
PAPI_BR_PRC 0x8000002f No No Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030 No No FMA instructions completed
PAPI_TOT_IIS 0x80000031 No No Instructions issued
PAPI_TOT_INS 0x80000032 Yes No Instructions completed
PAPI_INT_INS 0x80000033 No No Integer instructions
PAPI_FP_INS 0x80000034 No No Floating point instructions
PAPI_LD_INS 0x80000035 Yes No Load instructions
PAPI_SR_INS 0x80000036 Yes No Store instructions
PAPI_BR_INS 0x80000037 Yes No Branch instructions
PAPI_VEC_INS 0x80000038 Yes No Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039 No No Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a No No Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b Yes No Total cycles
PAPI_LST_INS 0x8000003c No No Load/store instructions completed
PAPI_SYC_INS 0x8000003d No No Synchronization instructions completed
PAPI_L1_DCH 0x8000003e No No Level 1 data cache hits
PAPI_L2_DCH 0x8000003f No No Level 2 data cache hits
PAPI_L1_DCA 0x80000040 Yes No Level 1 data cache accesses
PAPI_L2_DCA 0x80000041 No No Level 2 data cache accesses
PAPI_L3_DCA 0x80000042 No No Level 3 data cache accesses
PAPI_L1_DCR 0x80000043 No No Level 1 data cache reads
PAPI_L2_DCR 0x80000044 No No Level 2 data cache reads
PAPI_L3_DCR 0x80000045 No No Level 3 data cache reads
PAPI_L1_DCW 0x80000046 No No Level 1 data cache writes
PAPI_L2_DCW 0x80000047 No No Level 2 data cache writes
PAPI_L3_DCW 0x80000048 No No Level 3 data cache writes
PAPI_L1_ICH 0x80000049 No No Level 1 instruction cache hits
PAPI_L2_ICH 0x8000004a No No Level 2 instruction cache hits
PAPI_L3_ICH 0x8000004b No No Level 3 instruction cache hits
PAPI_L1_ICA 0x8000004c Yes No Level 1 instruction cache accesses
PAPI_L2_ICA 0x8000004d No No Level 2 instruction cache accesses
PAPI_L3_ICA 0x8000004e No No Level 3 instruction cache accesses
PAPI_L1_ICR 0x8000004f No No Level 1 instruction cache reads
PAPI_L2_ICR 0x80000050 No No Level 2 instruction cache reads
PAPI_L3_ICR 0x80000051 No No Level 3 instruction cache reads
PAPI_L1_ICW 0x80000052 No No Level 1 instruction cache writes
PAPI_L2_ICW 0x80000053 No No Level 2 instruction cache writes
PAPI_L3_ICW 0x80000054 No No Level 3 instruction cache writes
PAPI_L1_TCH 0x80000055 No No Level 1 total cache hits
PAPI_L2_TCH 0x80000056 No No Level 2 total cache hits
PAPI_L3_TCH 0x80000057 No No Level 3 total cache hits
PAPI_L1_TCA 0x80000058 No No Level 1 total cache accesses
PAPI_L2_TCA 0x80000059 No No Level 2 total cache accesses
PAPI_L3_TCA 0x8000005a No No Level 3 total cache accesses
PAPI_L1_TCR 0x8000005b No No Level 1 total cache reads
PAPI_L2_TCR 0x8000005c No No Level 2 total cache reads
PAPI_L3_TCR 0x8000005d No No Level 3 total cache reads
PAPI_L1_TCW 0x8000005e No No Level 1 total cache writes
PAPI_L2_TCW 0x8000005f No No Level 2 total cache writes
PAPI_L3_TCW 0x80000060 No No Level 3 total cache writes
PAPI_FML_INS 0x80000061 No No Floating point multiply instructions
PAPI_FAD_INS 0x80000062 No No Floating point add instructions
PAPI_FDV_INS 0x80000063 No No Floating point divide instructions
PAPI_FSQ_INS 0x80000064 No No Floating point square root instructions
PAPI_FNV_INS 0x80000065 No No Floating point inverse instructions
PAPI_FP_OPS 0x80000066 No No Floating point operations
PAPI_SP_OPS 0x80000067 No No Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS 0x80000068 No No Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP 0x80000069 No No Single precision vector/SIMD instructions
PAPI_VEC_DP 0x8000006a No No Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b No No Reference clock cycles
-------------------------------------------------------------------------
Of 108 possible events, 14 are available, of which 0 are derived.
avail.c PASSED

I shall use the "papi_native_avail" to get the available native events which reported 140 native events.

Related

Getting data from a direct mapped cache in prolog

The predicate getDataFromCache(StringAddress,Cache,Data,HopsNum,directMap,BitsNum)
should succeed when the Data is successfully retrieved from the Cache (cache hit)
and the HopsNum represents the number of hops required to access the data from
the cache which can differ according to direct map cache mapping technique such
that:
• StringAddress is a string of the binary number which represents the address
of the data you are required to address and it is six binary bits.
• Cache is the cache using the representation discussed previously .
• Data is the data retrieved from cache when cache hit occurs.
• HopsNum the number of hops required to access the data from the cache.
• BitsNum The BitsNum is the number of bits the index needs.
getDataFromCache is always giving me false although everythings seems working so I want someone to fix it
convertAddress(Binary,N,Tag,Idx,directMap):-
Idx is mod(Binary,10**N),
Tag is Binary // 10**N.
getDataFromCache(SA,[item(tag(T),data(D),V,_)|T],Data,HopsNum,directMap,BitsNum):-
convertAddress(SA,BitsNum,Tag,Idx,directMap),
number_string(Tag,Z),
Z==T,
V==1,
Data is D.
getDataFromCache(SA,[item(tag(T),data(D),V,_)|T],Data,HopsNum,directMap,BitsNum):-
convertAddress(SA,BitsNum,Tag,Idx,directMap),
number_string(Tag,Z),
(Z\=T;V==0),
getDataFromCache(SA,T,Data,HopsNum,directMap,BitsNum).
simply hopsNumber is always zero
and you don't have to traverse since it's direct
you can access it using nth0 perdicate
Also you are using the T variable twice

HERE API: Toll cost is not optimized

I'm exploring HERE API in order to evaluate usefulness to our application. My interest is focused on estimated truck transport cost (toll, vehicle, driver). I'm facing with a problem in generating optimized route between 2 points: from 50.893017,20.615645 to 52.055324,21.010707. Driver cost set to 10, vehicle cost set to 1.
And when I'm using https://fleet.ls.hereapi.com/2/calculateroute.json I get the distance in total 211km with 260.17 PLN cost (including 0 toll cost)
When I'm using https://tce.cit.api.here.com/2/calculateroute.json I get distance 159km with 221.28 PLN (including 38.55 PLN toll cost).
As you can see first API didn't returned cost optimized route. Moreover it looks like the first API is trying to omit toll gates, while this better than go around.
Am I missing something? Why there is so much difference? Parameters for both queries looks similar.
First api parameters (excluding api keys):
jsonAttributes:41
waypoint0:50.893017,20.615645
waypoint1:52.055324,21.010707
detail:1
routelegattributes:li
routeattributes:gr
maneuverattributes:none
linkattributes:none,rt,fl
legattributes:none,li,sm
currency:PLN
departure:
tollVehicleType:3
trailerType:0
vehicleNumberAxles:2
trailerNumberAxles:0
hybrid:0
emissionType:3
fuelType:petrol
trailerHeight:0
vehicleWeight:40t
disabledEquipped:0
hov:0
passengersCount:2
tiresCount:4
commercial:0
heightAbove1stAxle:1m
width:1.8
length:4.41
mode:fastest;truck;traffic:disabled
alternatives:2
driver_cost:10
vehicle_cost:1
Second api parameters (excluding api keys):
jsonAttributes:41
waypoint0:50.893017,20.615645
waypoint1:52.055324,21.010707
detail:1
routelegattributes:li
routeattributes:gr
maneuverattributes:none
linkattributes:none,rt,fl
legattributes:none,li,sm
currency:PLN
departure:
tollVehicleType:3
trailerType:0
vehicleNumberAxles:2
trailerNumberAxles:0
hybrid:0
emissionType:3
fuelType:petrol
trailerHeight:0
vehicleWeight:40t
disabledEquipped:0
hov:0
passengersCount:2
tiresCount:4
commercial:0
heightAbove1stAxle:1m
width:1.8
length:4.41
mode:fastest;truck;traffic:disabled
alternatives:2
driver_cost:10
vehicle_cost:1
Please don't use in URL request nor the domain name tce.cit.api.here.com nor tce.api.here.com or some old parameter names also for fleet.ls.hereapi.com .
This tce.api.here.com is legacy and will be sometimes wrong calculate a toll.
Read please this documentation for tollPass parameter:
Comma separated list of owned passes: Senior_Pass, transponder,
Annual, Nr_of_Days, Nr_of_Months, SunPass, E-Z Pass (last 2 are
examples for real toll transponders). Allows traversal of
'transponder-only' toll booths and allows cost free traversal of
certain toll sections.
Some toll booths and toll sections are allowed to use only by driver who has some toll pass e.g. tollPass=transponder otherwise a driver have to avoid these booths/toll sections to use more long way.
if you try to use e.g. tollPass=transponder parameter for request to fleet.ls.hereapi.com then you will see using toll section.

Units of Work and Backout in DataPower

I Have set the configuration as below
Units of Work : 1
Automatic Backout: on.
Backout Threshold: 3
Backout Queue Name: Queue Name is given.
So according to this settings , since threshold value is 3 and in case of failure, there should be 4 transaction in the probe?
can you please confirm
Thanks
Vathsa
No, only one as it is the same transaction in DP but three transport retires.

IPCS message passing related queries

I am dealing with Message Passing IPCS method. I do have few question regarding this:
KEY field in ipcs -q shows me 0x00000000 what does this means ?
Can i see what messsage is passes using msqid ?
If two entries are present (for a particular user) after executing command ipcs -q. Does this means that two messages were passed by this particular user ?
If used-bytes and message fields are set as 0 what does this mean?
Is there away to see if message queue is full or not?
How many queues can we have for one particular user?
I tried goggling, but was not able to find answer to these questions.
Please help
1. The "key" field of the Shared memory segments is usually 0x00000000. This indicates the IPC_PRIVATE key specified during creation of the shared memory segment. The manual of shmget() contains more details.
2. AFAIK, this cannot be done. If any msg is "de-queued" from the msgQ, then the intended receiver will not see it.
3. The 2 entries in the list of message queues indicates that there are currently 2 active message queues on the system identified by their corresponding unique keys.
Creating additional msgQ : ipcmk -Q
Deleting an existing msgQ : ipcrm -Q <unique-key>
4. The used-bytes and messages fields set to 0 indicate that currently no transfers have occurred using that particular msgQ.
5. Currently one way to do this to obtain the number of msgs currently queued-up in the msgQ programmatically as shown in the following C snippet. Next this can be compared with the size of the msgQ as demonstrated in this answer.
int ret = msgctl(msqid, IPC_STAT, &buf);
uint msg = (uint)(buf.msg_qnum);
printf("msgs in Q = %u\n", msg);
6. There exists a limit on the total memory used by all the msgQs on the system combined together. This can be obtained by ulimit -q. The amount of bytes used in a msgQ is listed under the used-bytes column in the output of ipcs -Q. The total number of msgQs is limited only by the amount of memory available to create a new msgQ from the msgQ memory pool limit seen above.
Also checkout the latter part of this answer for a few sample operations on POSIX message queues.

CopyError - OFS.CopySupport.manage_pasteObjects limited to ~ <160 objects?

I'm using a view to archive old content by moving it into another folder.
(catalog search for enddate more than N months ago, pass id's into the following command:
target.manage_pasteObjects( source.manage_cutObjects(idsToArchive) )
One or two years ago moving about 800 or even more objects was no problem.
Today I need to limit the catalog search to around 80 items, otherwise I get
a
Module OFS.CopySupport, line 193, in manage_pasteObjects
CopyError:
The data in the clipboard could not be read, possibly due to cookie data being truncated by your web browser. Try copying fewer objects.
running plone 4.1.6 / Zope2-2.13.15.
I already tried to deactivate beaker-session-datamanager (still the same problems)
You installed the latest Plone hotfix, 20130618. It includes a DDOS-prevention measure limiting the size of the __cp cookie data to 8kb (decompressed).
Future Zope versions will also include this fix.
To work around this temporarily your only option is to increase the maximum size default. Doing this will allow other threads use larger cookies as well until you restore the default:
from OFS.CopySupport import _cb_decode
_default_maxsize = _cb_decode.func_defaults[0]
def _increase_maxsize(newsize):
# Patch the maxsize default
_cb_decode.func_defaults = (newsize,)
def _restore_maxsize(newsize):
# Patch the maxsize default
_cb_decode.func_defaults = (_default_maxsize,)
The cookie data consists almost entirely of object paths (absolute paths as tuples) as marshall dumps, you'll have to estimate a suitable maximum size from that.

Resources