couldn't bring arm xen alive : Unable to allocate first memory bank - xen

I'm currently working on bring xen alive on an ARM64 platform.
The xen always pops up "Unable to allocate first memory bank" when it tries to allocate 128 MB for DOM0 (512MB on target platform).
I use pagealloc_info() to dump memory info, finding there is only small amount of heaps.
Does any one have idea??
- UART enabled -
- CPU 00000000 booting -
- Current EL 00000008 -
- Xen starting at EL2 -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000000000000 - 000000001ffe0fff
(XEN)
(XEN) MODULE[0]: 0000000001ff2000 - 0000000001ff3000 Device Tree
(XEN) MODULE[1]: 0000000003000000 - 0000000003600000 Kernel earlyprintk console=ttyS0,115200 cma=16m#64m
(XEN) RESVD[0]: 0000000001ff2000 - 0000000001ff3000
(XEN)
(XEN) Command line: console=ttyS0,115200 earlyprintk loglevel=4
(XEN) Placing Xen at 0x000000001fc00000-0x000000001fe00000
(XEN) Update BOOTMOD_XEN from 0000000000200000-0000000000302d81 => 000000001fc00000-000000001fd02d81
(XEN) Domain heap initialisedPhysical memory information:
(XEN) Xen heap: 0kB free
(XEN) heap[01]: 8kB free
(XEN) heap[02]: 8kB free
(XEN) heap[03]: 16kB free
(XEN) heap[04]: 32kB free
(XEN) heap[05]: 64kB free
(XEN) heap[06]: 128kB free
(XEN) heap[07]: 256kB free
(XEN) heap[08]: 512kB free
(XEN) heap[09]: 1024kB free
(XEN) heap[10]: 2048kB free
(XEN) heap[11]: 4096kB free
(XEN) heap[12]: 8192kB free
(XEN) heap[13]: 16328kB free
(XEN) Dom heap: 32712kB free
(XEN)
(XEN) Bad console= option 'ttyS0'
(XEN) Bad console= option '115200'
Xen 4.6.0-rc
(XEN) Xen version 4.6.0-rc (tom_ting#(none)) (aarch64-linux-xgcc (Realtek ASDK64-4.9.3 Build 2180) 4.9.3 20150413 (prerelease)) debug=y Wed Sep 2 20:16:44 CST 2015
(XEN) Latest ChangeSet: Wed Sep 2 17:15:27 2015 +0800 git:3cad003-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN) Processor Features: 0000000000002222 0000000000000000
(XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN) Extensions: FloatingPoint AdvancedSIMD
(XEN) Debug Features: 0000000010305106 0000000000000000
(XEN) Auxiliary Features: 0000000000000000 0000000000000000
(XEN) Memory Model Features: 0000000000001122 0000000000000000
(XEN) ISA Features: 0000000000010000 0000000000000000
(XEN) 32-bit Execution:
(XEN) Processor Features: 00000131:00011011
(XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN) Extensions: GenericTimer Security
(XEN) Debug Features: 03010066
(XEN) Auxiliary Features: 00000000
(XEN) Memory Model Features: 10201105 40000000 01260000 02102211
(XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00010001
(XEN) FIXME, temporary WA
(XEN) Using PSCI-0.2 for SMP bringup
(XEN) CPU0 has no enable method
(XEN) cpu0 init failed (hwid 0): -22
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 33000 KHz
(XEN) GICv2 initialization:
(XEN) gic_dist_addr=00000000ff011000
(XEN) gic_cpu_addr=00000000ff012000
(XEN) gic_hyp_addr=00000000ff014000
(XEN) gic_vcpu_addr=00000000ff016000
(XEN) gic_maintenance_irq=25
(XEN) GICv2: 128 lines, 4 cpus, secure (IID 0200143b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Allocated console ring of 16 KiB.
(XEN) Brought up 1 CPUs
(XEN) P2M: 40-bit IPA with 40-bit PA
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) I/O virtualisation disabled
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module # 0000000003000000
(XEN) Allocating 1:1 mappings totalling 128MB for dom0:
(XEN) Physical memory information:
(XEN) Xen heap: 0kB free
(XEN) heap[01]: 8kB free
(XEN) heap[02]: 8kB free
(XEN) heap[03]: 16kB free
(XEN) heap[04]: 32kB free
(XEN) heap[05]: 64kB free
(XEN) heap[06]: 128kB free
(XEN) heap[07]: 256kB free
(XEN) heap[08]: 512kB free
(XEN) heap[09]: 1024kB free
(XEN) heap[10]: 2048kB free
(XEN) heap[11]: 4096kB free
(XEN) heap[12]: 8192kB free
(XEN) heap[13]: 16016kB free
(XEN) Dom heap: 32400kB free
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Unable to allocate first memory bank
(XEN) ****************************************

Related

Unable to paralleled running VG bioinformatic tool in parallel computing (if possible with GUI), when working on server?

I am running a server at my workplace with specs as follows:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2683 v3 # 2.00GHz
Stepping: 2
CPU MHz: 2000.000
BogoMIPS: 4004.58
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
With the OS information:
Transient hostname: some-server-gs
Icon name: computer-server
Chassis: server
Operating System: CentOS Linux 7 (Core)
Kernel: Linux 3.10.0-514.el7.x86_64
Architecture: x86-64
I'm starting with the "Parallel" package but is there any GUI based way to use parallel computing?
I just tried running the command "parallel" but it did not work!

Having trouble stopping U-Boot autoboot

Background:
I have an old Seagate BlackArmor NAS 110 that I'm trying to install Debian on by following the instructions here: https://github.com/hn/seagate-blackarmor-nas.
I have a couple of USB to TTL serial adapters (one FTDI chipset and the other Prolific) that I've tried and have run into the same issue with both. I have made the connection to the serial port on the board of the NAS using a multimeter to make sure I've gotten the pinout correct.
Problem:
I'm not able to stop the autoboot process by pressing keys and any point during the boot process. The device also does not seem to respond to any keystrokes although they are echoed back.
What I've Tried So Far:
Using USB to TTL serial adapters with two different chipsets
Using the adapters on two different computers (MacBook Pro and a ThinkPad)
Using different operating systems (MacOS, Windows 10, Ubuntu 20.04)
Using different terminal programs (Screen, Minicom, Putty)
Turned off hardware and software flow control
Tested output of adapters by shorting RX and TX pins and seeing keystrokes echoed back
Commands seem to be sent to device as when I type I see my commands echoed back (not sure if this is supposed to happen)
I've been at this for a few days and can't figure it out. I've also recorded my screen while experiencing the issue: https://streamable.com/xl43br. Can anyone see where I'm going wrong?
Terminal output while experiencing the problem:
Welcome to minicom 2.7.1
OPTIONS:
Compiled on Nov 15 2020, 08:12:42.
Port /dev/tty.usbserial-AQ00KV6T, 16:51:31
Press Meta-Z for help on special keys
???
__ __ _ _
| \/ | __ _ _ ____ _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| | | | (_| | | \ V / __/ | |
|_| |_|\__,_|_| \_/ \___|_|_|
_ _ ____ _
| | | | | __ ) ___ ___ | |_
| | | |___| _ \ / _ \ / _ \| __|
| |_| |___| |_) | (_) | (_) | |_
\___/ |____/ \___/ \___/ \__| ** uboot_ver:v0.0.5 **
** MARVELL BOARD: MONO LE
U-Boot 1.1.4 (Nov 6 2009 - 11:15:26) Marvell version: 3.4.18
U-Boot code: 00600000 -> 0067FFF0 BSS: -> 006CDE60
Soc: 88F6192 A1 (DDR2)
CPU running # 800Mhz L2 running # 400Mhz
SysClock = 200Mhz , TClock = 166Mhz
DRAM CAS Latency = 3 tRP = 3 tRAS = 8 tRCD=3
DRAM CS[0] base 0x00000000 size 128MB
DRAM Total size 128MB 16bit width
Addresses 8M - 0M are saved for the U-Boot usage.
Mem malloc Initialization (8M - 7M): Done
NAND:d32 MB
Marvell Serial ATA Adapter
Integrated Sata device found
CPU : Marvell Feroceon (Rev 1)
Scanning partition header:
Found sign PrEr at c0000
Found sign KrNl at 2c0000
Found sign RoOt at 540000
Streaming disabled
Write allocate disabled
USB 0: host mode
PEX 0: interface detected no Link.
Net: egiga0 [PRIME]
0 any key to stop autoboot: 1
NAND read: device 0 offset 0xc4000, size 0x195200
Reading data from 0x259000 -- 100% complete.
1659392 bytes read: OK
Calculate CRC32:
crc32 checksum Pass
NAND read: device 0 offset 0x2c4000, size 0x21c000
Reading data from 0x4dfe00 -- 100% complete.
2211840 bytes read: OK
Calculate CRC32:
crc32 checksum Pass
## Booting image at 00040000 ...
Image Name: Linux-2.6.22.18
Created: 2009-11-06 3:38:29 UTC
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2211388 Bytes = 2.1 MB
Load Address: 00008000
Entry Point: 00008000
Verifying Checksum ... OK
OK
Starting kernel ...
Uncompressing Linux.......................................................................................................................................... done, booting the kernel.
Linux version 2.6.22.18 (root#jasonDev.localdomain) (gcc version 4.2.1) #1 Fri Nov 6 11:38:22 CST 2009 v0.0.7
CPU: ARM926EJ-S [56251311] revision 1 (ARMv5TE), cr=00053977
Machine: Feroceon-KW
Using UBoot passing parameters structure
Memory policy: ECC disabled, Data cache writeback
CPU0: D VIVT write-back cache
CPU0: I cache: 16384 bytes, associativity 4, 32 byte lines, 128 sets
CPU0: D cache: 16384 bytes, associativity 4, 32 byte lines, 128 sets
Built 1 zonelists. Total pages: 32512
Kernel command line: console=ttyS0,115200 mtdparts=nand_mtd:0x000a0000#0x0(uboot),0x00010000#0x000a0000(param),0x00200000#0x000c0000(preroot),0x00280000#0x002c0000(uimage),0x01a000000
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x30
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB 0MB 0MB 0MB = 128MB total
Memory: 109056KB available (4048K code, 289K data, 128K init)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
CPU Interface
-------------
SDRAM_CS0 ....base 00000000, size 128MB
SDRAM_CS1 ....disable
SDRAM_CS2 ....disable
SDRAM_CS3 ....disable
PEX0_MEM ....base e8000000, size 128MB
PEX0_IO ....base f2000000, size 1MB
INTER_REGS ....base f1000000, size 1MB
NFLASH_CS ....base fa000000, size 2MB
SPI_CS ....base f4000000, size 16MB
BOOT_ROM_CS ....no such
DEV_BOOTCS ....no such
CRYPT_ENG ....base f0000000, size 2MB
Marvell Development Board (LSP Version KW_LSP_4.2.7_patch21_with_rx_desc_tuned)-- MONO Soc: 88F6192 A1 LE
Detected Tclk 166666667 and SysClk 200000000
MV Buttons Device Load
Marvell USB EHCI Host controller #0: c05b4600
PEX0 interface detected no Link.
PCI: bus0: Fast back to back transfers enabled
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
NET: Registered protocol family 2
Time: kw_clocksource clocksource has been installed.
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 16384K
RTC registered
Use the XOR engines (acceleration) for enhancing the following functions:
o RAID 5 Xor calculation
o kernel memcpy
o kenrel memzero
Number of XOR engines to use: 4
cesadev_init(c00116c4)
mvCesaInit: sessions=640, queue=64, pSram=f0000000
MV Buttons Driver Load
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Installing knfsd (copyright (C) 1996 okir#monad.swb.de).
JFFS2 version 2.2. (NAND) ?Â?© 2001-2006 Red Hat, Inc.
fuse init (API version 7.8)
SGI XFS with large block numbers, no debug enabled
io scheduler noop registered
io scheduler anticipatory registered (default)
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250.0: ttyS0 at MMIO 0xf1012000 (irq = 33) is a 16550A
serial8250.0: ttyS1 at MMIO 0xf1012100 (irq = 34) is a 16550A
RAMDISK driver initialized: 2 RAM disks of 16384K size 1024 blocksize
loop: module loaded
Loading Marvell Ethernet Driver:
o Cached descriptors in DRAM
o DRAM SW cache-coherency
o Single RX Queue support - ETH_DEF_RXQ=0
o Single TX Queue support - ETH_DEF_TXQ=0
o TCP segmentation offload enabled
o Receive checksum offload enabled
o Transmit checksum offload enabled
o Network Fast Processing (Routing) supported
o Driver ERROR statistics enabled
o Driver INFO statistics enabled
o Proc tool API enabled
o Rx descripors: q0=256
o Tx descripors: q0=532
o Loading network interface(s):
o egiga0, ifindex = 1, GbE port = 0
Warning: Giga 1 is Powered Off
mvFpRuleDb (c73ab000): 1024 entries, 4096 bytes
e100: Intel(R) PRO/100 Network Driver, 3.5.17-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
Integrated Sata device found
scsi0 : Marvell SCSI to SATA adapter
scsi1 : Marvell SCSI to SATA adapter
NFTL driver: nftlcore.c $Revision: 1.98 $, nftlmount.c $Revision: 1.41 $
NAND device: Manufacturer ID: 0xec, Chip ID: 0x75 (Samsung NAND 32MiB 3,3V 8-bit)
Scanning device for bad blocks
7 cmdlinepart partitions found on MTD device nand_mtd
Using command line partition definition
Creating 7 MTD partitions on "nand_mtd":
0x00000000-0x000a0000 : "uboot"
0x000a0000-0x000b0000 : "param"
0x000c0000-0x002c0000 : "preroot"
0x002c0000-0x00540000 : "uimage"
0x00540000-0x01f40000 : "rootfs"
0x01f40000-0x02000000 : "misc"
0x00000000-0x02000000 : "flash"
ehci_marvell ehci_marvell.70059: Marvell Orion EHCI
ehci_marvell ehci_marvell.70059: new USB bus registered, assigned bus number 1
ehci_marvell ehci_marvell.70059: irq 19, io base 0xf1050100
ehci_marvell ehci_marvell.70059: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
USB Universal Host Controller Interface driver v3.0
usb 1-1: new high speed USB device using ehci_marvell and address 2
usb 1-1: configuration #1 chosen from 1 choice
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usbcore: registered new interface driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
attach_adapter....
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid6: int32x1 73 MB/s
raid6: int32x2 80 MB/s
raid6: int32x4 83 MB/s
raid6: int32x8 74 MB/s
raid6: using algorithm int32x4 (83 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: measuring checksumming speed
arm4regs : 722.800 MB/sec
8regs : 503.200 MB/sec
32regs : 600.000 MB/sec
raid5: using function: arm4regs (722.800 MB/sec)
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel#redhat.com
dm_crypt using the OCF package.
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
wix gpio_init
Advanced Linux Sound Architecture Driver Version 1.0.14 (Thu May 31 09:03:25 2007 UTC).
ALSA device list:
No soundcards found.
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 1620KiB [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing init memory: 128K
Enter Pre-Root FileSystem:
FW_UPDATE_FLAG_RES:1
BOARDTEST_FALG:0
DSK1_RES:1
DSK2_RES:1
DSK3_RES:1
DSK4_RES:1
DSK1_S_RES:
DSK2_S_RES:
DSK3_S_RES:
DSK4_S_RES:
CHK_RES:1
MD0CHK_RES:1
init started: BusyBox v1.1.1 (2008.10.08-08:58+0000) multi-call binary
Starting pid 396, console /dev/ttyS0: '/etc/init.d/rcS'
Starting network...
Starting inetd... OK
NOT_DEF_RES:0
EXT3-fs: unable to read superblock
FAT: unable to read boot sector
EXT3-fs: unable to read superblock
EXT2-fs: unable to read superblock
FAT: unable to read boot sector
FAT: unable to read boot sector
egiga0: started
admindasdas
So it turns out there is a short somewhere between the RX pin and the +3.3V pin which is not allowing me to send anything to the board. Thank you to those who have commented.

Total count of HugePages getting reduced from 6000 to 16 and Free pages to 0

I am testing a DPDK application with 2M Hugepages, so I changed the /proc/cmdline of my redhat VM to start with 6000 huge pages as shown below on my VM with total memory of 32GB.
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 6000
HugePages_Free: 6000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB*
But now when I start my application, it reports that application is asking for 5094 MB of memory but only 32 MB is available as shown below:
./build/app -l 4-7 -n 4 --socket-mem 5094,5094 --file-prefix dp -w 0000:13:00.0 -w 0000:1b:00.0
EAL: Detected 8 lcore(s)
EAL: Multi-process socket /var/run/.dp_unix
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 0! Requested: 5094MB, available: 32MB
EAL: FATAL: Cannot init memory
EAL: Cannot init memory
EAL: Error - exiting with code: 1
Cause: Error with EAL initialization
And now when I check Huge pages again, it only shows 16 pages as below, please let me know why my Huge pages are getting reduced to 16 from initial 6000 due to which my application is not able to get memory.
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
./dpdk-devbind --status
Network devices using DPDK-compatible driver
============================================
0000:13:00.0 'VMXNET3 Ethernet Controller 07b0' drv=igb_uio unused=vmxnet3
0000:1b:00.0 'VMXNET3 Ethernet Controller 07b0' drv=igb_uio unused=vmxnet3
Network devices using kernel driver
===================================
0000:04:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens161 drv=vmxnet3 unused=igb_uio *Active*
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens192 drv=vmxnet3 unused=igb_uio *Active*
0000:0c:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens193 drv=vmxnet3 unused=igb_uio *Active*
I also tried to increase the huge pages at run time but it doesn't help, it first increases but again on running the app, it reports that memory not available.
echo 6000 > /proc/sys/vm/nr_hugepages
echo "vm.nr_hugepages=6000" >> /etc/sysctl.conf
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 6000
HugePages_Free: 5984
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
./build/app -l 4-7 -n 4 --socket-mem 5094,5094 --file-prefix dp -w 0000:13:00.0 -w 0000:1b:00.0
EAL: Detected 8 lcore(s)
EAL: Multi-process socket /var/run/.dp_unix
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 0! Requested: 5094MB, available: 32MB
EAL: FATAL: Cannot init memory
EAL: Cannot init memory
EAL: Error - exiting with code: 1
Cause: Error with EAL initialization
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Seems there was some issue with the Centos 7 VM as Huge pages count was not making any sense, so I recreated the VM which resolved the issue.
If the requirement of your application is to have 5094 pages of 2MB, can you re-run your application with --socket-mem 5094,1.
but if your requirement is to have 5094 * 2, can you build the hugepages during boot by editing grub.conf as ' default_hugepagesz=2M hugepagesz=2M hugepages=10188'
Note: there is huge difference between 17.11 LTS and 18.11 LTS how huge pages are mapped and used.

DPDK MLX5 PMD driver probe issue

I'm not able to use the mlx5 pmd driver with some Mellanox NICs I have installed on my server. The error I'm receiving during EAL initialization is:
et_mlx5: no Verbs device matches PCI device 0000:03:00.0, are kernel drivers loaded?
The DPDK version I'm currently using is: DPDK-STABLE-18.11
I have installed the OFED latest version:
mlnx-en-4.5-1.0.1.0-ubuntu16.04-x86_64
I have performed modprobe of the ib_uverbs kernel module
Here's the kernel version I'm using
moragalu#server:~$ uname -r
4.4.0-143-generic
Here are the NIC models:
moragalu#server:~$ lspci | grep Mell
03:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
03:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
The firmware version that the NICs are using:
moragalu#eridium03:~$ ethtool -i eridium25-03
driver: mlx5_core
version: 4.5-1.0.1
firmware-version: 14.24.1000 (MT_2420110034)
expansion-rom-version:
bus-info: 0000:06:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
The complete output is of the eal initialization is:
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.0, are kernel drivers loaded?
EAL: Requested device 0000:03:00.0 cannot be used
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.1, are kernel drivers loaded?
EAL: Requested device 0000:03:00.1 cannot be used
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:06:00.0, are kernel drivers loaded?
EAL: Requested device 0000:06:00.0 cannot be used
EAL: PCI device 0000:06:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:06:00.1, are kernel drivers loaded?
EAL: Requested device 0000:06:00.1 cannot be used
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.1, are kernel drivers loaded?
EAL: Driver cannot attach the device (03:00.1)
EAL: Failed to attach device on primary process
Current modules loaded in the kernel:
moragalu#eridium03:~$ lsmod | grep ib
mlx5_ib 16384 0
mlx_compat 24576 4 mlx4_en,mlx5_ib,mlx4_core,mlx5_core
ib_uverbs 61440 0
ib_iser 49152 0
rdma_cm 49152 1 ib_iser
ib_cm 49152 1 rdma_cm
ib_sa 36864 2 rdma_cm,ib_cm
ib_mad 49152 2 ib_cm,ib_sa
ib_core 106496 7 rdma_cm,ib_cm,ib_sa,iw_cm,ib_mad,ib_iser,ib_uverbs
ib_addr 20480 2 rdma_cm,ib_core
I've had the same problems when using the RDMA-core libraries for the ibverbs dependency. In the past I've managed to find a bug in mlx5_core.c (hardcoded the no of queues to 8 in the probe function and it magically worked), but I'm not sure it's the same issue for you.
Either way, the problem went away when I installed the latest Mellanox OFED Drivers, so it's a good idea to try that. Just remember to install it using the command:
mlnxofedinstall --dpdk --upstream-libs
edit: Just noticed you have the drivers installed - make sure you did the installation as above. One more thing you can do: check the output of this (compiled with -libverbs):
#include <infiniband/verbs.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main() {
struct ibv_device ** devices;
int num;
struct ibv_context * ctx;
devices = ibv_get_device_list(&num);
int i;
if(devices[0] == NULL)
printf("devices is null\n");
printf("got %d devices\n", num);
for (i=0;i<num;i++) {
printf(ibv_get_device_name(devices[i]));
printf("\n");
ctx = ibv_open_device(devices[i]);
if (ctx == NULL)
printf("ctx is null \n");
else
printf("device opened\n");
}
if (errno)
printf("ERROR: %s\n", strerror(errno));
ibv_free_device_list(devices);
return 0;
}
If it lists no devices at least you'll know it's an issue with the verbs drivers and not DPDK itself.

Using all cores with Microsoft R Open and Google Compute Engine

I'm using Microsoft R Open on a GCE instance that has two vCPUs. Here are its specs.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU # 2.30GHz
Stepping: 0
CPU MHz: 2300.000
BogoMIPS: 4600.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc
eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp
ervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms xsaveopt
Even though I have two cores, Microsoft R Open seems to recognize only one of them, so I'm not taking full advantage of my computing capacity. I can't set the numbers of threads manually either.
Microsoft R Open 3.3.2
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2016 Microsoft Corporation
Using the Intel MKL for parallel mathematical computing(using 1 cores).
Default CRAN mirror snapshot taken on 2016-11-01.
See: https://mran.microsoft.com/.
> getMKLthreads()
[1] 1
> setMKLthreads(2)
Number of threads at maximum: no change has been made.
Here's a graph showing CPU usage. It never uses more than 50% of CPU power.
So, what should I do so I can use all my cores with MRO?
you are running Xeon which is hyper threaded . You have 1 cpu with hyper threading,the os treats it as 2 cpus but there is only one physical cpu. MRO uses the physical cores only(without hyper threading)
You can use this:
library(doParallel)
no_cores <- detectCores() - 1
registerDoParallel(cores=no_cores)
It will one core less than the actual cores you have. It leaves one core for OS operations. Try it out.

Resources