Total count of HugePages getting reduced from 6000 to 16 and Free pages to 0 - rhel7

I am testing a DPDK application with 2M Hugepages, so I changed the /proc/cmdline of my redhat VM to start with 6000 huge pages as shown below on my VM with total memory of 32GB.
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 6000
HugePages_Free: 6000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB*
But now when I start my application, it reports that application is asking for 5094 MB of memory but only 32 MB is available as shown below:
./build/app -l 4-7 -n 4 --socket-mem 5094,5094 --file-prefix dp -w 0000:13:00.0 -w 0000:1b:00.0
EAL: Detected 8 lcore(s)
EAL: Multi-process socket /var/run/.dp_unix
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 0! Requested: 5094MB, available: 32MB
EAL: FATAL: Cannot init memory
EAL: Cannot init memory
EAL: Error - exiting with code: 1
Cause: Error with EAL initialization
And now when I check Huge pages again, it only shows 16 pages as below, please let me know why my Huge pages are getting reduced to 16 from initial 6000 due to which my application is not able to get memory.
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
./dpdk-devbind --status
Network devices using DPDK-compatible driver
============================================
0000:13:00.0 'VMXNET3 Ethernet Controller 07b0' drv=igb_uio unused=vmxnet3
0000:1b:00.0 'VMXNET3 Ethernet Controller 07b0' drv=igb_uio unused=vmxnet3
Network devices using kernel driver
===================================
0000:04:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens161 drv=vmxnet3 unused=igb_uio *Active*
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens192 drv=vmxnet3 unused=igb_uio *Active*
0000:0c:00.0 'VMXNET3 Ethernet Controller 07b0' if=ens193 drv=vmxnet3 unused=igb_uio *Active*
I also tried to increase the huge pages at run time but it doesn't help, it first increases but again on running the app, it reports that memory not available.
echo 6000 > /proc/sys/vm/nr_hugepages
echo "vm.nr_hugepages=6000" >> /etc/sysctl.conf
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 6000
HugePages_Free: 5984
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
./build/app -l 4-7 -n 4 --socket-mem 5094,5094 --file-prefix dp -w 0000:13:00.0 -w 0000:1b:00.0
EAL: Detected 8 lcore(s)
EAL: Multi-process socket /var/run/.dp_unix
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 0! Requested: 5094MB, available: 32MB
EAL: FATAL: Cannot init memory
EAL: Cannot init memory
EAL: Error - exiting with code: 1
Cause: Error with EAL initialization
grep Huge /proc/meminfo
AnonHugePages: 6144 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

Seems there was some issue with the Centos 7 VM as Huge pages count was not making any sense, so I recreated the VM which resolved the issue.

If the requirement of your application is to have 5094 pages of 2MB, can you re-run your application with --socket-mem 5094,1.
but if your requirement is to have 5094 * 2, can you build the hugepages during boot by editing grub.conf as ' default_hugepagesz=2M hugepagesz=2M hugepages=10188'
Note: there is huge difference between 17.11 LTS and 18.11 LTS how huge pages are mapped and used.

Related

Having trouble stopping U-Boot autoboot

Background:
I have an old Seagate BlackArmor NAS 110 that I'm trying to install Debian on by following the instructions here: https://github.com/hn/seagate-blackarmor-nas.
I have a couple of USB to TTL serial adapters (one FTDI chipset and the other Prolific) that I've tried and have run into the same issue with both. I have made the connection to the serial port on the board of the NAS using a multimeter to make sure I've gotten the pinout correct.
Problem:
I'm not able to stop the autoboot process by pressing keys and any point during the boot process. The device also does not seem to respond to any keystrokes although they are echoed back.
What I've Tried So Far:
Using USB to TTL serial adapters with two different chipsets
Using the adapters on two different computers (MacBook Pro and a ThinkPad)
Using different operating systems (MacOS, Windows 10, Ubuntu 20.04)
Using different terminal programs (Screen, Minicom, Putty)
Turned off hardware and software flow control
Tested output of adapters by shorting RX and TX pins and seeing keystrokes echoed back
Commands seem to be sent to device as when I type I see my commands echoed back (not sure if this is supposed to happen)
I've been at this for a few days and can't figure it out. I've also recorded my screen while experiencing the issue: https://streamable.com/xl43br. Can anyone see where I'm going wrong?
Terminal output while experiencing the problem:
Welcome to minicom 2.7.1
OPTIONS:
Compiled on Nov 15 2020, 08:12:42.
Port /dev/tty.usbserial-AQ00KV6T, 16:51:31
Press Meta-Z for help on special keys
???
__ __ _ _
| \/ | __ _ _ ____ _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| | | | (_| | | \ V / __/ | |
|_| |_|\__,_|_| \_/ \___|_|_|
_ _ ____ _
| | | | | __ ) ___ ___ | |_
| | | |___| _ \ / _ \ / _ \| __|
| |_| |___| |_) | (_) | (_) | |_
\___/ |____/ \___/ \___/ \__| ** uboot_ver:v0.0.5 **
** MARVELL BOARD: MONO LE
U-Boot 1.1.4 (Nov 6 2009 - 11:15:26) Marvell version: 3.4.18
U-Boot code: 00600000 -> 0067FFF0 BSS: -> 006CDE60
Soc: 88F6192 A1 (DDR2)
CPU running # 800Mhz L2 running # 400Mhz
SysClock = 200Mhz , TClock = 166Mhz
DRAM CAS Latency = 3 tRP = 3 tRAS = 8 tRCD=3
DRAM CS[0] base 0x00000000 size 128MB
DRAM Total size 128MB 16bit width
Addresses 8M - 0M are saved for the U-Boot usage.
Mem malloc Initialization (8M - 7M): Done
NAND:d32 MB
Marvell Serial ATA Adapter
Integrated Sata device found
CPU : Marvell Feroceon (Rev 1)
Scanning partition header:
Found sign PrEr at c0000
Found sign KrNl at 2c0000
Found sign RoOt at 540000
Streaming disabled
Write allocate disabled
USB 0: host mode
PEX 0: interface detected no Link.
Net: egiga0 [PRIME]
0 any key to stop autoboot: 1
NAND read: device 0 offset 0xc4000, size 0x195200
Reading data from 0x259000 -- 100% complete.
1659392 bytes read: OK
Calculate CRC32:
crc32 checksum Pass
NAND read: device 0 offset 0x2c4000, size 0x21c000
Reading data from 0x4dfe00 -- 100% complete.
2211840 bytes read: OK
Calculate CRC32:
crc32 checksum Pass
## Booting image at 00040000 ...
Image Name: Linux-2.6.22.18
Created: 2009-11-06 3:38:29 UTC
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2211388 Bytes = 2.1 MB
Load Address: 00008000
Entry Point: 00008000
Verifying Checksum ... OK
OK
Starting kernel ...
Uncompressing Linux.......................................................................................................................................... done, booting the kernel.
Linux version 2.6.22.18 (root#jasonDev.localdomain) (gcc version 4.2.1) #1 Fri Nov 6 11:38:22 CST 2009 v0.0.7
CPU: ARM926EJ-S [56251311] revision 1 (ARMv5TE), cr=00053977
Machine: Feroceon-KW
Using UBoot passing parameters structure
Memory policy: ECC disabled, Data cache writeback
CPU0: D VIVT write-back cache
CPU0: I cache: 16384 bytes, associativity 4, 32 byte lines, 128 sets
CPU0: D cache: 16384 bytes, associativity 4, 32 byte lines, 128 sets
Built 1 zonelists. Total pages: 32512
Kernel command line: console=ttyS0,115200 mtdparts=nand_mtd:0x000a0000#0x0(uboot),0x00010000#0x000a0000(param),0x00200000#0x000c0000(preroot),0x00280000#0x002c0000(uimage),0x01a000000
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x30
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB 0MB 0MB 0MB = 128MB total
Memory: 109056KB available (4048K code, 289K data, 128K init)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
CPU Interface
-------------
SDRAM_CS0 ....base 00000000, size 128MB
SDRAM_CS1 ....disable
SDRAM_CS2 ....disable
SDRAM_CS3 ....disable
PEX0_MEM ....base e8000000, size 128MB
PEX0_IO ....base f2000000, size 1MB
INTER_REGS ....base f1000000, size 1MB
NFLASH_CS ....base fa000000, size 2MB
SPI_CS ....base f4000000, size 16MB
BOOT_ROM_CS ....no such
DEV_BOOTCS ....no such
CRYPT_ENG ....base f0000000, size 2MB
Marvell Development Board (LSP Version KW_LSP_4.2.7_patch21_with_rx_desc_tuned)-- MONO Soc: 88F6192 A1 LE
Detected Tclk 166666667 and SysClk 200000000
MV Buttons Device Load
Marvell USB EHCI Host controller #0: c05b4600
PEX0 interface detected no Link.
PCI: bus0: Fast back to back transfers enabled
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
NET: Registered protocol family 2
Time: kw_clocksource clocksource has been installed.
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 16384K
RTC registered
Use the XOR engines (acceleration) for enhancing the following functions:
o RAID 5 Xor calculation
o kernel memcpy
o kenrel memzero
Number of XOR engines to use: 4
cesadev_init(c00116c4)
mvCesaInit: sessions=640, queue=64, pSram=f0000000
MV Buttons Driver Load
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Installing knfsd (copyright (C) 1996 okir#monad.swb.de).
JFFS2 version 2.2. (NAND) ?Â?© 2001-2006 Red Hat, Inc.
fuse init (API version 7.8)
SGI XFS with large block numbers, no debug enabled
io scheduler noop registered
io scheduler anticipatory registered (default)
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250.0: ttyS0 at MMIO 0xf1012000 (irq = 33) is a 16550A
serial8250.0: ttyS1 at MMIO 0xf1012100 (irq = 34) is a 16550A
RAMDISK driver initialized: 2 RAM disks of 16384K size 1024 blocksize
loop: module loaded
Loading Marvell Ethernet Driver:
o Cached descriptors in DRAM
o DRAM SW cache-coherency
o Single RX Queue support - ETH_DEF_RXQ=0
o Single TX Queue support - ETH_DEF_TXQ=0
o TCP segmentation offload enabled
o Receive checksum offload enabled
o Transmit checksum offload enabled
o Network Fast Processing (Routing) supported
o Driver ERROR statistics enabled
o Driver INFO statistics enabled
o Proc tool API enabled
o Rx descripors: q0=256
o Tx descripors: q0=532
o Loading network interface(s):
o egiga0, ifindex = 1, GbE port = 0
Warning: Giga 1 is Powered Off
mvFpRuleDb (c73ab000): 1024 entries, 4096 bytes
e100: Intel(R) PRO/100 Network Driver, 3.5.17-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
Integrated Sata device found
scsi0 : Marvell SCSI to SATA adapter
scsi1 : Marvell SCSI to SATA adapter
NFTL driver: nftlcore.c $Revision: 1.98 $, nftlmount.c $Revision: 1.41 $
NAND device: Manufacturer ID: 0xec, Chip ID: 0x75 (Samsung NAND 32MiB 3,3V 8-bit)
Scanning device for bad blocks
7 cmdlinepart partitions found on MTD device nand_mtd
Using command line partition definition
Creating 7 MTD partitions on "nand_mtd":
0x00000000-0x000a0000 : "uboot"
0x000a0000-0x000b0000 : "param"
0x000c0000-0x002c0000 : "preroot"
0x002c0000-0x00540000 : "uimage"
0x00540000-0x01f40000 : "rootfs"
0x01f40000-0x02000000 : "misc"
0x00000000-0x02000000 : "flash"
ehci_marvell ehci_marvell.70059: Marvell Orion EHCI
ehci_marvell ehci_marvell.70059: new USB bus registered, assigned bus number 1
ehci_marvell ehci_marvell.70059: irq 19, io base 0xf1050100
ehci_marvell ehci_marvell.70059: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
USB Universal Host Controller Interface driver v3.0
usb 1-1: new high speed USB device using ehci_marvell and address 2
usb 1-1: configuration #1 chosen from 1 choice
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usbcore: registered new interface driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
attach_adapter....
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid6: int32x1 73 MB/s
raid6: int32x2 80 MB/s
raid6: int32x4 83 MB/s
raid6: int32x8 74 MB/s
raid6: using algorithm int32x4 (83 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: measuring checksumming speed
arm4regs : 722.800 MB/sec
8regs : 503.200 MB/sec
32regs : 600.000 MB/sec
raid5: using function: arm4regs (722.800 MB/sec)
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel#redhat.com
dm_crypt using the OCF package.
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
wix gpio_init
Advanced Linux Sound Architecture Driver Version 1.0.14 (Thu May 31 09:03:25 2007 UTC).
ALSA device list:
No soundcards found.
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 1620KiB [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing init memory: 128K
Enter Pre-Root FileSystem:
FW_UPDATE_FLAG_RES:1
BOARDTEST_FALG:0
DSK1_RES:1
DSK2_RES:1
DSK3_RES:1
DSK4_RES:1
DSK1_S_RES:
DSK2_S_RES:
DSK3_S_RES:
DSK4_S_RES:
CHK_RES:1
MD0CHK_RES:1
init started: BusyBox v1.1.1 (2008.10.08-08:58+0000) multi-call binary
Starting pid 396, console /dev/ttyS0: '/etc/init.d/rcS'
Starting network...
Starting inetd... OK
NOT_DEF_RES:0
EXT3-fs: unable to read superblock
FAT: unable to read boot sector
EXT3-fs: unable to read superblock
EXT2-fs: unable to read superblock
FAT: unable to read boot sector
FAT: unable to read boot sector
egiga0: started
admindasdas
So it turns out there is a short somewhere between the RX pin and the +3.3V pin which is not allowing me to send anything to the board. Thank you to those who have commented.

Data unpack would read past end of buffer in file util/show_help.c at line 501

I submitted a job via slurm. The job ran for 12 hours and was working as expected. Then I got Data unpack would read past end of buffer in file util/show_help.c at line 501. It is usual for me to get errors like ORTE has lost communication with a remote daemon but I usually get this in the beginning of the job. It is annoying but still does not cause as much time loss as getting error after 12 hours. Is there a quick fix for this? Open MPI version is 4.0.1.
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: barbun40
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: barbun40
Local device: mlx5_0
--------------------------------------------------------------------------
[barbun21.yonetim:48390] [[15284,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in
file util/show_help.c at line 501
[barbun21.yonetim:48390] 127 more processes have sent help message help-mpi-btl-openib.txt / ib port
not selected
[barbun21.yonetim:48390] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages
[barbun21.yonetim:48390] 126 more processes have sent help message help-mpi-btl-openib.txt / error in
device init
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: barbun64
Local PID: 252415
Peer host: barbun39
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15284,1],35]
Exit code: 9
--------------------------------------------------------------------------

DPDK MLX5 PMD driver probe issue

I'm not able to use the mlx5 pmd driver with some Mellanox NICs I have installed on my server. The error I'm receiving during EAL initialization is:
et_mlx5: no Verbs device matches PCI device 0000:03:00.0, are kernel drivers loaded?
The DPDK version I'm currently using is: DPDK-STABLE-18.11
I have installed the OFED latest version:
mlnx-en-4.5-1.0.1.0-ubuntu16.04-x86_64
I have performed modprobe of the ib_uverbs kernel module
Here's the kernel version I'm using
moragalu#server:~$ uname -r
4.4.0-143-generic
Here are the NIC models:
moragalu#server:~$ lspci | grep Mell
03:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
03:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
The firmware version that the NICs are using:
moragalu#eridium03:~$ ethtool -i eridium25-03
driver: mlx5_core
version: 4.5-1.0.1
firmware-version: 14.24.1000 (MT_2420110034)
expansion-rom-version:
bus-info: 0000:06:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
The complete output is of the eal initialization is:
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.0, are kernel drivers loaded?
EAL: Requested device 0000:03:00.0 cannot be used
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.1, are kernel drivers loaded?
EAL: Requested device 0000:03:00.1 cannot be used
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:06:00.0, are kernel drivers loaded?
EAL: Requested device 0000:06:00.0 cannot be used
EAL: PCI device 0000:06:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:06:00.1, are kernel drivers loaded?
EAL: Requested device 0000:06:00.1 cannot be used
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL: probe driver: 15b3:1015 net_mlx5
net_mlx5: no Verbs device matches PCI device 0000:03:00.1, are kernel drivers loaded?
EAL: Driver cannot attach the device (03:00.1)
EAL: Failed to attach device on primary process
Current modules loaded in the kernel:
moragalu#eridium03:~$ lsmod | grep ib
mlx5_ib 16384 0
mlx_compat 24576 4 mlx4_en,mlx5_ib,mlx4_core,mlx5_core
ib_uverbs 61440 0
ib_iser 49152 0
rdma_cm 49152 1 ib_iser
ib_cm 49152 1 rdma_cm
ib_sa 36864 2 rdma_cm,ib_cm
ib_mad 49152 2 ib_cm,ib_sa
ib_core 106496 7 rdma_cm,ib_cm,ib_sa,iw_cm,ib_mad,ib_iser,ib_uverbs
ib_addr 20480 2 rdma_cm,ib_core
I've had the same problems when using the RDMA-core libraries for the ibverbs dependency. In the past I've managed to find a bug in mlx5_core.c (hardcoded the no of queues to 8 in the probe function and it magically worked), but I'm not sure it's the same issue for you.
Either way, the problem went away when I installed the latest Mellanox OFED Drivers, so it's a good idea to try that. Just remember to install it using the command:
mlnxofedinstall --dpdk --upstream-libs
edit: Just noticed you have the drivers installed - make sure you did the installation as above. One more thing you can do: check the output of this (compiled with -libverbs):
#include <infiniband/verbs.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main() {
struct ibv_device ** devices;
int num;
struct ibv_context * ctx;
devices = ibv_get_device_list(&num);
int i;
if(devices[0] == NULL)
printf("devices is null\n");
printf("got %d devices\n", num);
for (i=0;i<num;i++) {
printf(ibv_get_device_name(devices[i]));
printf("\n");
ctx = ibv_open_device(devices[i]);
if (ctx == NULL)
printf("ctx is null \n");
else
printf("device opened\n");
}
if (errno)
printf("ERROR: %s\n", strerror(errno));
ibv_free_device_list(devices);
return 0;
}
If it lists no devices at least you'll know it's an issue with the verbs drivers and not DPDK itself.

Error establishing a database connection EC2 Amazon

I hope you can help me. I can not stand having to keep restarting my ec2 instance on Amazon.
I have two wordpress sites hosted there. My sites have always worked well until two months ago, one of them started having this problem. I tried all ways pack up, and the only solution was to reconfigure.
Now that all was right with the two. The second site started the same problem. I think Amazon is clowning me.
I am using a free micro instance. If anyone knows what the problem is, please help me!
Your issue will be the limited memory that is allocated to the T1 Micro instances in EC2. I'm assuming you are using ANI Linux in this case and if an alternate version of Linux is used then you may have different locations for your log and config files.
Make sure you are the root user.
Have a look at your MySQL logs in the following location:
/var/log/mysqld.log
If you see repeated instances of the following it's pretty certain that the 0.6GB of memory allocated to the micro instance is not cutting it.
150714 22:13:33 InnoDB: Initializing buffer pool, size = 12.0M
InnoDB: mmap(12877824 bytes) failed; errno 12
150714 22:13:33 InnoDB: Completed initialization of buffer pool
150714 22:13:33 InnoDB: Fatal error: cannot allocate memory for the buffer pool
150714 22:13:33 [ERROR] Plugin 'InnoDB' init function returned error.
150714 22:13:33 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
150714 22:13:33 [ERROR] Unknown/unsupported storage engine: InnoDB
150714 22:13:33 [ERROR] Aborting
You will notice in the log excerpt above that my buffer pool size is set to 12MB. This can be configured by adding the line innodb_buffer_pool_size = 12M to your MySQL config file /etc/my.cnf.
A pretty good way to deal with InnoDB chewing up your memory is to create a swap file.
Start by checking the status of your memory:
free -m
You will most probably see that your swap is not doing much:
total used free shared buffers cached
Mem: 592 574 17 0 15 235
-/+ buffers/cache: 323 268
Swap: 0 0 0
To start ensure you are logged in as the root user and run the following command:
dd if=/dev/zero of=/swapfile bs=1M count=1024
Wait for a bit as the command is not verbose but you should see the following response after about 15 seconds when the process is complete:
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 31.505 s, 34.1 MB/s
Next set up the swapspace with:
mkswap /swapfile
Now set up the swap event:
swapon /swapfile
If you get a permissions response you can ignore it or address the swap file by changing the permissions to 600 with the chmod command.
chmod 600 /swapfile
Now add the following line to /etc/fstab to create the swap spaces on server start:
/swapfile swap swap defaults 0 0
Restart your MySQL instance:
service mysqld restart
Finally check to see if your swap file is working correctly with the free -m command.
You should see something like:
total used free shared buffers cached
Mem: 592 575 16 0 16 235
-/+ buffers/cache: 323 269
Swap: 1023 0 1023
Hope this helps.

Scaling nginx with static files -- non-Persistent requests kill req/s

Working on a project where we need to server a small static xml file ~40k / s.
All incoming requests are sent to the server from HAProxy. However, none of the requests will be persistent.
The issue is that when benchmarking with non-Persistent requests, the nginx instance caps out at 19 114 req/s. When persistent connections are enabled, performance increases by nearly an order of magnitude, to 168 867 req/s. The results are similar with G-wan.
When benchmarking non-persistent requests, CPU usage is minimal.
What can I do to increase performance with non-persistent connections and nginx?
[root#spare01 lighttpd-weighttp-c24b505]# ./weighttp -n 1000000 -c 100 -t 16 "http://192.168.1.40/feed.txt"
finished in 52 sec, 315 millisec and 603 microsec, 19114 req/s, 5413 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 290000000 bytes total, 231000000 bytes http, 59000000 bytes data
[root#spare01 lighttpd-weighttp-c24b505]# ./weighttp -n 1000000 -c 100 -t 16 -k "http://192.168.1.40/feed.txt"
finished in 5 sec, 921 millisec and 791 microsec, 168867 req/s, 48640 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 294950245 bytes total, 235950245 bytes http, 59000000 bytes data
Your 2 tests are similar (except HTTP Keep-Alives):
./weighttp -n 1000000 -c 100 -t 16 "http://192.168.1.40/feed.txt"
./weighttp -n 1000000 -c 100 -t 16 -k "http://192.168.1.40/feed.txt"
And the one with HTTP Keep-Alives is 10x faster:
finished in 52 sec, 19114 req/s, 5413 kbyte/s
finished in 5 sec, 168867 req/s, 48640 kbyte/s
First, HTTP Keep-Alives (persistant connections) make HTTP requests run faster because:
Without HTTP Keep-Alives, the client must establish a new CONNECTION for EACH request (this is slow because of the TCP handshake).
With HTTP Keep-Alives, the client can send all requests at once (using the SAME CONNECTION). This is faster because there are less things to do.
Second, you say that the static file XML size is "small".
Is "small" nearer to 1 KB or 1 MB? We don't know. But that makes a huge difference in terms of available options to speedup things.
Huge files are usually served through sendfile() because it works in the kernel, freeing the usermode server from the burden of reading from disk and buffering.
Small files can use more flexible options available for application developers in usermode, but here also, file size matters (bytes and kilobytes are different animals).
Third, you are using 16 threads with your test. Are you really enjoying 16 PHYSICAL CPU Cores on BOTH the client and the server machines?
If that's not the case, then you are simply slowing-down the test to the point that you are no longer testing the web servers.
As you see, many factors have an influence on performance. And there are more with OS tuning (the TCP stack options, available file handles, system buffers, etc.).
To get the most of a system, you need to examinate all those parameters, and pick the best for your particular exercise.

Resources