I am running a few Xen servers on my company network. Recently, on one of them, I have been trying to rsync (on the Dom0 console) a big server image from another machine, but every time run in to a system crash after somewhere between 30 and 100 GB. The syslog and kernel log show me something like this:
Sep 12 16:41:19 ampxen1 kernel: [ 1730.917516] attempt to access beyond end of device
Sep 12 16:41:19 ampxen1 kernel: [ 1730.917518] dm-1: rw=1, want=8878402463988083936, limit=3759505408
Sep 12 16:41:19 ampxen1 kernel: [ 1730.917520] EXT4-fs warning (device dm-1): ext4_end_bio:323: I/O error 10 writing to inode 33030164 (offset 47354740736 size 5881856 starting block 1109800307998510491)
...continuing with several hundred thousands of similar lines per second, eventually making the machine unreachable. The very high number of the starting block of the EXT4-write operation (that's 10^18 or the exabyte range) is clearly what to look at, but I am unable to find any mention of what could be the cause.
The server is based on ubuntu-18.04.03, standard xen install from the repositories. Storage is two 2TB disks in RAID1, configured as seen below, EXT4 filesystem on the large partition used for our server images. I have checked the disks with smartctl and the file system(s) with e2fsck, for what it's worth. It seems to be a file system issue, but I am wondering whether the xen kernel could be involved. Any ideas of what to look for would be appreciated!
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 500G 0 loop
sda 8:0 0 1,8T 0 disk
├─sda1 8:1 0 476M 0 part /boot/efi
├─sda2 8:2 0 1,8T 0 part
│ └─md0 9:0 0 1,8T 0 raid1
│ ├─ampxen1.0-ampxen1.dom0 253:0 0 23,3G 0 lvm /
│ └─ampxen1.0-ampxen1.vms0 253:1 0 1,8T 0 lvm /srv/vms0
└─sda3 8:3 0 46,5G 0 part [SWAP]
sdb 8:16 0 1,8T 0 disk
├─sdb1 8:17 0 476M 0 part
├─sdb2 8:18 0 1,8T 0 part
│ └─md0 9:0 0 1,8T 0 raid1
│ ├─ampxen1.0-ampxen1.dom0 253:0 0 23,3G 0 lvm /
│ └─ampxen1.0-ampxen1.vms0 253:1 0 1,8T 0 lvm /srv/vms0
└─sdb3 8:19 0 46,5G 0 part [SWAP]
I finally figured out that the problem was something as trivial as a faulty RAM block – running a memtest showed lots of errors on one of the four 16GB blocks. It seems that memory was only maxed out exactly when copying large files, while my existing virtual servers on the machine were running just fine at all other times.
Related
I have a very odd problem in a proxy cluster of four Squid proxies:
One of the machine is the master. The mater is running ldirectord which is checking the availability of all four machines, distributing new client connections.
All over a sudden, after years of operation I'm encountering this problem:
1) The machine serving the master role is not being assigned new connections, old connections are served until a new proxy is assigned to the clients.
2) The other machines are still processing requests, taking over the clients from the master (so far, so good)
3) "ipvsadm -L -n" shows ever-decreasing ActiveConn and InActConn values.
Once I migrate the master role to another machine, "ipvsadm -L -n" is showing lots of active and inactive connections, until after about an hour the same thing happens on the new master.
Datapoint: This happened again this afternoon, and now "ipvsadm -L -n" shows:
TCP 141.42.1.215:8080 wlc persistent 1800
-> 141.42.1.216:8080 Route 1 98 0
-> 141.42.1.217:8080 Route 1 135 0
-> 141.42.1.218:8080 Route 1 1 0
-> 141.42.1.219:8080 Route 1 2 0
No change in the numbers quite some time now.
Some more stats (ipvsadm -L --stats -n):
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP 141.42.1.215:8080 1990351 87945600 0 13781M 0
-> 141.42.1.216:8080 561980 21850870 0 2828M 0
-> 141.42.1.217:8080 467499 23407969 0 3960M 0
-> 141.42.1.218:8080 439794 19364749 0 2659M 0
-> 141.42.1.219:8080 521378 23340673 0 4335M 0
Value for "Conns" is constant now for all realservers and the virtual server now. Traffic is still flowing (InPkts increasing).
I examined the output of "ipvsadm -L -n -c" and found:
25 FIN_WAIT
534 NONE
977 ESTABLISHED
Then I waited a minute and got:
21 FIN_WAIT
515 NONE
939 ESTABLISHED
It turns out that a local bird installation was injecting router for the IP of the virtual server and thus taking precedence over ARP.
I have an old Sybase server whose database is acting up. I have tried rebuilding the file-system and the database file. But the problem returns. I want to replace the hard drive that the database-files and transaction-files are stored. I want to determine exactly which hard drive it is because I am not familiar with Unix. Moreover, I also want to see if those files are stored in the same hard drive as the operating system or not; if they are, I will need to re-install the operating system as well as restoring the database to the new hard drive. Obviously this will be better if the database-files and the transaction-files are not in the same hard drive as the operating system. Please help me to determine these two things.
So far, I have found these:
(1) I use sp_helpdb command and find that the database files and the transaction files are stored in these logical devices:
sybdbs
syblogs
master
sybdbs2
(2) I use sp_helpdevice command to look into those 4 logical devices shown above, and find that those logical devices are in these physical devices:
/dev/rdsk/c0t0d0s1
/dev/rdsk/c0t3d0s4
d_master
/dev/rdsk/sybdbs2
(3) When I use sp_helpdevice to show all the physical devices, I see this:
device_name physical_name description status cntrltype device_number low high
------------------ ------------------------------------------- ------------------------------------------------ ------ --------- ------------- -------- --------
historydump /export/home/syb11.dump/history.dump disk, dump device 16 2 0 0 0
isproddump /export/home/syb11.dump/isprod.dump disk, dump device 16 2 0 0 0
istestdump /export/home/syb11.dump/istest.dump disk, dump device 16 2 0 0 0
master d_master special, physical disk, 100.00 MB 2 0 0 0 51199
masterdump /export/home/syb11.dump/master.dump disk, dump device 16 2 0 0 0
modeldump /export/home/syb11.dump/model.dump disk, dump device 16 2 0 0 0
prodtestdump /export/home/syb11.dump/prodtest.dump disk, dump device 16 2 0 0 0
sybdbs /dev/rdsk/c0t0d0s1 special, default disk, physical disk, 2000.00 MB 3 0 3 50331648 51355647
sybdbs2 /dev/rdsk/sybdbs2 special, physical disk, 1.00 MB 2 0 5 83886080 83886591
syblogs /dev/rdsk/c0t3d0s4 special, physical disk, 850.00 MB 2 0 4 67108864 67544063
sybscurty /dev/rdsk/c0t3d0s5 special, physical disk, 100.00 MB 2 0 2 33554432 33605631
sybsecuritydump /export/home/syb11.dump/sybsecurity.dump disk, dump device 16 2 0 0 0
sybsystemprocsdump /export/home/syb11.dump/sybsystemprocs.dump disk, dump device 16 2 0 0 0
sysprocsdev /dev/rdsk/c0t0d0s4 special, physical disk, 100.00 MB 2 0 1 16777216 16828415
tapedump1 /dev/rmt4 tape, 625 MB, dump device 16 3 0 0 20000
tapedump2 /dev/rst0 disk, dump device 16 2 0 0 20000
uniface724dump /export/home/syb11.dump/uniface724.dump disk, dump device 16 2 0 0 0
uniface7dump /export/home/syb11.dump/uniface7.dump disk, dump device 16 2 0 0 0
(4) I want to know more about those physical devices. I use the df command to examine them:
df -k /dev/rdsk/c0t0d0s1
df -k /dev/rdsk/c0t3d0s4
df -k d_master
df -k /dev/rdsk/sybdbs2
The df command complains that the first three devices are “not a block device, directory or mounted resource”.
On the other hand, the df command shows the following info for the last device:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t3d0s0 576558 371019 147889 71% /
In any case, this doesn’t tell me which drive(s) those devices are on.
(5) When I use the mount command, I see this:
/ on /dev/dsk/c0t3d0s0 read/write/setuid on Mon Jul 6 11:10:46 2015
/usr on /dev/dsk/c0t3d0s6 read/write/setuid on Mon Jul 6 11:10:46 2015
/proc on /proc read/write/setuid on Mon Jul 6 11:10:46 2015
/dev/fd on fd read/write/setuid on Mon Jul 6 11:10:46 2015
/tmp on swap read/write on Mon Jul 6 11:10:49 2015
/export on /dev/dsk/c0t3d0s7 setuid/read/write on Mon Jul 6 11:10:49 2015
/freespace on /dev/dsk/c0t0d0s5 setuid/read/write on Mon Jul 6 11:10:49 2015
/sybase on /dev/dsk/c0t0d0s0 setuid/read/write on Mon Jul 6 11:10:49 2015
/usr/openwin on /dev/dsk/c0t3d0s3 setuid/read/write on Mon Jul 6 11:10:49 2015
I cannot figure out the connection between the mounted devices above to the physical devices for the database-files and the transaction-files. I also cannot link the mounted devices above to the hard drives shown in the next section.
(6) When I use the cat /etc/vfstab command, I see these:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes -
/proc - /proc proc - no -
fd - /dev/fd fd - no -
swap - /tmp tmpfs - yes -
/dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no -
/dev/dsk/c0t3d0s7 /dev/rdsk/c0t3d0s7 /export ufs 2 yes -
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /freespace ufs 2 yes -
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 /sybase ufs 2 yes -
/dev/dsk/c0t3d0s3 /dev/rdsk/c0t3d0s3 /usr/openwin ufs 2 yes -
/dev/dsk/c0t3d0s1 - - swap - no -
# The following lines have been commented-out to allow Sybase to access these
# partitions and Raw Partitions. Nov-24-1999
# /dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /master ufs 2 yes -
# /dev/dsk/c0t0d0s1 /dev/rdsk/c0t0d0s1 /sybdbs ufs 2 yes -
# /dev/dsk/c0t3d0s4 /dev/rdsk/c0t3d0s4 /syblogs ufs 2 yes -
# /dev/dsk/c0t3d0s5 /dev/rdsk/c0t3d0s5 /sybscurty ufs 2 yes -
# /dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /sybtemproc ufs 2 yes -
(7) When I use the format command, I see these two hard drives:
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <IBM-DNES-309170-SA30 cyl 11195 alt 2 hd 5 sec 320>
/iommu#f,e0000000/sbus#f,e0001000/espdma#f,400000/esp#f,800000/sd#0,0
1. c0t3d0 <SEAGATE-ST34520N-1206 cyl 9004 alt 2 hd 4 sec 246>
/iommu#f,e0000000/sbus#f,e0001000/espdma#f,400000/esp#f,800000/sd#3,0
(8) I don’t see any external device attached to the Sybase server. Having said this, there is a backup Sybase server, and the backup Sybase server has an external device attached to it (through a SCSI cable). At this point, I assume the database-files and transaction-files are all stored inside the Sybase server.
By the way, the Sybase server uses this Unix operating system:
SunOS <my-server-name> 5.4 Generic_101945-62 sun4m sparc
And the Sybase version is:
SQL Server/11.0.3.2/P/Sun_svr4/OS 5.4/SWR 7578 Rollup/OPT/Mon Nov 3 22:19:21 PST 1997
By the way, what I have tried so far to repair the database are:
• Tried dbcc checkalloc(, fix). Unfortunately, this command could not fix and could not complete.
• Tried drop-db/add-new-db/restore-db-from-backup. Unfortunately the restore failed to complete.
• Tried fsck-to-fix-the-devices. It could not complete and complained about “MAGIC NUMBER WRONG”.
• Tried Analyze-option-in-format-command-to-repair-Disk-0, and then add-new-db and restore-db-from-backup. This method seemed to work. But after one week or so, I found a table has I/O error. Honestly, I don’t even know if the databases are really in Disk-0 or not.
Please help me to determine which hard drive those database-files and transaction-files are stored, and whether they are in the same hard drive as the Unix operating system.
Thanks in advance.
Jay Chan
I hope you can help me. I can not stand having to keep restarting my ec2 instance on Amazon.
I have two wordpress sites hosted there. My sites have always worked well until two months ago, one of them started having this problem. I tried all ways pack up, and the only solution was to reconfigure.
Now that all was right with the two. The second site started the same problem. I think Amazon is clowning me.
I am using a free micro instance. If anyone knows what the problem is, please help me!
Your issue will be the limited memory that is allocated to the T1 Micro instances in EC2. I'm assuming you are using ANI Linux in this case and if an alternate version of Linux is used then you may have different locations for your log and config files.
Make sure you are the root user.
Have a look at your MySQL logs in the following location:
/var/log/mysqld.log
If you see repeated instances of the following it's pretty certain that the 0.6GB of memory allocated to the micro instance is not cutting it.
150714 22:13:33 InnoDB: Initializing buffer pool, size = 12.0M
InnoDB: mmap(12877824 bytes) failed; errno 12
150714 22:13:33 InnoDB: Completed initialization of buffer pool
150714 22:13:33 InnoDB: Fatal error: cannot allocate memory for the buffer pool
150714 22:13:33 [ERROR] Plugin 'InnoDB' init function returned error.
150714 22:13:33 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
150714 22:13:33 [ERROR] Unknown/unsupported storage engine: InnoDB
150714 22:13:33 [ERROR] Aborting
You will notice in the log excerpt above that my buffer pool size is set to 12MB. This can be configured by adding the line innodb_buffer_pool_size = 12M to your MySQL config file /etc/my.cnf.
A pretty good way to deal with InnoDB chewing up your memory is to create a swap file.
Start by checking the status of your memory:
free -m
You will most probably see that your swap is not doing much:
total used free shared buffers cached
Mem: 592 574 17 0 15 235
-/+ buffers/cache: 323 268
Swap: 0 0 0
To start ensure you are logged in as the root user and run the following command:
dd if=/dev/zero of=/swapfile bs=1M count=1024
Wait for a bit as the command is not verbose but you should see the following response after about 15 seconds when the process is complete:
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 31.505 s, 34.1 MB/s
Next set up the swapspace with:
mkswap /swapfile
Now set up the swap event:
swapon /swapfile
If you get a permissions response you can ignore it or address the swap file by changing the permissions to 600 with the chmod command.
chmod 600 /swapfile
Now add the following line to /etc/fstab to create the swap spaces on server start:
/swapfile swap swap defaults 0 0
Restart your MySQL instance:
service mysqld restart
Finally check to see if your swap file is working correctly with the free -m command.
You should see something like:
total used free shared buffers cached
Mem: 592 575 16 0 16 235
-/+ buffers/cache: 323 269
Swap: 1023 0 1023
Hope this helps.
I've installed Qt for Embedded Linux (Qt 4.8.5) using the official installation guide. However, when I run any of the installed examples in QtCreator 3.0.1, I get the following error:
QWSSocket::connectToLocalFile could not connect:: Connection refused
No Qt for Embedded Linux server appears to be running.
If you want to run this program as a server,
add the "-qws" command-line option.
And if I run using the -qws option, I get:
QScreenLinuxFb::connect: Permission denied Error opening framebuffer
/dev/fb0 The program has unexpectedly finished.
From google results, I can see that it's related to permission settings, probably with framebuffer device (/dev/fb0). Following is the output of ls -al /dev/fb0 on my ubuntu 12.04 LTS system:
$ ls -al /dev/fb0
crw-rw---- 1 root video 29, 0 Apr 21 22:43 /dev/fb0
I've added the currently logged in user to the video group that /dev/fb0 belongs to. Still I'm getting the permission denied error.
If I run the /examples/qws/framebuffer example using sudo, I'm getting the following output:
The framebuffer device was opened successfully.
Fixed screen info:
id: inteldrmfb
smem_start: 0xc0073000
smem_len: 5763072
type: 0
type_aux: 0
visual: 2
xpanstep: 1
ypanstep: 1
ywrapstep: 0
line_length: 6400
mmio_start: 0x0
mmio_len: 0
accel: 0
The framebuffer device was mapped to memory successfully.
Was in graphics mode already. Skipping
Variable screen info:
xres: 1600
yres: 900
xres_virtual: 1600
yres_virtual: 900
yoffset: 0
xoffset: 0
bits_per_pixel: 32
grayscale: 0
red: offset: 16, length: 8, msb_right: 0
green: offset: 8, length: 8, msb_right: 0
blue: offset: 0, length: 8, msb_right: 0
transp: offset: 0, length: 0, msb_right: 0
nonstd: 0
activate: 0
height: -1
width: -1
accel_flags: 0x1
pixclock: 0
left_margin: 0
right_margin: 0
upper_margin: 0
lower_margin: 0
hsync_len: 0
vsync_len: 0
sync: 0
vmode: 0
Frame Buffer Performance test...
Average: 916 usecs
Bandwidth: 6000.102 MByte/Sec
Max. FPS: 1091.703 fps
Will draw 3 rectangles on the screen,
they should be colored red, green and blue (in that order).
Done.
However, I do not see the 3 rectangles on screen.
Could someone please help outline how to run these Qt for Embedded Linux (Qt 4.8.5) examples on the Ubuntu desktop environment?
I am also using a 12.04 system only but with Qt 4.6.1.
I ran into the same issues as you and found that, when I run using -qws from console (Ctrl + Alt + F1) terminal the GUI can be seen.
When run from the GUI terminal or Qt Creator, though the window is presented, the OS's GUI (X11) overwrites the screen with its contents during screen refresh.
If you want to be able to run Qt GUI apps from Qt Creator or a GUI OS, qvfb needs to be running in the background.
This link throws some information on this : http://qt-project.org/doc/qt-4.8/qvfb.html
Hope this helps !
Hello,I use function "Domain.InterfaceStats" to get network i/o statistics,but always get things like these:vif1.0 rx_bytes 0 vif1.0 rx_packets 0 vif1.0 rx_errs 0 vif1.0 rx_drop 0
vif1.0 tx_bytes 0 vif1.0 tx_packets 0 vif1.0 tx_errs 0 vif1.0 tx_drop 0. and it seems statistics is always zero. How can I fix it?
In addition,I use HVM.and xen version is 4.2, libvirt version is 1.1.2.and network is set as default just bridge.
Thank you.
if the vm system is window, maybe install gplpv driver can be helpful.