Buffered and Cache memory in Solaris - unix

how to get the Buffer, Cache memory and Block in-out in Solaris ? For Example: In Linux I can get it using vmstat. vmstat in Linux gives
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
Where as vmstat in Solaris doesn't give buff and cache under ------memory----. Also there is no -----io----. How to get these fields on Solaris ?

Kernel memory:
kstat -p > /var/tmp/kstat-p
more details kernel memory statistics:
kstat -p -c kmem_cache
kstat -p -m vmem
kstat -p -c vmem
alternative:
echo “::kmastat” | mdb -k > /var/tmp/kmastat
Do not use iostat that way,
try to show busy disks with realtime sampling (you want this to start with):
iostat -xmz 2 4 # -> 2 seconds sampling time, 4 sampling intervals
show historical average data:
iostat -xm

Related

Parallel execution of Unix command?

I wrote one shell program which divide the files in 4 parts automatically using csplit and then four shell program which execute same command in background using nohup and one while loop will look for the completion of these four processes and finally cat output1.txt ....output4.txt > finaloutput.txt
But then i came to know about this command parallel and i tried this with big file but looks like it is not working as expected. This file is an output of below command -
for i in $(seq 1 1000000);do cat /etc/passwd >> data.txt1;done
time wc -l data.txt1
10000000 data.txt1
real 0m0.507s
user 0m0.080s
sys 0m0.424s
with parallel
time cat data.txt1 | parallel --pipe wc -l | awk '{s+=$1} END {print s}'
10000000
real 0m41.984s
user 0m1.122s
sys 0m36.251s
And when i tried this for 2GB file(~10million) records it took more than 20 minutes.
Does this command only work on multi core system(I am using single core system currently)
nproc --all
1
--pipe is inefficient (though not at the scale your are measuring - something is very wrong on your system). It can deliver in the order of 1 GB/s (total).
--pipepart is, on the contrary, highly efficient. It can deliver in the order of 1 GB/s per core, provided your disk is fast enough. This should be the most efficient ways of processing data.txt1. It will split data.txt1 in into one block per cpu core and feed those blocks into a wc -l running on each core:
parallel --block -1 --pipepart -a data.txt1 wc -l
You need version 20161222 or later for block -1 to work.
These are timings from my old dual core laptop. seq 200000000 generates 1.8 GB of data.
$ time seq 200000000 | LANG=C wc -c
1888888898
real 0m7.072s
user 0m3.612s
sys 0m2.444s
$ time seq 200000000 | parallel --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 1m28.101s
user 0m25.892s
sys 0m40.672s
The time here is mostly due to GNU Parallel spawning a new wc -c for each 1 MB block. Increasing the block size makes it faster:
$ time seq 200000000 | parallel --block 10m --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m26.269s
user 0m8.988s
sys 0m11.920s
$ time seq 200000000 | parallel --block 30m --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m21.628s
user 0m7.636s
sys 0m9.516s
As mentioned --pipepart is much faster if you have data in a file:
$ seq 200000000 > data.txt1
$ time parallel --block -1 --pipepart -a data.txt1 LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m2.242s
user 0m0.424s
sys 0m2.880s
So on my old laptop I can process 1.8 GB in 2.2 seconds.
If you have only one core and your work is CPU dependent, then parallelizing will not help you. Parallelizing on a single core machine can make sense if most of the time is spent waiting (e.g. waiting for the network).
However, the timings from your computer tells me something is very wrong with that. I will recommend you test your program on another computer.
In short yes.. You will need more physical cores on the machines to get benefit from the parallel. Just for understanding your task ; following is what you intend to do
file1 is a 10,000,000 line file
split into 4 files >
file1.1 > processing > output1
file1.2 > processing > output2
file1.3 > processing > output3
file1.4 > processing > output4
>> cat output* > output
________________________________
And You want to parallelize the middle part and run it on 4 cores (hopefully 4 cores) simultaneously. Am I correct? I think you can use GNU parallel in much better way write a code for 1 of the files and use that command with (psuedocode warning )
parallel --jobs 4 "processing code on the file segments with sequence variable {}" ::: 1 2 3 4
Where -j is for number of processors.
UPDATE
Why are you trying parallel command for sequential execution within your file1.1 1.2 1.3 and 1.4?? Let it be regular sequential processing as you have coded
parallel 'for i in $(seq 1 250000);do cat file1.{} >> output{}.txt;done' ::: 1 2 3 4
The above code will run your 4 segmented files from csplit in parallel on 4 cores
for i in $(seq 1 250000);do cat file1.1 >> output1.txt;done
for i in $(seq 1 250000);do cat file1.2 >> output2.txt;done
for i in $(seq 1 250000);do cat file1.3 >> output3.txt;done
for i in $(seq 1 250000);do cat file1.4 >> output4.txt;done
I am pretty sure that --diskpart as suggested above by Ole is the better way to do it ; given that you have high speed data access from HDD.

Multithreaded program only runs on a single processor after compiling, how do I troubleshoot?

I am trying to run a compiled program that is supposed to be running on multiple processors. But with the same data, sometimes this program runs in parallel and sometimes it won't (with the identical PBS script file!). I am suspecting that something is wrong with some of the compute nodes that won't let it run on parallel (I don't get to choose the compute node I want). How can I troubleshoot if this is a bug in the program or it is problem with the compute node?
As per the sys admin's adivce, I am using ulimit -s 100000, but this don't change anything. Also, this program is not an mpi program (runs only on a single node, with multiple processors).
The code that I run is as follows:
quorum_error_correct_reads -q 68 \
--contaminant=/data004/software/GIF/packages/masurca/2.3.0rc1/bin/../share/adapter.jf \
-m 1 -s 1 -g 1 -a 3 --thread=32 -w 10 -e 3 \
quorum_mer_db.jf aa.renamed.fastq ab.renamed.fastq ac.renamed.fastq ad.renamed.fastq ae.renamed.fastq af.renamed.fastq ag.renamed.fastq \
--no-discard -o pe.cor --verbose
Thanks for any advice you can offer. I will greatly appreciate your help!
PS: I don't have sudo access.
EDIT: I know it is supposed to be using multiple processors because, when I SSH into the node and do top -c I can see (above command) sometimes running like 3200 % CPU (all the time) and sometimes only 100 % CPU all the time. This is the only step involved and there are no other sub-process within this program. Also, I am using HPC, where I submit the job to a compute node, each with 32 procs, 512GB RAM.

Mounting VMDK disk image

I have a single vmware disk image file with vmdk extension
I am trying to mount this and explore all of the partitions (including hidden ones).
I've tried to follow several guides, such as : http://forums.opensuse.org/showthread.php/469942-mounting-virtual-box-machine-images-host
I'm able to mount the image using vdfuse
vdfuse -w -f windows.vmdk /mnt/
After this I can see one partition and an entire disk exposed
# ll /mnt/
total 41942016
-r-------- 1 te users 21474836480 Feb 28 14:16 EntireDisk
-r-------- 1 te users 1569718272 Feb 28 14:16 Partition1
Continuing with the guide I try to mount either EntireDisk or Partition1 using
mount -o loop,ro /mnt/Partition1 mnt2/
But that gives me the error 'mount: you must specify a filesystem type'
In trying to find the correct type I tried
dd if=/mnt/EntireDisk | file -
which outputs a ton of information but of note is:
/dev/stdin: x86 boot sector; partition 1: ....... FATs ....
So i tired to mount as a vfat but that gave me
mount: wrong fs type, bad option, bad superblock ...etc
What am I doing wrong?
For newer Linux systems, you can use guestmount to mount the third partition within a VMDK image:
guestmount -a xyz.vmdk -m /dev/sda3 --ro /mnt/vmdk
Alternatively, to autodetect and mount an image (less reliable), you can try:
guestmount -a xyz.vmdk -i --ro /mnt/vmdk
Do note that the flag --ro simply mounts the image as read-only; to mount the image as read-write, just replace it with the flag --rw.
Installation
guestmount is contained in following packages per distro:
Ubuntu: libguestfs-tools
OpenSuse: guestfs-tools
CentOS / Fedora: libguestfs-tools-c
Troubleshooting
error: could not create appliance through libvirt
$ guestmount -a file.vmdk -i --ro /mnt/guest
libguestfs: error: could not create appliance through libvirt.
Try running qemu directly without libvirt using this environment variable:
export LIBGUESTFS_BACKEND=direct
Original error from libvirt: Cannot access backing file '/path/to/file.vmdk' of storage file '/tmp/libguestfssF6WKX/overlay1.qcow2' (as uid:107, gid:107): Permission denied [code=38 int1=13]
Solution: use LIBGUESTFS_BACKEND=direct, as suggested:
LIBGUESTFS_BACKEND=direct guestmount -a file.vmdk -i --ro /mnt/guest
fusermount: user has no write access to mountpoint
LIBGUESTFS_BACKEND=direct guestmount -a file.vmdk -i --ro /mnt/guest/
fusermount: user has no write access to mountpoint /mnt/guest
libguestfs: error: fuse_mount failed: /mnt/guest/, see error messages above
Solution: use sudo, or change file permissions on the mountpoint
You can also use qemu:
For .vdi disks
sudo modprobe nbd
sudo qemu-nbd -c /dev/nbd1 ./linux_box/VM/image.vdi
if they are not installed, you can install them (issuing this command in Ubuntu)
sudo apt install qemu-utils
and then mount it with:
mount /dev/nbd1p1 /mnt
For .vmdk disks
sudo modprobe nbd
sudo qemu-nbd -r -c /dev/nbd1 ./linux_box/VM/image.vmdk
notice that I use the option -r, that's because VMDK version 3 must be read only to be able to be mounted by qemu
and then I mount it with
mount /dev/nbd1p1 /mnt
I use nbd1, because nbd0 sometimes gives: 'mount: special device /dev/nbd0p1 does not exist'
For .ova disks
tar -tf image.ova
tar -xvf image.ova
The above will extract the .vmdk disk and then mount it.
Install affuse, then mount using it.
affuse /path/file.vmdk /mnt/vmdk
The raw disk image is now found under /mnt/vmdk.
Check its sector size:
fdisk -l /mnt/vmdk/file.vmdk.raw
# example
Disk file.vmdk.raw: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000da525
Device Boot Start End Sectors Size Id Type
/mnt/vmdk/file.vmdk.raw1 * 2048 41943039 41940992 20G 83 Linux
Multiply sector size and start sector. In the example it would be 2048*512:
echo '2048*512' | bc
1048576
Mount the raw file using that offset:
mount -o ro,loop,offset=1048576 /mnt/vmdk/file.raw /mnt/vmdisk
The disk should now be mounted and readable on /mnt/vmdisk.
Here is an answer from commandlinefu.com that worked for me:
kpartx -av <image-flat.vmdk>; mount -o /dev/mapper/loop0p1 /mnt/vmdk
You can also activate LVM volumes in the image by running
vgchange -a y
and then you can mount the LV inside the image.
To unmount the image, umount the partition/LV, deactivate the VG for the image
vgchange -a n <volume_group>
then run
kpartx -dv <image-flad.vmdk>
to remove the partition mappings.
You can take a look in this article for a download link for VMware Virtual Disk Development Kit (VDDK). Once downloaded and installed:
vmware-mount -p path_to_vmdk will show the partitions inside the VMDK file. For example:
Nr Start Size Type Id Sytem
-- ---------- ---------- ---- -- ------------------------
1 2048 461371392 BIOS 83 Linux
Then just do:
sudo vmware-mount path_to_vmdk 1 /mnt/mount_point
I tried guestmount, but it is very, very slow. Underneath it creates a virtual machine, uses KVM and so on. Crazy stuff, slow as hell.
Have you got the software package for ntfs?
Try
apt-get install ntfs-3g
on debian based systems.

Wget Hanging, Script Stops

Evening,
I am running a lot of wget commands using xargs
cat urls.txt | xargs -n 1 -P 10 wget -q -t 2 --timeout 10 --dns-timeout 10 --connect-timeout 10 --read-timeout 20
However, once the file has been parsed, some of the wget instances 'hang.' I can still see them in system monitor, and it can take about 2 minutes for them all to complete.
Is there anyway I can specify that the instance should be killed after 10 seconds? I can re-download all the URLs that failed later.
In system monitor, the wget instances are shown as sk_wait_data when they hang. xargs is there as 'do_wait,' but wget seems to be the issue, as once I kill them, my script continues.
I believe this should do it:
wget -v -t 2 --timeout 10
According to the docs:
--timeout: Set the network timeout to seconds seconds. This is equivalent to specifying
--dns-timeout, --connect-timeout, and --read-timeout, all at the same time.
Check the verbose output too and see more of what it's doing.
Also, you can try:
timeout 10 wget -v -t 2
Or you can do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec wget -v -t 2 )
(As seen in: BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
GNU Parallel can download in parallel, and retry the process after a timeout:
cat urls.txt | parallel -j10 --timeout 10 --retries 3 wget -q -t 2
If the time for an url to be fetched changes (e.g. due to faster internet connection), you can let GNU Parallel figure out the timeout:
cat urls.txt | parallel -j10 --timeout 1000% --retries 3 wget -q -t 2
This will make GNU Parallel record the median time for a successful job and set the timeout dynamically to 10 times that.

PBS scheduler assigning same processor for an MPI program of 3 processors

I am doing MPI programming on a cluster with 8 nodes and each having a Intel Xeon hexcore processor. I need three processors for my mpi code.
I submit the job using qsub. When I check on which processors the job is running using "qstat -n" it says something like cn004/0*3 .
So does this mean it is running it on only one processor ??
Because it is not speeding up than when I use a single processor(This is when the domain size is the same for both cases)
The script i use for submitting is as follows
#! /bin/bash
#PBS -o logfile.log
#PBS -e errorfile.err
#PBS -l cput=40:00:00
#PBS -lselect=1:ncpus=3:ngpus=3
#PBS -lplace=excl
cat $PBS_NODEFILE
cd $PBS_O_WORKDIR
mpicc -g -W -c -I /usr/local/cuda/include mpi1.c
mpicc -g -W mpi1.o -L /usr/local/cuda/lib64 -lOpenCL
mpirun -np 3 ./a.out
"qstat -n" it says something like cn004/0*3.
Q: So does this mean it is running it on only one processor ??
The short answer is "no". This does not mean that it runs on one processor.
"cn004/0*3" should be interpreted as "The job is allocated three cpu cores. And if we were to number the cores from 0 to 5 then the cores allocated would have numbers 0,1,and 2".
If another job were to run on the node it would receive the next three consecutive numbers "3,4, and 5". In the qstat -n output this would look like "cn004/3*3".
You use the directive place=excl to ensure that other jobs would not get the node, so essentially all the six cores are available.
Now for your second question:
Q: it is not speeding up than when I use a single processor
In order to answer this question we need to know if the algorithm is parallelized correctly.

Resources