Graphite. Some metrics lost, but seen in tcpdump - graphite

I'm using graphite for pretty long time, and first time facing issue with some metrics getting… lost?
Through tcpdump -nA dst port 2003 I can see that metrics are delivered to Graphite node.
Also, some of them are getting created in whisper database, and seen in /var/log/carbon/updates.log
But most of them are not appearing anywhere.
So my question is: how do I debug it? How do I prove that Graphite really receives these metrics from eth0?
I couldn't find any debug logs except for updates.log in carbon.
Log:
sudo tcpdump -An dst port 2003 | grep 172_31_00_01 | grep requests
backend.dev.172_31_00_01.requests.max 60554.34 1453734067
backend.dev.172_31_00_01.requests.mean 16714.87 1453734067
backend.dev.172_31_00_01.requests.min 2.93 1453734067
backend.dev.172_31_00_01.requests.stddev 12185.74 1453734067
backend.dev.172_31_00_01.requests.p50 16415.87 1453734067
backend.dev.172_31_00_01.requests.p75 20314.51 1453734067
backend.dev.172_31_00_01.requests.p95 41526.36 1453734067
backend.dev.172_31_00_01.requests.p98 54370.59 1453734067
backend.dev.172_31_00_01.requests.p99 60368.68 1453734067
backend.dev.172_31_00_01.requests.p999 60553.31 1453734067
backend.dev.172_31_00_01.requests.count 3141 1453734067
backend.dev.172_31_00_01.requests.m1_rate 2.02 1453734067
backend.dev.172_31_00_01.requests.m5_rate 1.95 1453734067
backend.dev.172_31_00_01.requests.m15_rate 1.20 1453734067
backend.dev.172_31_00_01.requests.mean_rate 0.66 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.max 152.59 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.mean 41.86 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.min 0.82 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.stddev 24.84 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.p75 57.51 1453734067
backend.dev.172_31_00_01.requests.mark_sessionid_active.p95 85.78 1453734067
$ pwd
/var/lib/graphite/whisper/backend/dev/172_31_00_01/requests
$ ls -Rl
.:
total 1796
drwxr-xr-x 2 _graphite _graphite 4096 Jan 25 14:25 mark_sessionid_active
-rw-r--r-- 1 _graphite _graphite 1831744 Jan 25 15:05 mean.wsp
./mark_sessionid_active:
total 3584
-rw-r--r-- 1 _graphite _graphite 1831744 Jan 25 15:05 min.wsp
-rw-r--r-- 1 _graphite _graphite 1831744 Jan 25 15:05 stddev.wsp
PS: It's not new installation, it works for several months now, and no metrics was lost until today.

There is MAX_CREATES setting in carbon.conf. Setting it to high value (like 1000) or inf solves this.

Check the LOG_DIR variable in carbon.conf. In my case it's /var/log/carbon/ and I can see a lot of logs in there like console.log, creates.log, listener.log. I believe creates.log is the one you are after.
If the .wsp file is created but you cannot see it in Graphite directly, try trying to render it anyhow using the URL API and see if it works.

Related

How do I synchronize my gluster replicated volumes?

I want to use a gluster replication volume for sqlite db storage
However, when the '.db' file is updated, LINUX does not detect the change, so synchronization between bricks is not possible.
Is there a way to force sync?
It is not synchronized even if you use the gluster volume heal command.
< My Gluster volume status >
[root#be-k8s-worker-1 common]# gluster volume create sync_test replica 2 transport tcp 10.XX.XX.X1:/home/common/sync_test 10.XX.XX.X2:/home/common/sync_test
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this. See: http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
(y/n) y
volume create: sync_test: success: please start the volume to access data
[root#be-k8s-worker-1 common]# gluster volume start sync_test
volume start: sync_test: success
[root#be-k8s-worker-1 sync_test]# gluster volume status sync_test
Status of volume: sync_test
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.XX.XX.X1:/home/common/sync_test 49155 0 Y 1142
Brick 10.XX.XX.X2:/home/common/sync_test 49155 0 Y 2134
Self-heal Daemon on localhost N/A N/A Y 2612
Self-heal Daemon on 10.XX.XX.X1 N/A N/A Y 4257
Task Status of Volume sync_test
------------------------------------------------------------------------------
There are no active volume tasks
< Problem Case >
[root#be-k8s-worker-1 sync_test]# ls -al ## client 1
total 20
drwxrwxrwx. 4 root root 122 Oct 17 10:51 .
drwx------. 8 sbyun domain users 4096 Oct 17 10:50 ..
-rw-r--r--. 1 root root 0 Oct 17 10:35 test
-rwxr--r--. 1 sbyun domain users 16384 Oct 17 10:52 test.d
[root#be-k8s-worker-1 sync_test2]# ls -al ## client2
total 20
drwxrwxrwx. 4 root root 122 Oct 17 10:51 .
drwx------. 8 sbyun domain users 4096 Oct 17 10:50 ..
-rw-r--r--. 1 root root 0 Oct 17 10:35 test
-rwxr--r--. 1 sbyun domain users 16384 Oct 17 10:52 test.db
## diff -> No result
[root#be-k8s-worker-1 user]# diff sync_test/test.db sync_test2/test.db
But if I compare same file in windows
compare on windows
My SQLite database was set to WAL mode. So the wal file was being updated and the .db file was not immediately synced.
I turned off WAL Mode with this command:
PRAGMA journal_mode=DELETE;
I confirmed that it was synced immediately.
According to Sqlite document, It doesn't work over a network file system.
All processes using a database must be on the same host computer; WAL does not work over a network filesystem.

Discrepancy in disk usage

I'm troubleshooting an issue regarding disk size usage in a centOS system (one of the partitions was growing too fast), and I notice one of my directories has 3.1GB:
$ du -hs /var/log/mongodb/
3.1G /var/log/mongodb/
$ df -h /var/log/mongodb/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-log 4.0G 3.7G 324M 93% /var/log
However, when I analyse the directory contents, I realize it only has 1 file, and that file is not that large (2.1GB):
$ ls -larth /var/log/mongodb/
total 3.1G
drwxr-xr-x 2 mongod mongod 24 Jul 2 2019 .
drwxr-xr-x. 22 root root 4.0K May 1 03:50 ..
-rw-r----- 1 mongod mongod 2.1G May 1 08:41 mongod.log
How can this happen?
Stat command:
$ stat /var/log/mongodb/mongod.log
File: ‘/var/log/mongodb/mongod.log’
Size: 2448779949 Blocks: 4880912 IO Block: 4096 regular file
Device: fd08h/64776d Inode: 6291527 Links: 1
Access: (0640/-rw-r-----) Uid: ( 996/ mongod) Gid: ( 994/ mongod)
Access: 2020-05-01 10:02:57.136265481 +0000
Modify: 2020-05-04 10:05:37.409626901 +0000
Change: 2020-05-04 10:05:37.409626901 +0000
Birth: -
Another example in another host:
$ df -kh | grep var
/dev/dm-3 54G 52G 2.1G 97% /var[
$ du -khs /var/
25G /var/
Is this somehow related to the difference between file size and actual space on disk occupied (due to disk blocks)? If so, how can I perform a defragmentation/optimization?

fslmaths subtraction function gives error: output can't be found

the code will not create the output 'imdiff' but instead will say it can't find imdiff.
dyn896-105:intro sophiejacobs$ pwd
/Users/sophiejacobs/Downloads/preCourse/intro
dyn896-105:intro sophiejacobs$ ls
LThal_mask_func.nii.gz filtered_func_data.nii.gz
LThal_mask_std.nii.gz highres.nii.gz
bighead.nii.gz image0.nii.gz
bvals image1.nii.gz
cst2standard_73_46_26.nii.gz newfmri.nii.gz
diffdata.nii.gz standard.nii.gz
egepi.nii.gz structural.nii.gz
egfmri.nii.gz sub3m0.nii.gz
example_func.nii.gz thresh_zstat1.nii.gz
example_func2highres.mat wrapped.nii.gz
example_func2standard.mat
dyn896-105:intro sophiejacobs$ fslmaths image0 -sub image1 imdiff
libc++abi.dylib: terminating with uncaught exception of type `enter code here`NiftiIO::NiftiException: Error: cant open file imdiff.nii.gz
Abort trap: 6
I expected that image1 would be subtracted from image0 and that that new image would be called imdiff
I think I had the same error and came across your post when trying to fix it
NBrMBP:intro colette$ fslmaths image0 -sub image1 imdiff
libc++abi.dylib: terminating with uncaught exception of type NiftiException: Error: cant open file imdiff.nii.gz
Luckily my supervisor was able to help.
I did not have permission to the directory/folders/files that I downloaded from the FSL tutorial website, but worked after I changed the permissions
cd to the folder that has the files you are trying to use (for me this was a folder called preCourse)
use the command: ls –la
This lists details of the files in the directory. In the left hand column it shows the permissions with dr-xr-xr-x# which means I only had permission to read the files.
e.g:
`NBrMBP:preCourse colette$ ls -la
total 16
dr-xr-xr-x# 3 colette staff 102 21 Jul 2017 fmri
dr-xr-xr-x# 23 colette staff 782 21 Jul 2017 intro`
Use the command: chmod u+w <directoryname>
Run this from the folder above the one you would need permissions from. (u=user w = writing permissions)
e.g NBrMBP:preCourse colette$ chmod u+w intro
Now use ls –la again, it should now show drwxr-xr-x# in the left hand column i
e.g.:
NBrMBP:preCourse colette$ ls -la
total 16
drwx------# 5 colette staff 170 8 Jan 11:55 .
drwxr-xr-x 4 colette staff 136 8 Jan 11:55 ..
-rw-r--r--# 1 colette staff 6148 8 Jan 11:55 .DS_Store
drwxr-xr-x# 3 colette staff 102 21 Jul 2017 fmri
drwxr-xr-x# 26 colette staff 884 8 Jan 15:51 intro
*Note you may need admin permissions on your computer to be able to change the permissions of the read/write/executability of the files.
Hope this helps anyone else who searched the error code!

how many websites should i host on EC2 nano instance

I am developing a couple of websites, but I only have paid for an EC2 nano instance on AWS. How many websites could I possible host there, assuming the websites will only have minimum traffic? Most of the websites are for personal use only.
Only one way to find out ;)
No definite answer possible because it depends on a lot of factors.
But if traffic is really low you will only be limited by the amount of disk space and as t2.nano runs on EBS storage this can be as big as you want. So you could fit a lot of websites!
t2.nano has only 512Mb memory so best to pick a not-so-memory-hungry webserver such as ngnix.
I run five very low traffic websites on my t2 nano - four of them Wordpress, one custom PHP. I run Nginx, PHP5.6, and MySQl 5.6 on the same instance. Traffic is extremely light, in the region of 2000 pages a day, which is about a page every 30 seconds. If you include static resources it'll be higher. CloudFlare runs as the CDN, which reduces static resource consumption significantly, but doesn't cache pages.
I have MySQL on the instance configured to use very little memory, currently 141MB physical RAM. Nginx takes around 10MB RAM. I have four PHP workers, each taking 150MB RAM, but of that 130MB is shared, so it's really 20MB per worker after the first.
Here's the output of a quick performance test on the t2.nano. Note that the Nginx page cache will be serving all of the pages.
siege -c 50 -t10s https://www.example.com -i -q -b
Lifting the server siege... done.
Transactions: 2399 hits
Availability: 100.00 %
Elapsed time: 9.60 secs
Data transferred: 14.82 MB
Response time: 0.20 secs
Transaction rate: 249.90 trans/sec ***
Throughput: 1.54 MB/sec
Concurrency: 49.42
Successful transactions: 2399
Failed transactions: 0
Longest transaction: 0.36
Shortest transaction: 0.14
Here it is with nginx page caching turned off
siege -c 5 -t10s https://www.example.com -i -q -b
Lifting the server siege... done.
Transactions: 113 hits
Availability: 100.00 %
Elapsed time: 9.99 secs
Data transferred: 0.70 MB
Response time: 0.44 secs
Transaction rate: 11.31 trans/sec ***
Throughput: 0.07 MB/sec
Concurrency: 4.95
Successful transactions: 113
Failed transactions: 0
Longest transaction: 0.70
Shortest transaction: 0.33

How to find a working directory of running process(HPUX/Solaris/Linux/AIX)

Trying to get the home directory of running process.
For Linux, I learned that I could use /proc/PID/exe information, but I think there is no that information in other OS.
Assuming that there is no file information $PATH, can you let me know how I can get the home directory of the running process?
I just need to assume that OS utilities usage is very limited in OS, meaning that I should use very common command.
Condition:
No special utility such as lsof.
Added
The process I am referring is the process a 3rd-party application runs.
Thanks in advance.
The first column for ps -ef (the most common useful-options, in POSIX) give you the process owner, usually a name (sometimes only the uid number). For
example
UID PID PPID C STIME TTY TIME CMD
statd 1935 1 0 04:00 ? 00:00:00 /sbin/rpc.statd
101 2329 1 0 04:00 ? 00:00:00 /usr/bin/dbus-daemon --system
daemon 2511 1 0 04:00 ? 00:00:00 /usr/sbin/atd
avahi 2540 1 0 04:01 ? 00:00:00 avahi-daemon: running [vmw-de>
avahi 2541 2540 0 04:01 ? 00:00:00 avahi-daemon: chroot helper
bind 2593 1 0 04:01 ? 00:00:00 /usr/sbin/named -u bind
kdm 2781 2780 0 04:01 ? 00:00:01 /usr/lib/kde4/libexec/kdm_gre>
www-data 2903 2782 0 04:01 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2904 2782 0 04:01 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2905 2782 0 04:01 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2906 2782 0 04:01 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2908 2782 0 04:01 ? 00:00:00 /usr/sbin/apache2 -k start
ntp 2989 1 0 04:01 ? 00:00:00 /usr/sbin/ntpd -p /var/run/nt>
postgres 3059 1 0 04:01 ? 00:00:00 /usr/lib/postgresql/9.1/bin/p>
postgres 3063 3059 0 04:01 ? 00:00:00 postgres: writer process >
postgres 3064 3059 0 04:01 ? 00:00:00 postgres: wal writer process >
postgres 3065 3059 0 04:01 ? 00:00:00 postgres: autovacuum launcher>
postgres 3066 3059 0 04:01 ? 00:00:00 postgres: stats collector pro>
104 3555 1 0 04:01 ? 00:00:00 /usr/sbin/exim4 -bd -q30m
gitlog 3677 3676 0 04:01 ? 00:00:00 svlogd -tt /var/log/git-daemon
116 3679 3676 0 04:01 ? 00:00:00 /usr/lib/git-core/git-daemon
The process owner name (or uid number) are in /etc/passwd as the first column (for the name) or the third column (uid number). Columns in /etc/passwd are delimited by colons (:). For example:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh
irc:x:39:39:ircd:/var/run/ircd:/bin/sh
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
messagebus:x:101:105::/var/run/dbus:/bin/false
colord:x:102:106:colord colour management daemon,,,:/var/lib/colord:/bin/false
usbmux:x:103:46:usbmux daemon,,,:/home/usbmux:/bin/false
Debian-exim:x:104:111::/var/spool/exim4:/bin/false
statd:x:105:65534::/var/lib/nfs:/bin/false
avahi:x:106:114:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false
In this example, statd is
statd:x:105:65534::/var/lib/nfs:/bin/false
The next-to-last column of /etc/passwd is the home directory of the process, e.g., /var/lib/nfs for the statd process.
Some system processes have no home directory, e.g., you may see /usr/sbin on Linux systems, or some other directory that several processes share.
Further reading:
POSIX ps shows the options of POSIX ps, implemented in these systems:
HPUX ps
Solaris ps
Linux ps
AIX ps
passwd(5) shows the file-format of /etc/passwd
OP amended question to indicate that the current directory (rather than home-directory) is wanted. Systems using a proc-filesystem can provide this information. Those are Solaris, AIX and Linux.
However, HPUX does not (see for example /proc on HP-UX?, which says the pstat system-call can be used). I do not see a possibility, reading its manual page, but the link below says pstat_getpathname will work.
AIX supports it, according to IBM documentation.
Look for the cwd "file" for the working directory of a given process in systems with a proc-filesystem.
Further reading:
Find out current working directory of a running process?
current working directory of process
Get full path of executable of running process on HPUX
A sample of a program that can find itself (getcwd getenv)

Resources