Graphite does not show old stats - graphite

We are using Graphite to store stats about our websites. Everything works fine when we want to see the data for the last 24h and 7days. When we are trying to have a look at the last month data Graphite does not show any data.
We collect the data for one metric every 5min and the other ones once an hour.
When I use the GUI this "query" works:
width=1188&height=580&target=identifierXYP.value&lineMode=connected&from=-8days
And this one does not return any data
width=1188&height=580&target=identifierXYP.value&lineMode=connected&from=-9days
The only thing that changed was the "from" part.
I already ran
find ./ -type f -name '*.wsp' -exec whisper-resize.py --nobackup {} 5m:365d \;
but it did not help.
whisper-info.py value.wsp outputs:
maxRetention: 157680000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 2521504
Archive 0
retention: 691200
secondsPerPoint: 10
points: 69120
size: 829440
offset: 64
Archive 1
retention: 2678400
secondsPerPoint: 60
points: 44640
size: 535680
offset: 829504
Archive 2
retention: 31536000
secondsPerPoint: 600
points: 52560
size: 630720
offset: 1365184
Archive 3
retention: 157680000
secondsPerPoint: 3600
points: 43800
size: 525600
offset: 1995904

Related

Graphite/Carbon not retaining data for some statistics

I have a Carbon/Graphite stack with some very basic retention schemas set up. These retention periods work fine, apart from on a couple of statistics - these only appear to last for a week.
My storage-schemas.conf:
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[collectd]
pattern = ^collectd.*
retentions = 10s:2d,1m:14d,5m:1y
And my storage-aggregation.conf:
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average
All stats arrive prefixed with collectd., so the retention patterns are correct. When viewing an affected dashboard in Grafana I see the following in graphite's cache.log:
Thu Oct 13 11:25:16 2016 :: CarbonLink cache-query request for collectd.host_domain_com.openstack-keystone-totals.gauge-users-count returned 0 datapoints
Using whisper-info.py on an affected .wsp shows the following:
maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 1710772
Archive 0
retention: 172800
secondsPerPoint: 10
points: 17280
size: 207360
offset: 52
Archive 1
retention: 1209600
secondsPerPoint: 60
points: 20160
size: 241920
offset: 207412
Archive 2
retention: 31536000
secondsPerPoint: 300
points: 105120
size: 1261440
offset: 449332
Can anyone suggest anything I may have missed?
So the answer to this comes from a couple of issues. Firstly, the data points are being submitted with -count on the end of the name instead of .count, so the default [sum] aggregation is being applied to the data. Because we're not submitting data every 10 seconds (and because we have an xFilesFactor of 0.5 on the default), the data is munged when it hits the retention point and because there are less than 50% of the expected data points, a null value is stored instead.

To calculate Moving/Rolling back Weekly (7 days) Sum:

Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...
Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.

How to list the files greater than specific timestamp in its pattern in Unix?

Can you please how I can accomplish the below scenario in Unix Ksh command?
I have a job J1 which is completed by the time HH:MM. I would like to list all the files created by this job J1, The file has the timestamp in its pattern YYYYMMDDHHMMSS_?
where YYYYMMDD is the date, HHMMSS is the system timestamp. I want to list the files if the job's timestamp is less than the file time stamp as the job creates the files, the timestamp of the job would be greater than the file timestamp?
Regards
Ben
You can use something like this: (Assuming the files listed)
$ ls -la
total 44K
drwxr-xr-x 2 gp users 4.0K Oct 27 14:56 .
drwxr-xr-x 11 gp users 4.0K Oct 27 14:57 ..
-rw-r--r-- 1 gp users 0 Oct 23 14:45 logfile
-rw-r--r-- 1 gp users 137 Oct 27 15:09 t2t2
prw-r--r-- 1 gp users 0 Oct 23 12:34 testpipe
-rw-r--r-- 1 gp users 0 Oct 23 14:51 tmpfile
-rw-r--r-- 1 gp users 7 Oct 27 14:58 ttt
# Find newer files
$ find . -newer ttt -print
./t2t2
# Find files that are NOT newer
$ find . ! -newer ttt -print
.
./tmpfile
./testpipe
./logfile
./ttt
# You can eliminate the directories (all of them) from the output this way:
$ find . ! -newer ttt ! -type d -print
./tmpfile
./testpipe
./logfile
./ttt
# or this way
$ find . ! -newer ttt -type f -print
Note that the different forms of the "newer" option (like anewer, cnewer) will not compare the other files against the the same timestamp. You might have to do a few tests to see which version suits you better.
If you must use the timestamp in the file name, and the different options of "find", including "mmin" are not acceptable, then you will have to examine the embedded timestamp of each file name. I suggest checking into these commands:
# You have to escape the < of > signs you use.
$ expr "fabc" \< "cde"
0
$ expr "abc" \< "cde"
1
and this:
FILENAME="ABC_20141026101112.log" ; TIMESTAMP="`expr \"$FILENAME\" : \".*_20\([0-9]\{12\}\).*$\"`";echo $TIMESTAMP
So a "while read" loop, looking at all the file names and comparing their timestamps using the above "expr" compares should do the job. Ideally, I'd try to see if "find" can do the job because reading and examining each file will be slower. If you have thousands of files in that directory, then I would try some other solution. If you are interested in more options, let me know.

Unix: Increment date column by one day in csv file

Help needed. I want to increment Date (which is a string) column in csv by one day.
e.g. (Date Format yyyy-MM-dd)
Col1,Col2,Col3
ABC,001,1900-01-01
XYZ,002,2000-01-01
Expected OutPut
Col1,Col2,Col3
ABC,001,1900-01-02
XYZ,002,2000-01-02
There's one standard Unix utility that has all the date magic from September 14, 1752 through December 31, 9999 built-in: the calendar cal. Instead of reinventing the wheel and do messy date calculations we will use its intelligence to our advantage. The basic problem is: given a date, is it the last day of a month? If not, simply increment the day. If yes, reset day to 1 and increment month (and possibly year).
However, the output of cal is unspecified and it may look like this:
$ cal 2 1900
February 1900
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28
What we would need is a list of days, 1 2 3 ... 28. We can do this by skipping everything up to the "1":
set -- $(cal 2 1900)
while test $1 != 1; do shift; done
Now the number of args gives us the number of days in February 1900:
$ echo $#
28
Putting it all together in a script:
#!/bin/sh
read -r header
printf "%s\n" "$header"
while IFS=,- read -r col1 col2 y m d; do
case $m-$d in
(12-31) y=$((y+1)) m=01 d=01;;
(*)
set -- $(cal $m $y)
# Shift away the month and weekday names.
while test $1 != 1; do shift; done
# Is the day the last day of a month?
if test ${d#0} -eq $#; then
# Yes: increment m and reset d=01.
m=$(printf %02d $((${m#0}+1)))
d=01
else
# No: increment d.
d=$(printf %02d $((${d#0}+1)))
fi
;;
esac
printf "%s,%s,%s-%s-%s\n" "$col1" "$col2" $y $m $d
done
Running it on this input:
Col1,Col2,Col3
ABC,001,1900-01-01
ABC,001,1900-02-28
ABC,001,1900-12-31
XYZ,002,2000-01-01
XYZ,002,2000-02-28
XYZ,002,2000-02-29
yields
Col1,Col2,Col3
ABC,001,1900-01-02
ABC,001,1900-03-01
ABC,001,1901-01-01
XYZ,002,2000-01-02
XYZ,002,2000-02-29
XYZ,002,2000-03-01
I made one little assumption: The first two columns don't contain a - or escaped comma. If they do, the IFS=,- read will act up.
Using the date command, this can be done in awk:
awk 'BEGIN{FS=OFS=","}NR>1{("date -d\""$3" +1 day\" +%Y-%m-%d")|getline newdate; $3=newdate; print}' file.in
If you can extract the date from the file, you can use this:
d="1900-01-01" # date from file
date --date '#'$(( $(date --date $d +"%s") + 86400 ))

"find command -mtime 0" not getting the file i expect

I am trying to find a file that are 0 days old. Below are the steps I performed to test this
$ ls
$ ls -ltr
total 0
$ touch tmp.txt
$ ls -ltr
total 0
-rw-r----- 1 tstUser tstUser 0 Feb 28 20:02 tmp.txt
$ find * -mtime 0
$
$ find * -mtime -1
tmp.txt
$
Why is '-mtime 0' not getting me the file?
What is the exact difference between '-mtime 0' and '-mtime -1'?
Im sure there must be other ways to find files that are 0 days old in unix, but im curious in understanding how this '-mtime' actually works.
This is a not user friendly aspect of find - you have to understand how the matching actually works to correctly define your search criteria. The following explanation is based on GNU find (findutils) 4.4.2.
find tests -atime, -ctime, -mtime work on 24 hour periods, therefore let's define "file age" as
floor (current_timestamp - file_modification_timestamp / 86400)
Given three files modified 1 hour ago, 25 hours ago and 49 hours ago
$ touch -t $(date -d "1 hour ago" +"%m%d%H%M") a.txt
$ touch -t $(date -d "25 hours ago" +"%m%d%H%M") b.txt
$ touch -t $(date -d "49 hours ago" +"%m%d%H%M") c.txt
file ages (as defined above) are
$ echo "($(date +"%s") - $(stat -c %Y a.txt)) / 86400" | bc
0
$ echo "($(date +"%s") - $(stat -c %Y b.txt)) / 86400" | bc
1
$ echo "($(date +"%s") - $(stat -c %Y c.txt)) / 86400" | bc
2
Given the above, here's what find does
$ find -type f -mtime 0 # find files with file age == 0, i.e. files modified less than 24 hours ago
./a.txt
$ find -type f -mtime -1 # find files with file age < 1, i.e. files modified less than 24 hours ago
./a.txt
$ find -mtime 1 # find files with file age == 1, i.e. files modified more than (or equal to) 24 hours ago, but less than 48 hours ago
./b.txt
$ find -mtime +1 # find files with file age > 1, i.e. files modified more than 48 hours ago
./c.txt
This shows that -mtime 0 and -mtime -1 give equivalent results.
-mmin gives the same test with finer granularity - argument is minutes instead of 24 hour periods.
I'm unable to reproduce your problem using the aforementioned version of find
$ touch tmp.txt
$ find * -mtime 0
tmp.txt
$ find * -mtime -1
tmp.txt
-mtime n
File's data was last modified n*24 hours ago. See the comments
for -atime to understand how rounding affects the interpretation
of file modification times.
So, -mtime 0 would be equal to: "File's data was last modified 0 hours ago.
While -mtime 1 would be: "File's data was last modified 24 hours ago"
Edit:
Numeric arguments can be specified as
+n for greater than n,
-n for less than n,
n for exactly n.
So I guess -1 would be modified within the last 24 hours, while 1 would be exactly one day.
The meaning of those three possibilities are as following:
n: exactly n 24-hour periods (days) ago, 0 means today.
+n: "more then n 24-hour periods (days) ago", or older then n,
-n: less than n 24-hour periods (days) ago (-n), or younger then n. It's evident that -1, and 0 are the same and both means "today".
NOTE: If you use parameters with find command in scripts be careful when -mtime parameter is equal zero. Some (earlier) versions of GNU find incorrectly interpret the following expression
Sourece: http://www.softpanorama.org/Tools/Find/index.shtml

Resources