20% of lost values after statsd

20% of lost values after statsd - graph

I need to supervise an real time application. This application receives 60 connections per seconds and for each I use 53 metrics.
So my simulation client sent 3180 metrics personds.
I need the lower, upper, average, median and the count_ps values. Thats why I use the "timing" type.
When I look the count_ps at the end of statsd for one metrics, i have only 40 values and not 60.
I dont find information on statsd's capacity. Maybe I overload it ^^
So could you help me, what are my options ?
I can't reduce the nomber of metrics but i don't need all informations provided by the "timing" type. Can I limit the "timing" ?
Thank you !
my configuration :
1) cat storage-schemas.conf
# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
# [name]
# pattern = regex
# retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[stats]
pattern = ^application.*
retentions = 60s:7d
2) cat dConfig.js
{
graphitePort: 2003
, graphiteHost: "127.0.0.1"
, port: 8125
, backends: [ "./backends/graphite", "./backends/console" ]
, flushInterval: 60000
, debug: true
, graphite: { legacyNamespace: false, globalPrefix: "", prefixGauge: "", prefixCounter: "", prefixTimer: "", prefixSet: ""}
}
3) cat storage-aggregation.conf
# Aggregation methods for whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds
#
# [name]
# pattern = <regex>
# xFilesFactor = <float between 0 and 1>
# aggregationMethod = <average|sum|last|max|min>
#
# name: Arbitrary unique name for the rule
# pattern: Regex pattern to match against the metric name
# xFilesFactor: Ratio of valid data points required for aggregation to the next retention to occur
# aggregationMethod: function to apply to data points for aggregation
#
[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.upper$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.3
4) Client :
#!/usr/bin/env python
import time
import random
import statsd
import math
c = statsd.StatsClient('localhost',8125)
k = 0
nbData = 60
pause = 1
while True :
print k
k += pause
tps1 = time.clock()
for j in range (nbData):
digit = j%10 + k*10 + math.sin(j/500)
c.timing('TPS.global', digit)
c.timing('TPS.interne', digit)
c.timing('TPS.externe', digit)
for i in range(5):
c.timing('TPS.a.'+str(i), digit)
c.timing('TPS.b.'+str(i), digit)
c.timing('TPS.c.'+str(i), digit)
c.timing('TPS.d.'+str(i), digit)
c.timing('TPS.e.'+str(i), digit)
c.timing('CR.a.'+str(i), digit)
c.timing('CR.b.'+str(i), digit)
c.timing('CR.c.'+str(i), digit)
c.timing('CR.d.'+str(i), digit)
c.timing('CR.e.'+str(i), digit)
tps2 = time.clock()
print 'temps = ' + str(tps2 - tps1)
if k >= 60:
k = 0
if pause-tps2 + tps1 < 1:
time.sleep(pause-tps2 + tps1)
Edit : add client code

Without more context it's hard to say, what could be going on. Do you use sampling when sending data to StatsD? What hardware are you running StatsD on? Was your simulation all on localhost? Did you run it on a lossy connection?
At the moment there is no way of limiting timing metrics to only certain types.
Sorry to not be of more immediate help. If your problems persist, consider dropping into #statsd on Freenode IRC and ask there.

What is your CARBON_METRIC_INTERVAL set to? I suspect it needs to match the StatsD flushInterval.

Related

Simultaneously read and write to buffer

I am in the process of learning Julia and I'd like to do some buffer manipulation.
What I want to achieve is the following:
I've got a buffer that I can write to and read from at the same time, meaning that the speed with which I add a value to the Fifo buffer approximately equals the speed with which I read from the buffer. Reading and writing will happen in separate threads so it can occur simultaneously.
Additionally, I want to be able to control the values that I write into the buffer based on user input. For now, this is just a simple console prompt asking for a number, which I then want to write into the stream continously. The prompt refreshes and asks for a new number to write into the stream, but the prompt is non-blocking, meaning that in the background, the old number is written to the buffer until I enter a new number, which is then written to the buffer continuously.
This is my preliminary code for simulatenous reading and writing of the stream:
using Causal
CreateBuffer(size...) = Buffer{Fifo}(Float32, size...)
function writetobuffer(buf::Buffer, n::Float32)
while !isfull(buf)
write!(buf, fill(n, 2, 1))
end
end
function readfrombuffer(buf::Buffer)
while true
while !isempty(buf)
#show read(buf)
end
end
end
n_channels = 2
sampling_rate = 8192
duration = 2
n_frames = sampling_rate * duration
sbuffer = CreateBuffer(n_channels, n_frames)
print("Please enter a number: ")
n = parse(Float32, readline())
s1 = Threads.#spawn writetobuffer(sbuffer, n)
s2 = Threads.#spawn readfrombuffer(sbuffer)
s1 = fetch(s1)
s2 = fetch(s2)
I am not sure how to integrate the user input in a way that it keeps writing and reading the latest number the user put in. I looked at the documentation for channels, but didn't manage to get it working in a way that was non-blocking for the stream writing. I don't know that the correct approach is (channels, events, julia's multithreading) to enable this functionality.
How would I go on about to include this?

I managed to get it working, but I think it could be improved:
using Causal
CreateBuffer(size...) = Buffer{Fifo}(Float32, size...)
function writeToBuffer(buf::Buffer, n::Float32)
write!(buf, fill(n, 2, 1))
end
function readFromBuffer()
global soundbuffer
println("Starting")
sleep(0.5)
while true
while !isempty(soundbuffer)
read(soundbuffer)
end
end
println("Exiting...")
end
function askForInput()::Float32
print("Please enter a number: ")
a = parse(Float32, readline())
return(a)
end
function inputAndWrite()
global soundbuffer
old_num::Float32 = 440
new_num::Float32 = 440
while true
#async new_num = askForInput()
while (new_num == old_num)
writeToBuffer(soundbuffer, new_num)
end
old_num = new_num
println("Next iteration with number " * string(new_num))
end
end
n_channels = 2
sampling_rate = 8192
duration = 2
n_frames = sampling_rate * duration
soundbuffer = CreateBuffer(n_channels, n_frames)
s1 = Threads.#spawn inputAndWrite()
s2 = Threads.#spawn readFromBuffer()
s1 = fetch(s1)
s2 = fetch(s2)

How to find the filter coefficients for a DVBS2 shaping SRRC?

in the DVBS2 Standard the SRRC filter is defined as
How can i find the filter's time domain coefficients for implementation? The Inverse Fourier transform of this is not clear to me.

For DVBS2 signal you can use RRC match filter before timing recovery. For match filter, you can use this expression:
For example for n_ISI = 32 and Roll of factor = 0.25 with any sample per symbol you can use this Matlab code:
SPS = 4; %for example
n_ISI=32;
rolloff = 0.25;
n = linspace(-n_ISI/2,n_ISI/2,n_ISI*SPS+1) ;
rrcFilt = zeros(size(n)) ;
for iter = 1:length(n)
if n(iter) == 0
rrcFilt(iter) = 1 - rolloff + 4*rolloff/pi ;
elseif abs(n(iter)) == 1/4/rolloff
rrcFilt(iter) = rolloff/sqrt(2)*((1+2/pi)*sin(pi/4/rolloff)+(1-2/pi)*cos(pi/4/rolloff)) ;
else
rrcFilt(iter) = (4*rolloff/pi)/(1-(4*rolloff*n(iter)).^2) * (cos((1+rolloff)*pi*n(iter)) + sin((1-rolloff)*pi*n(iter))/(4*rolloff*n(iter))) ;
end
end
But if you want to use SRRC, there are two ways: 1. You can use its frequency representation form if you use filtering in the frequency domain. And for implementation, you can use the expression that you've noted. 2. For time-domain filtering, you should define the FIR filter with its time representation sequence. The time representation of such SRRC pulses is shown to adopt the following form:

Calculate RSI indicator according to tradingview?

I would like to calculate RSI 14 in line with the tradingview chart.
According to there wiki this should be the solution:
https://www.tradingview.com/wiki/Talk:Relative_Strength_Index_(RSI)
I implemented this is in a object called RSI:
Calling within object RSI:
self.df['rsi1'] = self.calculate_RSI_method_1(self.df, period=self.period)
Implementation of the code the calculation:
def calculate_RSI_method_1(self, ohlc: pd.DataFrame, period: int = 14) -> pd.Series:
delta = ohlc["close"].diff()
ohlc['up'] = delta.copy()
ohlc['down'] = delta.copy()
ohlc['up'] = pd.to_numeric(ohlc['up'])
ohlc['down'] = pd.to_numeric(ohlc['down'])
ohlc['up'][ohlc['up'] < 0] = 0
ohlc['down'][ohlc['down'] > 0] = 0
# This one below is not correct, but why?
ohlc['_gain'] = ohlc['up'].ewm(com=(period - 1), min_periods=period).mean()
ohlc['_loss'] = ohlc['down'].abs().ewm(com=(period - 1), min_periods=period).mean()
ohlc['RS`'] = ohlc['_gain']/ohlc['_loss']
ohlc['rsi'] = pd.Series(100 - (100 / (1 + ohlc['RS`'])))
self.currentvalue = round(self.df['rsi'].iloc[-1], 8)
print (self.currentvalue)
self.exportspreadsheetfordebugging(ohlc, 'calculate_RSI_method_1', self.symbol)
I tested several other solution like e.g but non return a good value:
https://github.com/peerchemist/finta
https://gist.github.com/jmoz/1f93b264650376131ed65875782df386
Therefore I created a unittest based on :
https://school.stockcharts.com/doku.php?id=technical_indicators:relative_strength_index_rsi
I created an input file: (See excel image below)
and a output file: (See excel image below)
Running the unittest (unittest code not included here) should result in but is only checking the last value.
if result == 37.77295211:
log.info("Unit test 001 - PASSED")
return True
else:
log.error("Unit test 001 - NOT PASSED")
return False
But again I cannot pass the test.
I checked all values by help with excel.
So now i'm a little bit lost.
If I'm following this question:
Calculate RSI indicator from pandas DataFrame?
But this will not give any value in the gain.
a) How should the calculation be in order to align the unittest?
b) How should the calculation be in order to align with tradingview?

Here is a Python implementation of the current RSI indicator version in TradingView:
https://github.com/lukaszbinden/rsi_tradingview/blob/main/rsi.py

I had same issue in calculating RSI and the result was different from TradingView,
I have found RSI Step 2 formula described in InvestoPedia and I changed the code as below:
N = 14
close_price0 = float(klines[0][4])
gain_avg0 = loss_avg0 = close_price0
for kline in klines[1:]:
close_price = float(kline[4])
if close_price > close_price0:
gain = close_price - close_price0
loss = 0
else:
gain = 0
loss = close_price0 - close_price
close_price0 = close_price
gain_avg = (gain_avg0 * (N - 1) + gain) / N
loss_avg = (loss_avg0 * (N - 1) + loss) / N
rsi = 100 - 100 / (1 + gain_avg / loss_avg)
gain_avg0 = gain_avg
loss_avg0 = loss_avg
N is the number of period for calculating RSI (by default = 14)
the code is put in a loop to calculate all RSI values for a series.

For those who are experience the same.
My raw data contained ticks where the volume is zero. Filtering this OLHCV rows will directly give the good results.

vectorize complex slicing with pandas dataframe

I'd like to be able to vectorize, for speed purposes, this piece of code. the purpose is to calculate a function, in this case a standard deviation, from a tuple of pair of dates that are cointained in two separate arrays.
import pandas as pd
import numpy as np
asd_1 = pd.Series(0.01 * np.random.randn(252), index=pd.date_range('2011-1-1', periods=252))
index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1',])
index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17',])
index_tot = list(zip(index_1,index_2))
aux_learning_std = pd.DataFrame([np.nanstd(asd_1.loc[i:j]) for i, j in index_tot], index=index_1)
the solution, that works, is performed through a loop but i'd rather be able to vectorize it through numpy/pandas, which is much faster. initially I though about using something like:
df_aux = pd.concat([asd_1 for _ in range(len(index_1))], axis=1)
results = df_aux.apply(lambda x: np.nanstd(x.loc[i,j]), axis = 0)
but here I fail to put together the vectors into one operation.
any and all advice is welcome.
p.s.: below there is an image for explanatory purposes

Vectorized standard deviation across ranges in an array
def get_ranges_arr(starts,ends):
# Taken from http://stackoverflow.com/a/37626057/3293881
counts = ends - starts
counts_csum = counts.cumsum()
id_arr = np.ones(counts_csum[-1],dtype=int)
id_arr[0] = starts[0]
id_arr[counts_csum[:-1]] = starts[1:] - ends[:-1] + 1
return id_arr.cumsum()
def ranged_std(arr,starts,ends):
# Get all indices and the IDs corresponding to same groups
idx = get_ranges_arr(starts,ends)
id_arr = np.repeat(np.arange(starts.size),ends-starts)
# Extract relevant data
slice_arr = arr[idx]
# Simulate standard deviation implementation for a number of groups
# using id_arr as the basis to perform various mathematical operations
# within each group. Since, std. deviation performs sum/mean reduction,
# we can simply use np.bincount for an efficient implementation.
# Std. deviation formula used :
#https://github.com/numpy/numpy/blob/v1.11.0/numpy/core/fromnumeric.py#L2939
grp_counts = np.bincount(id_arr)
mean_vals = np.bincount(id_arr,slice_arr)/grp_counts
abs_vals = np.abs(slice_arr - mean_vals[id_arr])**2
return np.sqrt(np.bincount(id_arr,abs_vals)/grp_counts)
Sample run (verify against a loopy version)
In [173]: arr = np.random.randint(0,9,(20))
In [174]: starts = np.array([2,6,11])
In [175]: ends = np.array([8,9,15])
In [176]: [np.std(arr[i:j]) for i,j in zip(starts,ends)]
Out[176]: [1.9720265943665387, 0.81649658092772603, 0.82915619758884995]
In [177]: ranged_std(arr,starts,ends)
Out[177]: array([ 1.97202659, 0.81649658, 0.8291562 ])
Runtime test
Case #1 : Very small number of ranges 3
In [21]: arr = np.random.randint(0,9,(20))
In [22]: starts = np.array([2,6,11])
In [23]: ends = np.array([8,9,15])
In [24]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
10000 loops, best of 3: 146 µs per loop
In [25]: %timeit ranged_std(arr,starts,ends)
10000 loops, best of 3: 45 µs per loop
Case #2 : Decent number of ranges 1000
In [32]: arr = np.random.randint(0,9,(1010))
In [33]: starts = np.random.randint(0,9,(1000))
In [34]: ends = starts + np.random.randint(0,9,(1000))
In [35]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
10 loops, best of 3: 47.5 ms per loop
In [36]: %timeit ranged_std(arr,starts,ends)
1000 loops, best of 3: 217 µs per loop
Case #3 : Large number of ranges 10000
In [60]: arr = np.random.randint(0,9,(1010))
In [61]: arr = np.random.randint(0,9,(10010))
In [62]: starts = np.random.randint(0,9,(10000))
In [63]: ends = starts + np.random.randint(0,9,(10000))
In [64]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
1 loops, best of 3: 474 ms per loop
In [65]: %timeit ranged_std(arr,starts,ends)
100 loops, best of 3: 2.17 ms per loop
Really amazing speedups of 200x+!
Using ranged_std to solve our case
# Get start, stop numeric indices as needed for getting ranges array later on
starts = asd_1.index.searchsorted(index_1)
ends = asd_1.index.searchsorted(index_2)
# Create final dataframe output using ranged_std func
df = pd.DataFrame(ranged_std(asd_1.values,starts,ends+1),index=index_1)
Sample run for verification -
In [17]: asd_1 = pd.Series(0.01 * np.random.randn(252), index=\
...: pd.date_range('2011-1-1', periods=252))
...:
...: index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1',])
...: index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17',])
...:
...: index_tot = list(zip(index_1,index_2))
...: aux_learning_std = pd.DataFrame([np.nanstd(asd_1.loc[i:j]) for i, j in \
...: index_tot], index=index_1)
...:
In [18]: starts = asd_1.index.searchsorted(index_1)
...: ends = asd_1.index.searchsorted(index_2)
...: df = pd.DataFrame(ranged_std(asd_1.values,starts,ends+1),index=index_1)
...:
In [19]: aux_learning_std
Out[19]:
0
2011-02-02 0.007244
2011-04-03 0.012862
2011-05-01 0.010155
In [20]: df
Out[20]:
0
2011-02-02 0.007244
2011-04-03 0.012862
2011-05-01 0.010155

Graphite will only display data for the past 24 hours

Here's the display for a stat for the past 24 hours (in Graphite Composer):
Here's the display for a stat for the "past 14 days":
Not much difference there. I cannot convince Graphite to display any data for any period past the past 24 hours.
Here are the relavent entries from storage-schemas.conf (I'm using StatsD):
[stats]
pattern = ^stats.*
retentions = 10:2160,60:10080,600:262974
[stats_counts]
pattern = ^stats_counts.*
retentions = 10:2160,60:10080,600:262974
and my storage-aggregation.conf:
[min]
pattern = \.min$
xFilesFactor = 0
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0
aggregationMethod = max
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average
I have five or so days of data captured so far. What am I missing?
EDITED to add:
I guess I should mention that I started out with the default storage-schemas.conf and only yesterday rebuilt my whisper database files to match the above configuration. I don't think this should be relevant, but there it is.
UPDATED:
I'm using 0.9.10 of graphite-web and whisper, from PyPI, released in May 2012.

Well, this is what I get for not pasting the entire configuration. Here's what it actually looked like:
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d
[stats]
pattern = ^stats.*
retentions = 10:2160,60:10080,600:262974
[stats_counts]
pattern = ^stats_counts.*
retentions = 10:2160,60:10080,600:262974
Of course, the [default_1min_for_1day] section was matching first, ahead of the other two, and so I was only getting data for the past 24 hours. Moving the catch-all to the end of the file seems to have addressed the issue.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

20% of lost values after statsd - graph

What is your CARBON_METRIC_INTERVAL set to? I suspect it needs to match the StatsD flushInterval.

Related

Simultaneously read and write to buffer

How to find the filter coefficients for a DVBS2 shaping SRRC?

Calculate RSI indicator according to tradingview?

vectorize complex slicing with pandas dataframe

Graphite will only display data for the past 24 hours

Categories

Resources