Graphite Aggregate data for less than a second interval - graphite

I am using graphite to display our application transaction stats. The txns are generated at around 20 per second and is processed and pushed to graphite. So there are more than one data point per second.
My problem is, how can i aggregate this data in graphite? Currently my graphite only plots data points per minute.
My data is like this :
servername.syspulse.alert
. Currently i have only one server.
This is my storage schemas file
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default_1min_for_1day]
pattern = .*
retentions = 1s:3d,1min:90d,10min:180d
And this is storage-aggregation.conf file
[syspulse]
pattern = \.syspulse\.alert$
xFilesFactor = 0
aggregationMethod = sum
and default entries in the file are below this..
Any pointers will be helpful...

pattern = .*
retentions = 1s:3d,1min:90d,10min:180d
should work to store data with 1 sec resolution.
Have you defined this schema before creating the metrics ?
Otherwise you should recreate them (or resize them) because schema is used at creation time only.
To confirm that your whisper file have the correct schema, you can use the whisper-info.py script:
whisper-info.py /opt/graphite/storage/whisper/your/metric/path.wsp

Related

Telegraf drop a tag from a specific measurement

Is it possible in telegraf using a processor to drop a tag from a measurement?
Using the cisco_telemetry plugin that takes in series and within one of the measurements not the whole plugin I want to only keep one tag.
I tried using the tag_limit processor but it didn't work. The current measurement "Cisco-IOS-XR-procfind-oper:proc-distribution/nodes/node/process/pid/filter-type" has two tags "pid" and "proc_name" each contain around 10000 values. I only want to keep "proc_name" and drop "pid" from this measurement. Should the tag_limit processor work for this? Version 1.23
[[processors.tag_limit]]
namepass = ["Cisco-IOS-XR-procfind-oper:proc-distribution/nodes/node/process/pid/filter-type"]
## Maximum number of tags to preserve
limit = 1
## List of tags to preferentially preserve
keep = ["proc_name"]
within one of the measurements
I would probably use a starlark processor then. Use namepass as you have done, and then remove the specific tag.
[[processors.starlark]]
namepass = ["Cisco-IOS-XR-procfind-oper:proc-distribution/nodes/node/process/pid/filter-type"]
source = '''
def apply(metric):
metric.tags.pop("pid")
return metric
'''
For users looking to do this to an entire measurement, they can drop tags from a measurement with metric modifiers. Specifically, you are looking for tagexclude, which will remove tags from a measurement matching those patterns. This way, you do not even need to use a processor and can add this directly to the end of your input:
[[inputs.cisco_telemetry]]
<connection details>
tagexclude = ["pid"]

GoogleAnalyticsR api - FilterExpression

I need to retrieve data from google analytics using R
I write the following code with GoogleAnalyticsR:
df <- google_analytics(viewId = my_id,
date_range=c(start,end),
metrics = c("pageViews"),
dimensions = "pagePath",
anti_sample = TRUE,
filtersExpression ="ga:pagePath==RisultatoRicerca?nomeCasa",
max=100000)
I need to set correctly the FiltersExpression parameters.
I 'd like to have data from pagePath that contains RisultatoRicerca?nomeCasa. This code returns me a dataframe with 0 rows, which i know it's impossible ( data from an e-commerce with more than ten thousand interaction per day). So i 've begun to think that my FiltersExpression is incorrect.
Thanks in advance
I managed to solve the problem using filtersExpression
filtersExpression = "ga:pagePath=#RisultatoRicerca?nomeCasa
this filter works on pagePath dimension and filter every path that contain RisultatoRicerca

File splitting to different files

I have a file which has data in the below format
Col1
1,a,b,c
1,e,f,g,h,j
2,r,t,y,u,i.o
2,q,s,d,f
3,q,a,s,l
4,r,y,u,p,o
4,o,l,j,f,c,g,b,c
4,d,f,q,
.
.
.
97,w,e,r
3,f,g
100,q,a,x,c
Now I want to split this file to 100 different files so that each file has data based on first column . Example - First file should have only data which has value 1 in first column and second file should have data which starts with 2 in second column and so on till 100 files .
Please tell me the approaches in Informatica, Unix or teradata
Kindly use Transaction control transformation to get multiple files generated with respect to column value.
Take a variable port
V_curr - Col1
V_prev - Col1
V_flag = IIF(V_curr = V_prev,0,1)
Now import transaction control transformation and pass the pipeline.
In properties, Transaction Control Condition type,
IIF(V_flag =0, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
Once you execute the workflow, multiple files will be generated with respect to Col1.
For ref - https://kb.informatica.com/h2l/HowTo%20Library/1/0114-GeneratingMultipleTargetsFromOneTargetDefinition.pdf
Also check - https://etlinfromatica.wordpress.com/
Thank you
seems simple enough... use filename port on the flat file target and have an expression transformation with port filename_out dynamically created as a derivative of the first column value e.g. "FileOut" || Port1 || ".dat"
Then connect output port of filename_out to input port of filename on the target

Sequence number inside a txt file in UNIX

I want to generate a unique sequence number for each row in the file in unix. I can not make identity column in database as it has some other sources which also inserts data in it. I tried using NR number in awk but since i have filters in my script it may skip rows in the file so i may not get sequential numbers.
my requirements are - This sequence number needs to be persistent since everday i would receive this file and should start from where i left of. also the number needs to be preceded by "EMP_" for each line in the file.
Please suggest.
Thanks in advance.
To obtain unique id in UNIX you may use file to store and read the value. however this method is so tedious and require mechanism on file IO locking. the easiest way is to use date time to obtain unique id example :
#!/bin/sh
uniqueVal = `date '+%Y%m%d%H%M%S'`

Writing to Excel file containing formulas is extremely slow

We have an automatic process that opens a template excel file, writes rows of data, and returns the file to the user. This process is usually fast, however I was recently asked to add a summary page with some Excel formulas to one of the templates, and now the process takes forever.
It successfully runs with about 5 records after a few minutes, however this week's record set is almost 400 rows and the longest I've let it run is about half an hour before cancelling it. Without the formulas, it only takes a few seconds to run.
Is there any known issues with writing rows to an Excel file that contains formulas? Or is there a way to tell Excel not to evaluate formulas until the file is opened by a user?
The formulas on the summary Sheet are these:
' Returns count of cells in column where data = Y
=COUNTIF(Sheet1!J15:Sheet1!J10000, "Y")
=COUNTIF(Sheet1!F15:Sheet1!F10000, "Y")
' Return sum of column where data is a number greater than 0
' Column contains formula calculating the difference in months between two dates
=SUMIF(Sheet1!I15:Sheet1!I10000,">0",Sheet1!I15:Sheet1!I10000)
' Returns a count of distinct values in a column
=SUMPRODUCT((Sheet1!D15:Sheet1!D10000<>"")/COUNTIF(Sheet1!D15:Sheet1!D10000,Sheet1!D15:Sheet1!D10000&""))
And the code that writes to excel looks something like this:
Dim xls as New Excel.Application()
Dim xlsBooks as Excel.Workbooks, xlsBook as Excel.Workbook
Dim xlsSheets as Excel.Sheets, xlsSheet as Excel.Worksheet
Dim xlsCells as Excel.Range
xls.Visible = False
xls.DisplayAlerts = False
xlsBooks = xls.Workbooks
xlsBooks.Open(templateFile)
xlsBook = xlsBooks.Item(1)
' Loop through excel Sheets. Some templates have multiple sheets.
For Each drSheet as DataRow in dtSheets.Rows
xlsSheets = xlsBook.Worksheets
xlsSheet = CType(xlsSheets.Item(drSheet("SheetName")), Excel.Worksheet)
xlsCells = xlsSheet.Cells
' Loop though Column list from Database. Each Template requires different columns
For Each drDataCols as DataRow in dtDataCols.Rows
' Loop though Rows to get data
For Each drData as DataRow in dtData.Rows
xlsCells(drSheet("StartRow") + dtData.Rows.IndexOf(drData), drDataCols("DataColumn")) = drData("Col" + drDataCols("DataColumn").toString).toString
Next
Next
Next
xlsSheet.SaveAs(newFile)
xlsBook.Close
xls.Quit()
Every time you write to a cell Excel recalculates the open workbooks and refreshes the screen. Both of these things are slow, so you need to set Application.Screenupdating=false and Application.Calculation=xlCalculationManual
Also there is a high overhead associated with each write to a cell, so it is much faster to acuumulate the data in an array and then write the array to the range with a single call to the Excel object model.
With auto mode calculation, recalculation occurs after every data input/changed. I had the same problem, was solved by setting Manual calculation mode. (Reference MSDN link.)
xls.Calculation = Excel.XlCalculation.xlCalculationManual
Also, this property can only be set after a Workbook has been opened or it will throw a run-time error.
One way that has saved me over the years is to add
Application.ScreenUpdating = False
directly before I execute a potentially lengthy method, and then
Application.ScreenUpdating = True
directly after, or at least at some later point in the code. This forces Excel to not redraw anything on the visible screen until it is complete That issue is where I've found lengthy running operations to stem from quite often.

Resources