I have data in following format
Date A B
20150901 23.4 2.4
20150901 245 22
20150901 21 2.4
20150902 243 4.2
20150902 7.5 1.2
20150903 .54 8.4
what I want do is SUM(colA)/SUM(colB) for each date. I am using kibana for this but I can not find a way to do this. All it shows is SUM(colA) but I cannot save it to use for finding the ratio.
Can somebody help me with this?
You must use scripted field, create a new field, and then in that field you will have the sum of a + b of each data. Then when you discover data or do some graph only select the data that where you need the sum.
This was challenge I had too.
Have a look also at this great kibana plugin:
https://github.com/datasweet-fr/kibana-datasweet-formula
The original discussion can also be found here:
https://github.com/elastic/kibana/issues/2646
It supports several functions of aggregated metrics.
It worked for my case of ratios of aggregated sums over time, similar to your case.
Related
I have a little workflow with fractures and I'm having troubles at the moment of making the fracture apply only to certain group.
Being the wrangle vex:
And then I try to make the voronoifracture to take upon that group:
The problem is that the model appears grayed out, as if no group were found despite it appears in the spreadsheet:
I'm fairly new to houdini and in the lesson I've been following the fracture takes the group correctly. The lesson is recorded in houdini 16.5 though, and as I'm using 17.0 I'm unsure if the behaviour changed or if I'm doing something wrong with it.
Voronoi fracture accepts primitive groups, but you have set a point group. Change your wrangle to run over primitives instead of points.
Just an aside: your blast is cooperating because it will figure out the group type for you by default.
I'm looking to be able to perform the equivalent of a count if on a data set similar to the below. I found something similar here, but I'm not sure how to translate it into Enterprise Guide. I would like to create several new columns that count how many date occurrences there are for each primary key by year, so for example:
PrimKey Date
1 5/4/2014
2 3/1/2013
1 10/1/2014
3 9/10/2014
To be this:
PrimKey 2014 2013
1 2 0
2 0 1
3 1 0
I was hoping to use the advanced expression for calculated fields option in query builder, but if there is another better way I am completely open.
Here is what I tried (and failed):
CASE
WHEN Date(t1.DATE) BETWEEN Date(1/1/2014) and Date(12/31/2014)
THEN (COUNT(t1.DATE))
END
But that ended up just counting the total date occurrences without regard to my between statement.
Assuming you're using Query Builder you can use something like the following:
I don't think you need the CASE statement, instead use the YEAR() function to calculate the year and test if it's equal to 2014/2013. The test for equality will return a 1/0 which can be summed to the total per group. Make sure to include PrimKey in your GROUP BY section of query builder.
sum(year(t1.date)=2014) as Y2014,
sum(year(t2.date)=2013) as Y2013,
I don't like this type of solution because it's not dynamic, i.e. if your years change you have to change your code, and there's nothing in the code to return an error if that happens either. A better solution is to do a Summary Task by Year/PrimKey and then use a Transpose Task to get the data in the structure you want it.
I wanna know if someone know how to do transformation of the channel four (FLH 4) without using the standard transformations offer by the flowCore package?
The values of the channel four are between 1 and 4096 and i need to convert in values between 1 and 246 with the rule 10^(x/1024).
Thank you.
Better to use flowTrans mclMultivArcSinh transformation.
trans<-flowTrans(flowData, "mclMultivArcSinh",colnames(flowData)[3:12], n2f=FALSE, parameters.only=FALSE)
You must not tranform FSC-A,SSC-A and Time thats why i have colnames [3:12].
you could get a custom transform in by doing something like..
plot(transform(someFlowFrame, FSC-H=10^(FSC-H/1024), SSC-H=10^(SSC-H/1024)), c("FSC-H","SSC-H"))
however as 10^(4096/1024) returns a max value of 10000 for your hypothetical example, the plot with your ranges -
plot(transform(someFlowFrame, FSC-H=10^(FSC-H/1024), SSC-H=10^(SSC-H/1024)), c("FSC-H","SSC-H"), xlim=c(0,256), ylim=c(0,256))
doesn't look good.
I and my coworkers enter data in turns. One day I do, the next week someone else does and we always enter 50 observations at a time (into an Excel sheet). So I can be pretty sure that I entered the cases from 101 to 150, and 301 to 350. We then read the data into R to work with it. How can I select only the cases I entered?
Now I know that I can do that by copying from the excel sheet, however, I wonder if it is doable in R?
I checked several documents about subsetting data with R, also tried things like
data<-data[101:150 & 301:350,]
but didn't work. I appreciate if someone would guide me to a more comprehensive guide answering this question.
The answer to the specific example you gave is
data[c(100:150,300:350),]
Can you be more specific about which cases you want? Is it the first 50 of each 100, or the first 50 of each 300, or ... ? To get the indices for the first n of each m cases you could use something like
c(outer(0:4,seq(1,100,by=10),"+"))
(here n=5, m=10); outer is a generalized outer product. An alternate (and possibly more intuitive) solution would use rep, e.g.
rep(0:4,10) + rep(seq(1,100,by=10),each=5)
Because R automatically recycles vectors where necessary you could actually shorten this to:
0:4 + rep(seq(1,100,by=10),each=5)
but I would recommend the slightly longer formulation as more understandable.
I have 7 different variable in an excel spreadsheet that I have imported into R. They each are columns with a size of 3331. They are:
'Tribe' - there are 8 of them
'Month' - when the sampling was carried out
'Year' - the year when the sampling was carried out
'ID" - an identifier for each snail
'Weight' - weight of a snail in grams
'Length' - length of a snail shell in millimetres
'Width' - width of a snail shell in millimetres
This is a case where 8 different tribes have been asked to record data on a suspected endangered species of snail to see if they are getting rarer, or changing in size or weight.
This happened at different frequencies between 1993 and 1998.
I would like to know how to be able to create a new variables to the data so that if I entered names(Snails) # then it would list the 7 given variables plus any added variable that I have.
The dataset is limited to the point where I would like to add new variables. Such as, knowing the counts per month of snails in any given month.
This would rely on me using - Tribe,Month,Year and ID. Where if an ID (snail identifier) was were listed according to the rates in any given month then I would be able to sum them to see if there are any changes in counts. I have tried:
count=c(Tribe,Year,Month,ID)
count
But, after doing things like that, R just has a large list of that is 4X the size of the dataset. I would like to be able to create a given new variable that is of column size n=3331.
Or maybe I would like to create a simpler variable so I can see if a tribe collected at any given month. I don't know how I can do this.
I have looked at other forums and searched but, there is nothing that I can see that helps me in my case. I appreciate any help. Thanks
I'm guessing you need to organise your variables in a single structure, such as a data.frame.
See ?data.frame for the help file.
To get you started, you could do something like:
snails <- data.frame(Tribe,Year,Month,ID)
snails
# or for just the first few rows
head(snails)
Then this would have your data looking similar to your Excel file like:
Tribe Year Month ID
1 1 1 1 a
2 2 2 2 b
3 3 3 3 c
<<etc>>
Then if you do names(snails) it will list out your column names.
You could possibly avoid some of this mucking about by just importing your Excel file either directly from Excel, or saving as a csv (comma separated values) file first and then using read.csv("name_of_your_file.csv")
See http://www.statmethods.net/input/importingdata.html for some more specifics on this.
To tabulate your data, you can do things like...
table(snails$Tribe)
...to see the number of snail records collected by each tribe. Or...
table(snails$Tribe,snails$Year)
...to see the trends in each tribe by each year. The $ character will let you access the named variable (column) inside a data.frame in the same way you are currently using the free floating variables. This might seem like more work initially, but it will pay off greatly when you need to do some more involved analysis.
Take for example if you want to only analyse the weights from tribe "1", you could do:
snails$Weight[snails$Tribe==1]
# mean of these weights
mean(snails$Weight[snails$Tribe==1])
There are a lot more things I could explain but you would probably be better served by reading an excellent website like Quick-R here: http://www.statmethods.net/management/index.html to get you doing some more advanced analysis and plotting.