I have a data frame that has 3 columns a subid , test,day. For each subject, I want to identify which tests happened within a time frame of x days and calculate max change in test value. Please see example below. For each subject and a given test ,I want to identify which tests happened within 3 days. so if we look at "Day" column, for the value =1 it wont have any groups as subsequent test was done 6 days after. Values of Day= 10,7,8,9 should be identified as a group and the max change among these should be calculated. Similarly Day = 12,11,10,9 should be identified as another group and the max change among these should be calculated. How can i do this using R. Thank you in advance.
I'm trying to visualize the median profit as a proportion of sales for each day of the week. My data looks like this:
Date Category Profit Sales State
1/1 Book 3 6 NY
1/1 Toys 12 30 CA
1/2 Games 9 20 NY
1/2 Books 5 10 WA
I've created a calculated field "Profit_Prop" as SUM([Profit])/SUM([Sales]). I want to display the median daily value of profit_prop for Mondays, Tuesdays, etc.
I can kind of do this as a boxplot by adding WEEKDAY(Date) to Columns and Profit_Prop to Rows, then adding Date to Detail and changing granularity to Exact Date. But I just want to display the median without displaying a data point for each day.
I tried making another calculated field with MEDIAN([Profit_prop]), but I get "argument to MEDIAN is already an aggregation and cannot be further aggregated."
Remove date from the level of detail.
Create calculated field like below and use it instead of Profit prop
median(
{ INCLUDE [Date]:
[Profit_Prop]
}
)
Let me know how it goes.
When you are doing a calculation on a calculated field normal median function doesn't work instead you need to use the Table calculations.
Taking data from your example, create a formula. Use below code:
Create a calculated field and paste below code:
WINDOW_MEDIAN([Calculation1],FIRST(),LAST())
Set the computation to Table Down
I want to graph the mean response time for the 10 API calls which are called the most.
I have:
api.<route>.count
api.<route>.mean
I want to graph the mean value for the series with the highest counts.
I have the 10 highest count by using the highestCount( api.*.count ) so how do i take that list and replace .count with .mean
The useSeriesAbove method is very close to what i want... but I don't want to provide it with a static count.
useSeriesAbove(seriesList, value, search, replace) Compares the
maximum of each series against the given value. If the series maximum
is greater than value, the regular expression search and replace is
applied against the series name to plot a related metric
e.g. given useSeriesAbove(ganglia.metric1.reqs,10,’reqs’,’time’), the
response time metric will be plotted only when the maximum value of
the corresponding request/s metric is > 10
&target=useSeriesAbove(ganglia.metric1.reqs,10,"reqs","time")
Use limit(sortByMaxima(api.<route>.mean),10) for getting top 10 results.
Also, maybe mean time is not that you want if you want to measure latency - use 95th or 999th percentile - see https://news.ycombinator.com/item?id=10485804
I'm trying to add a new column to my data table that contains the average of some of the following rows. How many rows to be selected for the average however depends on the time stamp of the rows.
Here is some test data:
DT<-data.table(Weekstart=c(1,2,2,3,3,4,5,5,6,6,7,7,8,8,9,9),Art=c("a","b","a","b","a","a","a","b","b","a","b","a","b","a","b","a"),Demand=c(1:16))
I want to add a column with the mean of all demands, which occured in the weeks ("Weekstart") up to three weeks before the respective week (grouped by Art, excluding the actual week).
With rollapply from zoo-library, it works like this:
setorder(DT,-Weekstart)
DT[,RollMean:=rollapply(Demand,width=list(1:3),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
The problem however is, some data is missing. In the example, the data for the Art b lack the week no 4, there is no Demand in week 4. As I want the average of the three prior weeks, not the three prior rows, the average is wrong. Instead, the result for Art b for week 6 should look like this:
DT[Art=="b"&Weekstart==6,RollMean:=6]
(6 instead of 14/3, because only Week 5 and Week 3 count: (8+4)/2)
Here is what I tired so far:
It would be possible to loop through the minima of the week of the following rows in order to create a vector that defines for each row, how wide the 'width' should be (the new column 'rollwidth'):
i<-3
DT[,rollwidth:=Weekstart-rollapply(Weekstart,width=list(1:3),partial=TRUE,FUN=min,align="left",fill=1),.(Art)]
while (max(DT[,Weekstart-rollapply(Weekstart,width=list(1:i),partial=TRUE,FUN=min,align="left",fill=NA),.(Art)][,V1],na.rm=TRUE)>3) {
i<-i-1
DT[rollwidth>3,rollwidth:=i]
}
But that seems very unprofessional (excuse my poor skills). And, unfortunately, the rollapply with width and rollwidth doesnt work as intended (produces warnings as 'rollwidth' is considered as all the rollwidths in the table):
DT[,RollMean2:=rollapply(Demand,width=list(1:rollwidth),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
What does work is
DT[,RollMean3:=rollapply(Demand,width=rollwidth,partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
but then again, the average includes the actual week (not what I want).
Does anybody know how to apply a criterion (i.e. the difference in the weeks shall be <= 3) instead of a number of rows to the argument width?
Any suggestions are appreciated!
I have a column with a few dozen grades that have been assigned values Good, Average or Poor. I have a different column with employment rates. I want the maximum employment rate associated with Good, Average and Poor. I can get it to pull the value for each one in three different commands using the code below, but I need it written as a single command similar to this:
max(unHomework$Employment.Rate[unHomework$Job.Satisfaction.Category == 'Poor'])
We can use data.table
library(data.table)
setDT(unHomework)[, .(MaxER =max(Employment.Rate)), by = Job.Satisfaction.Category]