I simply don't know how to do it, and my coding experience is very limited. I have 90 reaction times per subject and some of the trial times are too high (because of lacking motivation or attention). How can I extract subject data that is too high (not in comparison to other subjects, in comparison to the subjects own mean and SD).
print(d1$(Reaction.time < mean+2*SD))
as you can see I have no clue what I'm doing, but eh, I'm trying
d1 is the dataframe containing the data of one particular subject.
Reaction.time is one column of the dataframe containing (duh) all the reaction times of that subject.
mean (mean reaction time) is a column that I added via mutate(and so on) and SD is the standard deviation of that subject that I added via mutate(and so on) as well. SD and mean can be seen in the dataframe. But how can I take out (/or only print the others) all the rows containing reaction times that are above 2 SD from mean and also all rows that are below 2 SD from Mean.
I'm trying to calculate the cumulative quantile (10 percentile, 25 percentile, etc.) over a column in a large dataset (over 10 million).
I tried to use the function cumquant from the cumstats package but it takes long (longer than an hour; a toy test shows that it takes more than 40 seconds to obtain results for a vector with 100,000 values (e.g. cumquant(1:100000,p=0.1)).
Is there a more efficient way to calculate it using data.table (or others)?
We have kusto Metrics table in our Kusto Explorer with the columns
EventTime
ClusterName
MetricName
TimeGrain
Average
Minimum
Maximum
Count
Total
IngestionTime
We are trying to find a median value for CPU utilization with the metric values mentioned above. Any KQL queries to calculate the same?
This question already has answers here:
Annual, monthly or daily mean for irregular time series
(4 answers)
Closed 8 years ago.
i'm quite new to R and have the following problem:
I work with meteorologic data (temperature and percipitation). The data was quantified every half our over one year. So i have a dataframe with about 17520 rows.
My first column contents the date in the form: "year-month-day hour:minute:second"
Now i want to get only one value for my parameters for each day, which means, i need to average every day.
I managed to split the dataframe by date with the following expression:
split(data, as.Date(data$DATE))
But now i have the problem that i dont have any clue how to work on with that split. If i want to save it, i only get some kind of list.
Has anyone an idea, how i can work on with my splitted data, which means, how can i average the values for each day and merge the averaged day-values to a new dataframe containing only one row for each day of the year.
I hope i described the problem quite sufficient.
Thanks for your answers in advance!
Generally as noted in the link, xts is a good package to deal with time series. Another option is to use data.table package. Here I compare the 2 packages's performance computing daily mean of a time series.
## needed package
library(data.table)
library(xts)
library(microbenchmark)
## create the data, here I am using yearly index with half an hour frequency
set.seed(1)
time = seq.POSIXt(from=as.POSIXct('2013-01-01',tz=''),
to=as.POSIXct('2013-12-31',tz=''),
by = as.difftime(0.5,units="hours"))
value= runif(n = length(time), min = 18, max = 54)
## the data.table object
DT <- data.table(time=time,value=value)
## the time series
xtt <- xts(x=value,time)
I compute the daily mean using data.table and xts. For data.table I use 2 methods: first method create the grouping factor (day) , second method use the grouping factor already crated.
## compute daily mean using data.table
dt.m <- function()
DT[,day:=format(time,'%d-%m-%Y')][,mean(value),day]
## using data.table but the grouping variable is already created
dt.withday.m <- function()
DT[,mean(value),day]
## daily mean using time series
xts.m <- function()
xtt.d <- apply.daily(xtt,mean)
## benchmark
microbenchmark(dt.m(),xts.m(),dt.withday.m,times=5,unit='ms')
Unit: milliseconds
expr min lq median uq max neval
dt.m() 159.36342 160.71548 161.628732 162.527193 171.672999 5
xts.m() 206.63565 207.90692 208.210708 214.023594 225.322446 5
dt.withday.m 0.00038 0.00038 0.001138 0.001139 0.001518 5
So the 2 methods have nearly the same performance. But we dramatically improve performance once we create the grouping factor(dt.withday.m ). So if you have to do other daily summaries , using data.table is a the best choice.
Another point, is rolling mean or computing the mean over a giving width time window.As far as I know, I think that xts is unbeatable for rolling mean:
xtt.r <- rollapply(xtt,width=2,FUN=mean)
This question might be closed as it sounds vague but I'm really asking this because I have no idea or my math background is not sufficient enough.
I'm trying to implement a challenge and a part of the challenge requires me to compute min and max values of a matrix. I have no trouble with matrix implementations and its operations however what is the min and max values of a matrix? Considering a 3x3 matrix is the min the smallest number among 9 numbers and max is the greatest or something else?
It really depends. The maximum could be the maximum entry, or the entry with the maximum absolute value or it could be the row (col) vector that is largest with respect to some norm.