Finding the Min & Max Times for Multiple Individuals - datetime

For work I have a report where I compile the number of calls, emails, and texts a person makes each day. Along with this I need to pick out the earliest (Min) and the latest (max) times for each of those actions. I'm wondering if there isn't an easier way to me to pull this data from the date column rather than scrolling down for each person and finding the information.

You are right, there is definitely an easier way. What we need to rely on is that Excel stores times as the number of days since 0th January 1900 (so 1st January 1900 is day 1). Therefore, finding the earliest and latest times is simple a matter of finding the min and max values within a specific day.
I'm assuming that your data is set out as in the following. If it isn't, you can just edit my formula as appropriate.
A B C D
1 Person Date Time
2 Steve Monday 14:00
3 Steve Monday 14:05
4 Sharon Monday 12:00
5 Steve Tuesday 09:00
6 Sharon Tuesday 15:00
What we need to do is find the minimum time for Steve, given that date = Monday. We need to use an array formula. Array formulas let us 'look up' against more than one cell at the same time. The formula I would use is:
=MIN(IF(A2:A6="Steve",IF(B2:B6="Monday",C2:C6)))
Instead of clicking 'Enter' when you use this formula, you need to click Ctrl+Shift+Enter i.e., you enter the formula above and click Ctrl+Shift+Enter and Excel will return:
={MIN(IF(A2:A6="Steve",IF(B2:B6="Monday",C2:C6)))}
Can you see how to add more constraints to the look up? I've included an example screenshot below, where I've made a bigger table and also made the 'Steve' and 'Monday' references refer out to a cell, rather than just being hardcoded into the formula.

Related

How to I transform half-hourly data that does not span the whole day to a Time Series in R?

This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.

Count between months in Tableau

I am needing to count month between collect dates. I need to know if the test was run in the last 3 months. Below is the code I used but it is giving me a count of zero, but I know they had 3 of the same tests run in a year because I can see the dates. I understand the first one have a count of zero, because there is no test before that, but the count for the other should be 3, 5 respectively.
DATEDIFF('month',[Collect Date],[Collect Date])
Dates of the Tests.
1/8/2015
4/23/2015
9/30/2015
What you are looking for is possible using the LOOKUP function in Tableau. Keep in mind, that the result relies heavily on the data that is displayed and how it is displayed (sorted, etc).
You can create a calculated field like this:
DATEDIFF("month",LOOKUP(ATTR([Test Date]),-1),ATTR([Test Date]))
Which calculates the number of months between the date in the current row and the date from the prior row.
Your result will look something like this:

Directional statistics in R

I need to create a function for some work I'm doing on directional statistics. I want to show the distribution of flood events using a circle and calculate the mean direction and variance.
I need to calculate the angular value in radians by multiplying the julian date by (360/365). I am having problems because I need a function that takes account of the leap years in the 40 year record I am considering. i.e. IF leap year angular value = julian date x (360/366).
The data I am using is Peaks above threshold so I do not have a piece of data for every year and in some years I have more than one entry
Date Time Flow
04/05/1973 00:00 44.67
22/06/1974 00:00 128.38
22/11/1974 23:45 129.15
26/09/1976 22:00 89.51
15/10/1976 00:00 139.35
24/02/1978 19:30 183.69
27/12/1978 04:00 229.65
18/03/1980 09:15 117.7
02/03/1981 22:00 262.39
Many thanks
Rich
There may be a more elegant way to do this, but try
df$Year<-format(df$Date,"%Y")
that should put just the year if a single column. Then make a new column to indicate if it is a leap year
df$Leap<-0
df$Leap[df$Year=="1972" | df$ Year=="1976" |df$Year=="1980"]<-1
depending on your data, you may find it easier to change to a number and then use the %% to see if you can divide it evenly by 4, but beware of the year 2000.
Then you can use an if statement to the effect of
if (df$Leap==0) {do * 360/365} else {do * 360/366}

Flexible calculations in data frames

I have a little problem with R and my skills are somehow limited.
I want to conduct two calculations in a data frame which are based on the previous row.
The first one is a count variable, additionally I want to calculate the difference between the current and the previous line.
I think the easiest way to clarify my problem is a small example:
Imagine the following table below, which consists of only two columns. user is a customer number and time is the time of a transaction of the particular user.
Now I want to create two new columns as specified in the example table:
The counter variable count, which simply counts the transactions of the user, indicating the actual number of the actual user's transaction.
The variable diff (time [s]), which is the time difference [in seconds] between the current transaction and the previous one. Thus something like: time [i] - time [i-1], but the calculation for each new user must start again from zero; obviously no time difference can be calculated for the first transaction of each user.
I've tried to solve this problem with a loop, however the table is very large and the calculation on the complete data set just didn't want to end.
user time count diff(time[s])
A 10:00:00 1
A 10:30:00 2 1.800
A 12:00:00 3 5.400
A 13:00:00 4 3.600
B 14:00:00 1
C 15:00:00 1
C 16:00:00 2 3.600
C 17:00:00 3 3.600
I would do it using the plyr package, which makes life a lot easier when it comes to data wrangling. There are ways to do this and other transformations in base R, but it's a mess of different functions with inconsistent interfaces.
library(plyr)
ddply(df, .(user), transform, count=seq_len(time), diff=c(0, diff(time)))

How to plot every second timestep? [r]

There must be a very easy way to do this but I don't know what it is...
As the title says, I would like to know how I can plot every second timestep of a time series in R? For example, I have half hourly data but I only want to plot the data on the hour e.g. I have
10:00 0
10:30 1
11:00 2
11:30 3
12:00 4
I just want to plot
10:00 0
11:00 2
12:00 4
Something like
plot(x[seq_along(x)%%2==0])
?
Edit: I don't know how you are plotting your data set above, but however you're doing it, you can subset your data as follows
halfhourdata <- fulldata[seq(nrow(fulldata)) %%2 == 1,]
If you give more details someone might tell you how to figure out which time values are hourly rather than relying (as here) on the fact that they are the odd-numbered rows ...
Slightly less verbose and not quite as clear as Ben's solution but you can use vector recycling and indexing using a boolean to achieve this (as long as you're just interested in every other observation).
# Extract the data you want (assuming you want to keep
# the first observation and skip the second, ...
newdat <- x[c(T,F)]
plot(newdat)

Resources