I have a very big dataframe with repeated measures but no column is available to group. The key to select needed rows is to geet the max(id) taking into account the repeated sequence is from 0 to 7 this way:
temperature weekday id
32 monday 0
34 thursday 0
34 saturday 1
55 wednesday 2
43 friday 0
45 sunday 1
42 friday 0
desired output (max id from sequence):
temperature weekday id
32 monday 0
55 wednesday 2
45 sunday 1
42 friday 0
Sounds like you want to select every row where the next id isn't higher than the current id. With dplyr:
your_data %>% filter(lead(id, default = 0) <= id)
(The default makes sure the last row of the data is included.)
Related
This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
I have a folder with several files where the name of each file is the respective userID. Something like this:
Time Sms
1 2012-01-01 00:00:00 10
2 2012-01-01 00:30:00 11
3 2012-01-01 01:00:00 13
4 2012-01-01 01:30:00 10
How can i aggretate by moth, week, hour and minute? Something like this:
Month DayofWeek hour min SMS
1 Mon 0 0 14 <-mean
1 Mon 0 30 12
1 Mon 1 0 17
1 Mon 1 30 21
.............................
12 Sunday 23 30 12
I had a similar issue aggregating hourly data into daily data. This is the code that worked for me.
fun <- function(s,i,j) { sum(s[i:(i+j-1)]) }
radday<-sapply(X=seq(1,24*nb_of_days,24),FUN=fun,s=your_time_series,j=24)
This sums data across a period j, which in my case since I was summing over 24 hours was 24. By changing the j value you can adjust it for your different periods of hour, day, week, month assuming that you have a constant period.
thanks for the help. I solved my problem by applying this code:
df<-aggregate(Sms~month(Time)+weekdays(Time)+hour(Time)+minute(Time),df,FUN='mean')
Edited for clarity:
I'm using R and I have a set of data that consists of order days:
> orders <- data.frame(order.num=1:4,
+ day = c("Mon", "Mon", "Mon", "Tue"))
> orders
order.num day
1 1 Mon
2 2 Mon
3 3 Mon
4 4 Tue
...
Orders typically come in on a consistent day (Monday in example above), but sometimes they come in on an alternate day (Tuesday in example above).
Here is actual data, spread into columns using dplyr::spread function
Outlet.number Sun Mon Tue Wed Thu Fri Sat
1 1 0 530 162 0 629 49 0
2 2 0 784 123 0 854 65 0
3 3 24 15 483 0 365 0 0
For Outlet 1, the "typical" order days are Monday and Thursday
For Outlet 2, the "typical" order days are Monday and Thursday
For Outlet 3, the "typical" order days are Tuesday and Thursday
I want to be able to predict if an order on an atypical day (e.g. Tue for Outlet 1) is more likely to be associated with the first typical day (Monday) or the second typical day (Thursday)
Neither of these examples have any orders on Wednesdays so I was able to hard code this small set, but for future outlets, Wednesday may either be a typical or atypical day.
Is there a way to ingest the data as shown above and then classify them?
There are a few similar queries to mine but I can't quite figure it out. In Access 2010 I have one table with three columns, day, week and number.
Day Week Number
Monday 1 12
Monday 2 24
Tuesday 2 10
Thursday 1 12
Monday 1 10
Tuesday 2 10
I want to be able to count (sum) the total "number" for Monday in Week 1, Monday in Week 2 etc.
Day Week Total
Monday 1 22
Monday 2 24
Tuesday 2 20
Thursday 1 12
You need a TOTALS Query.
SELECT
yourTable.dayFieldColumn,
yourTable.weekFieldColumn,
Sum(yourTable.numberColumnName) As TotalSum
FROM
yourTable
GROUP BY
yourTable.dayFieldColumn,
yourTable.weekFieldColumn;
Any idea why the day is coming out wrong when the date is accurate?
I'm debugging and I can see the date variables which are correct but the day is wrong.
date Date (#9f14161)
date 26 [0x1a]
dateUTC 26 [0x1a]
day 5
dayUTC 5
fullYear 2010 [0x7da]
fullYearUTC 2010 [0x7da]
hours 17 [0x11]
hoursUTC 17 [0x11]
milliseconds 0
millisecondsUTC 0
minutes 0
minutesUTC 0
month 10 [0xa]
monthUTC 10 [0xa]
seconds 0
secondsUTC 0
time 1290790800000 [0x12c89208a80]
timezoneOffset 0
That is my date variables, as you can see, The date is 26 (today), month is 10 (this month) and the year is 2010 (this year) yet the day is coming out at 5 which is a friday.
The month begins with 0, so a month with the value 10 is not october but november.
So friday (day = 5) is correct in your example.