I have a datafile containing ~60,000 observations from 70 individuals. The datafile looks like this: datafile exampledatafile
I wish to select the last 5 minutes of data for each individual. Each individual has a different number of observations. Is there a way to identify the last observation for each individual and select the preceding 5 minutes of data? I used the code below to identify the first 5 minutes but I am unsure how to do the same for the last 5 minutes.
#Set date and time format
df$DateTime=paste(df$Date, df$Time)
df$DateTime <- as.POSIXct(df$DateTime, format="%d/%m/%Y %H:%M:%S")
df$ID <- as.numeric(as.character(df$ID))
df$Value <- as.numeric(as.character(df$Value))
extract=df %>%
group_by(ID, DateTime = cut(DateTime, breaks="5 min")) %>%
summarize(Value=median(Value))
Thanks in advance!
This should filter to the last 5 minutes of observations per individual.
df %>%
group_by(ID) %>%
mutate(last_time = max(DateTime)) %>%
ungroup() %>%
filter(DateTime >= last_time - 5*60)
Related
I want to get the difference in consecutive rows of Time. When I used this code it works fine.
But when I apply that to my original dataset 1 minute appears as 0.0006944444 (in terms of days).
How can I get make it appear as 1 minute instead?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:00","17:16:00", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
Too long for a comment for the time being. When I run your code on a modified data set with a one second difference between two time points, your answer is exactly as expected. The diff column contains a one second difference.
Can you show us some code that produces the error?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:01","17:15:02", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
# Date Time diff
# 1 03/06/2019 17:15:01 <NA>
# 2 03/06/2019 17:15:02 00:00:01
# 3 03/06/2019 17:18:00 00:02:58
# 4 03/06/2019 17:21:00 00:03:00
I wish to generate some Tidy data.
26 companies are observed everyday for 10 days.
Each day a value is recorded.
The first day is: 2020/1/1
How do I create a list of dates so that the first 26 rows of the date column of the date frame is "2020/1/1" (Year, Month, Day) and the next 26 rows are "2020/1/2" etc.
Here is the data frame without the date column:
library(tidyverse)
set.seed(33)
date_chunk <- rep(as.Date("2020/1/1"), 26)
# Tidy data. 10 sequential days starting 2020/1/1/
df <- tibble(
company = rep(letters, 10),
value = sample(0:5, 260, replace = TRUE),
color = "grey"
)
You can try this
rep(seq(as.Date("2020-01-01"),as.Date("2020-01-10"),1),each=26)
This will return a list of dates from 2020-01-01 to 2020-01-10 where each date will be repeated 26 times
For each company we can add row_number() to first date_chunk to get an incremental sequence of dates.
library(dplyr)
df %>%
group_by(company) %>%
mutate(date = first(date_chunk) + row_number() - 1)
I am an aspiring data scientist, and this will be my first ever question on StackOF.
I have this line of code to help wrangle me data. My date filter is static. I would prefer not to have to go in an change this hardcoded value every year. What is the best alternative for my date filter to make it more dynamic? The date column is also difficult to work with because it is not a
"date", it is a "dbl"
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
Tried so far:
df %>%
filter(DATE >= 20191231)
# load packages (lubridate for dates)
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
This looks like this:
DATE
1 20191230
2 20191231
3 20200122
# and now...
df %>% # take the dataframe
mutate(DATE = ymd(DATE)) %>% # turn the DATE column actually into a date
filter(DATE >= floor_date(Sys.Date(), "year") - days(1))
...and filter rows where DATE is >= to one day before the first day of this year (floor_date(Sys.Date(), "year"))
DATE
1 2019-12-31
2 2020-01-22
How to add one column price.wk.average to the data such that price.wk.average is equal to the average price of last week, and also add one column price.mo.average to the data such that it equals to the average price of last month? The price.wk.average will be the same for the entire week.
Dates Price Demand Price.wk.average Price.mo.average
2010-1-1 x x
2010-1-2 x x
......
2015-1-1 x x
jkl,
try to post reproducible examples. It will make it easier to help you. you can use dplyr:
library(dplyr)
df <- data.frame(date = seq(as.Date("2017-1-1"),by="day",length.out = 100), price = round(runif(100)*100+50,0))
df <- df %>%
group_by(week = week(date)) %>%
mutate(Price.wk.average = mean(price)) %>%
ungroup() %>%
group_by(month = month(date)) %>%
mutate(Price.mo.average = mean(price))
(Since I don't have enough points to comment)
I wanted to point out that Eric's answer will not distinguish average weekly price by year. Therefore, if you are interested in unique weeks (Week 1 of 2012 != Week 1 of 2015 ), you will need to do extra work to group by unique weeks.
df <- data.frame( Dates = c("2010-1-1", "2010-1-2", "2015-01-3"),
Price = c(50, 20, 40) )
Dates Price
1 2010-1-1 50
2 2010-1-2 20
3 2015-01-3 40
Just to keep your data frame tidy, I suggest converting dates to POSIX format then sorting the data frame:
library(lubridate)
df <- df %>%
mutate(Dates = lubridate::parse_date_time(Dates,"ymd")) %>%
arrange( Dates )
To group by unique weeks:
df <- df %>%
group_by( yw = paste( year(Dates), week(Dates)))
Then mutate and ungroup.
To group by unique months:
df <- df %>%
group_by( ym = paste( year(Dates), month(Dates)))
and mutate and ungroup.
I have a data frame which consists of date and temperature of 34 different systems each system in different column. I need to calculate every systems average hourly temperature. I use this code to calculate average for 1 system. But if I want to calculate average for other 33 systems, I have to repeat code again, and again. Is there a better way to find hourly average in all columns at once ?
dat$ut_ms <- dat$ut_ms/1000
dat[ ,1]<- as.POSIXct(dat[,1], origin="1970-01-01")
dat$ut_ms <- strptime(dat$ut_ms, "%Y-%m-%d %H:%M")
dat$ut_ms <- cut(dat[enter image description here][1]$ut_ms, breaks = 'hour')
meanNPWD2401<- aggregate(NPWD2401 ~ ut_ms, dat, mean)
I added a picture of the data. For better understing of what I want.
You can split your data per hour and itterate,
list1 <- split(dat, cut(strptime(dat$ut_ms, format = '%Y-%m-%d %H:%M'), 'hour'))
lapply(list1, colMeans)
When you rearrange the data into a long format, things get much easier
n.system <- 34
n.time <- 100
temp <- rnorm(n.time * n.system)
temp <- matrix(temp, ncol = n.system)
seconds <- runif(n.time, max = 3 * 3600)
time <- as.POSIXct(seconds, origin = "1970-01-01")
dataset <- data.frame(time, temp)
library(dplyr)
library(tidyr)
dataset %>%
gather(key = "system", value = "temperature", -time) %>%
mutate(hour = cut(time, "hour")) %>%
group_by(system, hour) %>%
summarise(average = mean(temperature))