I wish to generate some Tidy data.
26 companies are observed everyday for 10 days.
Each day a value is recorded.
The first day is: 2020/1/1
How do I create a list of dates so that the first 26 rows of the date column of the date frame is "2020/1/1" (Year, Month, Day) and the next 26 rows are "2020/1/2" etc.
Here is the data frame without the date column:
library(tidyverse)
set.seed(33)
date_chunk <- rep(as.Date("2020/1/1"), 26)
# Tidy data. 10 sequential days starting 2020/1/1/
df <- tibble(
company = rep(letters, 10),
value = sample(0:5, 260, replace = TRUE),
color = "grey"
)
You can try this
rep(seq(as.Date("2020-01-01"),as.Date("2020-01-10"),1),each=26)
This will return a list of dates from 2020-01-01 to 2020-01-10 where each date will be repeated 26 times
For each company we can add row_number() to first date_chunk to get an incremental sequence of dates.
library(dplyr)
df %>%
group_by(company) %>%
mutate(date = first(date_chunk) + row_number() - 1)
Related
I am really new at R and this is probably a really basic question: Let's say I have a dataset with a column that includes date values of the format ("y-m-d H:M:S") as a Factor value.
How do I split the one column into 5?
Given example:
x <- as.factor(c("2018-01-03 12:34:32.92382", "2018-01-03 12:50:40.00040"))
x <- as_datetime(x) #to convert to type Date
x <- x %>%
dplyr::mutate(year = lubridate::year(x),
month = lubridate::month(x),
day = lubridate::day(x),
hour = lubridate::hour(x),
minute = lubridate::minute(x),
second = lubridate::second(x))
I get the error: for objects with the class(c('POSIXct', 'POSIXt') can not be used.
Change it into dataframe then run mutate part will works
x %>%
as.data.frame() %>%
rename(x = '.') %>%
dplyr::mutate(year = lubridate::year(x),
month = lubridate::month(x),
day = lubridate::day(x),
hour = lubridate::hour(x),
minute = lubridate::minute(x),
second = lubridate::second(x))
x year month day hour minute second
1 2018-01-03 12:34:32 2018 1 3 12 34 32.92382
2 2018-01-03 12:50:40 2018 1 3 12 50 40.00040
You could also make your mutate a little bit cleaner utilizing the power of across:
library(lubridate)
x %>%
data.frame(date = .) %>%
mutate(across(date,
funs(year, month, day, hour, minute, second),
.names = "{.fn}"))
I have a datafile containing ~60,000 observations from 70 individuals. The datafile looks like this: datafile exampledatafile
I wish to select the last 5 minutes of data for each individual. Each individual has a different number of observations. Is there a way to identify the last observation for each individual and select the preceding 5 minutes of data? I used the code below to identify the first 5 minutes but I am unsure how to do the same for the last 5 minutes.
#Set date and time format
df$DateTime=paste(df$Date, df$Time)
df$DateTime <- as.POSIXct(df$DateTime, format="%d/%m/%Y %H:%M:%S")
df$ID <- as.numeric(as.character(df$ID))
df$Value <- as.numeric(as.character(df$Value))
extract=df %>%
group_by(ID, DateTime = cut(DateTime, breaks="5 min")) %>%
summarize(Value=median(Value))
Thanks in advance!
This should filter to the last 5 minutes of observations per individual.
df %>%
group_by(ID) %>%
mutate(last_time = max(DateTime)) %>%
ungroup() %>%
filter(DateTime >= last_time - 5*60)
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I have a data set that looks something like below. Basically, I am interested in checking if a particular id is present at the beginning of the year(in this case jan,1,2003) that it is present everyday until the end of the year( dec 31 2003) then starting the checking process over again with the start of next year as people might change from year to year but should not change within a year. If on certain day, an id is not present I would like to know which day and which id.
I first started with a for loop and checked every two days but this is super inefficient since my data set spans roughly 50 years and will grow later on with new data.
dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates)))
df <- data.frame( dates = dates,id = id)
Edit:The above chunk has all the dates in it but if I delete for example id = 1 on the second day, the code should tell me it is missing so the count shouldn't be the same. I added the piece to delete the id = 1 on the second day below.
df <- df[-4,]
The code below will make the same data set but delete id = 1 for jan 2, 2003 and jan 3, 2003. I am trying to get something that returns the id that is missing and the date.
dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates)))
df <- data.frame( dates = dates,id = id)
df <- df[-4,]
df <- df[-6,]
This code chunk will count number of times a person appears in each year. if the answer is 365 or 366 in leap years a person was there everyday of the year.
library(dplyr)
library(tidyr)
dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates)))
df <- data.frame( dates = dates,id = id)
dfx <- df %>%
mutate(yrs = lubridate::year(dates)) %>%
group_by(id, dates) %>%
filter(row_number()==1) %>%
group_by(id, yrs) %>%
tally
#remove values
dfa <- df[c(-4,-6),]
The in oder to find the date of missing value add an indicator column to the data set. then fill in the missing dates by id. After this the val column will have missing values. Filter the data to get the dates where it went missing.
dfx <- dfa %>%
mutate(val = 1) %>%
complete(nesting(id),
dates = seq(min(dates),max(dates),by = "day")) %>%
filter(is.na(val))
How to add one column price.wk.average to the data such that price.wk.average is equal to the average price of last week, and also add one column price.mo.average to the data such that it equals to the average price of last month? The price.wk.average will be the same for the entire week.
Dates Price Demand Price.wk.average Price.mo.average
2010-1-1 x x
2010-1-2 x x
......
2015-1-1 x x
jkl,
try to post reproducible examples. It will make it easier to help you. you can use dplyr:
library(dplyr)
df <- data.frame(date = seq(as.Date("2017-1-1"),by="day",length.out = 100), price = round(runif(100)*100+50,0))
df <- df %>%
group_by(week = week(date)) %>%
mutate(Price.wk.average = mean(price)) %>%
ungroup() %>%
group_by(month = month(date)) %>%
mutate(Price.mo.average = mean(price))
(Since I don't have enough points to comment)
I wanted to point out that Eric's answer will not distinguish average weekly price by year. Therefore, if you are interested in unique weeks (Week 1 of 2012 != Week 1 of 2015 ), you will need to do extra work to group by unique weeks.
df <- data.frame( Dates = c("2010-1-1", "2010-1-2", "2015-01-3"),
Price = c(50, 20, 40) )
Dates Price
1 2010-1-1 50
2 2010-1-2 20
3 2015-01-3 40
Just to keep your data frame tidy, I suggest converting dates to POSIX format then sorting the data frame:
library(lubridate)
df <- df %>%
mutate(Dates = lubridate::parse_date_time(Dates,"ymd")) %>%
arrange( Dates )
To group by unique weeks:
df <- df %>%
group_by( yw = paste( year(Dates), week(Dates)))
Then mutate and ungroup.
To group by unique months:
df <- df %>%
group_by( ym = paste( year(Dates), month(Dates)))
and mutate and ungroup.