Adding hour and 0 count where it is missing from data [duplicate] - r

This question already has answers here:
How to format a pivot like table that includes records for all time and id values?
(2 answers)
Closed 4 years ago.
My dataframe looks like this. If there is no data for the hour there isnt even a row for the hour of day. The hours in the data go from 0-23 representing 24 hours in the day. Is there a way to add the hours for the date with a zero count with maybe a second dataframe as a lookup or something?
df
date hour count
2018-01-15 08 4682
2018-01-15 09 406
2018-01-16 05 3359
2018-01-16 06 11926
2018-01-16 07 42602
I would like the dataframe to look like this:
df
date hour count
2018-01-15 01 0
2018-01-15 02 0
2018-01-15 03 0
2018-01-15 04 0
2018-01-15 06 0
2018-01-15 06 0
2018-01-15 07 0
2018-01-15 08 4682
2018-01-15 09 406
2018-01-15 10 0
....
2018-01-16 05 3359
2018-01-16 06 11926
2018-01-16 07 42602
2018-01-16 08 0
2018-01-16 09 0
2018-01-16 10 0
2018-01-16 11 0
....

As mentionned by others, you could use dplyr and tidyr.
For your specific column names, this comes down to:
library(dplyr)
library(tidyr)
data = "date hour count
2018-01-15 08 4682
2018-01-15 09 406
2018-01-16 05 3359
2018-01-16 06 11926
2018-01-16 07 42602"
df <- read.table(text=data, header = T)
df
df %>%
group_by(date) %>%
complete(hour = full_seq(1:24, 1), fill = list(count = 0))
Which yields:
# A tibble: 48 x 3
# Groups: date [2]
date hour count
<fct> <dbl> <dbl>
1 2018-01-15 1. 0.
2 2018-01-15 2. 0.
3 2018-01-15 3. 0.
4 2018-01-15 4. 0.
5 2018-01-15 5. 0.
6 2018-01-15 6. 0.
7 2018-01-15 7. 0.
8 2018-01-15 8. 4682.
9 2018-01-15 9. 406.
10 2018-01-15 10. 0.
# ... with 38 more rows

you can use expand.grid to get the cartesian product of the column values, and use join operation in data.table package
library('data.table')
df2 <- expand.grid(date = unique(df1$date), hour = 0:23, count = 0L, stringsAsFactors = FALSE)
setDT(df2)[df1, count := i.count, on = .(date, hour)]
using cross join CJ in data.table for creating the df2 data
df2 <- CJ(date = unique(df1$date), hour = 0:23, count = 0L)
df2[df1, count := i.count, on = .(date, hour)]
Data:
df1 <- read.table(text='2018-01-15 08 4682
2018-01-15 09 406
2018-01-16 05 3359
2018-01-16 06 11926
2018-01-16 07 42602 ', stringsAsFactors = FALSE)
colnames(df1) <- c('date', 'hour', 'count')

Related

Splitting created sequence of dates into separate columns

I created a dataframe (Dates) of dates/times every 3 hours from 1981-2010 as follows:
# Create dates and times
start <- as.POSIXct("1981-01-01")
interval <- 60
end <- start + as.difftime(10957, units="days")
Dates = data.frame(seq(from=start, by=interval*180, to=end))
colnames(Dates) = "Date"
I now want to split the data into four separate columns with year, month, day and hour. I tried so split the dates using the following code:
Date.split = strsplit(Dates, "-| ")
But I get the following error:
Error in strsplit(Dates, "-| ") : non-character argument
If I try to convert the Dates data to characters then it completely changes the dates, e.g.
Dates.char = as.character(Dates)
gives the following output:
Dates.char Large Character (993.5 kB)
chr "c(347155200, 347166000 ...
I'm getting lost with the conversion between character and numeric and don't know where to go from here. Any insights much appreciated.
One way is to use format.
head(
setNames(
cbind(Dates,
format(Dates, "%Y"), format(Dates, "%m"), format(Dates, "%d"),
format(Dates, "%H")),
c("dates", "year", "month", "day", "hour"))
)
dates year month day hour
1 1981-01-01 00:00:00 1981 01 01 00
2 1981-01-01 03:00:00 1981 01 01 03
3 1981-01-01 06:00:00 1981 01 01 06
4 1981-01-01 09:00:00 1981 01 01 09
5 1981-01-01 12:00:00 1981 01 01 12
6 1981-01-01 15:00:00 1981 01 01 15
A very concise way is to decompose the POSIXlt record:
Dates = cbind(Dates, do.call(rbind, lapply(Dates$Date, as.POSIXlt)))
or
Dates <- data.frame(Dates, unclass(as.POSIXlt(Dates$Date)))
It will return you some aditional data, however. You can filter further
# Date sec min hour mday mon year wday yday isdst zone # gmtoff
# 1 1981-01-01 00:00:00 0 0 0 1 0 81 4 0 0 -03 # -10800
# 2 1981-01-01 03:00:00 0 0 3 1 0 81 4 0 0 -03 # -10800
# 3 1981-01-01 06:00:00 0 0 6 1 0 81 4 0 0 -03 # -10800

Is it possible to convert year-week date format to the first day of the week`?

I have a Year-Week format date. Is it possible to convert it to the first day of the week i.e. 201553 is 2015-12-28 and 201601 is 2016-01-04.
I found here how to do it, however, it does not work correctly on my dates. Could you help to do it without ISOweek package.
date<-c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L)
as.POSIXct(paste(date, "0"),format="%Y%u %w")
Here's a way,
date<-data.frame(first = c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L))
First separate the week and year from integer,
library(stringr)
library(dplyr)
date = date %>% mutate(week = str_sub(date$first,5,6))
date = date %>% mutate(year = str_sub(date$first,1,4))
The use aweek package to find the date,
library(aweek)
date = date %>% mutate(actual_date = get_date(week = date$week, year = date$year))
first week year actual_date
1 201553 53 2015 2015-12-28
2 201601 01 2016 2016-01-04
3 201602 02 2016 2016-01-11
4 201603 03 2016 2016-01-18
5 201604 04 2016 2016-01-25
6 201605 05 2016 2016-02-01
7 201606 06 2016 2016-02-08
8 201607 07 2016 2016-02-15
9 201608 08 2016 2016-02-22
10 201609 09 2016 2016-02-29

ggplot by group does not get expected outcomes

I have a data frame oz.sim.long. It has three columns. Please see below. The Times column should be the x axis in ggplot, i.e. hours from 00:30-23:00. The Month is the column of groups (03:08). The Ozone column is to plot.
> oz.sim.long
# A tibble: 144 x 3
Times Month Ozone
<chr> <chr> <fct>
1 00:30 03 44.45481
2 00:30 04 49.43994
3 00:30 05 50.86507
4 00:30 06 48.97589
5 00:30 07 46.31845
6 00:30 08 44.78662
7 01:30 03 44.47265
8 01:30 04 49.46492
9 01:30 05 50.83062
10 01:30 06 48.79744
# … with 134 more rows
Here is my code to plot and I got unexpected outcome. Any ideas?
simul.plt <- ggplot(data = oz.sim.long, aes(x=Times, y=Ozone)) +
geom_point(aes(shape=Month,color=Month)) +
geom_smooth(aes(color=Month, linetype=Month), method = 'auto', se = F) +
labs(x='Times',y='Ozone (ppb)')

Calculating first and last day of month from a yearmon object

I have a simple df with a column of dates in yearmon class:
df <- structure(list(year_mon = structure(c(2015.58333333333, 2015.66666666667,
2015.75, 2015.83333333333, 2015.91666666667, 2016, 2016.08333333333,
2016.16666666667, 2016.25, 2016.33333333333), class = "yearmon")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
I'd like a simple way, preferably using base R, lubridate or xts / zoo to calculate the first and last days of each month.
I've seen other packages that do this, but I'd like to stick with the aforementioned if possible.
We can use
library(dplyr)
library(lubridate)
library(zoo)
df %>%
mutate(firstday = day(year_mon), last = day(as.Date(year_mon, frac = 1)))
Using base R, you could convert the yearmon object to date using as.Date which would give you the first day of the month. For the last day, we could increment the date by a month (1/12) and subtract 1 day from it.
df$first_day <- as.Date(df$year_mon)
df$last_day <- as.Date(df$year_mon + 1/12) - 1
df
# year_mon first_day last_day
# <S3: yearmon> <date> <date>
# 1 Aug 2015 2015-08-01 2015-08-31
# 2 Sep 2015 2015-09-01 2015-09-30
# 3 Oct 2015 2015-10-01 2015-10-31
# 4 Nov 2015 2015-11-01 2015-11-30
# 5 Dec 2015 2015-12-01 2015-12-31
# 6 Jan 2016 2016-01-01 2016-01-31
# 7 Feb 2016 2016-02-01 2016-02-29
# 8 Mar 2016 2016-03-01 2016-03-31
# 9 Apr 2016 2016-04-01 2016-04-30
#10 May 2016 2016-05-01 2016-05-31
Use as.Date.yearmon from zoo as shown. frac specifies the fractional amount through the month to use so that 0 is beginning of the month and 1 is the end.
The default value of frac is 0.
You must already be using zoo if you are using yearmon (since that is where the yearmon methods are defined) so this does not involve using any additional packages beyond what you are already using.
If you are using dplyr, optionally replace transform with mutate.
transform(df, first = as.Date(year_mon), last = as.Date(year_mon, frac = 1))
gives:
year_mon first last
1 Aug 2015 2015-08-01 2015-08-31
2 Sep 2015 2015-09-01 2015-09-30
3 Oct 2015 2015-10-01 2015-10-31
4 Nov 2015 2015-11-01 2015-11-30
5 Dec 2015 2015-12-01 2015-12-31
6 Jan 2016 2016-01-01 2016-01-31
7 Feb 2016 2016-02-01 2016-02-29
8 Mar 2016 2016-03-01 2016-03-31
9 Apr 2016 2016-04-01 2016-04-30
10 May 2016 2016-05-01 2016-05-31

Aggregate data to weekly level with every week starting from Monday

I have a data frame like,
2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun
I want to aggregaate it to weekly level such that every week starts from "monday" and ends in "sunday". So, in the aggregated data for above, first week should end on 2015-02-01.
output should look like something for above
firstweek 6
secondweek 7
I tried this,
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
But here in the final result, every week is starting from Sunday.
This should work. I've called the dataframe m and named the columns possibly different to yours.
library(plyr) # install.packages("plyr")
colnames(m) = c("Date", "count","Day")
start = as.Date("2015-01-26")
m$Week <- floor(unclass(as.Date(m$Date) - as.Date(start)) / 7) + 1
m$Week = as.numeric(m$Week)
m %>% group_by(Week) %>% summarise(count = sum(count))
The library plyr is great for data manipulation, but it's just a rough hack to get the week number in.
Convert to date and use the %W format to get a week number...
df <- read.csv(textConnection("2015-01-30, 1, Fri,
2015-01-30, 2, Sat,
2015-02-01, 3, Sun,
2015-02-02, 1, Mon,
2015-02-03, 1, Tue,
2015-02-04, 1, Wed,
2015-02-05, 1, Thu,
2015-02-06, 1, Fri,
2015-02-07, 1, Sat,
2015-02-08, 1, Sun"), header=F, stringsAsFactors=F)
names(df) <- c("date", "something", "day")
df$date <- as.Date(df$date, format="%Y-%m-%d")
df$week <- format(df$date, "%W")
aggregate(df$something, list(df$week), sum)
Wit dplyr and lubridate is this really easy thanks to the function isoweek
my.df <- read.table(header=FALSE, text=
'2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun')
my.df %>% mutate(week = isoweek(V1)) %>% group_by(week) %>% summarise(sum(V2))
or a bit shorter
my.df %>% group_by(isoweek(V1)) %>% summarise(sum(V2))

Resources