How to create a date with "as.Date" from 3 different vectors (day, month, years)? - r

I have 3 different vectors including data of days, months and years. I would like to merge these 3 into one and add it to a new column of my data frame. I tried to use "as.Date" to merge these 3 vectors but it won't work...
Could you help me? :)
Here is my code:
Day<- substr(x = my_meteo_charleroi$Local.Time, start = 1, stop =2 )
Month<- substr(x = my_meteo_charleroi$Local.Time, start = 4, stop =5 )
Year<- substr(x = my_meteo_charleroi$Local.Time, start = 7, stop =10 )
my_date<- as.Date(c(Day, Month, Year), format = c("%d, %m, %y"))

Does this work:
Day <- c(10,11,12)
Month <- c(11,11,12)
Year <- c(2019,2020,2020)
library(tibble)
library(dplyr)
tibble(Day, Month, Year) %>%
mutate(my_date = paste(Day, Month, Year, sep = '-')) %>%
mutate(my_date = as.Date(my_date, format = '%d-%m-%Y', origin = '1970-01-01')) %>%
pull(my_date)
[1] "2019-11-10" "2020-11-11" "2020-12-12"
Dataframe with my_date column looks like this:
tibble(Day, Month, Year) %>% mutate(my_date = paste(Day, Month, Year, sep = '-')) %>%
mutate(my_date = as.Date(my_date, format = '%d-%m-%Y', origin = '1970-01-01'))
# A tibble: 3 x 4
Day Month Year my_date
<dbl> <dbl> <dbl> <date>
1 10 11 2019 2019-11-10
2 11 11 2020 2020-11-11
3 12 12 2020 2020-12-12

You are pretty close already, just two modifications needed.
Both arguments to as.Date() should be strings here, not vectors.
as.Date(paste(Day, Month, Year, sep = "-"), format = '%d-%m-%y')

Related

Compute day of the month and monthly averages in R and add as column

I have a data frame stored with daily data within a year and I want to compute monthly averages as well as day of the week averages and add those values as additional columns.
Here is a MWE of my data frame
df <- tibble(Date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
Daily_sales = rnorm(365, 2, 1))
df <- df %>%
mutate(month = lubridate::month(Date), #Month
dow = lubridate::wday(Date, week_start = 1), #Day of the week
dom = lubridate::day(Date)) #Day of the month
My problem is as follows: I know how to compute the monthly averages, e.g.
df %>% group_by(month) %>% summarize(Monthly_avg = mean(Daily_sales))
but i don't know how to add this as an additional column where every value in January has the average, and every value in February has the avg from February. E.g. if the avg of January is 2.22, then the new column should contain 2.22 for all dates in January. The same problem for the day of the week average.
Instead of summarize()ing an entire group into one row, we can mutate() all rows to add the group mean:
result <- df %>%
group_by(month) %>% mutate(monthly_avg = mean(Daily_sales)) %>%
group_by(dow) %>% mutate(dow_avg = mean(Daily_sales)) %>%
group_by(dom) %>% mutate(dom_avg = mean(Daily_sales)) %>%
ungroup()

Assigning factor labels and levels within a function

I have the following data frame:
library(janitor)
library(lubridate)
library(tidyverse)
data <- data.frame(date = c("1/28/2022", "1/25/2022", "1/27/2022", "1/23/2022"),
y = c(100, 25, 35, 45))
I need to write a function that adds a new column that sorts the date column and assigns sequential day stage (i.e., Day 1, Day 2, etc.). So far I have tried the following with no luck.
day.assign <- function(df){
df2 <- clean_names(df)
len <- length(unique(df2$date))
levels.start <- as.character(sort(mdy(unique(df2$date))))
day.label <- paste("Day", seq(1, len, by = 1))
df <-
df %>%
mutate(Date = as.character(mdy(Date)),
Day = as.factor(Date,
levels = levels.start,
labels = day.label))
}
Future files will have a various amount of dates that must be accounted for when assigning the day column (i.e., one file may have 4 dates while the next may have 6).
You could do:
library(lubridate)
library(dplyr)
data <- data.frame(date = c("1/28/2022", "1/25/2022", "1/27/2022", "1/23/2022"),
y = c(100, 25, 35, 45))
day.assign <- function(df) {
df %>%
mutate(Date = mdy(date)) %>%
arrange(mdy(date)) %>%
mutate(Day = paste0("Day ", row_number()))
}
day.assign(data)
#> date y Date Day
#> 1 1/23/2022 45 2022-01-23 Day 1
#> 2 1/25/2022 25 2022-01-25 Day 2
#> 3 1/27/2022 35 2022-01-27 Day 3
#> 4 1/28/2022 100 2022-01-28 Day 4

Plotting date intervals in ggplot2

I have a dataset which has a bunch of date intervals (i.e. POSIXct format start dates and end dates).
In the example provided, let's say it's each period is associated to when someone was in school or out of school. I'm interested in plotting the data in ggplot2, each row is essentially data for one period. Currently all of the rows don't have a factor variable, but I've put one in the example as it may make things easier to plot. It's worth noting that in some cases the end date of one period and the beginning of the next overlap.
In the data, each row is a unique stint in school associated to a specific period. I'm interested in creating a sequence of weeks (from the first week to the last week in dataset) in the x axis and on the y axis I want just either a dot for each week to signify whether the person was in school (also identifying which stint) or out of school (even a gap perhaps would suffice). Thus perhaps an 8 level factor is needed in this case, one for each period, and a level for out of school (or perhaps no level is needed for when out of school)?
So in this case we could envisage having 7 rows of dots on the y axis, something (very loosely) like this (this example has many gap in lines, but I expect few or no gaps).
I envisaged the process to be something like: create a sequence from min(start_date) to max(end_date), join rows to this. Then somehow identify each period and create a factor variable for each period. Then plot the factor variable (e.g. period1, period2, period3) against the sequence of dates. I haven't been able to do this though as it's quite fiddly.
Looking at the lubridate package I was thinking that using interval() and %within% might be the solution but I wasn't sure.
library(tidyverse)
library(lubridate)
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
You can try
df %>%
ggplot() +
geom_segment(aes(x = start_dates, xend = end_dates, y =studying, yend = studying, color = studying), size=3) +
geom_segment(aes(x = start_dates, xend = start_dates, y =0, yend = studying))+
geom_segment(aes(x = end_dates, xend = end_dates, y =0, yend = studying))
Per wwek as you asked in the comments
df %>%
as_tibble() %>%
mutate(start = week(start_dates),
end = week(end_dates)) %>%
mutate(gr = start>end,
start_2 = ifelse(gr, 0, NA),
end_2 = ifelse(gr, end, NA),
end = ifelse(gr, 52, end)) %>%
select(-2:-3, -gr) %>%
pivot_longer(-1) %>%
filter(!is.na(value)) %>%
separate(col = name, into = c("name", "index"), sep = "_", fill = "right") %>%
mutate(index = ifelse(is.na(index), 1, index)) %>%
pivot_wider(names_from = "name", values_from = "value") %>%
ggplot(aes(y=studying , yend=studying , x=start, xend=end, color=studying)) +
geom_segment(size = 2)
To get overlaps you can use the valr package. Since it is developed to find overlaps in DNA segments the data needs some transformation. Start end end are calculated using a cumsum week approach. Chrom is set to "1".
library(valr)
df %>%
as_tibble() %>%
mutate(start = week(start_dates) + (year(start_dates)-min(year(start_dates)))*52,
end = week(end_dates) + (year(end_dates)-min(year(end_dates)))*52,
chrom="1",
index=1:n()) %>%
valr::bed_intersect(., .) %>%
filter(studying.x != studying.y) %>%
# filter duplicated intervals out
mutate(index = paste(index.x, index.y) %>% str_split(., " ") %>% map(sort) %>% map_chr(toString)) %>%
filter(duplicated(index))
# A tibble: 5 x 15
studying.x start_dates.x end_dates.x start.x end.x chrom index.x studying.y start_dates.y end_dates.y start.y end.y index.y .overlap index
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
1 period3 1583193600 1590624000 61 74 1 3 period2 1579219200 1583193600 55 61 2 0 2, 3
2 period4 1590624000 1607558400 74 102 1 4 period3 1583193600 1590624000 61 74 3 0 3, 4
3 period5 1607558400 1620345600 102 123 1 5 period4 1590624000 1607558400 74 102 4 0 4, 5
4 period6 1620345600 1641254400 123 157 1 6 period5 1607558400 1620345600 102 123 5 0 5, 6
5 period7 1641254400 1642550400 157 159 1 7 period6 1620345600 1641254400 123 157 6 0 6, 7

How to put/save all elements of a List into one Excel sheet in R?

I have a list (bbb) with 5 elements in it, i.e., each element for a year, like 2010, 2011, ... , 2014:
The first one in the list is this:
> bbb[1]
$`2010`
Date Average
X2010.01.01 2010-01-01 2.079090e-03
X2010.01.02 2010-01-02 5.147627e-04
X2010.01.03 2010-01-03 2.997464e-04
X2010.01.04 2010-01-04 1.375538e-04
X2010.01.05 2010-01-05 1.332109e-04
The second one in the list is this:
> bbb[2]
$`2011`
Date Average
X2011.01.01 2011-01-01 1.546253e-03
X2011.01.02 2011-01-02 1.152864e-03
X2011.01.03 2011-01-03 1.752446e-03
X2011.01.04 2011-01-04 2.639658e-03
X2011.01.05 2011-01-05 5.231150e-03
X2011.01.06 2011-01-06 8.909878e-04
And so on.
Here is my question:
How can I save all of these list's elements in 1 sheet of an Excel file to have something like this:
Your help would be highly appreciated.
You can do this using dcast.
bbb <- list(`2010` = data.frame(date = as.Date("2010-01-01") + 0:4,
avg = 1:5),
`2011` = data.frame(date = as.Date("2011-01-01") + 0:5,
avg = 11:16),
`2012` = data.frame(date = as.Date("2012-01-01") + 0:9,
avg = 21:30),
`2013` = data.frame(date = as.Date("2013-01-01") + 0:7,
avg = 21:28))
df <- do.call("rbind", bbb)
df$year <- format(df$date, format = "%Y")
df$month_date <- format(df$date, format = "%b-%d")
library(data.table)
library(openxlsx)
df_dcast <- dcast(df, month_date~year, value.var = "avg")
write.xlsx(df_dcast, "example1.xlsx")
Or using spread
library(dplyr)
library(tidyr)
df2 <- df %>%
select(-date) %>%
spread(key = year, value = avg)
write.xlsx(df2, "example2.xlsx")
This isn't very pretty, but it's the best I could think of right now. But you could take the dataframes and loop through the list, joining them by date like this:
library(tidyverse)
library(lubridate)
bbb <- list(`2010` = tibble(date = c('01-01-2010', '01-02-2010', '01-03-2010', '01-04-2010', '01-05-2010'),
average = 11:15),
`2011` = tibble(date = c('01-01-2011', '01-02-2011', '01-03-2011', '01-04-2011', '01-05-2011'),
average = 1:5),
`2012` = tibble(date = c('01-01-2012', '01-02-2012', '01-03-2012', '01-04-2012', '01-05-2012'),
average = 6:10))
for (i in seq_along(bbb)) {
if(i == 1){
df <- bbb[[i]] %>%
mutate(
date = paste(day(as.Date(date, format = '%m-%d-%Y')),
month(as.Date(date, format = '%m-%d-%Y'), label = TRUE),
sep = '-')
)
colnames(df) <- c('date', names(bbb[i])) # Assuming your list of dataframes has just 2 columns: date and average
} else {
join_df <- bbb[[i]] %>%
mutate(
date = paste(day(as.Date(date, format = '%m-%d-%Y')),
month(as.Date(date, format = '%m-%d-%Y'), label = TRUE),
sep = '-')
)
colnames(join_df) <- c('date', names(bbb[i]))
df <- full_join(df, join_df, by = 'date')
}
}
This loops through the list of dataframes and reformats the dates to Day-Month.
# A tibble: 5 x 4
date `2010` `2011` `2012`
<chr> <int> <int> <int>
1 1-Jan 11 1 6
2 2-Jan 12 2 7
3 3-Jan 13 3 8
4 4-Jan 14 4 9
5 5-Jan 15 5 10
You could then write that out with the writexl package function write_xlsx

My original data is weekly data, how do I plot it as weekly data in r?

My data are originally in week (examples below). I find it difficult to perform time series data since this data is always in the from of dd-mm-yy.
WEEK SALES
1: 29.2010 60.48
2: 30.2010 95.76
3: 31.2010 51.66
4: 32.2010 73.71
5: 33.2010 22.05
Thanks in advance!
We can convert the week number as date using functions from the lubridate package, and then plot the date on the x-axis and SALES on the y-axis.
library(tidyverse)
library(lubridate)
dat2 <- dat %>%
separate(WEEK, into = c("WEEK", "YEAR"), convert = TRUE) %>%
mutate(Date = ymd("2010-01-01") + weeks(WEEK - 1))
ggplot(dat2, aes(x = Date, y = SALES)) +
geom_line()
DATA
dat <- read.table(text = " WEEK SALES
1 '29.2010' 60.48
2 '30.2010' 95.76
3 '31.2010' 51.66
4 '32.2010' 73.71
5 '33.2010' 22.05",
header = TRUE, stringsAsFactors = FALSE,
colClasses = c("character", "character", "numeric"))
UPDATE
If the data are from different years, we can use the following code.
dat2 <- dat %>%
separate(WEEK, into = c("WEEK", "YEAR"), convert = TRUE) %>%
mutate(Date = ymd(paste(YEAR, "01", "01", sep = "-")) + weeks(WEEK - 1))
DATA
dat <- read.table(text = " WEEK SALES
1 '29.2010' 60.48
2 '30.2010' 95.76
3 '31.2010' 51.66
4 '32.2010' 73.71
5 '33.2010' 22.05
6 '1.2011' 37.5
7 '2.2011' 45.2
8 '3.2011' 62.9",
header = TRUE, stringsAsFactors = FALSE,
colClasses = c("character", "character", "numeric"))

Resources