Turning date into integers in R - r

I have a data frame in R and I have a month column and a day column containing characters like "jan", "feb", or "mar" for months or "mon", "tue" or "wed" for days. I would like to find a way to convert both columns into integers ranging from 1 to 12 for months and 1 to 7 for days. I have tried built-in functions like month.abb but when I try using match with the column for months it just returns a list of NA. Thank you very much for your help !

A general method would be to define a factor with the levels you want, and then turn it into an integer. See reprex underneath.
This would also work for weekdays.
months <- c(
"jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "des"
)
x <- sample(months, 10, replace = TRUE)
x
#> [1] "sep" "oct" "mar" "jun" "oct" "mar" "apr" "aug" "jul" "sep"
as.integer(factor(x, levels = months))
#> [1] 9 10 3 6 10 3 4 8 7 9

Use match:
match(c("jan", "feb", "may"), tolower(month.abb))
match(c("mon", "tue", "thur"), c("mon", "tue", "wed", "thur", "fri", "sat", "sun"))

Related

How to add an increasing index based on multiple columns in R

I have a data frame that contains the columns "hour", "day","month" and "count".
library(tidyverse)
set.seed(0)
df <- expand_grid(expand_grid(
hour = seq(0:23),
day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")),
month = c("Jan", "Feb", "Mar", "Apr", "May", "Jun")) %>%
mutate(count = sample(0:100, n(), replace = TRUE))
head(df)
# A tibble: 6 × 4
hour day month count
<int> <chr> <chr> <int>
1 1 Mon Jan 13
2 1 Mon Feb 67
3 1 Mon Mar 38
4 1 Mon Apr 0
5 1 Mon May 33
6 1 Mon Jun 86
I would like to add a new column named "id" that contains an increasing index which can be used to sort the data in chronological order. The solution I found is not particularly concise and requires me to set factor levels before calling arrange(). Is there another way to solve this issue that capitalises on the fact that I am working with (unformatted) dates?
This is my solution with arrange():
df2 <- df %>%
mutate(day = factor(day, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")),
month = factor(month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun"))) %>%
arrange(month, day, hour) %>%
mutate(id = row_number())
head(df2)
# A tibble: 6 × 5
hour day month count id
<int> <fct> <fct> <int> <int>
1 1 Mon Jan 13 1
2 2 Mon Jan 43 2
3 3 Mon Jan 82 3
4 4 Mon Jan 66 4
5 5 Mon Jan 49 5
6 6 Mon Jan 79 6
Any suggestions are much appreciated. Thank you!

Convert name of day to date format in R

I want to convert this data in format of date and create new column with the value of month-year:
month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
day : chr [1:41188] "mon" "mon" "mon" "mon" ...
year : num [1:41188] 2008 2008 2008 2008 2008 ...
I make a dput()
dput(head(df))
df <-
structure(list(month = structure(c(7L, 7L, 7L, 7L, 7L, 7L),
.Label = c("apr", "aug", "dec", "jul", "jun", "mar", "may",
"nov", "oct", "sep"), class = "factor"), day = c("mon", "mon",
"mon", "mon", "mon", "mon"), year = c(2008, 2008, 2008, 2008,
2008, 2008)), class = "data.frame", row.names = c(NA, -6L))
The main of problem is the month and day columns because the format is factor and character
I try the next sentences:
as.integer(factor(df$month, levels=month.abb))
And this:
match(df$month, month.abb)
I make it:
df$date<-paste(as.character(df$month), df$year)
And this worked and returns:
$ date : chr [1:41188] "may 2008" "may 2008" "may 2008" "may 2008"
How can I change to date format?
I'll arbitrarily pick the "first" day-of-month for each weekday that you've listed. To make it interesting, I'll change the weekdays so that we have some variability in the data.
df <-
structure(list(month = structure(c(7L, 7L, 7L, 7L, 7L, 7L),
.Label = c("apr", "aug", "dec", "jul", "jun", "mar", "may",
"nov", "oct", "sep"), class = "factor"), day = c("mon", "tue",
"wed", "fri", "sat", "sun"), year = c(2008, 2008, 2008, 2008,
2008, 2008)), class = "data.frame", row.names = c(NA, -6L))
df
# month day year
# 1 may mon 2008
# 2 may tue 2008
# 3 may wed 2008
# 4 may fri 2008
# 5 may sat 2008
# 6 may sun 2008
From here, we need to determine what the 1st day of each month is, and then find the first day-of-week that is at or after that day.
firstdow <- as.POSIXlt(paste(df$year, df$month, "01", sep = "-"), format = "%Y-%b-%d")$wday
# ?strptime says with '%u' that monday is 1
datadow <- match(df$day, c("mon", "tue", "wed", "thu", "fri", "sat", "sun"))
datadom <- (firstdow + datadow - 1) %% 7 + 1
df$date <- as.Date(paste(df$year, df$month, datadom, sep = "-"), format = "%Y-%b-%d")
df
# month day year date
# 1 may mon 2008 2008-05-05
# 2 may tue 2008 2008-05-06
# 3 may wed 2008 2008-05-07
# 4 may fri 2008 2008-05-02
# 5 may sat 2008 2008-05-03
# 6 may sun 2008 2008-05-04
And proof that this came up with the correct day-of-month to get the first day-of-week:
format(df$date, format = "%a")
# [1] "Mon" "Tue" "Wed" "Fri" "Sat" "Sun"
We could do 2 things:
As month.abb is a system constant, we can use it to get numeric month
Use as.yearmon from zoo package to get month and year
library(zoo)
df %>%
mutate(month = match(month, tolower(month.abb))) %>%
mutate(new_date = as.yearmon(paste(year, month), "%Y %m"))
Output:
month day year new_date
1 5 mon 2008 Mai 2008
2 5 mon 2008 Mai 2008
3 5 mon 2008 Mai 2008
4 5 mon 2008 Mai 2008
5 5 mon 2008 Mai 2008
6 5 mon 2008 Mai 2008

Combining abbreviated months and year into one variable in R

I have a time series data with a column for a month and a column for a year. The months are JAN, FEB, etc.
I'm trying to combine them into one month year variable in order to run time series analysis on it. I'm very new to R and could use any guidance.
Perhaps something like this?
library(dplyr)
c("JAN", "FEB", "MAR", "APR",
"MAY", "JUN", "JUL", "AUG",
"SEP", "OCT", "NOV", "DEC") %>%
rep(., times = 3) %>%
as.factor() -> months
c("2018", "2019", "2020") %>%
rep(., each = 12) %>%
as.factor() -> years
df1 <- cbind.data.frame(months, years)
paste(df1$months, df1$years, sep = ".") %>%
as.factor() -> merged.years.months
Start with your month/year df.
library(tidyverse)
library(lubridate)
events <- tibble(month = c("JAN", "MAR", "FEB", "NOV", "AUG"),
year = c(2018, 2019, 2018, 2020, 2019))
Let's say that each of your time periods start on the first of the month.
series <- events %>%
mutate(mo1 = dmy(paste(1, month, year)))
This is what you want
R > series
# A tibble: 5 x 3
month year mo1
<chr> <dbl> <date>
1 JAN 2018 2018-01-01
2 MAR 2019 2019-03-01
3 FEB 2018 2018-02-01
4 NOV 2020 2020-11-01
5 AUG 2019 2019-08-01
These are now dates;you can use them in other analyses.
Base R solution:
events <- within(events,{
month_no <- as.integer(as.factor(sort(month)))
date <- as.Date(paste(year, ifelse(nchar(month_no) < 2, paste0("0", month_no),
month_no), "01", sep = "-"), "%Y-%m-%d")
rm(month_no, month, year)
}
)

a pair-case multiplication of variables/df (R)

I've a question about a pair-case multiplication of variables/df in R.
Consider the following problem:
having data in vector (or in dataframe) that have labels and values as follow:
alpha_lab <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
alpha_num <- c(15.28767, 44.38356, 73.47945, 103.56164, 133.64384, 163.72603, 193.80822, 224.38356, 254.46575, 284.54795, 314.63014, 344.71233)
the alpha_num is a product of other calculations (irrelevant), the following values correspond to their labels in alpha_lab (so January = 15.28767, April = 103.56164...).
I also have a dataframe with "case", "month" (as int), "year" and "value":
> df_values
# A tibble: 1,173 x 4
# Groups: case, month
case month year value
<chr> <int> <int> <dbl>
1 A1 1 2009 121.
2 A1 1 2010 177.
3 A1 1 2011 220.
4 A1 1 2012 196.
5 A1 1 2013 161.
6 A1 1 2014 142.
7 A1 2 2009 82.3
8 A1 2 2010 169.
9 A1 2 2011 194.
10 A1 2 2012 169.
# ... with 1,163 more rows
what I am looking for, is a way to compute for each case (20 different) in each month-year a product of
value * alpha_num
where alpha_num is taken only for a calculated month, so for example:
row 1 (A1, January 2009 case): 121 * 15.28767
row 5 (A1, January 2013 case): 161 * 15.28767
row 7 (A1, February 2011 case): 82.3 * 44.38356
and so on for each case in each month in each year...
Is there a way to compute this without adding corresponding alpha_num value to df_values table one-by-one month case?
Thanks!
This should be helpful:
library(dplyr)
# original vectors
alpha_lab <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
alpha_num <- c(15.28767, 44.38356, 73.47945, 103.56164, 133.64384, 163.72603, 193.80822, 224.38356, 254.46575, 284.54795, 314.63014, 344.71233)
# example of your dataframe
df_values = data.frame(case = c("A1", "A1"),
month = c(1, 2),
year = c(2009, 2009),
value = c(121, 82.3), stringsAsFactors = F)
df_values %>% mutate(new_col = value * alpha_num[month])
# case month year value new_col
# 1 A1 1 2009 121.0 1849.808
# 2 A1 2 2009 82.3 3652.767
Note that this works because your alpha_lab vector has the months in the right order. i.e. Jan, Feb, ..., Dec represent the positions 1, 2, ..., 12.
You can also try to work with an lookup table and dplyr::left_join.
library("magrittr")
sampleData <- tibble::tibble(
case = "A1",
month = rep(1:12, each = 6),
year = rep(2009:2014, 12),
value = runif(72, 10, 130)
)
lookup_table <- tibble::tibble(
alpha_lab = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
alpha_id = 1:12,
alpha_num = c(15.28767, 44.38356, 73.47945, 103.56164, 133.64384, 163.72603, 193.80822, 224.38356, 254.46575, 284.54795, 314.63014, 344.71233)
)
result <- dplyr::left_join(sampleData, lookup_table, by = c("month" = "alpha_id")) %>%
dplyr::mutate(new_col = alpha_num * value) %>%
dplyr::select(-alpha_num, -alpha_lab)

Creating season variable by month with dplyr in R

I have a dataset that has a variable called month, which each month as a character. Is there a way with dplyr to combine some months to create a season variable? I have tried the following but got an error:
data %>%
mutate(season = ifelse(month[1:3], "Winter", ifelse(month[4:6], "Spring",
ifelse(month[7:9], "Summer",
ifelse(month[10:12], "Fall", NA)))))
With error:
Error in mutate_impl(.data, dots) : Column `season` must be length 100798 (the number of rows) or one, not 3
I am new to R so any help is much appreciated!
The correct syntax should be
data %>% mutate(season = ifelse(month %in% 10:12, "Fall",
ifelse(month %in% 1:3, "Winter",
ifelse(month %in% 4:6, "Spring",
"Summer"))))
Edit: probably a better way to get the job done
Astronomical Seasons
temp_data %>%
mutate(
season = case_when(
month %in% 10:12 ~ "Fall",
month %in% 1:3 ~ "Winter",
month %in% 4:6 ~ "Spring",
TRUE ~ "Summer"))
Meteorological Seasons
temp_data %>%
mutate(
season = case_when(
month %in% 9:11 ~ "Fall",
month %in% c(12, 1, 2) ~ "Winter",
month %in% 3:5 ~ "Spring",
TRUE ~ "Summer"))
When there are multiple key/value, we can do a join with a key/val dataset
keyval <- data.frame(month = month.abb,
season = rep(c("Winter", "Spring", "Summer", "Fall"), each = 3),
stringsAsFactors = FALSE)
left_join(data, keyval)
You can also try using dplyr::recode or functions from forcats. I think this is the simplest method here:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
data <- tibble(month = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
data %>%
mutate(
season = fct_collapse(
.f = month,
Spring = c("Mar", "Apr", "May"),
Summer = c("Jun", "Jul", "Aug"),
Autumn = c("Sep", "Oct", "Nov"),
Winter = c("Dec", "Jan", "Feb")
)
)
#> # A tibble: 12 x 2
#> month season
#> <chr> <fct>
#> 1 Jan Winter
#> 2 Feb Winter
#> 3 Mar Spring
#> 4 Apr Spring
#> 5 May Spring
#> 6 Jun Summer
#> 7 Jul Summer
#> 8 Aug Summer
#> 9 Sep Autumn
#> 10 Oct Autumn
#> 11 Nov Autumn
#> 12 Dec Winter
Created on 2018-04-06 by the reprex package (v0.2.0).

Resources