I have a dataframe with monthly data, one column containing the year and one column containing the month. I'd like to combine them into one column with Date format, going from this:
Year Month Data
2020 1 54
2020 2 58
2020 3 78
2020 4 59
To this:
Date Data
2020-01 54
2020-02 58
2020-03 78
2020-04 59
I think you can't represent a Date format in R without showing the day. If you want a character column, like in your example, you can do:
> x <- data.frame(Year = c(2020,2020,2020,2020), Month = c(1,2,3,4), Data = c(54,58,78,59))
> x$Month <- ifelse(nchar(x$Month == 1), paste0(0, x$Month), x$Month) # add 0 behind.
> x$Date <- paste(x$Year, x$Month, sep = '-')
> x
Year Month Data Date
1 2020 01 54 2020-01
2 2020 02 58 2020-02
3 2020 03 78 2020-03
4 2020 04 59 2020-04
> class(x$Date)
[1] "character"
If you want a Date type column you will have to add:
x$Date <- paste0(x$Date, '-01')
x$Date <- as.Date(x$Date, format = '%Y-%m-%d')
x
class(x$Date)
Maybe the simplest way would be to arbitrarily set a day (e.g. 01) to all your dates ? Therefore date intervals would be preserved.
data<-data.frame(Year=c(2020,2020,2020,2020), Month=c(1,2,3,4), Data=c(54,58,78,59))
data$Date<-gsub(" ","",paste(data$Year,"-",data$Month,"-","01"))
data$Date<-as.Date(data$Date,format="%Y-%m-%d")
You can use sprintf -
sprintf('%d-%02d', data$Year, data$Month)
#[1] "2020-01" "2020-02" "2020-03" "2020-04"
Related
The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.
file.names <- c( 'S123.P2.C10_20120621_213422.jpg',
'S10.P1.C1_20120622_050148.jpg',
'S187.P2.C2_20120702_023501.jpg')
file.names
Use a combination of str_sub() and str_split() to produce a data frame with columns corresponding to the site, plot, camera, year, month, days, hour, minute, and second for these three file names. So we want to produce code that will create the data frame:
Site
Plot
Camera
Year
Month
Day
Hour
Minute
Second
S123
P2
C10
2012
06
21
21
34
22
S10
P1
C1
2012
06
22
05
01
48
S187
P2
C2
2012
07
02
02
35
01
My codes are below:
file.names %>%
str_sub(start = 1, end = -5) %>%
str_replace_all("_", ".") %>%
str_split(pattern = fixed("."), n = 5)
I have no idea how to split date and time
nms <- c("Site", "Plot", "Camera", "Year", "Month", "Day", "Hour", "Minute", "Second")
library(tidyverse)
data.frame(file.names) %>%
extract(file.names, nms,
'(\\w+)\\.(\\w+)\\.(\\w+)_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})')
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 06 21 21 34 22
2 S10 P1 C1 2012 06 22 05 01 48
3 S187 P2 C2 2012 07 02 02 35 01
in Base R:
type.convert(strcapture('(\\w+)\\.(\\w+)\\.(\\w+)_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})',
file.names, as.list(setNames(character(length(nms)), nms))), as.is = TRUE)
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 6 21 21 34 22
2 S10 P1 C1 2012 6 22 5 1 48
3 S187 P2 C2 2012 7 2 2 35 1
This is a specific case where your data is pretty neatly formatted with fields separated by either _ or ., and where the date and time fields have uniform character length. That means you can skip doing regex and instead just split by those delimeters, drop the substrings into a data frame, then separate the date components and the time components by their positions. As is often the case, as a tidyverse solution you're trading writing extra code for it being pretty easy to follow and scale.
library(magrittr)
strsplit(file.names, split = "[._]") %>%
purrr::map_dfr(setNames, c("site", "plot", "camera", "date", "time", "ext")) %>%
tidyr::separate(date, into = c("year", "month", "day"), sep = c(4, 6)) %>%
tidyr::separate(time, into = c("hour", "minute", "second"), sep = c(2, 4)) %>%
dplyr::select(-ext)
#> # A tibble: 3 × 9
#> site plot camera year month day hour minute second
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 S123 P2 C10 2012 06 21 21 34 22
#> 2 S10 P1 C1 2012 06 22 05 01 48
#> 3 S187 P2 C2 2012 07 02 02 35 01
The ext column was leftover from the initial string splitting, so you can drop it.
I don't know anything about str_sub or str_split other than the fact that they may be efforts to adapt the sub and strsplit functions to an alternate universe. I just learned base R and have not really seen the need to learn a new syntax. Here's a base solution:
as.POSIXct( sub( "([^_]+[_])(\\d{8})[_](\\d{6})", "\\2 \\4", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21" "2012-06-22" "2012-07-02"
You can real the sub pattern as
1) beginning with the start of the string collect all the non-underscore characters into the first capture group
2) Then get the next 8 digits (if they exist) in a second capture group
3) and everything that follows will be in a third capture group
The substitution is to just return the contents of the second capture group. The conversion to Date values is straightforward. I'm assuming that should be clear from the code, but if not then see ?as.Date.
Here's the rest;
as.POSIXct( sub( "([^_]+[_])(\\d{8})[_](\\d{6})(.+$)", "\\2 \\3", file.names) ,
format="%Y%m%d %H%M%S")
[1] "2012-06-21 21:34:22 PDT" "2012-06-22 05:01:48 PDT" "2012-07-02 02:35:01 PDT"
If you want the break out then convert to POSIXlt and extract the resulting list.
Given Date in date frame:
Date: Note it's in year/month/day format
2020-01-01
2020-02-01
2020-03-03
2020-04-04
How do I get the aggregate count total of number of days between each date.
Count:
0
30
58
87
Just convert the character strings to a Date object.
dates <- as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-04"))
dates - dates[1]
# Time differences in days
# [1] 0 31 62 94
you can convert your character strings to date format using as.Date and then use the lag function:
df <- data.frame(date = c("2020-01-01", "2020-02-02", "2020-03-03", "2020-04-04"))
df$ndays <- as.numeric(as.Date(df$date) - dplyr::lag(as.Date(df$date), n = 1, default = as.Date(df$date)[1]))
> df
date ndays
1 2020-01-01 0
2 2020-02-02 32
3 2020-03-03 30
4 2020-04-04 32
Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.
We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12
This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February
I have a column with date formatted as MM-DD-YYYY, in the Date format.
I want to add 2 columns one which only contains YYYY and the other only contains MM.
How do I do this?
Once again base R gives you all you need, and you should not do this with sub-strings.
Here we first create a data.frame with a proper Date column. If your date is in text format, parse it first with as.Date() or my anytime::anydate() (which does not need formats).
Then given the date creating year and month is simple:
R> df <- data.frame(date=Sys.Date()+seq(1,by=30,len=10))
R> df[, "year"] <- format(df[,"date"], "%Y")
R> df[, "month"] <- format(df[,"date"], "%m")
R> df
date year month
1 2017-12-29 2017 12
2 2018-01-28 2018 01
3 2018-02-27 2018 02
4 2018-03-29 2018 03
5 2018-04-28 2018 04
6 2018-05-28 2018 05
7 2018-06-27 2018 06
8 2018-07-27 2018 07
9 2018-08-26 2018 08
10 2018-09-25 2018 09
R>
If you want year or month as integers, you can wrap as as.integer() around the format.
A base R option would be to remove the substring with sub and then read with read.table
df1[c('month', 'year')] <- read.table(text=sub("-\\d{2}-", ",", df1$date), sep=",")
Or using tidyverse
library(tidyverse)
separate(df1, date, into = c('month', 'day', 'year') %>%
select(-day)
Note: it may be better to convert to datetime class instead of using the string formatting.
df1 %>%
mutate(date =mdy(date), month = month(date), year = year(date))
data
df1 <- data.frame(date = c("05-21-2017", "06-25-2015"))
I've got some data that looks about like so:
demo <- read.table(text = "
date num
'12/31/2010' 35
'04/01/2013' 34
'06/02/2015' 34
'06/15/2015' 34
'01/30/2015' 33
'04/15/2014' 33
'05/28/2014' 33
'06/02/2014' 33
'06/17/2015' 33
'06/25/2015' 33
'06/24/2015' 32
'07/31/2013' 32
'08/31/2013' 32
'04/27/2015' 31
'05/07/2015' 31
'12/30/2013' 31
'11/21/2014' 30
'12/20/2013' 30
",header = TRUE, sep = "")
How do I group and count these by year?
2010 1
2013 5
etc.
I can use plyr to count each date: count(demo, vars = 'date'), but not group them.
I'd convert the dates to a date format first, rather than treating them as strings.
library(lubridate)
# Convert string to date format
demo$date <- as.Date(demo$date, "%m/%d/%Y")
# Table of counts by year
table(year(demo$date))
# 2010 2013 2014 2015
# 1 5 4 8
I like data.table for this. First we need to convert to "Date" class in the date column, then find the number of observations by year.
library(data.table)
demo$date <- as.Date(demo$date, "%m/%d/%Y")
as.data.table(demo)[, .N, keyby = year(date)]
# year N
# 1: 2010 1
# 2: 2013 5
# 3: 2014 4
# 4: 2015 8
We use keyby here so we get a nice ordered result. Alternatively, and to change your entire table to a data.table, you can use setDT() instead of as.data.table(). This is the preferred method.
setDT(demo)[, .N, keyby = year(date)]
table(substr(demo$date, 7,10))
2010 2013 2014 2015
1 5 4 8
substr allows you isolate the year, and table tallies the amounts.
demo$date <- as.Date(demo$date, format = "%m/%d/%Y")
demo$year <- format(demo$date, format = "%Y")
aggregate(num ~ year, demo, FUN = length)
## year num
## 1 2010 1
## 2 2013 5
## 3 2014 4
## 4 2015 8
Date formats can be modified using Date and POSIXct classes. This allows you to handle dates that looks like '1/1/2010'.
dates <- as.Date(demo$date, format = "%m/%d/%Y")
head(dates)
# [1] "2010-12-31" "2013-04-01" "2015-06-02" "2015-06-15" "2015-01-30"
# [6] "2014-04-15"
table(format(dates, format = "%Y"))
#
# 2010 2013 2014 2015
# 1 5 4 8