R change integer to three integers - r

I have this number
20101213 which is a representation of this data 2010 Dec 13th I want to extract the year, month and day numbers from that number. So I should have three variables contain the values.
What I have tried:
value = 20101213
as.numeric(strsplit(as.character(value), "")[[1]])
The result is [1] 2 0 1 0 1 0 1 0
but I didn't know how to continue, may you help me please

You probably want to get this into a date-time format anyways for future computing, so how about:
(x <- strptime(20101213, "%Y%m%d"))
# [1] "2010-12-13 EST"
This will enable you to do computations that you wouldn't have been able to with just the year, month number, and day number, such as grabbing the day of the week (0=Sunday, 1=Monday, ...) or day of the year:
x$wday
# [1] 1
x$yday
# [1] 346
Further, you could easily extract the year, month number, and day of month number:
c(x$year+1900, x$mon+1, x$mday)
# [1] 2010 12 13
Edit: As pointed out by #thelatemail, an alternative that doesn't involve remembering offsets is:
as.numeric(c(format(x, "%Y"), format(x, "%m"), format(x, "%d")))
# [1] 2010 12 13

year <- as.numeric(substr(as.character(value),start = 1,stop = 4))
month <- as.numeric(substr(as.character(value),start = 5,stop = 6))
day <- as.numeric(substr(as.character(value),start = 7,stop = 8))

If you don't want to deal with string representation you could also just use mod function like this:
# using mod
year = floor(value/10000)
month = floor((value %% 10000)/100)
day = value %% 100
Which will then extract the relevant parts of the number as expected.

Related

Converting filenames to date in year + weeks returns Error in charToDate (x): character string is not in a standard unambiguous format

For a time series analysis of over 1000 raster in a raster stack I need the date. The data is almost weekly in the structure of the files
"... 1981036 .... tif"
The zero separates year and week
I need something like: "1981-36"
but always get the error
Error in charToDate (x): character string is not in a standard unambiguous format
library(sp)
library(lubridate)
library(raster)
library(Zoo)
raster_path <- ".../AVHRR_All"
all_raster <- list.files(raster_path,full.names = TRUE,pattern = ".tif$")
all_raster
brings me:
all_raster
".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif"
…
To get the year and the associated week, I have used the following code:
timeline <- data.frame(
year= as.numeric(substr(basename(all_raster), start = 17, stop = 17+3)),
week= as.numeric(substr(basename(all_raster), 21, 21+2))
)
timeline
brings me:
timeline
year week
1 1981 35
2 1981 36
3 1981 37
4 1981 38
…
But I need something like = "1981-35" to be able to plot my time series later
I tried that:
timeline$week <- as.Date(paste0(timeline$year, "%Y")) + week(timeline$week -1, "%U")
and get the error:Error in charToDate(x) : character string is not in a standard unambiguous format
or I tried that
fileDates <- as.POSIXct(substr((all_raster),17,23), format="%y0%U")
and get the same error
until someone will post a better way to do this, you could try:
x <- c(".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif", ".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif",
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif")
xx <- substr(x, 21, 27)
library(lubridate)
dates <- strsplit(xx,"0")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
newdates <- as.POSIXct(dates)
format(newdates, "%Y-%W")
Thanks to #Soren who posted this anwer here: Get the month from the week of the year
You can do it if you specify that Monday is a Weekday 1 with %u:
w <- c(35,36,37,38)
y <- c(1981,1981,1981,1981)
s <- c(1,1,1,1)
df <- data.frame(y,w,s)
df$d <- paste(as.character(df$y), as.character(df$w),as.character(df$s), sep=".")
df$date <- as.Date(df$d, "%Y.%U.%u")
# So here we have variable date as date if you need that for later.
class(df$date)
#[1] "Date"
# If you want it to look like Y-W, you can do the final formatting:
df$date <- format(df$date, "%Y-%U")
# y w s d date
# 1 1981 35 1 1981.35.1 1981-35
# 2 1981 36 1 1981.36.1 1981-36
# 3 1981 37 1 1981.37.1 1981-37
# 4 1981 38 1 1981.38.1 1981-38
# NB: though it looks correct, the resulting df$date is actually a character:
class(df$date)
#[1] "character"
Alternatively, you could do the same by setting the Sunday as 0 with %w.

Retrieving month number in fiscal year using lubridate

I'm hoping to retrieve the month number from a fiscal year that starts in November (i.e. the first day of the fiscal year is November 1st). The following code provides my desired output, borrowing the week_start syntax of lubridate::wday, where year_start is analogous to week_start:
library('lubridate')
dateToRetrieve = ymd('2017-11-05')
#output: [1] "2017-11-05"
monthFromDate = month(dateToRetrieve, year_start=11)
#output: [1] 1
Since this functionality doesn't yet exist, I'm looking for an alternative solution that provides the same output. Adding period(10, units="month") to each date does not work because the length of different months leads to issues translating between months (e.g. March 31st minus a month = February 31st, which doesn't make sense).
I checked a somewhat similar question on the lubridate github here, but didn't see any solutions. Does anyone have an idea that will provide my desired functionality?
Many thanks,
1) lubridate Below x can be a character vector or a Date vector:
x <- "2017-11-05" # test data
(month(x) - 11) %% 12 + 1
## [1] 1
2) Base R To do this with only base R first calculate the month number giving mx as shown and then perform the same computation:
mx <- as.POSIXlt(x)$mon + 1
(mx - 11) %% 12 + 1
## [1] 1
It is a not pretty way... but you could create a vector range of months starting at November, call the full month of the date object, then match the two objects together to get the vector position.
suppressPackageStartupMessages(library('lubridate'))
x <- format(ISOdate(2004,1:12,1),"%B")[c(11,12,1:10)]
match(as.character(month(ymd('2017-11-05'), label = TRUE, abbr = FALSE)), x)
#> [1] 1
match(as.character(month(ymd('2017-01-15'), label = TRUE, abbr = FALSE)), x)
#> [1] 3
match(as.character(month(ymd('2017-05-01'), label = TRUE, abbr = FALSE)), x)
#> [1] 7

How can I merge three variables into one variable that represents the merged variables separated by a comma? [duplicate]

This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 5 years ago.
I have three variables: Year, Month, and Day. How can I merge them into one variable ("Date") so that the variable is represented as such:
yyyy-mm-dd
Thanks in advance and best regards!
How do you merge three variables into one variable?
Consider two methods:
Old school
With dplyr, lubridate, and data frames
And consider the data types. You can have:
Number or character
Date or POSIXct final type
Old School Method
The old school method is straightforward. I assume you are using vectors or lists and don't know data frames yet. Let's take your data, force it to a standardized, unambiguous format, and concatenate the data.
> y <- 2012:2015
> y
[1] 2012 2013 2014 2015
> m <- 1:4
> m
[1] 1 2 3 4
> d <- 10:13
> d
[1] 10 11 12 13
Use as.numeric if you want to be safe and convert everything to the same format before concatenation. If you get any NA values you will need to handle them with the is.na function and provide a default value.
Use paste with the sep separator value set to your delimiter, in this case, the hyphen.
> paste(y,m,d, sep = '-')
[1] "2012-1-10" "2013-2-11" "2014-3-12" "2015-4-13"
Dataframe / Dplyr / Lubridate Way
> df <- data.frame(year = y, mon = m, day = d)
> df
year mon day
1 2012 1 10
2 2013 2 11
3 2014 3 12
4 2015 4 13
Below I do the following:
Take the df object
Create a new variable name Date
Concatenate the numeric variables y, m, and d with a - separator
Convert the string literal into a Date format with ymd()
> df %>%
mutate(Date = ymd(
paste(y,m,d, sep = '-')
)
)
year mon day Date
1 2012 1 10 2012-01-10
2 2013 2 11 2013-02-11
3 2014 3 12 2014-03-12
4 2015 4 13 2015-04-13
Below we create year-month-day character strings, yyyy-mm-dd character strings (similar except one digit month and day are zero padded out to 2 digits) and Date class. The last one prints out as yyyy-mm-dd and can be manipulated in ways that character strings can't, for example adding one to a Date class object gives the next day.
First we set up some sample input:
year <- c(2017, 2015, 2014)
month <- c(3, 1, 10)
day <- c(15, 9, 25)
convert to year-month-day character string This is not quite yyyy-mm-dd since 1 digit months and days are not zero padded to 2 digits:
paste(year, month, day, sep = "-")
## [1] "2017-3-15" "2015-1-9" "2014-10-25"
convert to Date class It prints on console as yyyy-mm-dd. Two alternatives:
as.Date(paste(year, month, day, sep = "-"))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
as.Date(ISOdate(year, month, day))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
convert to character string yyyy-mm-dd In this case 1 digit month and day are zero padded out to 2 characters. Two alternatives:
as.character(as.Date(paste(year, month, day, sep = "-")))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
sprintf("%d-%02d-%02d", year, month, day)
## [1] "2017-03-15" "2015-01-09" "2014-10-25"

Extract Date in R

I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources