Delete part of a value for the whole column - r

If a have a vector such as the following:
REF_YEAR
1994-01-01
1995-01-01
1996-01-01
how can I delete the part "-01-01", so that I only get the year for the whole column?

If your vector is formatted as Dates, you can do:
x <- as.Date("2001-01-01")
format(x, "%Y")
#[1] "2001"
And for your example data:
# Your sample data:
df <- read.table(header=TRUE, text = "REF_YEAR
1994-01-01
1995-01-01
1996-01-01", stringsAsFactors = FALSE)
Convert your data to Date format:
df$REF_YEAR <- as.Date(df$REF_YEAR) # skip this step if it's already formatted as Date
Now convert to year format:
df$REF_YEAR <- format(df$REF_YEAR, "%Y")
Or
transform(df, REF_YEAR = format(REF_YEAR, "%Y"))
Result in both cases:
df
# REF_YEAR
#1 1994
#2 1995
#3 1996
You only need to make sure your data is in Date format (use as.Date() for conversion).

This can be done using regular expression. You can either keep the first four digit or eliminate the last six. Here is how you can do using the second option as asked by you.
ref_year = as.character("1994-01-01")
ref_year_only = substr(ref_year, 1, nchar(ref_year) - 6) ; ref_year_only
Also, please show some effort while asking questions on stack.

Without converting to Date, you could also try:
library(stringr)
df$YEAR <- str_extract(df$REF_YEAR, perl('\\d+(?=-)'))
df$YEAR
#[1] "1994" "1995" "1996"

Related

Change date format from YYYYQQ or YYYY to mm/dd/yyyy

I have a column of data with two different formats: yyyyqq and yyyyy. I want to reformat the column to mmddyyyyy.
Whenever I use the following command as.Date(as.character(x), format = "%y") the output is yyyy-12-03. I cannot get any other combination of as.Date to work.
I'm sure this is a simple fix, but how do I do this?
Using the following assumptions:
2021 <- 2021-01-01
2021Q1 <- 2021-01-01
2021Q2 <- 2021-04-01
2021Q3 <- 2021-07-01
2021Q4 <- 2021-10-01
You can use the following:
as.Date(paste(substr(x, 1, 4), 3*as.numeric(max(substr(x, 6, 6),1))-2, "1", sep = "-"))
Edit: You can wrap this in a format(..., "%m%d%Y) but as already said in the comments I would not recommend it.
Here is a function which translates to the first (if frac=0) or last (if frac=1) date of the period. First append a 01 (first of the period) or 04 (last of the period) to the end of the input. That puts them all in yyyyqq format possibly with junk at the end. Then yearqtr will convert to a yearqtr object ignoring any junk. Then convert that to a Date object. as.Date.yearqtr uses the same meaning for frac. Finally format it as a character string in mm/dd/yyyy format.
(One alternative is to replace the format(...) line with chron::as.chron() in which case it will render in the same manner, since the format specified is the default for chron, but be a chron dates object which can be manipulated more conveniently, e.g. it sorts chronologically, than a character string.)
library(zoo)
to_date <- function(x, frac = 1) x |>
paste0(if (frac == 1) "04" else "01") |>
as.yearqtr("%Y%q") |>
as.Date(frac = frac) |>
format("%m/%d/%Y")
# test data
dd <- data.frame(x = c(2001, 2003, 200202, 200503))
transform(dd, first = to_date(x, frac = 0), last = to_date(x, frac = 1))
giving:
x first last
1 2001 01/01/2001 12/31/2001
2 2003 01/01/2003 12/31/2003
3 200202 04/01/2002 06/30/2002
4 200503 07/01/2005 09/30/2005

Find pattern and replace

A similar question was probably asked but here goes :
Suppose I have the following erronous dates in my df (in numeric format such that yyyymmdd): 20169904, 20179999, 20161099. These dates are from my date column, where many dates are wrong - no such thing as day = 99 or month = 99.
Now I wish to ONLY change the 99 in dd to 01. In other words, I need to find ONLY the dates that are yyyymm99 and change them to yyyymm01. I am not having trouble with str_sub(df$date,7,8) <- 01. However, this changes all dd in the column to 01. I only need to change those that are yyyymm99.
Using pipes or multi-step solutions are both ok with me.
Thanks in advance!
Here is a solution with gsub():
gsub("99$", "01", df$date)
The $ in regular expressions means "end of line" or "end of string". With "99$", gsub() only matches "99" at the end of the string.
Here's a base R solution that will replace 99s in the mm part of the string if you need that as well, and will work if there are 99s in both the mm and dd portions.
df <- data.frame(date = c("19990104", "20160399", "19901003", "20199904", "20169999"), stringsAsFactors = FALSE)
df$new_date <- sapply(df$date, function(x) {
if(!is.na(as.Date(x, format = "%Y%m%d"))) {
return(x)
}
new_date <- x
if(grepl(".99$", x)) {
new_date <- paste0(substr(x, 1, 6), "01")
}
if(grepl("^\\d{4}99\\d{2}", new_date)) {
new_date <- paste0(substr(new_date, 1, 4), "01", substr(new_date, 7, 8))
}
return(new_date)
})
And here's the result.
date new_date
1 19990104 19990104
2 20160399 20160301
3 19901003 19901003
4 20199904 20190104
5 20169999 20160101

How can I delete numbers and characters from my date and convert a character column to a date column

I have a dataframe with the column name perioden. This column contains the date but it is written in this format: 2010JJ00, 2011JJ00, 2012JJ00, 2013JJ00 etc..
This column is also a character when I look at the structure. I've tried multiple solutions but so far am still stuck, my qeustion is how can I convert this column to a date and how do I remove the JJ00 part so that you only see the year format of the column.
You can try this approach. Using gsub() to remove the non desired text (as said by #AllanCameron) and then format to date using paste0() to add the day and month, and as.Date() for date transformation:
#Data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'),stringsAsFactors = F)
#Remove string
df$Date <- gsub('JJ00','',df$Date)
#Format to date, you will need a day and month
df$Date2 <- as.Date(paste0(df$Date,'-01-01'))
Output:
Date Date2
1 2010 2010-01-01
2 2011 2011-01-01
3 2012 2012-01-01
4 2013 2013-01-01
We can use ymd with truncated option
library(lubridate)
library(stringr)
ymd(str_remove(df$Date, 'JJ\\d+'), truncated = 2)
#[1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'), stringsAsFactors = FALSE)

Converting filenames to date in year + weeks returns Error in charToDate (x): character string is not in a standard unambiguous format

For a time series analysis of over 1000 raster in a raster stack I need the date. The data is almost weekly in the structure of the files
"... 1981036 .... tif"
The zero separates year and week
I need something like: "1981-36"
but always get the error
Error in charToDate (x): character string is not in a standard unambiguous format
library(sp)
library(lubridate)
library(raster)
library(Zoo)
raster_path <- ".../AVHRR_All"
all_raster <- list.files(raster_path,full.names = TRUE,pattern = ".tif$")
all_raster
brings me:
all_raster
".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif"
…
To get the year and the associated week, I have used the following code:
timeline <- data.frame(
year= as.numeric(substr(basename(all_raster), start = 17, stop = 17+3)),
week= as.numeric(substr(basename(all_raster), 21, 21+2))
)
timeline
brings me:
timeline
year week
1 1981 35
2 1981 36
3 1981 37
4 1981 38
…
But I need something like = "1981-35" to be able to plot my time series later
I tried that:
timeline$week <- as.Date(paste0(timeline$year, "%Y")) + week(timeline$week -1, "%U")
and get the error:Error in charToDate(x) : character string is not in a standard unambiguous format
or I tried that
fileDates <- as.POSIXct(substr((all_raster),17,23), format="%y0%U")
and get the same error
until someone will post a better way to do this, you could try:
x <- c(".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif", ".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif",
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif")
xx <- substr(x, 21, 27)
library(lubridate)
dates <- strsplit(xx,"0")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
newdates <- as.POSIXct(dates)
format(newdates, "%Y-%W")
Thanks to #Soren who posted this anwer here: Get the month from the week of the year
You can do it if you specify that Monday is a Weekday 1 with %u:
w <- c(35,36,37,38)
y <- c(1981,1981,1981,1981)
s <- c(1,1,1,1)
df <- data.frame(y,w,s)
df$d <- paste(as.character(df$y), as.character(df$w),as.character(df$s), sep=".")
df$date <- as.Date(df$d, "%Y.%U.%u")
# So here we have variable date as date if you need that for later.
class(df$date)
#[1] "Date"
# If you want it to look like Y-W, you can do the final formatting:
df$date <- format(df$date, "%Y-%U")
# y w s d date
# 1 1981 35 1 1981.35.1 1981-35
# 2 1981 36 1 1981.36.1 1981-36
# 3 1981 37 1 1981.37.1 1981-37
# 4 1981 38 1 1981.38.1 1981-38
# NB: though it looks correct, the resulting df$date is actually a character:
class(df$date)
#[1] "character"
Alternatively, you could do the same by setting the Sunday as 0 with %w.

Extracting year from two different date format

I have column say x which has two different date formats 12/31/1998 and 12/--/98. As you can see, in the second format date is missing and year is in 2 digits.
I need to extract year from all the dates in my column. So, when I am using Year<- data.frame(format(df$x, "%Y")) it returning year for first format. For second format, it is returning NA.
I would appreciate all the help. Thanks.
You could get a bit creative and specify an ugly format for the missing data, and then just keep one of the valid responses:
vals <- c("12/31/1998", "12/--/98")
out <- pmax(
as.Date(vals, "%m/%d/%Y"),
as.Date(paste0("01",vals), "%d%m/--/%y"),
na.rm=TRUE
)
format(out, "%Y")
#[1] "1998" "1998"
If they are all in the format where the year is the last number after "/" you can use basename. Then you just need to covert the 2 character years to a four year format:
vals <- c("12/31/1998", "12/--/98", "68", "69")
yrs <- basename(vals)
yrs <- ifelse(nchar(yrs) == 2, format(as.Date(yrs, format = "%y"), "%Y"), yrs)
yrs
# [1] "1998" "1998" "2068" "1969"
The issue is it does not work with dates older than 1969.

Resources