Extract Month from a YYYYMM column in R - r

I have tried to extract it but the methods seem to only work for YYYY-MM. I have data in terms of a date (YYYYMM) and am trying to get in terms of just the month, such as: Month
Ultimately, I would like it to look like this:
ID Date Month
1 200402 2
2 200603 3
3 200707 7
I am doing this in hopes of plotting monthly mean values.

You can simply do it using:
library(stringr)
str_sub(df$Date,-2,-1)
Or using;
df['Date'].str[-2:]
Hope this helps!

Assuing your Date column be numeric, you could just use the modulus:
df$Month <- df$Date %% 100
df
ID Date Month
1 1 200402 2
2 2 200603 3
3 3 200707 7
Data:
df <- data.frame(ID=c(1,2,3), Date=c(200402, 200603, 200707))
To make the above work when Date be character, just cast it to numeric first.

You can extract last two characters of Date Column.
sub('.*(..)$', '\\1', df$Date)
#Or without capture groups suggested by #Tim Biegeleisen
#sub("^.*(?=..$)", "", df$Date, perl = TRUE)
#[1] "02" "03" "07"
However, ideally you should avoid parsing information from date-time using regex. Convert it to date and then extract the month.
format(as.Date(paste(df$Date, '01'), "%Y%m%d"), '%m')
#Or with zoo::yearmon
#format(zoo::as.yearmon(as.character(df$Date), "%Y%m"), '%m')

Related

How to change syntax of column in R?

I have df1:
ID Time
1 16:00:00
2 14:30:00
3 9:23:00
4 10:00:00
5 23:59:00
and would like to change the current 'character' column 'Time' into a an 'integer' as below:
ID Time
1 1600
2 1430
3 923
4 1000
5 2359
We could replace the :'s, make numeric, divide by 100, and convert to integer like this:
df1$Time = as.integer(as.numeric(gsub(':', '', df1$Time))/100)
You want to use as.POSIXct().
Functions to manipulate objects of classes "POSIXlt" and "POSIXct" representing calendar dates and times.
R Documents as.POSIXct()
So in the case of row 1: as.POSIXct("16:00:00", format = "%H%M")
Then use as.numeric if you need it to truly be an int.
Converts a character matrix to a numeric matrix.
R Docs as.Numeric()
df1 <- data.frame(Time = "16:00:00")
df1[, "Time"] <- as.numeric(paste0(substr(df1[, "Time"], 1, 2), substr(df1[, "Time"], 4, 5)))
print(df1)
# Time
# 1 1600
There are many ways to process this, but here's one example:
library(dplyr)
df1 <- mutate(df1, Time = gsub(":", "", Time) # replace colons with blanks
df1 <- mutate(df1, Time = as.numeric(Time)/100) # coerce to numeric type, divide by 100

Using value in 1 column to fill in values in 2 other columns

When entering behavior data in a different system, I wrote the subjects in a form such as 3-2 (to mean rank 3 to rank 2). I exported these to Excel, which took these entries as dates (so 2-Mar for this example).
I now have thousands of entries in this format. I have added two columns ("Actor" and "Recipient") and would like to fill in the rank numbers for these, based on what is in the "Subject" column.
A couple of lines of what I'm hoping my R output will give me:
Subject Actor Recipient
2-Mar 3 2
5-Jun 6 5
6-Feb 2 6
etc.
So I already have the "Subject" columns and need help figuring out code to fill in the "Actor" and "Recipient" columns. Rank numbers only go up to 6.
I've tried a couple of things but just keep getting error messages... Any help with this would be GREATLY appreciated!
Here you can use tstrsplit() after converting to date format
# Recreate your data
x <- data.frame("Subject" = c("2-Mar", "5-Jun", "6-Feb"))
# Change the format of your Subject coumn
x[, "Subject"] <- format(as.POSIXct(x[, "Subject"], format = "%d-%b"), "%m %d")
# Split into the two strings
library(data.table) # to get tstrsplit() function
x[, c("Actor", "Recipient")] <- tstrsplit(x[, "Subject"], " ")
# Convert to numeric
x[, "Actor"] <- as.numeric(x[, "Actor"])
x[, "Recipient"] <- as.numeric(x[, "Recipient"])
This returns
> x
Subject Actor Recipient
1 02 03 3 2
2 05 06 6 5
3 06 02 2 6
And if you want Subject in its original format
# Return Subject to original format
x[, "Subject"] <- format(as.POSIXct(x[, "Subject"], format = "%m %d"), "%d-%b")
Giving
> x
Subject Actor Recipient
1 02-Mar 3 2
2 05-Jun 6 5
3 06-Feb 2 6
Explained:
Your vector/variable "Subject" was imported as a character-type atomic vector (atomic vectors are a 1 dimensional structure of one or more elements, where all elements must be the same type). The solution was to convert that something that R would interpret as a date using the as.POSIXct(..., format = "...") function, where format is telling R how the string is formatted (see codes here). I then wrapped that in the format() function, telling it to change the format to numeric months. That was then split into two columns using the tstrsplit() function, but R interpreted those as character-type data, so I converted them using the as.numeric() function to double-type data.
You could convert Subject to date and extract month and year from it.
temp <- as.Date(df$Subject, "%d-%b")
df$Actor <- as.integer(format(temp, "%m"))
df$Recipient <- as.integer(format(temp, "%d"))
df
# Subject Actor Recipient
#1 2-Mar 3 2
#2 5-Jun 6 5
#3 6-Feb 2 6
This can also be done using lubridate functions.
df$Actor <- month(temp)
df$Recipient <- day(temp)

Convert correct date format in R

It is what my row data looks like:
Extraction BORN
1 30/06/06 31/01/48
2 30/06/06 20/05/74
3 30/06/06 20/02/49
4 30/06/06 06/07/53
5 30/06/06 26/05/63
6 30/06/06 20/05/74
I want to use as.Date function to convert the date format. For example,I want to change 30/06/06 into 2006-06-30, and 31/01/48 change into 1948/01/31 so my code is:
data$Extraction<-as.Date(data$Extraction, "%d/%m/%y")
data$BORN<-as.Date(data$BORN, "%d/%m/%y")
But they all convert into NA as result. Dose anyone know how to solve this problem?
Since the variables are factors, this should work:
data$Extraction<-as.Date(as.character(data$Extraction), "%d/%m/%y")
data$BORN<-as.Date(as.character(data$BORN), "%d/%m/%y")
EDIT:
I tried it out but your code should work on factors as well.
> x <- data.frame(date = as.factor("30/06/06"))
> x
date
1 30/06/06
> as.Date(x$date, "%d/%m/%y")
[1] "2006-06-30"
> as.Date(as.character(x$date), "%d/%m/%y")
[1] "2006-06-30"

Conditional subsetting of data frame based on HH:MM:SS formatted column

So I have a large df with a column called "session" that is in the format
HH:MM:SS (e.g. 0:35:24 for 35 mins and 24 secs).
I want to create a subset of the df based on a condition like > 2 mins or < 90 mins from the "sessions" column
I tried to first convert the column format into Date:
df$session <- as.Date(df$session, "%h/%m/%s")
I was going to then use the subset() to create my conditional subset but the above code generates a column of NAs.
subset.morethan2min <-subset(df, CONDITION)
where CONDITION is df$session >2 mins?
How should I manipulate the "session" column in order to be able to subset on a condition as described?
Sorry very new to R so welcome any suggestions.
Thanks!
UPDATE:
I converted the session column to POSIXct then used function minute() from lubridate package to get numerical values for hour and minute components. Not a near solution but seems to work for my needs right now. Still would welcome a neater solution though.
df$sessionPOSIX <- as.POSIXct(strptime(df$session, "%H:%M:%S"))
df$minute <- minute(df$sessionPOSIX)
subset.morethan2min <- subset(df, minute > 2)
A date is not the same as a period. The easiest way to handle periods is to use the lubridate package:
library(lubridate)
df$session <- hms(df$session)
df.morethan2min <- subset(df, df$session > period(2, 'minute'))
hms() converts your duration stamps into period objects, and period() creates a period object of the specified length for comparison.
As an aside, there are numerous other ways to subset data frames, including the [ operator and functions like filter() in the dplyr package, but that's beyond what you need for your current purposes.
Probably simpler ways to do this, but here's one solution:
set.seed(1234)
tDF <- data.frame(
Val = rnorm(100),
Session = paste0(
sample(0:23,100,replace=TRUE),
":",
sample(0:59,100,replace=TRUE),
":",
sample(0:59,100,replace=TRUE),
sep="",collapse=NULL),
stringsAsFactors=FALSE
)
##
toSec <- function(hms){
Long <- as.POSIXct(
paste0(
"2013-01-01 ",
hms),
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
3600*as.numeric(substr(Long,12,13))+
60*as.numeric(substr(Long,15,16))+
as.numeric(substr(Long,18,19))
}
##
tDF <- cbind(
tDF,
Seconds = toSec(tDF$Session),
Minutes = toSec(tDF$Session)/60
)
##
> head(tDF)
Val Session Seconds Minutes
1 -1.2070657 15:21:41 55301 921.6833
2 0.2774292 12:58:24 46704 778.4000
3 1.0844412 7:32:45 27165 452.7500
4 -2.3456977 18:26:46 66406 1106.7667
5 0.4291247 12:56:34 46594 776.5667
6 0.5060559 17:27:11 62831 1047.1833
Then you can just subset your data easily by doing subset(Data, Minutes > some_number).

Convert months mmm to numeric

I have been given a csv with a column called month as a char variable with the first three letters of the month. E.g.:
"Jan", "Feb","Mar",..."Dec"
Is there any way to convert this to a numeric representation of the month, 1 to 12, or even a type that is in a date format?
Use match and the predefined vector month.abb:
tst <- c("Jan","Mar","Dec")
match(tst,month.abb)
[1] 1 3 12
You can use the built-in vector month.abb to check against when converting to a number, eg :
mm <- c("Jan","Dec","jan","Mar","Apr")
sapply(mm,function(x) grep(paste("(?i)",x,sep=""),month.abb))
Jan Dec jan Mar Apr
1 12 1 3 4
The grep construct takes care of differences in capitalization. If that's not needed,
match(mm,month.abb)
works just as fine.
If you also have a day and a year column, you can use any of the conversion functions, using the appropriate codes (see also ?strftime)
eg
mm <- c("Jan","Dec","jan","Mar","Apr")
year <- c(1998,1998,1999,1999,1999)
day <- c(4,10,3,16,25)
dates <- paste(year,mm,day,sep="-")
strptime(dates,format="%Y-%b-%d")
[1] "1998-01-04" "1998-12-10" "1999-01-03" "1999-03-16" "1999-04-25"
Just adding to the existing answers and the comment in the question:
readr::parse_date("20/DEZEMBRO/18","%d/%B/%y",locale=locale("pt"))
Results date format "2018-12-20". locale("pt") is for Portuguese, which is used in Brazil, can do "es" for Spanish, "fr" for French etc.
A couple of options using:
vec <- c("Jan","Dec","Jan","Apr")
are
> Months <- 1:12
> names(Months) <- month.abb
> unname(Months[vec])
[1] 1 12 1 4
and/or
> match(vec, month.abb)
[1] 1 12 1 4

Resources