It is what my row data looks like:
Extraction BORN
1 30/06/06 31/01/48
2 30/06/06 20/05/74
3 30/06/06 20/02/49
4 30/06/06 06/07/53
5 30/06/06 26/05/63
6 30/06/06 20/05/74
I want to use as.Date function to convert the date format. For example,I want to change 30/06/06 into 2006-06-30, and 31/01/48 change into 1948/01/31 so my code is:
data$Extraction<-as.Date(data$Extraction, "%d/%m/%y")
data$BORN<-as.Date(data$BORN, "%d/%m/%y")
But they all convert into NA as result. Dose anyone know how to solve this problem?
Since the variables are factors, this should work:
data$Extraction<-as.Date(as.character(data$Extraction), "%d/%m/%y")
data$BORN<-as.Date(as.character(data$BORN), "%d/%m/%y")
EDIT:
I tried it out but your code should work on factors as well.
> x <- data.frame(date = as.factor("30/06/06"))
> x
date
1 30/06/06
> as.Date(x$date, "%d/%m/%y")
[1] "2006-06-30"
> as.Date(as.character(x$date), "%d/%m/%y")
[1] "2006-06-30"
Related
I have df1:
ID Time
1 16:00:00
2 14:30:00
3 9:23:00
4 10:00:00
5 23:59:00
and would like to change the current 'character' column 'Time' into a an 'integer' as below:
ID Time
1 1600
2 1430
3 923
4 1000
5 2359
We could replace the :'s, make numeric, divide by 100, and convert to integer like this:
df1$Time = as.integer(as.numeric(gsub(':', '', df1$Time))/100)
You want to use as.POSIXct().
Functions to manipulate objects of classes "POSIXlt" and "POSIXct" representing calendar dates and times.
R Documents as.POSIXct()
So in the case of row 1: as.POSIXct("16:00:00", format = "%H%M")
Then use as.numeric if you need it to truly be an int.
Converts a character matrix to a numeric matrix.
R Docs as.Numeric()
df1 <- data.frame(Time = "16:00:00")
df1[, "Time"] <- as.numeric(paste0(substr(df1[, "Time"], 1, 2), substr(df1[, "Time"], 4, 5)))
print(df1)
# Time
# 1 1600
There are many ways to process this, but here's one example:
library(dplyr)
df1 <- mutate(df1, Time = gsub(":", "", Time) # replace colons with blanks
df1 <- mutate(df1, Time = as.numeric(Time)/100) # coerce to numeric type, divide by 100
I have tried to extract it but the methods seem to only work for YYYY-MM. I have data in terms of a date (YYYYMM) and am trying to get in terms of just the month, such as: Month
Ultimately, I would like it to look like this:
ID Date Month
1 200402 2
2 200603 3
3 200707 7
I am doing this in hopes of plotting monthly mean values.
You can simply do it using:
library(stringr)
str_sub(df$Date,-2,-1)
Or using;
df['Date'].str[-2:]
Hope this helps!
Assuing your Date column be numeric, you could just use the modulus:
df$Month <- df$Date %% 100
df
ID Date Month
1 1 200402 2
2 2 200603 3
3 3 200707 7
Data:
df <- data.frame(ID=c(1,2,3), Date=c(200402, 200603, 200707))
To make the above work when Date be character, just cast it to numeric first.
You can extract last two characters of Date Column.
sub('.*(..)$', '\\1', df$Date)
#Or without capture groups suggested by #Tim Biegeleisen
#sub("^.*(?=..$)", "", df$Date, perl = TRUE)
#[1] "02" "03" "07"
However, ideally you should avoid parsing information from date-time using regex. Convert it to date and then extract the month.
format(as.Date(paste(df$Date, '01'), "%Y%m%d"), '%m')
#Or with zoo::yearmon
#format(zoo::as.yearmon(as.character(df$Date), "%Y%m"), '%m')
When entering behavior data in a different system, I wrote the subjects in a form such as 3-2 (to mean rank 3 to rank 2). I exported these to Excel, which took these entries as dates (so 2-Mar for this example).
I now have thousands of entries in this format. I have added two columns ("Actor" and "Recipient") and would like to fill in the rank numbers for these, based on what is in the "Subject" column.
A couple of lines of what I'm hoping my R output will give me:
Subject Actor Recipient
2-Mar 3 2
5-Jun 6 5
6-Feb 2 6
etc.
So I already have the "Subject" columns and need help figuring out code to fill in the "Actor" and "Recipient" columns. Rank numbers only go up to 6.
I've tried a couple of things but just keep getting error messages... Any help with this would be GREATLY appreciated!
Here you can use tstrsplit() after converting to date format
# Recreate your data
x <- data.frame("Subject" = c("2-Mar", "5-Jun", "6-Feb"))
# Change the format of your Subject coumn
x[, "Subject"] <- format(as.POSIXct(x[, "Subject"], format = "%d-%b"), "%m %d")
# Split into the two strings
library(data.table) # to get tstrsplit() function
x[, c("Actor", "Recipient")] <- tstrsplit(x[, "Subject"], " ")
# Convert to numeric
x[, "Actor"] <- as.numeric(x[, "Actor"])
x[, "Recipient"] <- as.numeric(x[, "Recipient"])
This returns
> x
Subject Actor Recipient
1 02 03 3 2
2 05 06 6 5
3 06 02 2 6
And if you want Subject in its original format
# Return Subject to original format
x[, "Subject"] <- format(as.POSIXct(x[, "Subject"], format = "%m %d"), "%d-%b")
Giving
> x
Subject Actor Recipient
1 02-Mar 3 2
2 05-Jun 6 5
3 06-Feb 2 6
Explained:
Your vector/variable "Subject" was imported as a character-type atomic vector (atomic vectors are a 1 dimensional structure of one or more elements, where all elements must be the same type). The solution was to convert that something that R would interpret as a date using the as.POSIXct(..., format = "...") function, where format is telling R how the string is formatted (see codes here). I then wrapped that in the format() function, telling it to change the format to numeric months. That was then split into two columns using the tstrsplit() function, but R interpreted those as character-type data, so I converted them using the as.numeric() function to double-type data.
You could convert Subject to date and extract month and year from it.
temp <- as.Date(df$Subject, "%d-%b")
df$Actor <- as.integer(format(temp, "%m"))
df$Recipient <- as.integer(format(temp, "%d"))
df
# Subject Actor Recipient
#1 2-Mar 3 2
#2 5-Jun 6 5
#3 6-Feb 2 6
This can also be done using lubridate functions.
df$Actor <- month(temp)
df$Recipient <- day(temp)
Here is an easy example. I have a a data frame with three dates in it:
Data <- as.data.frame(as.Date(c('1970/01/01', '1970/01/02', '1970/01/03')))
names(Data) <- "date"
Now I add a column consisting of the same entries:
for(i in 1:3){
Data[i, "date2"] <- Data[i, "date"]
}
Output looks like this:
date date2
1 1970-01-01 0
2 1970-01-02 1
3 1970-01-03 2
For unknown reasons the class of column date2 is numeric instead of date which was the class of date. Curiously, if you tell R explicitly to use the Date format:
for(i in 1:3){
Data[i, "date3"] <- as.Date(Data[i, "date"])
}
it doesn't make any difference.
date date2 date3
1 1970-01-01 0 0
2 1970-01-02 1 1
3 1970-01-03 2 2
The problem seems to be in the use of subsetting [], in more interesting examples where you have two columns of dates and want to create a third one that picks a date from one of the two other columns depending on some factor the same happens.
Of course we can fix everything in retrospect by doing something like:
Data$date4 <- as.Date(Data$date2, origin = "1970-01-01")
but I'm still wondering: why? Why is this happening? Why can't my dates just stay dates when being transferred to another column??
This is not a final solution, but I think that can help to understand.
Here your data :
Data <- data.frame(date =
as.Date(c('2000/01/01', '2012/01/02', '2013/01/03')))
Take this 2 vectors , one typed by default as numeric and the second as Date.
vv <- vector("numeric",3)
vv.Date <- vector("numeric",3)
class(vv.Date) <- 'Date'
vv
[1] 0 0 0
> vv.Date
[1] "1970-01-01" "1970-01-01" "1970-01-01" ## type dates is initialized by the origin 01-01-1970
Now if I try to assign the first element of each vector as you do in the first step of your loop:
vv[1] <- Data$date[1]
vv.Date[1] <- Data$date[1]
vv
[1] 10957 0 0
> vv.Date
[1] "2000-01-01" "1970-01-01" "1970-01-01"
As you see the typed vector is well created. What happen, when you assign a vector by a scalar value , R try internally to convert it to the type of the vector. To return to your example, When you do this :
You a creating a numeric vector (vv), and you try to assign dates to it:
for(i in 1:3){
Data[i, "date3"] <- as.Date(Data[i, "date"])
}
If you type your date3 , for example:
Data$date3 <- vv.Date
then you try again
for(i in 1:3){
Data[i, "date3"] <- as.Date(Data[i, "date"])
}
You will get a good result:
date date3
1 2000-01-01 2000-01-01
2 2012-01-02 2012-01-02
3 2013-01-03 2013-01-03
I have been given a csv with a column called month as a char variable with the first three letters of the month. E.g.:
"Jan", "Feb","Mar",..."Dec"
Is there any way to convert this to a numeric representation of the month, 1 to 12, or even a type that is in a date format?
Use match and the predefined vector month.abb:
tst <- c("Jan","Mar","Dec")
match(tst,month.abb)
[1] 1 3 12
You can use the built-in vector month.abb to check against when converting to a number, eg :
mm <- c("Jan","Dec","jan","Mar","Apr")
sapply(mm,function(x) grep(paste("(?i)",x,sep=""),month.abb))
Jan Dec jan Mar Apr
1 12 1 3 4
The grep construct takes care of differences in capitalization. If that's not needed,
match(mm,month.abb)
works just as fine.
If you also have a day and a year column, you can use any of the conversion functions, using the appropriate codes (see also ?strftime)
eg
mm <- c("Jan","Dec","jan","Mar","Apr")
year <- c(1998,1998,1999,1999,1999)
day <- c(4,10,3,16,25)
dates <- paste(year,mm,day,sep="-")
strptime(dates,format="%Y-%b-%d")
[1] "1998-01-04" "1998-12-10" "1999-01-03" "1999-03-16" "1999-04-25"
Just adding to the existing answers and the comment in the question:
readr::parse_date("20/DEZEMBRO/18","%d/%B/%y",locale=locale("pt"))
Results date format "2018-12-20". locale("pt") is for Portuguese, which is used in Brazil, can do "es" for Spanish, "fr" for French etc.
A couple of options using:
vec <- c("Jan","Dec","Jan","Apr")
are
> Months <- 1:12
> names(Months) <- month.abb
> unname(Months[vec])
[1] 1 12 1 4
and/or
> match(vec, month.abb)
[1] 1 12 1 4