my data is curently in this format
head(month)
[1] "192512" "192601" "192602" "192603" "192604" "192605
means 1925 Dec, 1926 Jan, etc
How do I convert this value to "Dec1925"
Thanks in advance
You can use the as.POSIXct function for all kinds of date manipulations, these are well worth investigating. Unfortunately without a day included in the date, this returns NA. So, to use it you can first append some day "01" to the end of your number strings. Then, when reformatting to character, the day can be dropped again.
as.character(as.POSIXct(paste0(month,'01'),format='%Y%m%d'),format = '%b%Y')
You can use ?as.POSIXct to see more about the as.POSIXct function.
?strptime will give you a listing of all the format options.
Related
In R, it seems like this should be obvious, but I'm having trouble. I have dates formatted as 1/1/00, 12/31/00, etc., where each part is abbreviated.
When I try to convert it to a date, I get this error:
> headlines$Date <- as.Date(headlines$Date)
Error in charToDate(x) :
character string is not in a standard unambiguous format
I've also tried the below, but get all NAs:
> headlines$Date <- as.Date(headlines$Date,format="%b/%d/%y")
How should I convert this column to dates?
You were on the right track by adding the format argument. I'm guessing you realized that the first error ("character string is not in a standard unambiguous format") happened because R doesn't know which of the numbers is the day, month, or the year. Say one of the values was "01/02/03"; there's no way of knowing whether it's 2 January 2003, or 1 February 2003, and so on.
In this case I think you just need to fix what you're passing to the format argument. %b is the symbol for abbreviated month in text form, not number form (e.g. "Jan" instead of "01"). You need to use %m instead for months stored as numbers. Try this:
headlines$Date <- as.Date(headlines$Date,format="%m/%d/%y")
See this page for more info about date formats in R.
Replace format = "%b/%d/%y" with format = "%m/%d/%y".
%b means month as in Jan, Feb, Mar and so on.
%m is the integer equivalent (1, 2, 3 etc).
Further reading: https://www.stat.berkeley.edu/~s133/dates.html
I have 2 Date variables in a .csv file with formats of "07-JUL-16 06.05.54.000000 AM". I want to use these in a regression model. Should I be reading these into a data frame as factors or characters? How can I take a difference of the 2 dates in each case?
Read them in as characters (e.g. stringsAsFactors=FALSE or tidyverse functions), then use as.POSIXct, e.g.
as.POSIXct("07-JUL-16 06.05.54.000000 AM",format="%d-%b-%y %I.%M.%OS %p")
## [1] "2016-07-07 06:05:54 EDT"
(I'm assuming that you are intending a day-month-year format rather than a month-day-year format -- but actually I don't have any evidence to support that thought!)
Once you've done this, subtracting the values should just work (give you an object of difftime) -- but be careful with units when converting to numeric!
For what it's worth, lubridate::ymd_hms thinks it can guess the format, but guesses wrong (?? assuming I guessed right above: with a two-digit year, and without any year values greater than 31, there's really nothing to distinguish years and days ...)
What could be the problem? I don't seem to get why this had to be NA.
as.Date("jan2012", format="%b%Y")
[1] NA
I have also used strptime function and it is the same thing. I have been using this functions but I don't know they are not working this morning. Any insight as to why this is will be useful.
The Date include day also. So, we need to paste with a day i.e. 01
as.Date(paste("jan2012", "01"), format="%b%Y%d")
#[1] "2012-01-01"
"jan2012" isn't a date, it's a month. You need to prefix the day you want, e.g.
as.Date(paste0("01", "jan2012"), format = "%d%b%Y")
I want to split up a Timestamp from Excel into year and julian day. I know, duplicate question, but combining everything I have found from other questions is not helping me.
The timestamp is formatted 1/13/2011 13:55 . So, I wanted to tell R to recognize this as a time variable. I have hours and minutes so I tried as.POSIXct and as.POSIXlt. These didn't work. I tried adding strptime --
as.POSIXct(strptime(df$TIMESTAMP, "%d/%m/%Y %H:%M%S"))
I just got NAs.
Once I got R to recognize it as a date, I was going to use lubridate like day(df$Date).
It seems as though you have month and day reversed
1/13/2011 13:55
with
"%d/%m/%Y %H:%M%S"
corresponds to the 1st day of the 13th month, which is probably why you're getting NAs. This seems to work for me:
a <- "01/13/2011 13:55"
t <- strptime(a, "%m/%d/%Y %H:%M")
t
"2011-01-13 13:55:00"
In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))