I am reading a Excel file with time as a column.
This column has values like
23:29:04
23:04:31
21:55:37
21:52:27
21:49:53
When I read this column using R , read column comes as a numeric value :
0.961469907
0.913622685
0.911423611
0.907094907
0.906250000
0.899490741
There is no correspondence between above mentioned Excel and R column values. These are just samples.
I tried using
strptime(TimeStamp,format="%H:%M:%S)
It gives all values as NA.
Please suggest how to read time correctly in R.
These numbers are fractions of a day corresponding to times. Time objects are, e.g., implemented in package chron:
library(chron)
x <- c(0.961469907, 0.913622685, 0.911423611, 0.907094907, 0.906250000, 0.899490741)
x <- times(x)
print(x)
#[1] 23:04:31 21:55:37 21:52:27 21:46:13 21:45:00 21:35:16
Read the columns as string and wrap your strptime command with as.POSIXct:
as.POSIXct(strptime(TimeStamp,format="%H:%M:%S"))
Related
If I have multiple spreadsheets with date column that have different formats, is it possible to convert the dates if some of the rows have the numeric format?
If I import the columns as character, I would still need to convert to date. I believe parse_date_time will fix anything accept the number format.
The following will convert the first two but not the numeric version. I don't think this function has a numeric function.
Is there a function that can process both Text and Numeric dates?
x<- c("2019-12-05","8-Dec-19","43787")
lubridate::parse_date_time(x, c("ymd", "d-b-y"))
It's a bit clunky, but you can take a second pass through the data with janitor::excel_numeric_to_date() for any values that are numeric and failed to parse via parse_date_time ... (if you're going to use this often, you can write a wrapper function - you might also want to suppress the warning messages from parse_date_time and as.numeric ...)
x <- c("2019-12-05","8-Dec-19","43787")
y <- lubridate::parse_date_time(x, c("ymd", "d-b-y"))
exceld <- is.na(y) & !is.na(as.numeric(x))
y[exceld] <- janitor::excel_numeric_to_date(as.numeric(x[exceld]))
Perhaps there is a better approach, but I was able to create a function that handles both the numeric and text versions of the date format with the tryFormats.
dfix(c("2019-12-05","8-Dec-19","43787"))
[1] "2019-12-05" "2019-12-08" "2019-11-18"
dfix<-function(x1){
dout<-c()
for (x in x1){
if(grepl('\\d{5}',x)){ #Check for numeric date (5 digits)
n<-as.numeric(x)
d<-as.Date(n, origin = "1899-12-30")
}
else if (grepl('-',x)){ # TryFormats for dates with "-"
d<-as.Date(x,tryFormats = c("%Y-%m-%d","%d-%b-%y"))
}
dout<-c(dout,d)
}
return (as.Date(dout,origin = '1970-01-01'))
}
I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:
"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX" (ie, a five-digit Excel-style date)
Let's say I treat the column as a vector.
dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
"1981-02-01", "30621", "01-05-1986")
I can see that I probably need a solution that uses both parse_date_time and as.Date.
If I use parse_date_time:
dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))
This fixes them all, except the five-digit one, which returns NA.
I can fix the five-digit one, by using as.integer and as.Date:
dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")
Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.
Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!
We create a logical index after the first run based on the NA values and use that to index for the second run
i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")
The following vector of Dates is given in form of a string sequence:
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
I would like to exemplary lag this vector by 1 month, that means to produce the following structure:
d <- c("01/08/1991","01/09/1991","01/10/1991","01/11/1991")
My data is much larger and I must impose higher lags as well, but this seems to be the basis I need to know.
By doing this, I would like to have the same format in the end again:("%d/%m/%Y). How can this be done in R? I found a couple of packages (e.g. lubridate), but I always have to convert between formats (strings, dates and more) so it's a bit messy and seems prone to mistake.
edit: some more info on why I want to do this: I am using this vector as rownames of a matrix, so I would prefer a solution where the final outcome is a string vector again.
This does not use any packages. We convert to "POSIXlt" class, subtract one from the month component and convert back:
fmt <- "%d/%m/%Y"
lt <- as.POSIXlt(d, format = fmt)
lt$mon <- lt$mon - 1
format(lt, format = fmt)
## [1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
My solution uses lubridatebut it does return what you want in the specified format:
require(lubridate)
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
format(as.Date(d,format="%d/%m/%Y")-months(1),'%d/%m/%Y')
[1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
You can then change the lag and (if you want) the output (which is this part : '%d/%m/%Y') by specifying what you want.
I have a data set in which I want to pad zeroes in front of a set of dates that don't have six characters. For example, I have a date that reads 91003 (October 3rd, 2009) and I want it to read 091003, as well as any other date that is missing a zero in front. When I use the sprintf function, the code is:
Data1$entrydate <- sprintf("%06d", data1$entrydate)
But what it spits out is something like 000127, or some other other random number for all the other dates in the problem. I don't understand what's going on, and I would appreciate some help on the issue. Thanks.
PS. I am sometimes also getting a error message that sprintf is only for character values, I don't know if there is any code for numerical values.
I guess you got different results than expected because the column class was factor. You can convert the column to numeric either by as.numeric(as.character(datacolumn)) or as.numeric(levels(datacolumn)). According to ?factor
To transform a factor ‘f’ to approximately its
original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
and slightly more efficient than ‘as.numeric(as.character(f))’.
So, you can use
levels(data1$entrydate) <- sprintf('%06d', as.numeric(levels(data1$entrydate)))
Example
Here is an example that shows the problem
v1 <- factor(c(91003, 91104,90103))
sprintf('%06d', v1)
#[1] "000002" "000003" "000001"
Or, it is equivalent to
sprintf('%06d', as.numeric(v1)) #the formatted numbers are
# the numeric index of factor levels.
#[1] "000002" "000003" "000001"
When you convert it back to numeric, works as expected
sprintf('%06d', as.numeric(levels(v1)))
#[1] "090103" "091003" "091104"
One of the columns in my data frame is a character which has the following format (an example):
2013-02-05 08:00:00
Some of the rows are NULL in this column. I want to change the class to date format but I am getting NA for all rows.
Could you please tell me what should I do to make it work?
You should install Hadley Wickham's lubridate package, and use:
> ymd_hms("2013-02-05 08:00:00")
The package includes many other functions that'll help you (safely) manipulate datetime and interval objects.
Based on your comment, assuming your data frame is DF, and your date column (as character) DATE.STR, I would do the following:
DF$DATE=as.Date(DF$DATE.STR)
Of course, using lubridate would give you more options, but I think you can use base R for this.