dates transfer from spreadsheet to R - r

I have 451 dates in the format "2002-06-18",YYYY-MM-DD, in the spreadsheet program libre office calc. I would like to transfer these dates into R as a column with the name "Date_Sale".
In the next step I copied this column of dates to a text file. In the next step I read this text file into R by the command
Date_Sale <- read.csv("Date_Sale.txt", header=FALSE,stringsAsFactors=FALSE)
> str(Date_Sale)
'data.frame': 451 obs. of 1 variable:
$ V1: chr "2002-06-18" "2002-05-22" "2002-05-23" "2002-10-23" ...
Above the command str etc. shows that the data was read as dataframe in the format chr, character, into R. Now I tried to use the command
Date_Sale <- strptime(Date_Sale, "%Y-%m-%d")
There appears the error message
Fehler in strptime(Date_Sale, "%Y-%m-%d") :
Eingabe-Zeichenkette ist zu lang
If I use one element in the command above it works.
firstday <- strptime("2002-06-18", "%Y-%m-%d")
[1] "2002-06-18 CEST"

Here is one approach
library(tidyverse)
df <- tribble(~my_date,
"2002-06-18",
"2002-05-22",
"2002-05-23",
"2002-10-23")
df %>%
mutate(my_date = lubridate::ymd(my_date))
or
df %>%
mutate(my_date = as.Date(my_date, format = '%Y-%m-%d'))
Be careful with timezones when converting data. strptime will use your current time zone by default which may be summer time (daylight saving time). Check ?strptime

Related

Error by changing all the date format in R

I'm trying to change the format from all the dates in one column. I tried it with strftime, as.Date and with parse_date, all giving me some sort of issue. The column includes 2200 different times currently expressed in the following format: Feb-03-2022, it should be expressed as: "%B %d %Y", how could I modify all dates?
ethereum <- read_csv('ethereum_2022-01-04_2022-02-03.csv')
head(ethereum)
# Changing date format in the dataset
ethereum$Date <- parse_date(ethereum$`Date`, "%d-%b-%y")
head(ethereum$Date)
# Naming the datatype and the timeseries
ds<- ethereum$Date
y<- ethereum$`Close`
df<- data.frame(ds,y)
View(df)
When I try with this code, I get the following error:
Warning: 2200 parsing failures.
row col expected actual
1 -- date like %d-%b-%y Feb-03-2022
2 -- date like %d-%b-%y Feb-02-2022
3 -- date like %d-%b-%y Feb-01-2022
4 -- date like %d-%b-%y Jan-31-2022
5 -- date like %d-%b-%y Jan-30-2022
... ... .................. ...........
See problems(...) for more details.
You can convert the vector to the date format and then apply any desired formatting.
vec_str <- c("Feb-03-2022", "Feb-02-2022", "Jan-31-2022", "Jan-30-2022")
vec_dates <- as.Date(x = vec_str, format = "%b-%d-%Y")
vec_dates_str <- format(vec_dates, "%B %d %y")
vec_dates_str
# [1] "February 03 22" "February 02 22" "January 31 22" "January 30 22"
For convenience of applying in data frame you can wrap this behaviour in a function:
my_date_transform <- function(x,date_in_format = "%b-%d-%Y",
date_out_format = "%B-%d-%Y") {
x_dates <- as.Date(x = x, format = date_in_format)
format(x = vec_dates, date_out_format)
}
my_date_transform(x = vec_str)
Example
sample_data <- data.frame(original_date_str = vec_str)
sample_data$new_date_format <- my_date_transform(sample_data$original_date_str)
sample_data
# >> sample_data
# original_date_str new_date_format
# 1 Feb-03-2022 February-03-2022
# 2 Feb-02-2022 February-02-2022
# 3 Jan-31-2022 January-31-2022
# 4 Jan-30-2022 January-30-2022
You can then apply your function to a data frame
I assume you are having a packages problem.
Try this example:
library(parsedate)
## Calling the function
parse_date("Feb-03-2022")
## Specifying the package to avoid masked functions
parsedate::parse_date("Feb-03-2022")
In your code would look like this:
ethereum$Date <- parse_date(ethereum$Date)
The package "parsedate", is used to parse from any date format. I don't know if all the 2.2k are in the same format, so my suggestion is to use that. In case you are 100% sure, you can use many other parsing date functions. You can even write one yourself using string processing techniques.

conditional conversion from character to date for a dataframe column in r

I have a dataframe I read from an excel file, like below. Date turned out to be in 5-digits format or a date string format.
df = data.frame(Date = c('42195', '3/31/2016', '42198'), Value = c(123, 445, 222))
Date Value
42195 123
3/31/2016 445
42198 222
I want to clean up the column and convert everything into date format.
I did the following.
df %>%
mutate(Date = ifelse(length(Date)==5,as.Date(Date, origin = '1899-12-30'), as.Date(Date) ))
I got error like this:
Error in charToDate(x) : character string is not in a standard unambiguous format
What did I do wrong? I could not figure out why after many attempts to fix it.
Thanks a lot.
There are few changes that you need to do :
1) Instead of length(Date)==5, I guess you are looking for nchar
2) To change excel date to R date, they need to be numeric. Currently they are factors.
3) You need to provide specific format to as.Date when date is not in standard format.
4) Use if_else instead of ifelse since the latter would change them to numbers.
library(dplyr)
df %>%
mutate(Date = as.character(Date),
Date = if_else(nchar(Date) ==5,
as.Date(as.numeric(Date), origin = '1899-12-30'),
as.Date(Date, "%m/%d/%Y")))
# Date Value
#1 2015-07-10 123
#2 2016-03-31 445
#3 2015-07-13 222

Reformatting date and timestamp with r [duplicate]

This question already has answers here:
Changing date format in R
(7 answers)
Closed 3 years ago.
In order to prevent an error in uploading xls data into a sql database, I am trying to reformat a date type of "08/22/2019 02:05 PM CDT" and want only the date, not the time or the timezone. Many efforts to use the default, POSIX and lubridate actions have failed. The xls file formats the date column as general.
I have a column of data to convert, not a single cell. This is a part of a loop for multiple files in a folder.
Failures:
#mydata_r11_Date2 <- strptime(as.character(mydata_r11_Date$Date), "%d/%m/%Y")
# parse_date_time(x = mydata_r11_Date$Date,
# orders = c("d m y", "d B Y", "m/d/y"),
# locale = "eng")
#
#
# mydata_r11_Date <- as.character(mydata_r11_Date)
mydata_r11_Date <- gsub('^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]+[+-][0-9]{2}):([0-9]{2})$',
'\\1\\2',
mydata_r11_Date$Date)
ymd_hms(mydata_r11_Date$Date)
mydata_r11_Date <- as_date (mydata_r11_Date$Date,format = "%Y-%m-%d")
mydata_r11_Date2 <- format(as.Date(mydata_r11_Date,"%Y-%m-%d"),"%Y-%m-%d")
Errors include:
Warning message:
All formats failed to parse. No formats found.
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
Error in as.Date.default(mydata_r11_Date$Date, format = "%Y-%m-%d") :
do not know how to convert 'mydata_r11_Date$Date' to class “Date”
Error: unexpected ',' in " mydata_r11_Date <- as.Date(mydata_r11_Date$Date),"
Error in as_date(x) : object 'x' not found
library(readxl)library(reshape2) library(lubridate)
import xsl
mydata_r11 <- read_excel("C:/FOLDER/FOLDER/FOLDER/OUTPUT/WADUJONOKO_student_assessment_results.xls",1,skip = 1, col_types = "list")
Isolate date column
mydata_r11_Date <- mydata_r11[,c(8)]
Convert date
mydata_r11_Date 2 <-
Have "08/22/2019 02:05 PM CDT"
Want "08/22/2019"
I don't understand why you are resorting to complex regex here when you seem to only want the date component, which is the first 10 characters of the timestamps. Just take the substring and then call as.Date with an appropriate format mask:
x <- "08/22/2019 02:05 PM CDT"
y <- substr(x, 1, 10)
as.Date(y, format = "%m/%d/%Y")
[1] "2019-08-22"

load data from xlsx file as date and time

Now a xlsx file contained a date column as :
Date
2019-3-1 0:15
2019-3-1 19:15
2019-3-1 23:15
How can I load it into data.frame as read date and time datatype? My tool is openxlsx package and I tried like:
df <- readWorkbook(xlsxFile = '0301-0314.xlsx',sheet=1)
First you read the data set using any library. Then you can try as.POSIXlt or as.POSIXct to define the date-time format. This also allows you to provide timezone info along with date-time format.
Example:
> sampledf <- data.frame(DateTime = c("2019-3-1 0:15",
+ "2019-3-1 19:15",
+ "2019-3-1 23:15")
+ )
> str(sampledf$DateTime)
Factor w/ 3 levels "2019-3-1 0:15",..: 1 2 3
> sampledf$DateTime <- as.POSIXlt(sampledf$DateTime ,"GMT",format = "%Y-%m-%d %H:%M")
> str(sampledf$DateTime)
POSIXlt[1:3], format: "2019-03-01 00:15:00" "2019-03-01 19:15:00" ...
Timezone info "GMT" can be replace with any time zone.
More about different time formatting options in R is available here.
This will work:
# Create example dataset:
df <- data.frame(Date = c("2019-3-1 0:15",
"2019-3-1 19:15",
"2019-3-1 23:15")
)
df$Date <- as.character(df$Date)
# Format "Date" as date and time:
df$time <- strptime(as.character(df$Date), "%Y-%m-%d %H:%M")
# Check:
str(df)
# If then you would like to count time, for example in number of hours, from a certain initial time (e.g. 2019-3-1 0:15) try:
df$timestep <- as.numeric(difftime(time2="2019-3-1 0:15", time1=df$time, units="hours"))

Converting dates from excel to R

I have difficulty converting dates from excel (reading from csv) to R. Help is much appreciated.
Here is what I'm doing:
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
However, some dates get converted but some not. Here is the output of:
head(df$date)
[1] NA NA NA "0006-01-05" NA NA
the first 5 entries imported from csv file are as follows:
7/28/05
7/28/05
12/16/05
5/1/06
4/21/05
and here is the output of:
head(df$excel.date)
[1] 7/28/05 7/28/05 12/16/05 5/1/06 4/21/05 1/25/07
1079 Levels: 1/1/00 1/1/02 1/1/97 1/10/96 1/10/99 1/11/04 1/11/94 1/11/96 1/11/97 1/11/98 ... 9/9/99
str(df)
.
.
$ excel.date : Factor w/ 1079 levels "1/1/00","1/1/02",..: 869 869 288 618 561 48 710 1022 172 241 ...
First of all, make sure you have the dates in your file in an unambiguous format, using full years (not just 2 last numbers). %Y is for "year with century" (see ?strptime) but you don't seem to have century. So you can use %y (at your own risk, see ?strptime again) or reformat the dates in Excel.
It is also a good idea to use as.is=TRUE with read.csv when reading in these data -- otherwise character vectors are converted to factors which can lead to unexpected results.
And on Wndows it may be easier to use RODBC to read in dates directly from xls or xlsx file.
(edit)
The following may give a hint:
> as.Date("13/04/2014", format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/2014"), format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%Y")
[1] "14-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%y")
[1] "2014-04-13"
(So as.Date can actually take care of factors - the magick happens in as.Date.factor method defined as:
function (x, ...) as.Date(as.character(x), ...)
It is not a good idea to represent dates as factors but in this case it is not a problem either. I think the problem is excel which saves your years as 2-digit numbers in a CSV file, without asking you.)
-
The ?strptime help file says that using %y is platform specific - you can have different results on different machines. So if there's no way of going back to the source and save the csv in a better way you might use something like the following:
x <- c("7/28/05", "7/28/05", "12/16/05", "5/1/06", "4/21/05", "1/25/07")
repairExcelDates <- function(x, yearcol=3, fmt="%m/%d/%Y") {
x <- do.call(rbind, lapply(strsplit(x, "/"), as.numeric))
year <- x[,yearcol]
if(any(year>99)) stop("dont'know what to do")
x[,yearcol] <- ifelse(year <= as.numeric(format(Sys.Date(), "%Y")), year+2000, year + 1900)
# if year <= current year then add 2000, otherwise add 1900
x <- apply(x, 1, paste, collapse="/")
as.Date(x, format=fmt)
}
repairExcelDates(x)
# [1] "2005-07-28" "2005-07-28" "2005-12-16" "2006-05-01" "2005-04-21"
# [6] "2007-01-25"
Your data is formatted as Month/Day/Year so
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
should be
df$date = as.Date(df$excel.date, format = "%m/%d/%Y")

Resources