Hi :) I have a column of my data.frame which contains dates in two formats. Here is an short minimal example:
D = data.frame(dates = c("3/31/2016", "01.12.2015"))
dates
1 3/31/2016
2 01.12.2015
With the nice function strptime I can easily get date-times for each format:
D$date1 <- strptime(D$dates, format = "%m/%d/%Y")
D$date2 <- strptime(D$dates, format = "%d.%m.%Y")
I already managed a workaround with:
D$date12 <- do.call(pmin, c(D[c("date1","date2")], na.rm=TRUE) )
To achieve this:
dates date1 date2 date12
1 3/31/2016 2016-03-31 <NA> 2016-03-31
2 01.12.2015 <NA> 2015-12-01 2015-12-01
Is there are more sophisticated way to do this transformation (from dates to date12) at once?
Regards
You can use the anytime package.
library(anytime)
anytime::addFormats("%d.%m.%Y")
anydate(D$dates)
Note that the argument in anydate has to be a vector, so just select the coloumn dates.
Or use lubridate
parse_date_time(D$dates, c("mdy", "dmy"))
Related
Trying to change the format of the column incident_timestamp so I can accurately query the data using sqldf correctly. Here is the dataset I have for reference.Column #21 is the column I am interested in changing to date format
Assuming you want to convert incident_timestamp from a character to a date/time value you can use ymd_hms() from the lubridate package:
library(lubridate)
dt <- data.frame(
incident_timestamp = c(
"2017/01/04 01:25:00+00",
"2017/03/23 13:15:00+00",
"2017/03/25 13:45:00+00"
)
)
dt$converted_incident_timestamp <-
ymd_hms(dt$incident_timestamp)
dt
Output:
incident_timestamp converted_incident_timestamp
1 2017/01/04 01:25:00+00 2017-01-04 01:25:00
2 2017/03/23 13:15:00+00 2017-03-23 13:15:00
3 2017/03/25 13:45:00+00 2017-03-25 13:45:00
I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03
Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47
In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.
Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00
I have a dataframe with the column name perioden. This column contains the date but it is written in this format: 2010JJ00, 2011JJ00, 2012JJ00, 2013JJ00 etc..
This column is also a character when I look at the structure. I've tried multiple solutions but so far am still stuck, my qeustion is how can I convert this column to a date and how do I remove the JJ00 part so that you only see the year format of the column.
You can try this approach. Using gsub() to remove the non desired text (as said by #AllanCameron) and then format to date using paste0() to add the day and month, and as.Date() for date transformation:
#Data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'),stringsAsFactors = F)
#Remove string
df$Date <- gsub('JJ00','',df$Date)
#Format to date, you will need a day and month
df$Date2 <- as.Date(paste0(df$Date,'-01-01'))
Output:
Date Date2
1 2010 2010-01-01
2 2011 2011-01-01
3 2012 2012-01-01
4 2013 2013-01-01
We can use ymd with truncated option
library(lubridate)
library(stringr)
ymd(str_remove(df$Date, 'JJ\\d+'), truncated = 2)
#[1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'), stringsAsFactors = FALSE)
I have this kind of data frame
dat <- read.table(text = " count date
0 10/05/2012
1 10/05/2013
1 10/05/2014
",header = TRUE)
I would like to have a new variable that will contain the following format:
> dat
count date DayMonth
1 0 10/05/2012 10-05
2 1 10/05/2013 10-05
3 1 10/05/2014 10-05
I tried some versions of strptime function like dat$DayMonth<-strptime(dat$date, "%d/%m") but got strange resluts.
How can I get the desired result
Same solution using the packages lubridate or anydate.
library(lubridate)
dat$DayMonth <- format(dmy(dat$date), "%d-%m")
# dmy stands for day,month,year, can you use ymd etc.
library(anytime)
dat$DayMonth <- format(anydate(dat$date), "%d-%m")
We can do this with as.Date and format
dat$DayMonth <- format(as.Date(dat$date, "%d/%m/%Y"), "%d-%m")
dat$DayMonth
#[1] "10-05" "10-05" "10-05"
Using strptime converts to POSIXlt/POSIXct class, from which we can change to the format using format
NOTE: No external packages used
The format of my excel data file is:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:04:00 1
I open my file with this:
ts = read.csv(file=pathfile, header=TRUE, sep=",")
How can I add additional rows with zero number in column “value” into the data frame. Output example:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:03:00 0
01-01-2000 00:04:00 1
This is now completely automated in the padr package. Takes only one line of code.
original <- data.frame(
day = as.POSIXct(c("01-01-2000 00:00:00",
"01-01-2000 00:01:00",
"01-01-2000 00:02:00",
"01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
value = c(4, 3, 1, 1))
library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)
See vignette("padr") or this blog post for its working.
I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.
# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")
# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX
# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Try:
ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]
Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:
ts.out$day = as.factor(ts.out$day)
Tidyr offers the nice complete function to generate rows for implicitly missing data. I use replace_na to turn NA values to 0 in second step.
ts%>%
tidyr::complete(day=seq.POSIXt(min(day), max(day), by="min"))%>%
dplyr::mutate(value=tidyr::replace_na(value,0))
Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.