I merged several other dataframes together. However, now the dates are no longer chronologically order (See photo). How do I order the dataframe based on the values of the 'Date' column?
R dataframe output which I want to change
I first tried to set the 'Date' column as index, but since the 'Date' column does not only have unique values, I can't.
Whenever I do:
new_df <- new_df[order(new_df$Date),]
it only sorts the dates based on their first value.
Next to that, sometimes there are multiple exact the same values for the 'Date' column. How can I make the index the same whenever the 'Date' column has the exact same value?
The order should be based on the converted to Date class
new_df$Date1 <- as.Date(new_df$Date, "%A, %d %b %Y, %H:%M")
If we want to keep the time part as well in ordering, use as.POSIXct
new_df$Date1 <- as.POSIXct(new_df$Date,format = "%A, %d %b %Y, %H:%M")
and then do
new_df <- new_df[order(new_df$Date1),]
If we want to create a time series object, use xts
library(xts)
xts(new_df["Income"], order.by = new_df$Date1)
As a reproducible example
> str1 <- "Saturday, 12 Apr 2014, 18:00"
> as.Date(str1, "%A, %d %b %Y, %H:%M")
[1] "2014-04-12"
> as.POSIXct(str1, format = "%A, %d %b %Y, %H:%M")
[1] "2014-04-12 18:00:00 EDT"
Related
I need to calculate time difference in minutes/hours/days etc between 2 Date-Time columns of two dataframes, please find the details below
df1 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Mon Apr 1 14:23:09 1980", "Sun Jun 12 12:10:21 1975", "Fri Jan 5 18:45:10 1985", "Thu Feb 19 02:26:19 1990"))
df2 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Apr-01-1980 14:28:00","Jun-12-1975 12:45:10","Jan-05-1985 17:50:30","Feb-19-1990 02:28:00"))
I am facing problem in converting df1$timestamp and df2$timestamp , here POSIXct & as.Date are not working getting error - non numeric argument to binary operator
I need to calculate time diff in mins/hrs or days
One approach is strptime and indicate the appropriate directives in the datetime format:
df1$timestamp2 <- strptime(df1$timestamp, "%a %b %d %H:%M:%S %Y")
df2$timestamp2 <- strptime(df2$timestamp, "%b-%d-%Y %H:%M:%S")
In this case, you have:
%a abbreviated weekday name
%b abbreviated month name
%d day of the month
%H hour, 24-hour clock
%M minute
%S second
%Y year including century
Then you can use difftime to get the difference, and specify the units (in this case, difference expressed in hours):
difftime(df1$timestamp2, df2$timestamp2, units = "hours")
Output
Time differences in hours
[1] -0.08083333 -0.58027778 0.91111111 -0.02805556
If locale-setting prevent correct reading, try:
# Store current locale
orig_locale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
# Convert to posix-timestamp
df1$timestamp <- as.POSIXct( df1$timestamp, format = "%a %b %d %H:%M:%S %Y")
df2$timestamp <- as.POSIXct( df2$timestamp, format = "%b-%d-%Y %H:%M:%S")
# Restore locale
Sys.setlocale("LC_TIME", orig_locale)
# Calculate difference
df2$timestamp - df1$timestamp
# Time differences in mins
# [1] 4.850000 34.816667 -54.666667 1.683333
I imported Excel data into R and I have a problem to convert dates.
In R, my data are character and look like :
date<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
I would like to convert character into date (MM/YYYY) but the '00' value used for days poses a problem and 'NA' are returned systematically.
It works when I manually replace '00' with '01' and then use as.yearmon, ymd and format. But I have lots of dates to change and I don't know how to change all my '00' into '01' in R.
# data exemple
date1<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
# removing time -> doesn't work because of the '00' day
date1c<-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
date1c<-format(strptime(date1, format = '%Y-%m'), '%Y/%m')
# trying to convert character into date -> doesn't work either
date1c<-ymd(date1)
date1c<-strptime(date1, format = "%Y-%m-%d %H:%M:%S")
date1c<-as.Date(date1, format="%Y-%m-%d %H:%M:%S")
date1c<as.yearmon(date1, format='%Y%m')
# everything works if days are '01'
date2<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date2c<-as.yearmon(ymd(format(strptime(date2, format = "%Y-%m-%d"), "%Y/%m/%d")))
date2c
If you have an idea to do it or an another idea to solve my problem, I would be thankful!
Use gsub to replace -00 with -01.
date1<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date1 <- gsub("-00", "-01", date1)
date1c <-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
> date1c
[1] "1971/02/01" "1979/06/01"
Another possibility could be:
as.Date(paste0(substr(date1, 1, 9), "1"), format = "%Y-%m-%d")
[1] "1971-02-01" "1979-06-01"
Here it extracts the first nine characters, pastes it together with 1 and then converts it into a date object.
These alternatives each accept a vector input and produce a vector as output.
Date output
These all will accept a vector as input and produce a Date vector as the output.
# 1. replace first occurrence of '00 ' with '01 ' and then convert to Date
as.Date(sub("00 ", "01 ", date1))
## [1] "1971-02-01" "1979-06-01"
# 2. convert to yearmon class and then to Date
library(zoo)
as.Date(as.yearmon(date1, "%Y-%m"))
## [1] "1971-02-01" "1979-06-01"
# 3. insert a 1 and then convert to Date
as.Date(paste(1, date1), "%d %Y-%m")
## [1] "1971-02-01" "1979-06-01"
yearmon output
Note that if you really are trying to represent just months and years then yearmon class directly represents such objects without the kludge of using an unused day of the month. Such objects are internally represented as a year plus a fraction of a year, i.e. year + 0 for January, year + 1/12 for February, etc. They display in a meaningful way, they sort in the expected manner and can be manipulated, e.g. take the difference between two such objects or add 1/12 to get the next month, etc. As with the others it takes a vector in and produces a vector out.
library(zoo)
as.yearmon(date1, "%Y-%m")
## [1] "Feb 1971" "Jun 1979"
character output
If you want character output rather than Date or yearmon output then these variations work and again accept a vector as input and produce a vector as output:
# 1. replace -00 and everything after that with a string having 0 characters
sub("-00.*", "", date1)
## [1] "1971-02" "1979-06"
# 2. convert to yearmon and then format that
library(zoo)
format(as.yearmon(date1, "%Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 3. convert to Date class and then format that
format(as.Date(paste(1, date1), "%d %Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 4. pick off the first 7 characters
substring(date1, 1, 7)
## [1] "1971-02" "1979-06"
I am converting following format to date from character
January 2016
I want to convert it to following format
201601
I am using following code
df$date <- as.Date(df$date,"%B %Y")
But it returns me NA values. I have even set the locale as follows
lct<- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME",lct)
But it gives me NA values. How to fix it
We can do this easily with as.yearmon and format
library(zoo)
format(as.yearmon(str1), "%Y%m")
#[1] "201601"
If we are going by the as.Date route, then 'Date' requires day also, so, paste a day and then use format after converting to 'Date'
format(as.Date(paste(str1, '01'), "%B %Y %d") , "%Y%m")
data
str1 <- "January 2016"
I have set of dates for a period of 10 years starting April 2006 till August 2016 i.e. 125 months. I want to identify each month by marking them out by sequential numbering starting from "1" till "125" in corresponding column (new column).
Example:
All dates in Apr'2006 will be identified as 1...May'2006 as 2 ...... Aug'2016 as 125.
Dates in the data set is in format type.
Requesting guidance on how to achieve this.
Assume that you start with a vector of dates in factor format:
x<- as.factor(c("8/7/2006", "12/13/2006", "12/14/2006"))
First you should convert this vector to Date format. In your case this can be done like this
x<- as.Date(x, format= "%m/%d/%Y")
Using the format command you can delete the day of a specific date:
format(x, "%Y %m")
> "2006 08" "2006 12" "2006 12"
This way you get rid of the day and just keep year and month.
Next you define a reference vector which contains all months from April 2006 to August 2016:
ref<- seq(from= as.Date("04/01/2006", format= "%m/%d/%Y"), to= as.Date("08/01/2016", format= "%m/%d/%Y"), length.out = 125)
ref<- format(ref, "%Y %m").
Finally you compare the entries from x with the entries from ref. This can be done with the sapply function which basically applies a function to each component of x. Here, the function it applies is the function:
myfun<-function(z) {
which(ref == format(z, "%Y %m"))
}
But since you do not need the function myfun elsewhere you can directly plug it into the sapply funtion. In the end you use the command unlist, so you get a vector.
sapply(x, function(z) which(ref == format(z, "%Y %m")))
> 6 10 10
should do the trick.
Using lubridate to format the dates:
library(lubridate)
# Create a data frame from the string below, as a factor variable
dat <- '8/7/2006 12/13/2006 12/14/2006 12/15/2006 12/16/2006 8/28/2007 8/29/2007 4/22/2008 4/23/2008 4/24/2008 4/25/2008 4/28/2008 4/29/2008 4/30/2008 5/1/2008 5/2/2008 5/7/2016 5/7/2016 5/7/2016 5/7/2016 6/26/2016 7/4/2016 7/31/2016 8/28/2016'
test_df <- data.frame(original=as.factor(strsplit(dat, ' ')[[1]]))
# We will need to convert the dates to strings in the right format
test_df$converted_string <- as.character(floor_date(mdy(test_df$original), unit="month"))
# Create a lookup table
my_months <- seq(125)
names(my_months) <- seq(as.Date('2006-04-01'), by='month', length.out=125)
# Do the lookup
test_df$converted_int <- my_months[test_df$converted_string]
I am not getting the right conversion when I try to convert 12 hours to 24 hours. My script (with sample data is below).
library(lubridate)
library(data.table)
# Read Sample Data
df <- data.frame(c("April 22 2016 10:49:15 AM","April 22 2016 10:01:21 AM","April 22 2016 09:06:40 AM","April 21 2016 09:50:49 PM","April 21 2016 06:07:18 PM"))
colnames(df) <- c("Date") # Set Column name
dt <- setDT(df) # Convert to data.table
ff <- function(x) as.POSIXlt(strptime(x,"%B %d %Y %H:%M:%S %p"))
dt[,dates := as.Date(ff(Date))]
When I try creating a new variable called TOD, I get the output in H:M:S format without converting it into 24 hour format. What I mean is that for the 3rd row, instead of getting 21:50:49 I get 09:50:49. I tried two different ways to do this. One use as.ITime from data.table and then also using strptime. The code I use to calculate TOD is below.
dt[,TOD1 := as.ITime(ff(Date))]
dt$TOD2 <- format(strptime(dt$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S")
I thought of trying it using dataframe instead of data.table to eliminate any issues with using strptime in data.table and still got the same answer.
df$TOD <- format(strptime(df$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S") # Using dataframe instead of data.table
Any insights on how to get the right answer?
As commented #lmo, you need to use %I parameter instead of %H, from ?strptime:
%H
Hours as decimal number (00–23). As a special exception strings
such as 24:00:00 are accepted for input, since ISO 8601 allows these.
%I
Hours as decimal number (01–12).
strptime("April 21 2016 09:50:49 PM", "%B %d %Y %I:%M:%S %p")
# [1] "2016-04-21 21:50:49 EDT"
Here you go:
library(lubridate)
df$Date <- mdy_hms(df$Date)
Note that while mdy_hms is extremely convenient and takes care of the 12 / 24 hour time for you, it will automatically assign UTC as a time zone. You can specify a different one if you need. You can then convert df to a data.table if you like.