I need to sort a data frame by date in R. The dates are all in the form of "dd/mm/yyyy". The dates are in the 3rd column. The column header is V3. I have seen how to sort a data frame by column and I have seen how to convert the string into a date value. I can't combine the two in order to sort the data frame by date.
Assuming your data frame is named d,
d[order(as.Date(d$V3, format="%d/%m/%Y")),]
Read my blog post, Sorting a data frame by the contents of a column, if that doesn't make sense.
Nowadays, it is the most efficient and comfortable to use lubridate and dplyr libraries.
lubridate contains a number of functions that make parsing dates into POSIXct or Date objects easy. Here we use dmy which automatically parses dates in Day, Month, Year formats. Once your data is in a date format, you can sort it with dplyr::arrange (or any other ordering function) as desired:
d$V3 <- lubridate::dmy(d$V3)
dplyr::arrange(d, V3)
In case you want to sort dates with descending order the minus sign doesn't work with Dates.
out <- DF[rev(order(as.Date(DF$end))),]
However you can have the same effect with a general purpose function: rev(). Therefore, you mix rev and order like:
#init data
DF <- data.frame(ID=c('ID3', 'ID2','ID1'), end=c('4/1/09 12:00', '6/1/10 14:20', '1/1/11 11:10')
#change order
out <- DF[rev(order(as.Date(DF$end))),]
Hope it helped.
You can use order() to sort date data.
# Sort date ascending order
d[order(as.Date(d$V3, format = "%d/%m/%Y")),]
# Sort date descending order
d[rev(order(as.Date(d$V3, format = "%d/%m/%y"))),]
Hope this helps,
Link to my quora answer https://qr.ae/TWngCe
Thanks
If you just want to rearrange dates from oldest to newest in r etc. you can always do:
dataframe <- dataframe[nrow(dataframe):1,]
It's saved me exporting in and out from excel just for sort on Yahoo Finance data.
The only way I found to work with hours, through an US format in source (mm-dd-yyyy HH-MM-SS PM/AM)...
df_dataSet$time <- as.POSIXct( df_dataSet$time , format = "%m/%d/%Y %I:%M:%S %p" , tz = "GMT")
class(df_dataSet$time)
df_dataSet <- df_dataSet[do.call(order, df_dataSet), ]
You could also use arrange from the dplyr library.
The following snippet will modify your original date string to a date object, and order by it. This is a good approach, as you store a date as a date, not just a string of characters.
dates <- dates %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
arrange(date)
If you just want to order by the string (usually an inferior option), you can do this:
dates <- dates %>%
arrange(date = as.Date(date, "%d/%m/%Y"))
If you have a dataset named daily_data:
daily_data <- daily_data[order(as.Date(daily_data$date, format="%d/%m/%Y")),]
Related
I have a csv file with a column values like "20140929120000" which gives the date and time.
After importing it to R, I want to format this as a date variable while keeping the time part as well(if that is possible).
So, the output should be a date variable '2014-09-29'.
How would I get the time part as a separate column with value "12:00:00"?
how about
install.packages('lubridate')
library(lubridate)
y <- as.numeric(20140929120000)
df %>%
mutate(Date = as.Date(ymd_hms(y), tz= Sys.timezone()),
Time = format(lubridate::ymd_hms(y), "%H:%M:%S")
just change the Y in the mutate to your date column
We can use the POSIXct type here:
val <- "20140929120000"
mask <- "%Y%m%d%H%M%S"
as.POSIXct(strptime(val, mask))
[1] "2014-09-29 12:00:00 UTC"
To see the various components of the timestamp, try:
unclass(strptime(val, mask))
My current setup
How do I filter the end_time column for data only after 12/01/2018 and then sum these data after this date?
Below is what I have already tried.
setwd("/Users/jackbell/Desktop")
bookings<- read.csv("bookings_data_data_analyst_test.csv", header= TRUE)
end_time<- bookings %>%select(end_time)
end_time
new_date <- filter(end_time< as.Date("12/01/2018"))
We need to convert it to Date class. Based on the image and the OP's code, 'end_time' seems to be the column name and there is also an object created with the same name. In the last step, the semantic is incorrect as we need to apply filter on the data object. The data object ('end_time') was not called. Secondly, the formats for 'Date' is day/month/Year. By default, as.Date returns a Date class if the format is Year-month-day (YYYY-MM-DD). For all other formats, specify the format
library(tidyverse)
end_time %>%
filter(dmy(end_time) < dmy("12/01/2018"))
In the above code, we used dmy from lubridate package. If we use as.Date, it would be
end_time %>%
filter(as.Date(end_time, format = "%d/%m/%Y") < as.Date("2018-01-12"))
How can I transform a value from Factor to time ? I've tried using lubridate package but had no success.
I have a dataframe with a column "time" with 08:00:00 like values. Then used
phsb1 <- phsb %>%
dplyr::mutate(time = lubridate::hm(time))
with resulted in a class with 6 slots
data year month day hour and minutes
Any help to be able to obtain 08:00 like values would be very much appreciated.
Further more information or advice regarding how to handle "time" would be fantastic. I've found a lot about "dates" but almost nothing related to "time".
I think OP wants to convert a column containing data in H:M:S' format toH:M` format in character.
Option #1: Simply get substring containing part of hour and min using sub as:
library(dplyr)
phsb1 <- phsb %>%
mutate(time = sub("(\\d{2}:\\d{2}):\\d{2}","\\1", as.character(time)))
Option #2: Use parse_date_time from lubridate as.
library(lubridate)
library(dplyr)
phsb1 <- phsb %>%
mutate(time = format(parse_date_time(as.character(time), "HMS"), format = "%H:%M"))
#Example
format(parse_date_time("08:05:00", "HMS"), format = "%H:%M")
#"08:05"
I have a dataframe of dates and numeric values in R. The dates are all the first of the month and the values are a number associated with that month
library(DT)
library(dplyr)
df <- data.frame(date = as.Date(c("2017-01-01","2017-02-01","2017-03-01","2017-04-01")),
val = c(-5600,7000,4200,-2000))
I'd like to stick this through DT::datatable(), which is my new favourite thing. However, I'd like to have the output formatted nicely, thousand separators, nice dates etc.
df <- df %>% mutate(val = formatC(val, big.mark=","))
datatable(df)
This turns val into a character vector, although datatable() is apparently able to recognise that it's really a number and sort appropriately using the arrows in the header. So far so good.
However the issue comes when I try to format the date as MMM YY.
df <- df %>% mutate(date = format(date, "%b %y"))
datatable(df)
This turns date into a character vector as well - the values look like "Jan 17" etc. Everything looks fine, only trouble is when I go to sort by date, it doesn't recognise the values as months and puts them in alphabetical rather than chronological order.
Is there any way of reformatting the dates, either prior to or whilst passing them to datatable(), to keep the "date-ness" of the variable and allow it to be sorted appropriately? Failing that, is there another package that outputs interactive tables and is better at sorting?
Thanks in advance,
James
you can take help of lubridate package.
And do the stuff using this function.
What you need to do is take month and date separately into account.
library(lubridate)
date_conversion<-function(df){
months<-month(df$date,label = T)
years<-year(df$date)
months_years<-paste(months, years, sep = " ")
df[1]<-months_years
df[order(row.names(df),decreasing = F),]
}
hope this helps you .... :)
DataTables as integrated in R by the DT package has options to format numeric and date variables while maintaining the proper sort order.
Below, I will discuss three different options:
library(DT)
df <- data.frame(date = as.Date(c("2017-01-01","2017-02-01","2017-03-01","2017-04-01")),
val = c(-5600,7000,4200,-12000))
Please, note that I've deliberately choosen to change the last value in column val to demonstrate a pitfall in using formatC().
# OP's own formatting
df$val_chr <- formatC(df$val, big.mark=",")
df$date_chr <- format(df$date, "%b %y")
# copy columns to demonstrate DT formatting
df$val_dt <- df$val
df$date_dt <- df$date
# ISO 8601 year-month format as alternative
df$dat_iso <- format(df$date, "%Y-%m")
# create DT object and apply DT formatting
datatable(df) %>% formatCurrency("val_dt", "") %>% formatDate("date_dt", "toDateString")
Note that val_dt has been formatted nicely as expected and is right justified. In contrast, val_chr is left justified with the thousands separators not aligned. In addition, formatC() has recognized that val is of type double and has used the "g" format by default. According to the description of the formatparameter in ?formatC Default is "d" for integers, "g" for reals. So, we do get
formatC(12000L, big.mark=",")
#[1] "12,000"
but
formatC(12000, big.mark=",")
#[1] "1.2e+04"
Sorting by date_dt within the datatables object by clicking on the small arrows symbols at the right side of the column headers works as expected in contrast to date_chr. Unfortunately, the number of available methods for formatDate() is limited and doesn't include the desired month-year format. (There is a datetime plugin which converts date / time source data into one suitable for display but I haven't explored that in detail.)
Column date_iso shows the abbreviated ISO 8601 format YYYY-MM as a third option. This is my favoured format (which I do use alot also for aggregating by month) because
it always sorts correctly, even for several years,
it doesn't depend on the current locale, so it works in any language,
it is short while being unambiguous,
and it is an international standard.
Addendum
The formattable package does also have various formatter functions and can create DataTables:
library(formattable)
as.datatable(formattable(df))
I have a problem. I downloaded data and tranformed dates into POSIXlt format
df<-read.csv("007.csv", header=T, sep=";")
df$transaction_date<-strptime(df$transaction_date, "%d.%m.%Y")
df$install_date<-strptime(df$install_date, "%d.%m.%Y")
df$days<- as.numeric(difftime(df$transaction_date,df$install_date, units = "days"))
Data frame is about transaction in one online game. It contains value (its payment), transaction_date, intall_date and ID. I added new column, which showndays after installation. I tried to summarise data using dlyr
df2<-df %>%
group_by(days) %>%
summarise(sum=sum(value))
And I've got an error:
Error: column 'transaction_date' has unsupported type : POSIXlt, POSIXt
How can i Fix it?
UPD. I changed classes of Date columns into Character. It solved problem. But can i use dlyr withouts changing classes in my dataset?
You could use as.POSIXct as recommended in the comments but if the hours, minutes, and seconds don't matter then you should just use as.Date
df <- read.csv("007.csv", header=T, sep=";")
df2 <- df %>%
mutate(
transaction_date = as.Date(transaction_date, "%d.%m.%Y")
,install_date = as.Date(install_date, "%d.%m.%Y")
) %>%
group_by(days = transaction_date - install_date) %>%
summarise(sum=sum(value))
As noted here, this is a "feature" of the tidyverse. They don't want to handle POSIXlt object because it is some kind of list within a vector. However, using as.POSIXct isn't always an option. In my case I really needed the POSIXlt class to handle some uncleaned data. In that case, just go back to good old stable base R. In your case:
df2 <- aggregate(df1$value, by=list(df$days), sum)
One trick I use often is the following:
Convert POSIXt columns (in example below eventDate) to character
Perform dplyr operations you need (in example below we bind rows of two data frames)
Convert back from character to POSIXt not forgetting to set the right format (format) and timezone (tz) as it was before performing step 1.
Example:
# step 1
df1$eventDate <- as.character.POSIXt(df1$eventDate)
df2$eventDate <- as.character.POSIXt(df2$eventDate)
#step 2
merged_df <- bind_rows(df1, df2)
#step 3
merged_df$eventDate <- strptime(merged_df$eventDate, format = "%Y-%m-%d", tz = "UTC")