Goal: Plot a time series.
Problem: X-axis data is of course viewed as a character and I'm having trouble converting the character into a date.
new.df <- df %>%
group_by(Month, Year) %>%
summarise(n = n())
new.df <- new.df %>%
unite(Date, Month, Year, sep = "/") %>%
mutate(Total = cumsum(n))
So, I end up with a data frame looking like this:
Date n Group Total
8/2010 1 1 1
9/2010 414 1 415
etc
I'm trying to convert the Date column into a Date format. The column is a character class. So, I tried doing
new.df$Date <- as.Date(New.Patients$Date, %m/%Y)
However, when I do that, it replaces the entire Date column into NA's.
I'm not sure if this is because my single-digit month dates do not have 0's in front or not. I did the unite() function just because I thought it may make it easier, but it might not.
I originally created the Year/Month variable with the lubridate package but I wasn't sure I could incorporate that here. Bonus points if someone can show me how.
I would appreciate any help or guidance. I'm sure it's not that hard I am just having a major brain fart at the moment.
You can try like this:
library(zoo) # for yearmon
new.df$Date <- as.yearmon(New.Patients$Date, format="%m/%Y")
But if you really need it to be as.Date then I guess you have to define day (e.g. 01) as #lukeA has suggested in comment.
My issue, as pointed out by lukeA in the comments, is that the as.Date function requires a day to be somewhere within the character string.
Therefore, just by pasting "01" (or I think virtually any other two-digit combination would work) to the front of each date fixed the issue.
Related
I'm writing an r program which lists sales prices for various items. I have a column called InvoiceDate, which lists date and time as follows: '12/1/2009 7:45'. I'm trying to isolate the date only in a separate field called date, and then arrange the dates sequentially. The code I'm using is as follows:
library(dplyr)
library(ggplot2)
setwd("C:/Users/cshor/OneDrive/Environment/Restoration_Ecology/Udemy/Stat_Thinking_&_Data_Sci_with_R/Assignments/Sect_5")
retail_clean <- read.csv("C:/Users/cshor/OneDrive/Environment/Restoration_Ecology/Udemy/Stat_Thinking_&_Data_Sci_with_R/Data/retail_clean.csv")
retail_clean$date <- as.Date(retail_clean$InvoiceDate)#, format = "%d/%m/%Y")
total_sales = sum(retail_clean$Quantity, na.rm=TRUE) %>%
arrange(retail_clean$date) %>% ggplot(aes(x=date, y=total_sales)) + geom_line()
Initially, everything works fine, and the date field is created. However, I get the following error for the arrange() function:
Error in UseMethod("arrange") :no applicable method for 'arrange' applied to an object of class "c('integer', 'numeric')"
I've searched for over a week for a solution to this problem, but have found nothing that specifically addresses this issue. I've also used '.asPosixct' instead of .asDate, with similar results. Any help as to why the program interprets Date data as numeric, and how I can correct the problem, would be greatly appreciated.
First, the error message is not about Date time.
Let's look at the code you provided:
total_sales = sum(retail_clean$Quantity, na.rm=TRUE) %>%
arrange(retail_clean$date) %>% ggplot(aes(x=date, y=total_sales)) + geom_line()
The result of this term sum(retail_clean$Quantity, na.rm=TRUE) is an integer in your case, and it is piped into the first argument of the dpyr::arrange function, which calls UseMethod("arrange").
Then, the piped argument is inspected as being an object of class of integer and numeric, and arrange do not have a method for these classes, that is, neither arrange.integer nor arrange.numeric are defined. Hence the error msg. There is nothing wrong with you date convertion except that you do need that format term you commented out in the code sample.
The solution is also simple. Change sum to something that returns a data.frame or other classes that arrange is aware of. You can check what methods are available for arrange:
$>methods(dplyr::arrange)
[1] arrange.data.frame*
In this R instance, you can only put a data.frame object through arrange, but you can always define specific methods for other classes.
Looks like this is a Udemy course assignment. Maybe here you need to calculate a sum for each day or each month, whichever your assignment is asking you to do, but sum is definitely not the right answer.
By the way, welcome to SO!
Update:
An example
n <- 100
data <- data.frame(sales = runif(n), day = sample(1:30, n, replace = TRUE))
data$date_ <- paste0(data$day, "/1/2009 7:45")
head(data$date_) # This is the orignial date string
data$date <- as.Date(data$date_, format = "%d/%m/%Y")
head(data$date) # Check here to see the formated date
library(dplyr)
library(ggplot2)
data %>%
group_by(date) %>%
summarise(totalSale = sum(sales, na.rm=TRUE)) %>%
arrange(date) %>%
ggplot(aes(x = date, y = totalSale)) +
geom_line()
Here is the plot
It looks fine, isn't it? The sales are all ordered by date now.
How can I transform a value from Factor to time ? I've tried using lubridate package but had no success.
I have a dataframe with a column "time" with 08:00:00 like values. Then used
phsb1 <- phsb %>%
dplyr::mutate(time = lubridate::hm(time))
with resulted in a class with 6 slots
data year month day hour and minutes
Any help to be able to obtain 08:00 like values would be very much appreciated.
Further more information or advice regarding how to handle "time" would be fantastic. I've found a lot about "dates" but almost nothing related to "time".
I think OP wants to convert a column containing data in H:M:S' format toH:M` format in character.
Option #1: Simply get substring containing part of hour and min using sub as:
library(dplyr)
phsb1 <- phsb %>%
mutate(time = sub("(\\d{2}:\\d{2}):\\d{2}","\\1", as.character(time)))
Option #2: Use parse_date_time from lubridate as.
library(lubridate)
library(dplyr)
phsb1 <- phsb %>%
mutate(time = format(parse_date_time(as.character(time), "HMS"), format = "%H:%M"))
#Example
format(parse_date_time("08:05:00", "HMS"), format = "%H:%M")
#"08:05"
I'll preface this by saying I'm very much a self taught beginner with R.
I have a very large data set looking at biological data.
I want to find the average of a variable "shoot.density" split by year, but my date data is entered as "%d/%m/%y". This means using the normal way I would achieve this splits by each individual date, rather than by year only, eg.
tapply(df$Shoot.Density, list(df$Date), mean)
Any help would be much appreciated. I am also happy to paste in a section of my data, but I'm not sure how.
If your data is in date-class, you can use format to transform your date column to a year variable:
tapply(df$Shoot.Density, list(format(df$Date, '%Y')), mean)
If it is in the format %d/%m/%y, you need the substr function:
tapply(df$Shoot.Density, list(substr(df$Date,7,8)), mean)
You can also do this with dplyr:
library(dplyr)
df %>%
group_by(years = format(df$Date, '%Y')) %>%
summarise(means = mean(Shoot.Density))
Another way to do this is with the year function of the data.table package:
library(data.table)
setDT(df)[, mean(Shoot.Density), by = year(Date)]
I need to sort a data frame by date in R. The dates are all in the form of "dd/mm/yyyy". The dates are in the 3rd column. The column header is V3. I have seen how to sort a data frame by column and I have seen how to convert the string into a date value. I can't combine the two in order to sort the data frame by date.
Assuming your data frame is named d,
d[order(as.Date(d$V3, format="%d/%m/%Y")),]
Read my blog post, Sorting a data frame by the contents of a column, if that doesn't make sense.
Nowadays, it is the most efficient and comfortable to use lubridate and dplyr libraries.
lubridate contains a number of functions that make parsing dates into POSIXct or Date objects easy. Here we use dmy which automatically parses dates in Day, Month, Year formats. Once your data is in a date format, you can sort it with dplyr::arrange (or any other ordering function) as desired:
d$V3 <- lubridate::dmy(d$V3)
dplyr::arrange(d, V3)
In case you want to sort dates with descending order the minus sign doesn't work with Dates.
out <- DF[rev(order(as.Date(DF$end))),]
However you can have the same effect with a general purpose function: rev(). Therefore, you mix rev and order like:
#init data
DF <- data.frame(ID=c('ID3', 'ID2','ID1'), end=c('4/1/09 12:00', '6/1/10 14:20', '1/1/11 11:10')
#change order
out <- DF[rev(order(as.Date(DF$end))),]
Hope it helped.
You can use order() to sort date data.
# Sort date ascending order
d[order(as.Date(d$V3, format = "%d/%m/%Y")),]
# Sort date descending order
d[rev(order(as.Date(d$V3, format = "%d/%m/%y"))),]
Hope this helps,
Link to my quora answer https://qr.ae/TWngCe
Thanks
If you just want to rearrange dates from oldest to newest in r etc. you can always do:
dataframe <- dataframe[nrow(dataframe):1,]
It's saved me exporting in and out from excel just for sort on Yahoo Finance data.
The only way I found to work with hours, through an US format in source (mm-dd-yyyy HH-MM-SS PM/AM)...
df_dataSet$time <- as.POSIXct( df_dataSet$time , format = "%m/%d/%Y %I:%M:%S %p" , tz = "GMT")
class(df_dataSet$time)
df_dataSet <- df_dataSet[do.call(order, df_dataSet), ]
You could also use arrange from the dplyr library.
The following snippet will modify your original date string to a date object, and order by it. This is a good approach, as you store a date as a date, not just a string of characters.
dates <- dates %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
arrange(date)
If you just want to order by the string (usually an inferior option), you can do this:
dates <- dates %>%
arrange(date = as.Date(date, "%d/%m/%Y"))
If you have a dataset named daily_data:
daily_data <- daily_data[order(as.Date(daily_data$date, format="%d/%m/%Y")),]
I have a dataframe of dates and numeric values in R. The dates are all the first of the month and the values are a number associated with that month
library(DT)
library(dplyr)
df <- data.frame(date = as.Date(c("2017-01-01","2017-02-01","2017-03-01","2017-04-01")),
val = c(-5600,7000,4200,-2000))
I'd like to stick this through DT::datatable(), which is my new favourite thing. However, I'd like to have the output formatted nicely, thousand separators, nice dates etc.
df <- df %>% mutate(val = formatC(val, big.mark=","))
datatable(df)
This turns val into a character vector, although datatable() is apparently able to recognise that it's really a number and sort appropriately using the arrows in the header. So far so good.
However the issue comes when I try to format the date as MMM YY.
df <- df %>% mutate(date = format(date, "%b %y"))
datatable(df)
This turns date into a character vector as well - the values look like "Jan 17" etc. Everything looks fine, only trouble is when I go to sort by date, it doesn't recognise the values as months and puts them in alphabetical rather than chronological order.
Is there any way of reformatting the dates, either prior to or whilst passing them to datatable(), to keep the "date-ness" of the variable and allow it to be sorted appropriately? Failing that, is there another package that outputs interactive tables and is better at sorting?
Thanks in advance,
James
you can take help of lubridate package.
And do the stuff using this function.
What you need to do is take month and date separately into account.
library(lubridate)
date_conversion<-function(df){
months<-month(df$date,label = T)
years<-year(df$date)
months_years<-paste(months, years, sep = " ")
df[1]<-months_years
df[order(row.names(df),decreasing = F),]
}
hope this helps you .... :)
DataTables as integrated in R by the DT package has options to format numeric and date variables while maintaining the proper sort order.
Below, I will discuss three different options:
library(DT)
df <- data.frame(date = as.Date(c("2017-01-01","2017-02-01","2017-03-01","2017-04-01")),
val = c(-5600,7000,4200,-12000))
Please, note that I've deliberately choosen to change the last value in column val to demonstrate a pitfall in using formatC().
# OP's own formatting
df$val_chr <- formatC(df$val, big.mark=",")
df$date_chr <- format(df$date, "%b %y")
# copy columns to demonstrate DT formatting
df$val_dt <- df$val
df$date_dt <- df$date
# ISO 8601 year-month format as alternative
df$dat_iso <- format(df$date, "%Y-%m")
# create DT object and apply DT formatting
datatable(df) %>% formatCurrency("val_dt", "") %>% formatDate("date_dt", "toDateString")
Note that val_dt has been formatted nicely as expected and is right justified. In contrast, val_chr is left justified with the thousands separators not aligned. In addition, formatC() has recognized that val is of type double and has used the "g" format by default. According to the description of the formatparameter in ?formatC Default is "d" for integers, "g" for reals. So, we do get
formatC(12000L, big.mark=",")
#[1] "12,000"
but
formatC(12000, big.mark=",")
#[1] "1.2e+04"
Sorting by date_dt within the datatables object by clicking on the small arrows symbols at the right side of the column headers works as expected in contrast to date_chr. Unfortunately, the number of available methods for formatDate() is limited and doesn't include the desired month-year format. (There is a datetime plugin which converts date / time source data into one suitable for display but I haven't explored that in detail.)
Column date_iso shows the abbreviated ISO 8601 format YYYY-MM as a third option. This is my favoured format (which I do use alot also for aggregating by month) because
it always sorts correctly, even for several years,
it doesn't depend on the current locale, so it works in any language,
it is short while being unambiguous,
and it is an international standard.
Addendum
The formattable package does also have various formatter functions and can create DataTables:
library(formattable)
as.datatable(formattable(df))