I have this dataframe
df=data.frame("temp"=c(60.80,46.04,26.96,24.98),"humid"=c(93.79,53.33,50.34,54.65),"wind_speed"=c(40.27,39.12,14.96, 13.81), "date" =c("2013-01-31","2013-01-31","2013-02-02", "2013-02-02"))
df$date <- as.Date(df$date, "%Y-%m-%d")
View(df)
# after the code above, I need to do smth else with my data, then I transform the dataframe.
df_mod<- cbind(df[4], stack(df[1:3]))
df_mod$date<- as.data.frame(df_mod$date) %>%
mutate(Month = lubridate::month(df_mod$date, label = TRUE))
View(df_mod)
I want to create a column named 'Month'. the code above does what I want however, it renames my 'date' column as 'date$df_mod$date' and my new month column becomes named as 'date$Month'.
Is there a better way to do it?
Instead of as.data.frame(df_mod$date), use select(date).
library(dplyr)
library(lubridate)
df_mod %>%
select(date) %>%
mutate(Month = month(date, label = TRUE))
Or combine the select and mutate steps using transmute():
df_mod %>%
transmute(
date,
Month = month(date, label = TRUE)
)
Or use mutate() with .keep = "used":
df_mod %>%
mutate(
Month = month(date, label = TRUE),
.keep = "used"
)
Output from all three approaches:
date Month
1 2013-01-31 Jan
2 2013-01-31 Jan
3 2013-02-02 Feb
4 2013-02-02 Feb
5 2013-01-31 Jan
6 2013-01-31 Jan
7 2013-02-02 Feb
8 2013-02-02 Feb
9 2013-01-31 Jan
10 2013-01-31 Jan
11 2013-02-02 Feb
12 2013-02-02 Feb
Related
I am fairly new to R and I am trying to do the following task:
I have the following dataset:
df1 <- data.frame(ITEM = c("A","A","A","A","A","B","B","B","B","B"),
Date = c("Jan-2020","Feb-2020","May-2020","Jun-2020","Jul-2020","Jan-2020","Apr-2020","Jun-2020","Jul-2020","Aug-2020"))
Here is an image:
I have used the library "zoo" to change the date column into yearmon and I am trying to create rows for the missing "yearmon" dates. So something like this:
Anyone has any idea how I can do this?
Thank you
You can create a sequence of yearmon objects for each ITEM and use it in complete.
library(dplyr)
library(zoo)
library(tidyr)
df1 %>%
mutate(Date = as.yearmon(Date, '%b-%Y')) %>%
group_by(ITEM) %>%
complete(Date = seq(min(Date), max(Date), 1/12)) %>%
ungroup
# ITEM Date
# <chr> <yearmon>
# 1 A Jan 2020
# 2 A Feb 2020
# 3 A Mar 2020
# 4 A Apr 2020
# 5 A May 2020
# 6 A Jun 2020
# 7 A Jul 2020
# 8 B Jan 2020
# 9 B Feb 2020
#10 B Mar 2020
#11 B Apr 2020
#12 B May 2020
#13 B Jun 2020
#14 B Jul 2020
#15 B Aug 2020
If you want a sequence of date objects you can use :
df1 %>%
mutate(Date = as.Date(as.yearmon(Date, '%b-%Y'))) %>%
group_by(ITEM) %>%
complete(Date = seq(min(Date), max(Date), 'month')) %>%
ungroup()
I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).
Data:
df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,
1588291200, 1588377600, 1588118400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29
Here is how it looks like sorting a formatted date variable:
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(Date)
# Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020
So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.
df %>%
mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>%
arrange(Date)
# Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020
Edit:
Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.
When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:
df %>%
arrange(Date) %>%
mutate(n = 1:n(),
Date = format(Date, "%d %b %Y")) %>%
ggplot() +
geom_bar(aes(x = n)) +
facet_wrap(~Date)
If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.
library(dplyr)
library(ggplot2)
df %>%
arrange(Date) %>%
mutate(n = row_number(),
Date = format(Date, "%d %b %Y"),
Date = factor(Date, levels = unique(Date))) %>%
ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)
My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:
df %>%
arrange(Date) %>%
mutate(Date = format(Date, "%d %b %Y"))
Giving:
Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020
Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(as.Date(Date, format = "%d %b %Y"))
Personally, I prefer the tidyverse solution for dates - lubridate. Here:
library(lubridate)
df %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,
ymd_hms("20150102 12:23:01")
As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.
I have the data-frame called dates which looks like this:
Day Month Year
2 April 2015
5 May 2014
23 December 2017
This code is:
date <- data.frame(Day = c(2,5,23),
Month = c("April", "May", "December"),
Year = c(2015, 2014, 2017))
I want to create a new column that looks like this:
Day Month Year Date
2 April 2015 2/4/2015
5 May 2014 5/5/2014
23 December 2017 23/12/2017
To do this, I tried:
data <- data %>%
mutate(Date = as.Date(paste(Day, Month, Year, sep = "/"))) %>%
dmy()
But I got an error which says:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there an obvious error that I'm not seeing?
Thank you so much.
We need to use appropriate format in as.Date. Using base R, we can do
transform(data, Date = as.Date(paste(Day, Month, Year, sep = "/"), "%d/%B/%Y"))
# Day Month Year Date
#1 2 April 2015 2015-04-02
#2 5 May 2014 2014-05-05
#3 23 December 2017 2017-12-23
Or with dplyr and lubridate
library(dplyr)
library(lubridate)
data %>% mutate(Date = dmy(paste(Day, Month, Year, sep = "/")))
You can add format(Date, "%d/%m/%Y") if you need to change the display format.
I have a dataset over some departments (dep. number), and in which timeframe a certain survey was made in that department. It looks like this
dep type inDate outDate
1 14 AA 2015-01-16 2015-04-25
2 10 AB 2014-05-01 2017-01-01
3 14 BA 2013-01-04 2015-04-06
4 11 CA 2016-09-10 2017-12-01
5 10 DD 2013-01-01 2013-12-01
...
Also i have a startYear = 2013
and an endYear = 2017
for when the surveys started and ended globally.
I want a plot for each of the departments. These plots should show how many surveys were active in the period between the startDate and endDate. So for department 14, the plot should look like this
Can someone just point me in the right direction, i don't even know where to start?
df = read.table(text = "
dep type inDate outDate
1 14 AA 2015-01-16 2015-04-25
2 10 AB 2014-05-01 2017-01-01
3 14 BA 2013-01-04 2015-04-06
4 11 CA 2016-09-10 2017-12-01
5 10 DD 2013-01-01 2013-12-01
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
df %>%
mutate_at(vars(inDate, outDate), ymd) %>% # update date columns to date format (if needed)
mutate(dep = factor(dep)) %>% # update dep to factor (if it is not)
group_by(dep, id = row_number()) %>% # for every row
nest() %>% # nest data
mutate(dates = map(data, ~seq(.x$inDate, .x$outDate, "1 day"))) %>% # create a sequence of dates
unnest(dates) %>% # add that sequence of dates as column
count(dep, dates) %>% # count live projects each day
complete(dep, dates, fill = list(n = 0L)) %>% # add zeros to days that surveys weren't live
ggplot(aes(dates, n, group=dep, col=dep))+ # plot
geom_line()+ # add line
facet_wrap(~dep) # one plot for each department
You can remove +facet_wrap(~dep) if you want all departments in the same plot.
I have a column with date formatted as MM-DD-YYYY, in the Date format.
I want to add 2 columns one which only contains YYYY and the other only contains MM.
How do I do this?
Once again base R gives you all you need, and you should not do this with sub-strings.
Here we first create a data.frame with a proper Date column. If your date is in text format, parse it first with as.Date() or my anytime::anydate() (which does not need formats).
Then given the date creating year and month is simple:
R> df <- data.frame(date=Sys.Date()+seq(1,by=30,len=10))
R> df[, "year"] <- format(df[,"date"], "%Y")
R> df[, "month"] <- format(df[,"date"], "%m")
R> df
date year month
1 2017-12-29 2017 12
2 2018-01-28 2018 01
3 2018-02-27 2018 02
4 2018-03-29 2018 03
5 2018-04-28 2018 04
6 2018-05-28 2018 05
7 2018-06-27 2018 06
8 2018-07-27 2018 07
9 2018-08-26 2018 08
10 2018-09-25 2018 09
R>
If you want year or month as integers, you can wrap as as.integer() around the format.
A base R option would be to remove the substring with sub and then read with read.table
df1[c('month', 'year')] <- read.table(text=sub("-\\d{2}-", ",", df1$date), sep=",")
Or using tidyverse
library(tidyverse)
separate(df1, date, into = c('month', 'day', 'year') %>%
select(-day)
Note: it may be better to convert to datetime class instead of using the string formatting.
df1 %>%
mutate(date =mdy(date), month = month(date), year = year(date))
data
df1 <- data.frame(date = c("05-21-2017", "06-25-2015"))