How to format dates in data.frame and ggplot? [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have the following data frame:
df <- data.frame(A=c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"),
B=c(12.5, 24.5, 23.4, 45.0, 12.0))
## + > df
## A B
## 1 2019-01 12.5
## 2 2019-02 24.5
## 3 2019-03 23.4
## 4 2019-04 45.0
## 5 2019-05 12.0
Where column A contains dates (YYYY-MM) and column B the observations corresponding to those dates.
I want to plot the graph using ggplot2 and I need the dates to display along the X axis with the format MM-YY, e.g. Jan-19.

Please consider giving us a reproducible example next time and stating your problem in a much more precise way.
Nonetheless, with the information you provide, you could do something as follows:
First, let's make a vector with the correct date data to avoid further problems. I'm assuming your observations are done the first day of the month, so your vector A would be something as:
A <- paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01")
If we create directly the data.frame it would be something as:
df <- data.frame(A=paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01"),
B=c(12.5, 24.5, 23.4, 45.0, 12.0), stringsAsFactors = FALSE)
Or using magrittr:
library(magrittr)
df <- paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01") %>%
data.frame(A=.,B=c(12.5, 24.5, 23.4, 45.0, 12.0), stringsAsFactors = FALSE)
Then we format A as date:
df$A <- as.Date(df$A, format="%Y-%m-%d")
To plot it you should do something like:
library(ggplot2)
ggplot(data = df, aes(x=A, y=B)) +
geom_line() +
scale_x_date(date_labels = "%b-%y", date_breaks = "1 month") +
theme_light() +
labs(x="time") +
theme(legend.position = "bottom")
Hope it helps
PS: check out this post regarding date formats

Related

Time Series plot.ts and x label

I have .csv file with quarters in first col (like 200901, 200902 etc) or I can have them as a row names. In other cols I have some common statistical data (like inflation rate 102.5; 101.5 etc).
The problem is that function plot.ts doesn't show the quarters in x label. Althoug I see nice 7 plots in one card.
My code is simple:
require(ggplot2)
plot.ts(abc, xlab = abc$quarters)
abc - my file with data, abc$quarters - col with number of quarters.
Maybe other function will be better here, but I get annoyed just for thinking it's very close to quite an easy solution.
As comments say, plot.ts isn't a ggplot2 function. Here's an example of what you may be looking for in ggplot2:
library(stocks)
library(tidyverse)
getSymbols("AMZN", src="yahoo", from="2016-07-01")
data.frame(AMZN) %>%
rownames_to_column() %>%
mutate(
rowname = as.Date(rowname, format="%Y-%m-%d")
) %>%
ggplot() +
geom_line(aes(rowname, AMZN.Close)) +
scale_x_date(expand = expand_scale(0), minor_breaks = NULL,
date_breaks = "3 months", date_labels = "%m-%Y") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
I'd suggest working with ggplot2 functions for plotting time series over something like plot.ts since it'd likely be more flexible.
Thanks for answers, they lead me to the following solution:
dane_2$abc2 dane_2$abc3 dane_2$abc4
[1,] 103.5 19.9 37.3
[2,] 103.4 19.5 35.2
[3,] 103.0 25.1 34.7
View(dane_2)
# In dane_2$abc1 I have own numbers of quarters like 20094, so I skip it.
tseries <- ts(dane_2[ ,-1], start = c(2009, 4), frequency = 4)
par(mfrow=c(1,3))
plot(tseries)

Parsing complicated date text in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I would like to extract all the dates from some text content. The content consists of date text like this:
21, 17, 16, 12, 10, 6, 5, 3 June 2019, 30 and 28, 27 May 2019
I expect to keep all the dates in a list() like this:
c("2019-06-21", "2019-06-17", "2019-06-16", "2019-06-12", "2019-06-10", "2019-06-06", "2019-06-05", "2019-06-03", "2019-05-30", "2019-05-28", "2019-05-27")
Is that possible to do that? Thanks.
To complement #Oliver answer, there is a solution which uses stringr and lubridate packages and implements quite simple regular expressions.
First of all, find the month-year blocks (like "June 2019"):
mny_loc_list <- str_locate_all(date_string,
paste0("\\b(", paste(month.name, collapse = "|"), ")", "\\s*\\d{4}"))
print(mny_loc_list)
> > mny_loc_list
[[1]]
start end
[1,] 29 38
[2,] 55 62
[3,] 72 81
Note, please, that the built-in month.name vector should correspond to the month names in your original dates string. Inconsistencies may be fixed with proper setting of the locales or by setting a month names vector manually.
Then, create a function to transform the dates corresponding to each month-year block to the calendar dates:
ExtractForMonth <- function(list_entry, string_entry) {
# define the end of a previous month-year block
if (string_entry %in% 1) {
block_begin <- 1
} else {
# take the end of a previous entry if it is not the first block
block_begin <- list_entry[(string_entry - 1), 2] + 1
}
n_day <- str_sub(date_string, block_begin, list_entry[string_entry, 1] - 1)
month_year <- str_sub(date_string,
list_entry[string_entry, 1], list_entry[string_entry, 2])
day_date <- str_extract_all(n_day, "\\b\\d+?\\b")
date_final <- paste0(unlist(day_date), " ", month_year)
return(lubridate::dmy(date_final))
}
Finally, apply this function to each pair of the month-year block locations:
dates_list <- lapply(
function(i) ExtractForMonth(list_entry = mny_loc_list[[1]],
string_entry = i),
X = seq(to = nrow(mny_loc_list[[1]])))
print(dates_list)
[[1]]
[1] "2019-06-21" "2019-06-17" "2019-06-16" "2019-06-12" "2019-06-10"
[6] "2019-06-06" "2019-06-05" "2019-06-03"
[[2]]
[1] "2019-05-30" "2019-05-28" "2019-05-27"
As pointed out in the comment, the simple answer is to clean the data into a format which R understands. If the data is imported from another software, it is often (if not always) easier from that software rather than R.
That said, it is always possible to translate, although for these tasks it has to be done manually. Below is an illustration of how this could be achieved in R, using only the base package.
dates <- '21, 17, 16, 12, 10, 6, 5, 3 June 2019, 30 and 28, 27 May 2019'
#split on ', ' and ' and '
split_dates <- strsplit(dates, ", | and ", perl = TRUE)[[1]]
#Find the dats which contain months and year
long_dates <- which(nchar(split_dates) > 2)
#Function to format dates
make_dates <- function(string){
string <- unlist(strsplit(string, " "))
nString <- length(string)
year <- string[nString]
month <- string[nString - 1]
as.Date(paste0(year, month, string[seq(nString - 2)]), format = '%Y%B%d')
}
#Date vector for output
output_Dates <- integer(length(split_dates))
class(output_Dates) <- "Date"
j <- 0
for(i in long_dates){
output_Dates[j:i] <- make_dates(split_dates[j:i])
j <- i + 1
}
output_Dates
[1]"2019-06-21" "2019-06-17" "2019-06-16" "2019-06-12" "2019-06-10" "2019-06-06" "2019-06-05" "2019-06-03" "2019-05-30" "2019-05-28" "2019-05-27"
Note that you seem to be lacking 2019-05-30 in your expected output for it to be consistent.

R: aggregate by date - (every 30min mean) [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I have been struggling with this for a while now:
I have a data frame that contains 5-minute measurements (for around 6 months) of different parameters. I want to aggregate them and get the mean of every parameter every 30 min. Here is a short example:
TIMESTAMP <- c("2015-12-31 0:30", "2015-12-31 0:35","2015-12-31 0:40", "2015-12-31 0:45", "2015-12-31 0:50", "2015-12-31 0:55", "2015-12-31 1:00", "2015-12-31 1:05", "2015-12-31 1:10", "2015-12-31 1:15", "2015-12-31 1:20", "2015-12-31 1:25", "2015-12-31 1:30")
value1 <- c(45, 50, 68, 78, 99, 100, 5, 9, 344, 10, 45, 68, 33)
mymet <- as.data.frame(TIMESTAMP, value1)
mymet$TIMESTAMP <- as.POSIXct(mymet$TIMESTAMP, format = "%Y-%m-%d %H:%M")
halfhour <- aggregate(mymet, list(TIME = cut(mymet$TIMESTAMP, breaks = "30 mins")),
mean, na.rm = TRUE)
What I want to get is the average between 00:35 and 1:00 and call this DATE-1:00AM, however, what I get is: average between 00:30 and 00:55 and this is called DATE-12:30am.
How can I change the function to give me the values that I want?
The trick (I think) is looking at when your first observation starts. If the first observation is 00:35 and you do the 30 minute cut then the intervals should follow the logic you want. Regarding the name of the Breaks it's just a matter of adding 25 minutes to the name and then you get what you want. Here is an example for 6 months of 2015:
require(lubridate)
require(dplyr)
TIMESTAMP <- seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2015-06-01 23:55'), by = '5 min')
TIMESTAMP <- data.frame(obs=1:length(TIMESTAMP),TS=TIMESTAMP)
TIMESTAMP <- TIMESTAMP[-(1:7),] #TO start with at 00:35 minutes
TIMESTAMP$Breaks <- cut(TIMESTAMP$TS, breaks = "30 mins")
TIMESTAMP$Breaks <- ymd_hms(as.character(TIMESTAMP$Breaks)) + (25*60)
Averages <- TIMESTAMP %>% group_by(Breaks) %>% summarise(MeanObs=mean(obs,na.rm = TRUE))
If you get mymet constructed properly, you can cut TIMESTAMP into bins (which you can do with cut.POSIXt) so you can aggregate:
mymet$half_hour <- cut(mymet$TIMESTAMP, breaks = "30 min")
aggregate(value1 ~ half_hour, mymet, mean)
## half_hour value1
## 1 2015-12-31 00:30:00 73.33333
## 2 2015-12-31 01:00:00 80.16667
## 3 2015-12-31 01:30:00 33.00000
Data
mymet <- structure(list(TIMESTAMP = structure(c(1451539800, 1451540100,
1451540400, 1451540700, 1451541000, 1451541300, 1451541600, 1451541900,
1451542200, 1451542500, 1451542800, 1451543100, 1451543400), class = c("POSIXct",
"POSIXt"), tzone = ""), value1 = c(45, 50, 68, 78, 99, 100, 5,
9, 344, 10, 45, 68, 33)), .Names = c("TIMESTAMP", "value1"), row.names = c(NA,
-13L), class = "data.frame")

Extracting a point from ggplot and plot it

I am initially having the dataset as shown below:
ID A B Type Time Date
1 12 13 R 23:20 1-1-01
1 13 12 F 23:40 1-1-01
1 13 11 F 00:00 2-1-01
1 15 10 R 00:20 2-1-01
1 12 06 W 00:40 2-1-01
1 11 09 F 01:00 2-1-01
1 12 10 R 01:20 2-1-01
so on...
I tried to make the ggplot of the above dataset for A and B.
ggplot(data=dataframe, aes(x=A, y=B, colour = Type)) +geom_point()+geom_path()
Problem:
HOW do I add a subsetting variable that looks at the first 24 hours after the every 'F' point.
For the time being I have posted a continuous data set [with respect to time] but my original data set is not continuous. How can I make my data set continuous in a interval of 10 mins? I have used interpolation xspline() function on A and B but I don't know how to make my data set continuous with respect to time,
The highlighted part shown below is what I am looking for, I want to extract this dataset and then plot a new ggplot:
From MarkusN plots this is what I am looking for:
Taking first point as 'F' point and traveling 24hrs from that point (Since there is no 24 hrs data set available here so it should produce like this) :
I've tried the following, maybe you can get an idea from here. I recommend you to first have a variable with the time ordered (either in minutes or hours, in this example I've used hours). Let's see if it helps
#a data set is built as an example
N = 100
set.seed(1)
dataframe = data.frame(A = cumsum(rnorm(N)),
B = cumsum(rnorm(N)),
Type = sample(c('R','F','W'), size = N,
prob = c(5/7,1/7,1/7), replace=T),
time.h = seq(0,240,length.out = N))
# here, a list with dataframes is built with the sequences
l_dfs = lapply(which(dataframe$Type == 'F'), function(i, .data){
transform(subset(.data[i:nrow(.data),], (time.h - time.h[1]) <= 24),
t0 = sprintf('t0=%4.2f', time.h[1]))
}, dataframe)
ggplot(data=do.call('rbind', l_dfs), aes(x=A, y=B, colour=Type)) +
geom_point() + geom_path(colour='black') + facet_wrap(~t0)
First I created sample data. Hope it's similar to your problem:
df = data.frame(id=rep(1:9), A=c(12,13,13,14,12,11,12,11,10),
B=c(13,12,10,12,6,9,10,11,12),
Type=c("F","R","F","R","W","F","R","F","R"),
datetime=as.POSIXct(c("2015-01-01 01:00:00","2015-01-01 22:50:00",
"2015-01-02 08:30:00","2015-01-02 23:00:00",
"2015-01-03 14:10:00","2015-01-05 16:30:00",
"2015-01-05 23:00:00","2015-01-06 17:00:00",
"2015-01-07 23:00:00")),
stringsAsFactors = F)
Your first question is to plot the data, highlighting the first 24h after an F-point. I used dplyr and ggplot for this task.
library(dplyr)
library(ggplot)
df %>%
mutate(nf = cumsum(Type=="F")) %>% # build F-to-F groups
group_by(nf) %>%
mutate(first24h = as.numeric((datetime-min(datetime)) < (24*3600))) %>% # find the first 24h of each F-group
mutate(lbl=paste0(row_number(),"-",Type)) %>%
ggplot(aes(x=A, y=B, label=lbl)) +
geom_path(aes(colour=first24h)) + scale_size(range = c(1, 2)) +
geom_text()
The problem here is, that the colour only changes at some points. One thing I'm not happy with is the use of different line colors for path sections. If first24h is a discrete variable
geom_path draws two sepearate paths. That's why I defined the variable as numeric. Maybe someone can improve this?
Your second question about an interpolation can easily be solved with the zoo package:
library(zoo)
full.time = seq(df$datetime[1], tail(df$datetime, 1), by=600) # new timeline with point at every 10 min
d.zoo = zoo(df[,2:3], df$datetime) # convert to zoo object
d.full = as.data.frame(na.approx(d.zoo, xout=full.time)) # interpolate; result is also a zoo object
d.full$datetime = as.POSIXct(rownames(d.full))
With these two dataframes combined, you get the solution. Every F-F section is drawn in a separate plot and only the points not longer than 24h after the F-point is shown.
df %>%
select(Type, datetime) %>%
right_join(d.full, by="datetime") %>%
mutate(Type = ifelse(is.na(Type),"",Type)) %>%
mutate(nf = cumsum(Type=="F")) %>%
group_by(nf) %>%
mutate(first24h = (datetime-min(datetime)) < (24*3600)) %>%
filter(first24h == TRUE) %>%
mutate(lbl=paste0(row_number(),"-",Type)) %>%
filter(first24h == 1) %>%
ggplot(aes(x=A, y=B, label=Type)) +
geom_path() + geom_text() + facet_wrap(~ nf)

R Need to extract month and assign season [duplicate]

This question already has answers here:
Find which season a particular date belongs to
(11 answers)
Closed 8 years ago.
I am using R, and I need to set up a loop (I think) where I extract the month from the date and assign a season. I would like to assign winter to months 12, 1, 2; spring to 3, 4, 5; summer to 6, 7, 8; and fall to 9, 10, 11. I have a subset of the data below. I am awful with loops and couldn't figure it out. Also for the date, I wasn't sure how packages like lubridate would work
"","UT_TDS_ID_2011.Monitoring.Location.ID","UT_TDS_ID_2011.Activity.Start.Date","UT_TDS_ID_2011.Value","UT_TDS_ID_2011.Season"
"1",4930585,"7/28/2010 0:00",196,""
"2",4933115,"4/21/2011 0:00",402,""
"3",4933115,"7/23/2010 0:00",506,""
"4",4933115,"6/14/2011 0:00",204,""
"8",4933115,"12/3/2010 0:00",556,""
"9",4933157,"11/18/2010 0:00",318,""
"10",4933157,"11/6/2010 0:00",328,""
"11",4933157,"7/23/2010 0:00",290,""
"12",4933157,"6/14/2011 0:00",250,""
Regarding the subject/title of the question, its actually possible to do this without extracting the month. The first two solutions below do not extract the month. There is also a third solution which does extract the month but only to increment it.
1) as.yearqtr/as.yearmon Convert the dates to year/month and add one month (1/12). Then the calendar quarters correspond to the seasons so convert to year/quarter, yq, and label the quarters as shown:
library(zoo)
yq <- as.yearqtr(as.yearmon(DF$dates, "%m/%d/%Y") + 1/12)
DF$Season <- factor(format(yq, "%q"), levels = 1:4,
labels = c("winter", "spring", "summer", "fall"))
giving:
dates Season
1 7/28/2010 summer
2 4/21/2011 spring
3 7/23/2010 summer
4 6/14/2011 summer
5 12/3/2010 winter
6 11/18/2010 fall
7 11/6/2010 fall
8 7/23/2010 summer
9 6/14/2011 summer
1a) A variation of this is to use chron's quarters which produces a factor so that levels=1:4 does not have to be specified. To use chron replace the last line in (1) with:
library(chron)
DF$Season <- factor(quarters(as.chron(yq)),
labels = c("winter", "spring", "summer", "fall"))
chron could also be used in conjunction with the remaining solutions.
2) cut. This solution only uses the base of R. First convert the dates to the first of the month using cut and add 32 to get a date in the next month, d. The quarters corresponding to d are the seasons so compute the quarters using quarters and construct the labels in the same fashion as the first answser:
d <- as.Date(cut(as.Date(DF$dates, "%m/%d/%Y"), "month")) + 32
DF$Season <- factor(quarters(d), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
giving the same answer.
3) POSIXlt This solution also uses only the base of R:
p <- as.POSIXlt(as.Date(DF$dates, "%m/%d/%Y"))
p$day <- 1
p$mo <- p$mo+1
DF$Season <- factor(quarters(p), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
Note 1: We could optionally omit levels= in all these solutions if we knew that every season appears.
Note 2: We used this data frame:
DF <- data.frame(dates = c('7/28/2010', '4/21/2011', '7/23/2010',
'6/14/2011', '12/3/2010', '11/18/2010', '11/6/2010', '7/23/2010',
'6/14/2011'))
Using only base R, you can convert the "datetime" column to "Date" class (as.Date(..)), extract the "month" (format(..., '%m')) and change the character value to numeric (as.numeric(). Create an "indx" vector that have values from "1" to "12", set the names of the values according to the specific season (setNames(..)), and use this to get the corresponding "Season" for the "months" vector.
months <- as.numeric(format(as.Date(df$datetime, '%m/%d/%Y'), '%m'))
indx <- setNames( rep(c('winter', 'spring', 'summer',
'fall'),each=3), c(12,1:11))
df$Season <- unname(indx[as.character(months)])
df
# datetime Season
#1 7/28/2010 0:00 summer
#2 4/21/2011 0:00 spring
#3 7/23/2010 0:00 summer
#4 6/14/2011 0:00 summer
#5 12/3/2010 0:00 winter
#6 11/18/2010 0:00 fall
#7 11/6/2010 0:00 fall
#8 7/23/2010 0:00 summer
#9 6/14/2011 0:00 summer
Or as #Roland mentioned in the comments, you can use strptime to convert the "datetime" to "POSIXlt" and extract the month ($mon)
months <- strptime(df$datetime, format='%m/%d/%Y %H:%M')$mon +1
and use the same method as above
data
df <- data.frame(datetime = c('7/28/2010 0:00', '4/21/2011 0:00',
'7/23/2010 0:00', '6/14/2011 0:00', '12/3/2010 0:00', '11/18/2010 0:00',
'11/6/2010 0:00', '7/23/2010 0:00', '6/14/2011 0:00'),stringsAsFactors=FALSE)

Resources