R ggplot how to expand the interval on the horizontal axis - r

I have a dataset with 100 Timestamp points. While when I plot the chart, the horizontal axis indicates all time points, and so all time points were overlapped together. How to indicates some regular time points on the horizontal axis rather than show all of them?
EU
T_DCEP DCEP
1 05/02/2016 1:28 1.14596
2 05/02/2016 1:39 1.14684
3 05/02/2016 2:04 1.14488
4 05/02/2016 3:15 1.14820
5 05/02/2016 3:34 1.14750
6 05/02/2016 4:40 1.14915
7 05/02/2016 4:56 1.14849
8 05/02/2016 5:22 1.14913
9 05/02/2016 5:55 1.14761
10 05/02/2016 6:07 1.14821
. ... ..
My code:
ggplot(EU,aes(T_DCEP,DCEP, group = 1)) + geom_line()+geom_point()

The class of the variables matter when plotting. Convert to a valid date-time class to solve the problem:
#Example data
set.seed(451)
dates <- seq(Sys.time()-99, Sys.time(), length.out=100)
df <- data.frame(x=dates, y=rnorm(100))
head(df)
# x y
# 1 2016-11-04 09:49:09 -0.9431540
# 2 2016-11-04 09:49:10 0.7257408
# 3 2016-11-04 09:49:11 2.5257787
# 4 2016-11-04 09:49:12 1.1916054
# 5 2016-11-04 09:49:13 3.1091791
# 6 2016-11-04 09:49:14 0.2848636
class(df$x)
[1] "POSIXct" "POSIXt"
This example will plot correctly because it is a proper date-time class.
ggplot(df, aes(x, y, group=1)) + geom_point() + geom_line()
But if I did not have a proper date-time class, it would look like your example.
df$x <- as.character(dates)
ggplot(df, aes(x, y, group=1)) + geom_point() + geom_line()

Related

plot a categorial variable based on two other variables

So I have a data.frame which contains the columns date, price and a categorial variable.
> head(join)
date e5 near_motorway
1 2019-01-01 05:00:12 1.449 1
2 2019-01-01 05:00:12 1.439 1
3 2019-01-01 05:03:06 1.439 0
4 2019-01-01 05:03:06 1.439 1
5 2019-01-01 05:03:06 1.449 0
6 2019-01-01 05:03:06 1.449 1
I want to do draw two lines in one plot based on the categorial variable, with the hour of the date on the x axis and the price on the y axis.
Does anybody have a solution?
This should work:
library(ggplot2)
ggplot(data = join,
aes(x = date, y = e5, col = near_motorway, group = near_motorway) +
geom_line()
I am supposing you have date in a date format, e5 as numeric and near_motorway as factor. Also that e5 is price.
And to costume the graph you can play with scale_y_datetime and scale_colour_manual and with the prefer theme.

How do I only have x-axis labels that specify when the year is changed in R?

I've downloaded a couple of .csv files, and they look something like this, just a lot longer and the date continues until 2020-03-13.
Date Open High Low Close Adj.Close Volume
1 2015-03-13 2064.56 2064.56 2041.17 2053.40 2053.40 3498560000
2 2015-03-16 2055.35 2081.41 2055.35 2081.19 2081.19 3295600000
3 2015-03-17 2080.59 2080.59 2065.08 2074.28 2074.28 3221840000
4 2015-03-18 2072.84 2106.85 2061.23 2099.50 2099.50 4128210000
5 2015-03-19 2098.69 2098.69 2085.56 2089.27 2089.27 3305220000
6 2015-03-20 2090.32 2113.92 2090.32 2108.10 2108.10 5554120000
I've created a data frame that looks like this based on the data
Date t SandP AMD
1 0 1 0.000000000 0.000000000
2 2015-03-16 2 0.013442909 0.003629768
3 2015-03-17 3 -0.003325698 0.003616640
4 2015-03-18 4 0.012085102 -0.007246409
5 2015-03-19 5 -0.004884489 -0.003642991
6 2015-03-20 6 0.008972382 0.021661497
I am trying to graph the SandP and AMD columns on the same axis, however I only want the axis labels to show each year (when each year changes). Therefore I would only want the 6 ticks on the axis (2015,2016,2017,2018,2019,2020).
If it helps, the .csv files were downloaded from Yahoo Finance data for S&P500.
This is my code up to now:
SPdata <- read.csv("^GSPC.csv")
AMDdata <- read.csv("AMD.csv")
head(SPdata)
R_t <- function(t){
S=log(SPdata[t,6])-log(SPdata[t-1,6])
return(S)
}
S_t <- function(t){
S=log(AMDdata[t,6])-log(AMDdata[t-1,6])
return(S)
}
comparedata <- data.frame(0,1,0,0)
names(comparedata)[1]<-"Date"
names(comparedata)[2]<-"t"
names(comparedata)[3]<-"SandP"
names(comparedata)[4]<-"AMD"
t<-2
while(t<1260){
comparedata <-rbind(comparedata, list(AMDdata[t,1],t,R_t(t),S_t(t)))
t=t+1
}
# install.packages("ggplot2")
library("ggplot2")
ggplot() +
geom_line(data=comparedata, aes(x=Date,y=SandP),color="red",group=1)+
geom_line(data=comparedata, aes(x=Date,y=AMD), color="blue",group=1)+
labs(x="Date",y="Returns")
I think you need to use scale_x_date and set the argument date_breaks and date_labels (see the offficial documentation: https://ggplot2.tidyverse.org/reference/scale_date.html)
Here, I recreate an example using the small portion of the data you provided:
library(lubridate)
date <- seq(ymd("2015-03-16"), ymd("2020-03-13"), by = "day")
df <- data.frame(date = date,
t = 1:1825,
SandP = rnorm(1825),
AMD = rnorm(1825))
Starting from this example, I reshape the dataframe into a longer format using pivot_longer function from tidyr:
library(tidyr)
DF <- df %>% pivot_longer(cols = c(SandP, AMD), names_to = "indices", values_to = "values")
# A tibble: 3,650 x 4
date t indices values
<date> <int> <chr> <dbl>
1 2015-03-16 1 SandP 0.566
2 2015-03-16 1 AMD -0.185
3 2015-03-17 2 SandP -1.59
4 2015-03-17 2 AMD 0.236
5 2015-03-18 3 SandP 1.11
6 2015-03-18 3 AMD -1.52
7 2015-03-19 4 SandP -1.02
8 2015-03-19 4 AMD 0.0833
9 2015-03-20 5 SandP 2.78
10 2015-03-20 5 AMD -0.173
# … with 3,640 more rows
Then, I plot both indices according to the date using ggplot2:
library(ggplot2)
ggplot(DF, aes(x = date, y = values, color = indices))+
geom_line()+
labs(x="Date",y="Returns")+
scale_x_date(date_breaks = "year", date_labels = "%Y")
Does it look what you are trying to achieve ?

Extract day and month from date

I'm trying to extract only the day and the month from as.POSIXct entries in a dataframe to overlay multiple years of data from the same months in a ggplot.
I have the data as time-series objects ts.
data.ts<-read.zoo(data, format = "%Y-%m-%d")
ts<-SMA(data.ts[,2], n=10)
df<-data.frame(date=as.POSIXct(time(ts)), value=ts)
ggplot(df, aes(x=date, y=value),
group=factor(year(date)), colour=factor(year(date))) +
geom_line() +
labs(x="Month", colour="Year") +
theme_classic()
Now, obviously if I only use "date" in aes, it'll plot the normal time-series as a consecutive sequence across the years. If I do "day(date)", it'll group by day on the x-axis. How do I pull out day AND month from the date? I only found yearmon(). If I try as.Date(df$date, format="%d %m"), it's not doing anything and if I show the results of the command, it would still include the year.
data:
> data
Date V1
1 2017-02-04 113.26240
2 2017-02-05 113.89059
3 2017-02-06 114.82531
4 2017-02-07 115.63410
5 2017-02-08 113.68569
6 2017-02-09 115.72382
7 2017-02-10 114.48750
8 2017-02-11 114.32556
9 2017-02-12 113.77024
10 2017-02-13 113.17396
11 2017-02-14 111.96292
12 2017-02-15 113.20875
13 2017-02-16 115.79344
14 2017-02-17 114.51451
15 2017-02-18 113.83330
16 2017-02-19 114.13128
17 2017-02-20 113.43267
18 2017-02-21 115.85417
19 2017-02-22 114.13271
20 2017-02-23 113.65309
21 2017-02-24 115.69795
22 2017-02-25 115.37587
23 2017-02-26 114.64885
24 2017-02-27 115.05736
25 2017-02-28 116.25590
If I create a new column with only day and month
df$day<-format(df$date, "%m/%d")
ggplot(df, aes(x=day, y=value),
group=factor(year(date)), colour=factor(year(date))) +
geom_line() +
labs(x="Month", colour="Year") +
theme_classic()
I get such a graph for the two years.
I want it to look like this, only with daily data instead of monthly.
ggplot: Multiple years on same plot by month
You are almost there. As you want to overlay day and month based on every year, we need a continuous variable. "Day of the year" does the trick for you.
data <-data.frame(Date=c(Sys.Date()-7,Sys.Date()-372,Sys.Date()-6,Sys.Date()-371,
Sys.Date()-5,Sys.Date()-370,Sys.Date()-4,Sys.Date()-369,
Sys.Date()-3,Sys.Date()-368),V1=c(113.23,123.23,121.44,111.98,113.5,114.57,113.44, 121.23, 122.23, 110.33))
data$year = format(as.Date(data$Date), "%Y")
data$Date = as.numeric(format(as.Date(data$Date), "%j"))
ggplot(data=data, mapping=aes(x=Date, y=V1, shape = year, color = year)) + geom_point() + geom_line()
theme_bw()

Issue to have correct scale with date ggplot

I have two dataframes tur_e and tur_w. Below you can see the data frame:
tur_e:
Time_f turbidity_E
1 2014-12-12 00:00:00 87
2 2014-12-12 00:15:00 87
3 2014-12-12 00:30:00 91
4 2014-12-12 00:45:00 84
5 2014-12-12 01:00:00 92
6 2014-12-12 01:15:00 89
tur_w:
Time_f turbidity_w
47 2015-06-04 11:45:00 8.4
48 2015-06-04 12:00:00 10.5
49 2015-06-04 12:15:00 9.2
50 2015-06-04 12:30:00 9.1
51 2015-06-04 12:45:00 8.7
52 2015-06-04 13:00:00 8.4
I then create a unique dataframe combining turbidity_E and turbidity_w. I match with the date (time_f) and use melt to reshape data:
dplr <- left_join(tur_e, tur_w, by=c("Time_f"))
dt.df <- melt(dplr, measure.vars = c("turbidity_E", "turbidity_w"))
I plotted series of box plot over time. The code is below:
dt.df%>% mutate(Time_f = ymd_hms(Time_f)) %>%
ggplot(aes(x = cut(Time_f, breaks="month"), y = value)) +
geom_boxplot(outlier.size = 0.3) + facet_wrap(~variable, ncol=1)+labs(x = "time")
I obtain the following graph:
I would like to reduce the number of dates that appear in my x-axis. I add this line of code:
scale_x_date(breaks = date_breaks("6 months"),labels = date_format("%b"))
I got this following error:
Error: Invalid input: date_trans works with objects of class Date
only
I tried a lot of different solutions but no one work. Any help would be appreciate! Thanks!
Two things. First, you need to use scale_x_datetime (you don't have only dates, but also time!). Secondly, when you cut x, it actually just becomes a factor, losing any sense of time altogether. If you want a boxplot of each month, you can group by that cut instead:
dt.df %>% mutate(Time_f = lubridate::ymd_hms(Time_f)) %>%
ggplot(aes(x = Time_f, y = value, group = cut(Time_f, breaks="month"))) +
geom_boxplot(outlier.size = 0.3) +
facet_wrap(~variable, ncol = 1) +
labs(x = "time") +
scale_x_datetime(date_breaks = '1 month')

How to cut yearly time-based data into 36 parts with R?

I have a df like the following with 30 years until 2015. I want to cut every month into three data like 1-10, 11-20, and 21-31 and average all ten (less then ten) data. Thus, each month has three data. How can I do it?
1993-01-29 28.92189
1993-02-01 29.12760
1993-02-02 29.18927
1993-02-03 29.49786
1993-02-04 29.62128
1993-02-05 29.60068
1993-02-08 29.60068
1993-02-09 29.39498
------
------
2015-08-18 209.92999
2015-08-19 208.28000
2015-08-20 204.01000
2015-08-21 197.63001
2015-08-24 189.55000
2015-08-25 187.23000
2015-08-26 194.67999
2015-08-27 199.16000
2015-08-28 199.24000
tryCatch is for eliminate data start date problem. I will provide more info when i have time.
library(xts)
dates<-seq(as.Date("1993-01-29"),as.Date("2015-08-25"),"days")
sample<-rnorm(length(dates))
tmpxts<-split.xts(xts(x = sample,order.by = dates),f = "months")
mxts<-lapply(tmpxts,function(x) {
tmp<-data.frame(val=tryCatch(c(mean(x[1:10]),mean(x[11:20]),mean(x[21:length(x)])),
error=function(e) matrix(mean(x),1)))
row.names(tmp)<-tryCatch(index(x[c(1,11,21)]),error=function(e) index(x[1]))
tmp
})
do.call(rbind,mxts)
This is a base solution that builds cuts from an increasing sequence the cycles through years, months and your cuts at 1st, 11th and 21th of the month, The default for the base cut functions are to include the breaks as the "right-side" of intervals, but your specification required cuts at 1,11,and 21 (to leave 10, and 20 in the lower intervals) so I used right=TRUE:
tapply(dat$V2, cut.Date(dat$V1,
breaks=as.Date(
apply( expand.grid( c(1,11,21), 1:12, 1993:2015), 1,
function( x) paste(rev(x), collapse="-")) ), right=TRUE), FUN=mean)
1993-01-01 1993-01-11 1993-01-21 1993-02-01 1993-02-11 1993-02-21 1993-03-01
NA NA 29.02475 29.48412 NA NA NA
snipped many empty intervals
And the bottom of results included:
2015-07-21 2015-08-01 2015-08-11 2015-08-21 2015-09-01 2015-09-11 2015-09-21
NA NA 204.96250 193.97200 NA NA NA
2015-10-01 2015-10-11 2015-10-21 2015-11-01 2015-11-11 2015-11-21 2015-12-01
NA NA NA NA NA NA NA
2015-12-11
NA
The code below cuts each month separately into thirds, based on the number of days in each month.
library(dplyr)
library(lubridate)
library(ggplot2)
# Fake data
df = data.frame(date=seq.Date(as.Date("2013-01-01"),
as.Date("2013-03-31"), by="day"))
set.seed(394)
df$value = rnorm(nrow(df), sqrt(1:nrow(df)), 2)
# Cut months into thirds
df = df %>%
# Create a new column to group by Year-Month
mutate(yr_mon = paste0(year(date) , "_", month(date, label=TRUE, abbr=TRUE))) %>%
group_by(yr_mon) %>%
# Cut each month into thirds
mutate(cutMonth = cut(day(date),
breaks=c(0, round(1/3*n()), round(2/3*n()), n()),
labels=c("1st third","2nd third","3rd third")),
# Add yr_mon to cutMonth so that we have a unique group label for
# each third of each month
cutMonth = paste0(yr_mon, "\n", cutMonth)) %>%
ungroup() %>%
# Turn cutMonth into a factor with correct date ordering
mutate(cutMonth = factor(cutMonth, levels=unique(cutMonth)))
And here is the result:
# Show number of observations in each group
as.data.frame(table(df$cutMonth))
Var1 Freq
1 2013_Jan\n1st third 10
2 2013_Jan\n2nd third 11
3 2013_Jan\n3rd third 10
4 2013_Feb\n1st third 9
5 2013_Feb\n2nd third 10
6 2013_Feb\n3rd third 9
7 2013_Mar\n1st third 10
8 2013_Mar\n2nd third 11
9 2013_Mar\n3rd third 10
# Plot means by group (just to visualize the result of the date grouping operations)
ggplot(df, aes(cutMonth, value)) +
stat_summary(fun.y=mean, geom='point', size=4, colour="red") +
coord_cartesian(ylim=c(-0.2,10.2)) +
theme(axis.text.x = element_text(size=14))

Resources