I'm new to R. I have daily time series data on sap flux and want to plot line graph in R and want to format x-axis for date .my data file is like this;
Date G1T0 G1T1 G1T2 G1T3
19-Jul-14 0.271081377 0.342416929 0.216215197 0.414495265
20-Jul-14 0.849117059 0.778333568 0.555856888 0.375737302
21-Jul-14 0.742855108 0.756373483 0.536025029 0.255169809
22-Jul-14 0.728504928 0.627172734 0.506561041 0.244863511
23-Jul-14 0.730702865 0.558290192 0.452253842 0.223213402
24-Jul-14 0.62732916 0.461480279 0.377567279 0.180328992
25-Jul-14 0.751401513 0.5404663 0.517567416 0.204342317
Please help me by sample R script.
You can try this:
# install.packages("ggplot2")
# install.packages("scales")
library(ggplot2)
library(scales)
data$Date <- as.Date(data$Date, format = "%d-%b-%y")
ggplot(data = data, x = Date) +
geom_line(aes(x = Date, y = G1T0, col = "G1T0")) +
geom_line(aes(x = Date, y = G1T1, col = "G1T1")) +
geom_line(aes(x = Date, y = G1T2, col = "G1T2")) +
geom_line(aes(x = Date, y = G1T3, col = "G1T3")) +
scale_colour_discrete(name = "Group") +
labs(y = "Measurement", x = "Date")
What this does is loads a couple of packages to do the plot (obviously, if you don't have those packages, install them), then it formats your date so R knows they're dates (the format argument I used matched your particular string pattern), then it calls the ggplot function to map your data.
Does this work for you?
Related
I am currently trying to generate NOAA tide prediction charts (x = datetime, y = water level) with the dawn/sunrise/dusk/sunset times as vertical lines along the x axis timeline.
The rnoaa package calls the data and gives me the prediction date times in POSIXct. The suncalc library provides me a data frame with each date in the range's sunrise, sunset, etc. in POSIXct format as well.
library(rnoaa)
library(tidyverse)
library(ggplot2)
library(suncalc)
march.tides <- as.data.frame(coops_search(station_name = 8551762,
begin_date = 20200301, end_date = 20200331,
datum = "mtl", product = "predictions"))
march.tides <- march.tides %>%
mutate(yyyy.mm.dd = as.Date(predictions.t))
dates <- unique(march.tides$yyyy.mm.dd)
sunlight.times <- getSunlightTimes(date = seq.Date(as.Date("2020/3/1"), as.Date("2020/3/31"), by = 1),
lat = 39.5817, lon = -75.5883, tz = "EST")
I then have a loop that spits out separate plots for each calendar date - which works hunky dory. The vertical lines are drawing on the graph without an error, but are definitely in the wrong spot (sunrise is being drawn around 11am when it should be 06:30).
for (i in seq_along(dates)) {
plot <- ggplot(subset(march.tides, march.tides$yyyy.mm.dd==dates[i])) +
aes(x = predictions.t, y = predictions.v) +
geom_line(size = 1L, colour = "#0c4c8a") +
theme_bw() +
geom_vline(xintercept = sunlight.times$sunrise) +
geom_vline(xintercept = sunlight.times$sunset) +
geom_vline(xintercept = sunlight.times$dawn, linetype="dotted") +
geom_vline(xintercept = sunlight.times$dusk, linetype="dotted") +
ggtitle(dates[i])
print(plot)
}
I could alternatively facet the separate dates instead of this looping approach. Even when I subset the data to a single date, the vertical lines still did not draw correctly.
I wondered if maybe the issue was a time zone one. If I try to stick a time zone argument onto the tide prediction data call, I get the error:
Error in if (!repeated && grepl("%[[:xdigit:]]{2}", URL, useBytes = TRUE)) return(URL) :
missing value where TRUE/FALSE needed
It looks like you want to use EST as your timezone, so you could include in your conversion of predictions.t.
I would be explicit in what you want labeled on your xaxis in ggplot using scale_x_datetime, including the timezone.
library(rnoaa)
library(tidyverse)
library(ggplot2)
library(suncalc)
library(scales)
march.tides <- as.data.frame(coops_search(station_name = 8551762,
begin_date = 20200301, end_date = 20200331,
datum = "mtl", product = "predictions"))
march.tides <- march.tides %>%
mutate(yyyy.mm.dd = as.Date(predictions.t, tz = "EST"))
dates <- unique(march.tides$yyyy.mm.dd)
sunlight.times <- getSunlightTimes(date = seq.Date(as.Date("2020/3/1"), as.Date("2020/3/31"), by = 1),
lat = 39.5817, lon = -75.5883, tz = "EST")
for (i in seq_along(dates)) {
plot <- ggplot(subset(march.tides, march.tides$yyyy.mm.dd==dates[i])) +
aes(x = predictions.t, y = predictions.v) +
geom_line(size = 1L, colour = "#0c4c8a") +
theme_bw() +
geom_vline(xintercept = sunlight.times$sunrise) +
geom_vline(xintercept = sunlight.times$sunset) +
geom_vline(xintercept = sunlight.times$dawn, linetype="dotted") +
geom_vline(xintercept = sunlight.times$dusk, linetype="dotted") +
ggtitle(dates[i]) +
scale_x_datetime(labels = date_format("%b %d %H:%M", tz = "EST"))
print(plot)
}
Plot
I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))
I'm wondering if there is any easy way to change the name in a legend (given using the colour aesthetic) on a ggplot after the plot is created. I know this feels a bit hacky and would normally be changed in the data or when the plot is created, but I want to change the label on a plot that is created by another package, and there's no option in the package to change it.
I could obviously copy the function and save my own version and change it, but I just want to change one thing so it seems neater if I can just do it afterwards.
Here is an example with some dummy data, basically I want to relabel the Mean and Median timeseries that come out of fasstr's plot_daily_stats to "Modelled Mean" and "Modelled Median" so they cannot be confused with the observed mean which I am manually adding.
library(fasstr)
library(tibble)
library(ggplot2)
#create some fake data
df <- tibble(Date = seq.Date(from = as.Date("1991-01-01"), as.Date("1997-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(2557,0,1) + 50 + (cos((1/60)*DayOfYear)+4))
obsdf <- tibble(Date = seq.Date(from = as.Date("1900-01-01"), as.Date("1900-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(365,0,1) + 51 + (cos((1/60)*DayOfYear)+4))
# create plot using fasstr package
plt1<- fasstr::plot_daily_stats(df)
# add my own trace. I also want to rename the trace "Mean" to
# "Modelled Mean" to avoid confusion (and same with Median)
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(values = c("red", "black","blue"))
The names are given in fasstr as hard coded names:
daily_plots <- ... +
ggplot2::geom_line(ggplot2::aes(y = Median, colour = "Median")) +
ggplot2::geom_line(ggplot2::aes(y = Mean, colour = "Mean"))
No hacking needed, just add labels to your manual scale.
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(labels = c("Modelled Mean","Modelled Median","Observed Mean"),
values = c("red", "black","blue"))
I am trying to make a graph with "time markers". These time markers are vertical lines for certain dates. Time data are POSIXct format. I would like to use the awesome interactive interface of Plotly and use my ggplot objects in it.
The problem is that these "time markers" doesn't show in after using ggplotly(). I ave already tried with plotly::add_segments() but it does not work.
Here are two reproductible examples :
1. With non-POSIXct data it works fine
# dummy dataset
df2 = data.frame(id = 1:10, measure = runif(10, 0, 20))
events2 = data.frame(number = c(2,3,8))
# ggplot graph
p2 = ggplot() + geom_line(data = df2, aes(x = id, y = measure)) +
geom_vline(data = events2, aes(xintercept = events2$number), color = "red")
p2
# plotly graph that displays the geom_vline properly
ggplotly(p2)
2. With POSIXct data is doesn't display the correct result
# dummy dataset
df = data.frame(date = seq(as.POSIXct("2017-07-01", tz = "UTC", format = "%Y-%m-%d"),
as.POSIXct("2018-04-15", tz = "UTC", format = "%Y-%m-%d"),
"1 month"),
measure = runif(10, 0, 20))
events = data.frame(date_envents = as.POSIXct(c("2017-10-12", "2017-11-12", "2018-03-15"), tz = "UTC", format = "%Y-%m-%d"))
# ggplot graph
p = ggplot() + geom_line(data = df, aes(x = date, y = measure)) +
geom_vline(data = events, aes(xintercept = events$date), color = "red")
p
# plotly graph that does not display the geom_vline properly
ggplotly(p)
I have seen some workaround (like this one : Add vertical line to ggplotly plot) but it is "complicated". Is there a more simple way to solve this problem ?
I am using Windows 10 with R version 3.5.0, RStudio and the following packages :
library(tidyverse) and library(plotly)
A simple workaround is to set the xintecept of the geom_vline to numeric.
sample data
df = data.frame(date = seq(as.POSIXct("2017-07-01", tz = "UTC", format = "%Y-%m-%d"),
as.POSIXct("2018-04-15", tz = "UTC", format = "%Y-%m-%d"),
"1 month"),
measure = runif(10, 0, 20))
events = data.frame(date_envents = as.POSIXct(c("2017-10-12", "2017-11-12", "2018-03-15"), tz = "UTC", format = "%Y-%m-%d"))
code
p = ggplot() + geom_line(data = df, aes(x = date, y = measure)) +
geom_vline(data = events, aes(xintercept = as.numeric(events$date)), color = "red")
result
ggplotly(p)
I've the following dataset:
https://app.box.com/s/au58xaw60r1hyeek5cua6q20byumgvmj
I want to create a density plot based on the time of the day. Here is what I've done so far:
library("ggplot2")
library("scales")
library("lubridate")
timestamp_df$timestamp_time <- format(ymd_hms(hn_tweets$timestamp), "%H:%M:%S")
ggplot(timestamp_df, aes(timestamp_time)) +
geom_density(aes(fill = ..count..)) +
scale_x_datetime(breaks = date_breaks("2 hours"),labels=date_format("%H:%M"))
It gives the following error:
Error: Invalid input: time_trans works with objects of class POSIXct only
If I convert that to POSIXct, it adds dates to the data.
Update 1
The following converted data to 'NA'
timestamp_df$timestamp_time <- as.POSIXct(timestamp_df$timestamp_time, format = "%H:%M%:%S", tz = "UTC"
Update 2
Following is what I want to achieve:
One problem with the solutions posted here is that they ignore the fact that this data is circular/polar (i.e. 00hrs == 24hrs). You can see on the plots on the other answer that the ends of the charts dont match up with each other. This wont make too much of a difference with this particular dataset, but for events that happen near midnight, this could be an extremely biased estimator of density. Here's my solution, taking into account the circular nature of time data:
# modified code from https://freakonometrics.hypotheses.org/2239
library(dplyr)
library(ggplot2)
library(lubridate)
library(circular)
df = read.csv("data.csv")
datetimes = df$timestamp %>%
lubridate::parse_date_time("%m/%d/%Y %h:%M")
times_in_decimal = lubridate::hour(datetimes) + lubridate::minute(datetimes) / 60
times_in_radians = 2 * pi * (times_in_decimal / 24)
# Doing this just for bandwidth estimation:
basic_dens = density(times_in_radians, from = 0, to = 2 * pi)
res = circular::density.circular(circular::circular(times_in_radians,
type = "angle",
units = "radians",
rotation = "clock"),
kernel = "wrappednormal",
bw = basic_dens$bw)
time_pdf = data.frame(time = as.numeric(24 * (2 * pi + res$x) / (2 * pi)), # Convert from radians back to 24h clock
likelihood = res$y)
p = ggplot(time_pdf) +
geom_area(aes(x = time, y = likelihood), fill = "#619CFF") +
scale_x_continuous("Hour of Day", labels = 0:24, breaks = 0:24) +
scale_y_continuous("Likelihood of Data") +
theme_classic()
Note that the values and slopes of the density plot match up at the 00h and 24h points.
Here is one approach:
library(ggplot2)
library(lubridate)
library(scales)
df <- read.csv("data.csv") #given in OP
convert character to POSIXct
df$timestamp <- as.POSIXct(strptime(df$timestamp, "%m/%d/%Y %H:%M", tz = "UTC"))
library(hms)
extract hour and minute:
df$time <- hms::hms(second(df$timestamp), minute(df$timestamp), hour(df$timestamp))
convert to POSIXct again since ggplot does not work with class hms.
df$time <- as.POSIXct(df$time)
ggplot(df, aes(time)) +
geom_density(fill = "red", alpha = 0.5) + #also play with adjust such as adjust = 0.5
scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))
to plot it scaled to 1:
ggplot(df) +
geom_density( aes(x = time, y = ..scaled..), fill = "red", alpha = 0.5) +
scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))
where ..scaled.. is a computed variable for stat_density made during plot creation.