ggplot time series plotting: group by dates

ggplot time series plotting: group by dates - r

I would like to plot several time series on the same panel graph, instead of in separate panels. I took the below R code from another stackoverflow post.
Please note how the 3 time series are in 3 different panels. How would I be able to layer the 3 time series on 1 panal, and each line can differ in color.
Time = Sys.time()+(seq(1,100)*60+c(rep(1,100)*3600*24, rep(2, 100)*3600*24, rep(3, 100)*3600*24))
Value = rnorm(length(Time))
Group = c(0, cumsum(diff(Time) > 1))
library(ggplot2)
g <- ggplot(data.frame(Time, Value, Group)) +
geom_line (aes(x=Time, y=Value, color=Group)) +
facet_grid(~ Group, scales = "free_x")
If you run the above code, you get this:
When the facet_grid() part is eliminated, I get a graph that looks like this:
Basically, I would like ggplot to ignore the differences in the dates, and only consider the times. And then use group to identify the differing dates.
This problem could potentially be solved by creating a new column that only contains the times (eg. 22:01, format="%H:%M"). However, when as.POSIXct() function is used, I get a variable that contains both date and time. I can't seem to escape the date part.

Since the data file has different days for each group's time, one way to get all the groups onto the same plot is to just create a new variable, giving all groups the same "dummy" date but using the actual times collected.
experiment <- data.frame(Time, Value, Group) #creates a data frame
experiment$hms <- as.POSIXct(paste("2015-01-01", substr(experiment$Time, 12, 19))) # pastes dummy date 2015-01-01 onto the HMS of Time
Now that you have the times with all the same date, you then can plot them easily.
experiment$Grouping <- as.factor(experiment$Group) # gglot needed Group to be a factor, to give the lines color according to Group
ggplot(experiment, aes(x=hms, y=Value, color=Grouping)) + geom_line(size=2)
Below is the resulting image (you can change/modify the basic plot as you see fit):

Related

How to properly plot a histogram with dates using ggplot?

I would like to create an interactive histogram with dates on the x-axis.
I have used ggplot+ggplotly.
I've read I need to use to pass the proper information using the "text=as.character(mydates)" option and sometimes "tooltips=mytext".
This trick works for other kinds of plots but there is a problem with the histograms, instead of getting a single bar with a single value I get many sub-bars stacked.
I guess the reason is passing "text=as.character(fechas)" produces many values instead of just the class value defining that bar.
How can I solve this problem?
I have tried filtering myself the data but I don't know how to make this the parameters match the parameters used by the histogram, such as where the dates start for each bar.
library(lubridate)
library(ggplot2)
library(ggplotly)
Ejemplo <- data.frame(fechas = dmy("1-1-20")+sample(1:100,100, replace=T),
valores=runif(100))
dibujo <- ggplot(Ejemplo, aes(x=fechas, text=as.character(fechas))) +
theme_bw() + geom_histogram(binwidth=7, fill="darkblue",color="black") +
labs(x="Fecha", y="Nº casos") +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
scale_x_date(date_breaks = "weeks", date_labels = "%d-%m-%Y",
limits=c(dmy("1-1-20"), dmy("1-4-20")))
ggplotly(dibujo)
ggplotly(dibujo, tooltip = "text")
As you can see, the bars are not regular histogram bars but something complex.
Using just ggplot instead of ggplotly shows the same problem, though then you woulnd't need to use the extra "text" parameter.

Presently, feeding as.character(fechas) to the text = ... argument inside of aes() will display the relative counts of distinct dates within each bin. Note the height of the first bar is simply a count of the total number of dates between 6th of January and the 13th of January.
After a thorough reading of your question, it appears you want the maximum date within each weekly interval. In other words, one date should hover over each bar. If you're partial to converting ggplot objects into plotly objects, then I would advise pre-processing the data frame before feeding it to the ggplot() function. First, group by week. Second, pull the desired date by each weekly interval to show as text (i.e., end date). Next, feed this new data frame to ggplot(), but now layer on geom_col(). This will achieve similar output since you're grouping by weekly intervals.
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
set.seed(13)
Ejemplo <- data.frame(fechas = dmy("1-1-20") + sample(1:100, 100, replace = T),
valores = runif(100))
Ejemplo_stat <- Ejemplo %>%
arrange(fechas) %>%
filter(fechas >= ymd("2020-01-01"), fechas <= ymd("2020-04-01")) %>% # specify the limits manually
mutate(week = week(fechas)) %>% # create a week variable
group_by(week) %>% # group by week
summarize(total_days = n(), # total number of distinct days
last_date = max(fechas)) # pull the maximum date within each weekly interval
dibujo <- ggplot(Ejemplo_stat, aes(x = factor(week), y = total_days, text = as.character(last_date))) +
geom_col(fill = "darkblue", color = "black") +
labs(x = "Fecha", y = "Nº casos") +
theme_bw() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_x_discrete(label = function(x) paste("Week", x))
ggplotly(dibujo) # add more text (e.g., week id, total unique dates, and end date)
ggplotly(dibujo, tooltip = "text") # only the end date is revealed
The "end date" is displayed once you hover over each bar, as requested. Note, the value "2020-01-12" is not the last day of the second week. It is the last date observed in the second weekly interval.
The benefit of the preprocessing approach is your ability to modify your grouped data frame, as needed. For example, feel free to limit the date range to a smaller (or larger) subset of weeks, or start your weeks on a different day of the week (e.g., Sunday). Furthermore, if you want more textual options to display, you could also display your total number of unique dates next to each bar, or even display the date ranges for each week.

Graphing different variables in the same graph R- ggplot2

I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks

I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:

We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))

Stop graph touching zero in ggplot geom_freqpoly function

I am creating a frequency plot using the geom_freqpoly function in ggplot2. I have a large data set of social media comments across 14 months and am plotting the number of comments for each week of that data. I am using this code, first converting the UTC to POSIXct and the doing the frequency plot:
ggplot(data = TRP) +
geom_freqpoly(mapping = aes(x = created_utc), binwidth = 604800)
This is creating a plot that looks like this:
I want however to top and tail the plot, as it touches 'zero' at both the start and end, making it look like there was rapid growth and rapid decline. This is not the case as this is simply a snapshot of the data, which exists before and after my analysis. The data begins at the 4,000 mark and ends at around 2,000 and I want it represented like that. I have checked the 'pad' instruction and have insured it is set at FALSE.
Any help as to why this may be occurring would be greatly appreciated.
Thanks!

Rather than adjusting the geom_freqpoly to work differently than intended, it might be simpler to calculate the weekly totals yourself and use geom_line:
library(lubridate); library(dplyr)
set.seed(1)
df <- data.frame(
datetime = ymd_h(2018010101) + dhours(runif(1000, 0, 14*30*24))
)
df %>%
count(week_count = floor_date(datetime, "1 week")) %>%
ggplot(aes(week_count, n)) +
geom_line()

Generating R ggplot line graph with color/type conditional on different variables

I'm struggling to get the exact output needed for a ggplot line graph. As an example, see the code below. Overall, I have two conditions (A/B), and two treatments (C/D). So four total series, but in a factorial way. The lines can be viewed as a time series but with ordinal markings (rather than numeric).
I'd like to generate a connected line graph for the four types, where the color depends on the condition, and the line type depends on the treatment. Thus two different colors and two line types. To make things a bit more complicated, one condition (B) does not have data for the third time period.
I cannot seem to generate the graph needed for these constraints. The closest I got is shown below. What am I doing wrong? I try to remove the group=condition code, but that doesn't help either.
library(ggplot2)
set.seed<-1
example_df <- data.frame(time = c('time1','time2','time3','time1','time2','time3','time1','time2','time1','time2'),
time_order = c(1,2,3,1,2,3,1,2,1,2),
condition = c('A','A','A','A','A','A','B','B','B','B'),
treatment = c('C','C','C','D','D','D','C','C','D','D'),
value = runif(10))
ggplot(example_df, aes(x=reorder(time,time_order), y=value, color=condition , line_type=treatment, group=condition)) +
geom_line()

You've got 3 problems, from what I can tell.
linetype doesn't have an underscore in it.
With a categorical axis, you need to use the group aesthetic to set which lines get connected. You've made a start with group = condition, but this would imply one line for each condition type (2 lines), but you want one line for each condition:treatment interaction (2 * 2 = 4 lines), so you need group = interaction(condition, treatment).
Your sample data doesn't quite make sense. Your condition B values have two treatment Cs at time 1 and two Ds at time 2, so there is no connection between times 1 and 2. This doesn't much matter, and your real data is probably fine.
This should work:
ggplot(
example_df,
aes(
x = reorder(time, time_order),
y = value,
color = condition,
linetype = treatment,
group = interaction(condition, treatment)
)
) +
geom_line()

Differentiate missing values from main data in a plot using R

I create a dummy timeseries xts object with missing data on date 2-09-2015 as:
library(xts)
library(ggplot2)
library(scales)
set.seed(123)
seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour")
ob1 <- xts(rnorm(length(seq),150,5),seq)
seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour")
ob2 <- xts(rnorm(length(seq2),170,5),seq2)
final_ob <- rbind(ob1,ob2)
plot(final_ob)
# with ggplot
df <- data.frame(time = index(final_ob), val = coredata(final_ob) )
ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d"))
After plotting my data looks like this:
The red coloured rectangular portion represents the date on which data is missing. How should I show that data was missing on this day in the main plot?
I think I should show this missing data with a different colour. But, I don't know how should I process data to reflect the missing data behaviour in the main plot.

Thanks for the great reproducible example.
I think you are best off to omit that line in your "missing" portion. If you have a straight line (even in a different colour) it suggests that data was gathered in that interval, that happened to fall on that straight line. If you omit the line in that interval then it is clear that there is no data there.
The problem is that you want the hourly data to be connected by lines, and then no lines in the "missing data section" - so you need some way to detect that missing data section.
You have not given a criteria for this in your question, so based on your example I will say that each line on the plot should consist of data at hourly intervals; if there's a break of more than an hour then there should be a new line. You will have to adjust this criteria to your specific problem. All we're doing is splitting up your dataframe into bits that get plotted by the same line.
So first create a variable that says which "group" (ie line) each data is in:
df$grp <- factor(c(0, cumsum(diff(df$time) > 1)))
Then you can use the group= aesthetic which geom_line uses to split up lines:
ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change
scale_x_datetime(labels = date_format("%Y-%m-%d"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot time series plotting: group by dates - r

Related

How to properly plot a histogram with dates using ggplot?

Graphing different variables in the same graph R- ggplot2

Stop graph touching zero in ggplot geom_freqpoly function

Generating R ggplot line graph with color/type conditional on different variables

Differentiate missing values from main data in a plot using R

Categories

Resources