Related
I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2
It is longitudinal data; ID wise values are repeating 4 times in every tick of 20 steps. Then this experiments repeats. For the datafarme below I want bins based for every tick time steps for the categories of land based on the values of X.
Bins can be 3 for every time interval for land type (Small, medium and large) each.
I want to see timeline of bins of X based on categories of Land.
Any help will be appreciated. I have added possibly a picture of how data may look like for ggplot and plot as bins or dots may look like as in picture.
Seed(123)
ID = 1:5
Time = rep (c(1,2,3,4,5), each = 20)
Type = 1:25
data <- data.frame( IDn = rep(ID,20), Time, Land = rep(Type, 40), y = rnorm(100,0,1), x = runif(100,0,1))
data$Land= ifelse (data$Land > 15,"large farmers", ifelse(data$Land <=5, "small farmers", "medium-farmers"))
Edit: Question for labeling the faceting variable and dot plots.
Maybe something like this would help -
library(dplyr)
library(ggplot2)
data %>%
group_by(Time, Land) %>%
mutate(x = cut(x, c(0, 0.25, 0.75, 1))) %>%
ungroup %>%
count(Time, Land, x) %>%
ggplot() + aes(Time, n, fill = Land) + geom_col(position = 'dodge')
I'm trying to do a levelplot using ggplot2 for a (meteorological) variable. The variable is measured continuously in time (my x-axis), but in non-continuous heights (y-axis) at every time step.
The produced plot therefore shows data at the heights (y-coordinates) specified, but nothing in between.
Here's an example:
library(ggplot2)
data <- runif(400, min=0, max=10)
index <- c(1:20)
heights <- c(1,2,3,4,5,7,9,12,15,19,23,28,33,39,45,52,59,67,75,83)
dat <- as.data.frame(cbind(expand.grid(X=index,Y=heights),data))
ggplot(dat, aes(x=dat[,1], y=dat[,2],z=dat[,3])) +geom_tile(aes(fill=dat[,3]))
This produces the following plot:
Is there an easy way to fill the plot fully, i.e. make the lines in the upper part of the plot broader?
Thank you!
OK one more solution.. you could interpolate using the approx function. Although maybe 2D kriging would be more appropriate for your application???
library(purrr)
dat2<- dat %>%
split(.$X) %>%
map_dfr(~ approx(.$Y, .$data, xout =1:83), .id = "X")
ggplot(dat2, aes(x=as.integer(dat2$X), y=dat2$x, z=dat2$y)) +geom_tile(aes(fill=dat2$y))
That will give you :
You can use the height and width attributes in geom_tile, alternatively geom_rect
library(tidyverse)
data <- runif(400, min=0, max=10)
index <- c(1:20)
heights <- c(1,2,3,4,5,7,9,12,15,19,23,28,33,39,45,52,59,67,75,83)
dat <- crossing(index = index, heights = heights) %>%
mutate(
Z = data,
index0 = index - 1) %>%
left_join(data_frame(heights, heights0 = c(0, heights[-length(heights)])))
ggplot(dat, aes(xmin = index0, xmax = index, ymin = heights0, ymax = heights, fill = Z)) +
geom_rect()
This assumes that your heights are the top of each level and that they start at zero.
You could convert the y axis to be factor in order to eliminate the dead space. This will, however, not make the upper lines broader.
ggplot(dat, aes(x=dat[,1], y=factor(dat[,2]),z=dat[,3])) +geom_tile(aes(fill=dat[,3]))
I've following dataset:
time tta
08:20:00 1
21:30:00 5
22:00:00 1
22:30:00 1
00:25:00 1
17:00:00 5
I would like to plot bar chart using ggplot so that the x-axis has every every 2 hours(00:00:00,02:00:00,04:00:00 and so on) and y-axis has frequency for a factor tta (1 and 5).
x-axis should be 00-01,01-02,... so on
I approached this using the xts package, but then found that it does not offer flooring the time. Hence, I conclude lubridate to be more practical here, also because ggplot does not understand xts objects right away. Both packages help you transforming time data in many ways.
Use xts::align.time or lubridate::floor_date to shift your times to the next/previous full hour/day/etc.
Either way, you aggregate the data before you pass it to ggplot. You can use sum to sum up tta, or just use length to count the number of occurences, but in the latter case you could also use geom_histogram on the time series only. You can carefully shift the bars in ggplot with position_nudge to represent a period rather than just sitting centered on a point of time. You sould specify scale_x_time(labels = ..., breaks = ...) in the plot.
Data:
time <- c(
"08:20:00",
"21:30:00",
"22:00:00",
"22:30:00",
"00:25:00",
"17:00:00"
)
time <- as.POSIXct(time, format = "%H:%M:%S")
tta <- c(1, 5, 1, 1, 1, 5)
Using xts:
library(xts)
myxts <- xts(tta, order.by = time)
myxts_aligned <- align.time(myxts, n = 60*60*2) # shifts all times to the next full
# 2 hours
myxts_agg <- period.apply(myxts_aligned,
INDEX = endpoints(myxts, "hours", 2),
FUN = sum) # sums up every two hours
require(ggplot2)
ggplot(mapping = aes(x = index(myxts_agg), y = myxts_agg[, 1])) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = -60*60), # shift one hour to the left
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = index(myxts_agg)) + # add more breaks manually if you like
scale_y_continuous() # to escape the warning of ggplot not knowing
# how to deal with xts object
Using lubridate:
require(lubridate)
require(tidyverse)
mydf <- data.frame(time = time, tta = tta)
mydf_agg <-
mydf %>%
group_by(time = floor_date(time, "2 hours")) %>%
summarise(tta_sum = sum(tta), tta_freq = n())
ggplot(mydf_agg, aes(x = time, y = tta_sum)) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = 60*60), # shift one hour to the *right*
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = mydf_agg$time) # add more breaks manually if you like
After all, allmost the same:
use the floor_date function from lubridate
library(tidyverse)
library(lubridate)
your_df %>% group_by(floor_date(time,"2 hours")) %>% count(tta)
and then ggplot with geom_col from there
library(lubridate)
library(ggplot2)
Make sure the class for your timestamp is POSxx
> class(df$timestamp)
[1] "POSIXct" "POSIXt"
Then use the scale_x_datetime function as follows.
gg +
scale_x_datetime(expand = c(0, 0), breaks=date_breaks("1 hour"), labels=date_format("%H:%M"))
On this case, it will space the brakes on the x axis, every one hour and the labels will look 09:00 for example.
I have data in the following format:
Date Year Month Day Flow
1 1953-10-01 1953 10 1 530
2 1953-10-02 1953 10 2 530
3 1953-10-03 1953 10 3 530
I would like to create a graph like this:
Here is my current image and code:
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
## Read Data
df <- read.csv("Salt River Flow.csv")
## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")
## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900
## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)
##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))
## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p = 0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))
## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")
## Plots
p <- ggplot(m, aes(x = dat)) +
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) +
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) +
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) +
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) +
theme_bw() +
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) +
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) +
labs(x = "Water Year", y = "Flow (cfs)")
p
I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.
Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.
Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).
Any help is greatly appreciated (especially with creating more efficient code).
Thank you.
Something along these lines might get you close with base:
library(lubridate)
library(reshape2)
# simulating data...
Date <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)
# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
})
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))
# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)
The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).
Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:
library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
labs(x="Month",y="Flow") +
scale_x_date( labels = date_format("%b") ) +
theme_bw()
Edit: Fixed X scale and X scale label
(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>
HDL.mon.mat <- aggregate(dfrm$Flow,
list( dfrm$Year + dfrm$Month/12),
quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")