I want to display two dimensions in my data, (1) reporting entity in different facets and (2) country associated to the data point on the x-axis. The problem is that the country dimension includes a "total", which is a lot higher than all of the individual values, so I would need an own axis limit for that.
My solution was to try another facetting dimension, but I could not get it working and looking nicely at the same time. Consider the following dummy data:
id <- c(1,1,1,1,1,1,2,2,2,2,2,2)
country <- c("US","US","UK","World","World","World","US","US","UK","World","World","World")
value <- c(150,40,100,1000,1100,1500,5,10,20,150,200,120)
# + some other dimensions
mydat <- data.frame(id,country,value)
id country value
1 1 US 150
2 1 US 40
3 1 UK 100
4 1 World 1000
5 1 World 1100
6 1 World 1500
7 2 US 5
8 2 US 10
9 2 UK 20
10 2 World 150
11 2 World 200
12 2 World 120
If I use a facet grid to display a world total, the axis limit is forced for the other countries as well:
mydat$breakdown <- mydat$country == "World"
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_grid(id ~ breakdown,scales = "free",space = "free_x") +
theme(strip.text.x = element_blank() , strip.background = element_blank(),
plot.margin = unit( c(0,0,0,0) , units = "lines" ) )
(the last part of the plot is just to remove the additional strip).
If I use a facet wrap, it does give me different axis limits for each plot, but then I cannot pass the space = "free_x" argument, meaning that the single column for the total will consume the same space as the entire country overview, which looks ugly for data sets with many countries:
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_wrap(id ~ breakdown,scales = "free")
There are several threads here which ask similar questions, but none of the answers helped me to achieve this yet.
Different axis limits per facet in ggplot2
Is it yet possible to have different axis breaks / limits for individual facets in ggplot with free scale?
Setting individual axis limits with facet_wrap and scales = "free" in ggplot2
Maybe try gridExtra::grid.arrange or cowplot::plot_grid:
lst <- split(mydat, list(mydat$breakdown, mydat$id))
plots <- lapply(seq(lst), function(x) {ggplot(lst[[x]]) +
aes(x=country,y=value) +
geom_point() +
ggtitle(names(lst)[x]) + labs(x=NULL, y=NULL)
})
do.call(gridExtra::grid.arrange,
c(plots, list(ncol=2, widths=c(2/3, 1/3)),
left="Value", bottom="country"))
Related
I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
I am trying to produce a bar graph that has thousand data.
I have size problem with ggplot.
Code :
ggplot(data = df, aes(x=extension, y=duration)) +
geom_bar(stat="identity", width=10,fill="steelblue")+
ggtitle("Chart") +
xlab("Number") +
ylab("Duration") +
theme(legend.position = "none")+
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))+
coord_flip()
Output:
Chart output
Load data frame from MongoDB.
Data Frame:
1 36952 7158803
2 36110 7068360
3 36080 4736043
4 36509 4726630
5 36890 4699026
6 36051 4698594
7 36783 4677233
8 36402 4672623
9 36880 4672093
10 36513 4655583
11 36522 4630962
12 36116 4628046
13 36746 4593291
....
From your sample chart I would infer that your x-axis (extension) is probably a factor. If it were numeric, ggplot would correctly scale the axis.
I would recommend to check the class of the columns of your dataset. Make sure that both are numeric.
Alternatively, you would have to come up with an appropriate scaling of your x-axis.
Here's the plot where your flipped x-axis is a factor; ggplot tries to render every separate level of the factor and they overlap as there are so many. I created some fake data quickly to mimic yours.
Here's the plot where extension is numeric and ggplot neatly scales this correctly.
I have seen different variations of this question, but none that are straight-forward in answering a problem I come across all of the time. I often have large datasets like the one described in this link:
Make multiple separate plots from single data frame in R
Example provided:
head(data)
Park_name Zone Year Height_mm
1 Park1 Zone1 2011 380
2 Park1 Zone1 2011 510
3 Park1 Zone1 2011 270
4 Park1 Zone2 2011 270
5 Park1 Zone2 2011 230
6 Park1 Zone2 2011 330
# load packages
require(ggplot2)
require(plyr)
# read data
Y <- read.table("C:/data.csv", sep=",", header=TRUE)
# define the theme
th <- theme_bw() +
theme(axis.text.x=element_text(),
axis.line=element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.background=element_blank(),
legend.justification=c(10,10), legend.position=c(10,10),
legend.title = element_text(),
legend.key = element_blank()
)
# determine park levels
parks <- levels(Y[,"Park_name"])
# apply seperately for each park
p <- lapply(parks, function(park) {
ggplot(Y[Y[, "Park_name"]==park,], aes(x=as.factor(Year), y=Height_mm)) +
facet_grid(Zone~.) + # show each zone in a seperate facet
geom_point() + # plot the actual heights (if desired)
# plot the mean and confidence interval
stat_summary(fun.data="mean_cl_boot", color="red")
})
# finally print your plots
lapply(p, function(x) print(x+th))
I want to create a singular plot to put in a report appendices for each Park's Zone, plotting year x height. Sometimes this totals over 100 plots. I do not want to facet wrap. I want the plots uniquely individual and it would be great to save jpegs automatically to a designated folder. I also want each plot to uniquely record:
1. A unique y-axis title. (let's say the height column had values in both feet and meters and you needed figures to identify which one.
2. A unique main-title based off the Park Name and Zone.
This is a huge challenge for me but may be an easy coding problem for someone who uses code so often. I would be eternally grateful for help, since I need this type of loop all of the time. Thank you!
I think the main problem with the example you provided is that the loop is made over the "parks" vector, which only contains the levels of "Park_name". I think a better approach would be to loop over the data, subsetting by each "Park_name" entry.
I am also assuming that you have a column with the "units" variable (I added it in the plot as "Units"); however, if that is not the case, you may be able to create it using dplyr::separate. I hope you find this code useful!
# determine park levels
parks <- unique(data[,"Park_name"])
# lapply for each park entry
p <- lapply(parks, function(park) {
#Subset the data by the each entry in the parks vector
subdata <- subset(data,data$Park_name == park)
#Collapse the zone vector as a string
zones <- paste(unique(subdata[,"Zone"]),
collapse = " ")
##ggplot
ggplot(subdata, aes(x=as.factor(Year), y=Height_mm)) +
facet_grid(Zone~.) +
geom_point() +
#Add the title and y lab as variables defined by park name, zones and a column with unit information
labs(title = paste(subdata$Park_name, zones, sep = " "),
y = paste0("Height (", subdata$Units,")"),
x = "Year") +
stat_summary(fun.data="mean_cl_boot", color="red")
#Save the plot, define your folder location as "C:/myplots/"
ggsave(filename = paste0(folder, park,".jpeg"),
device = "jpeg",
width = 15,
height = 10,
units = "cm",
dpi = 200)
})
I'm trying to create a stacked bar plot to indicate when requests were made for resources on website. I would like to use a gradient legend to describe when the requests were made. How can I create a gradient legend, and would that be the right way to visualize this time domain data?
> head(livePostHit)
path date hits
1 /2017/06/27/goog-fit-cal.html 2018/04/01 1
2 /2015/05/24/sqlite-tutorial.html 2018/04/01 1
3 /2016/11/07/coin-freq.html 2018/04/01 1
4 /2017/03/30/alpine-linux.html 2018/04/01 2
5 /2018/03/09/querySelectorAll.html 2018/04/01 1
6 /2017/11/24/fedora-27-rv.html 2018/04/01 1
> ggplot(livePostHit, aes(x = path, y = hits, fill = date)) +
geom_bar(stat='identity') +
theme(axis.text = element_text(angle=75, hjust = 1),
legend.position = 'none')
I turned off the legend because there were too many groups for it to render correctly, but I would like to create like a gradient for the highest to the lowest.
I just needed to make date an actual date type rather than a factor.
livePostHit$date <- as.Date(livePostHit$date)
I'm trying to generate a series of bar charts, one for each of 7 provinces, based off a master data table. However, the software only plots data from one of the provinces -- province 4. When I export to PDF I get 7 of the same bar chart (with different titles).
The data is in this format (abbreviated for clarity)
province travelcat pc_pop
60 1 0 to 4 hours 0.6807
21 1 4 to 8 hours 0.1093
28 2 4 to 8 hours 0.0969
44 2 36 to 48 hours 0.0014
31 3 48 to 72 hours 0.0016
49 3 > 72 hours 0.0007
Weirdly, when I generate a filtered table prov_filter and print that, it shows the data exactly as I'd expect it, specific to each province. Similarly the province title province_number is assigned correctly in the resulting PDF printouts. So the filtering is happening...but the data isn't going into the plot.
province_list=list()
for (i in unique(slim_prov_TCR$province)) {
province_number <- paste("Province",i)
prov_filter <- filter(slim_prov_TCR, province == i)
print(prov_filter)
plot <- ggplot(prov_filter, aes(x = prov_filter$travelcat, y = prov_filter$pc_pop))
+ theme(axis.text.x = element_text(angle=45, hjust=1))
+ scale_y_continuous(limits=c(0,1),labels = scales::percent)
+ ylab("% of provincial population") + xlab("Travel time to nearest medical facility")
+ ggtitle(province_number)
+ stat_summary(fun.y="identity",geom="bar")
filename=paste(province_number,".pdf",sep="")
province_list[[i]] = plot
print(plot)
}
I've done this before using similar code with no problems, but this time I've had serial problems, despite revising the filter code using multiple methods. I'm relatively new to R and statistics land in general so I'm probably mucking up something on the grammar side. Any and all help appreciated.
For reference purposes the final printout code is below
for (i in unique(slim_prov_TCR$province)) { # Another for loop, this time to save out the bar charts in province_list as PDFs
province_number <- paste("Province",i)
filename=paste(province_number,".pdf",sep="") # Make the file name for each PDF. The paste makes the name a variable of the disrict, so each chart is named by sensor
pdf(filename,width=3.5,height=3.5) # PDF basic specifications. Modify the width and height here.
print(province_list[[i]])
dev.off()
}
As highlighted by alistaire and Gregor, using $ for categories and having the + at the beginning of lines was confusing R. Reformatting these two points did the trick. See below text.
province_list=list()
for (i in unique(slim_prov_TCR$province)) {
province_number <- paste("Province",i)
prov_filter <- filter(slim_prov_TCR, province == i)
print(prov_filter)
plot <- ggplot(prov_filter, aes(x = travelcat, y = pc_pop)) +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
scale_y_continuous(limits=c(0,1),labels = scales::percent) +
ylab("% of provincial population") + xlab("Travel time to nearest medical facility") +
ggtitle(province_number) +
stat_summary(fun.y="identity",geom="bar")
filename=paste(province_number,".pdf",sep="")
province_list[[i]] = plot
print(plot)
}