Related
I'm trying to recreate a circular plot from here (a first plot on this page), but the output I just got seems incorrect. The 'last' bar (between 23 and 0) is missing and the 'first' one (between 0 and 1) is unproportionally high. What's more, bars appear 'moved' by one unit to the left, while on the website above the plot seems fine.
Here is a code which I copied from that site. The only difference I made is that I removed "width=2" from geom_histogram(), because otherwise it raised an error saying that argument width was deprecated.
library(lubridate)
library(ggplot2)
set.seed(44)
N=500
events <- as.POSIXct("2011-01-01", tz="GMT") +
days(floor(365*runif(N))) +
hours(floor(24*rnorm(N))) +
minutes(floor(60*runif(N))) +
seconds(floor(60*runif(N)))
hour_of_event <- hour(events)
eventdata <- data.frame(datetime = events, eventhour = hour_of_event)
# determine if event is in business hours
eventdata$Workday <- eventdata$eventhour %in% seq(9, 17)
ggplot(eventdata, aes(x = eventhour, fill = Workday)) +
geom_histogram(breaks = seq(0, 24), colour = "grey") +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") +
ggtitle("Events by Time of day") +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0, 24))
Here is what I got:
Here is a table of the data. You can see that for hour 23 should be a value of 17 instead of 0 like in my plot.
table(eventdata$eventhour)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
23 22 18 26 28 20 19 21 16 17 20 16 18 22 16 21 24 21 22 27 25 18 23 17
Do you have an idea why my plot doesn't show correct values and how I can fix this?
I propose this solution based on this post :
library(lubridate)
library(ggplot2)
set.seed(44)
N=500
events <- as.POSIXct("2011-01-01", tz="GMT") +
days(floor(365*runif(N))) +
hours(floor(24*rnorm(N))) +
minutes(floor(60*runif(N))) +
seconds(floor(60*runif(N)))
hour_of_event <- hour(events)
eventdata <- data.frame(datetime = events, eventhour = hour_of_event)
# determine if event is in business hours
eventdata$Workday <- eventdata$eventhour %in% seq(9, 17)
df <- data.frame(table(eventdata$eventhour),
business_hour = 0:23 %in% seq(9, 17))
colnames(df)[1:2] <- c("hour", "value")
ggplot(df, aes(hour, value, fill = business_hour)) +
coord_polar(theta = "x", start = 0) +
geom_bar(stat = "identity", width = .9)
I hope it helps. It doesn't tell you why you have a problem in your case but it gives you a viable solution.
It seems that the issue was caused by arguments of geom_histogram and scale_x_continuous function.
Instead of this:
geom_histogram(breaks = seq(0, 24), colour = "grey") +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0, 24))
it should be:
geom_histogram(bins = 24, colour = "grey") +
scale_x_continuous(breaks = seq(-0.5, 23.5), labels = seq(0, 24))
It's still a bit confusing to me why it works only this way, but it finally works...
Using ggplot2 and scale_size_area(), how to I make the point size for area = 0.5 correspond to the size of a default point (size = 0.5)?
Here is a simple repex showing that this is not the default behavior. I would like for the black and red points to have the same size at the middle point (where area = 0.5):
ggplot(data.frame(area = seq(from = 0, to = 1, length.out = 17), y = 1), aes(x = area, y = y)) +
geom_point(aes(size = area), color = "red") + # Area point
geom_point() + # Default point
scale_size_area("size_area")
I have tried and failed with area = area / 2 and scale_size_area(rescaler = NULL).
You can play around with the range and limits arguments within scale_size to get something closer to what you're looking for:
ggplot(data.frame(area = seq(from = 0, to = 1, length.out = 17), y = 1), aes(x = area, y = y)) +
geom_point(aes(size = area), color = "red") + # Area point
geom_point() + # Default point
scale_size("size_area", range = c(-20, 10))
EDIT:
Since that's a little hacky and not scalable, the better way to do this is to first figure out what the default point size is:
default_size <- ggplot2:::check_subclass("point", "Geom")$default_aes$size
default_size
[1] 1.5
It should be 1.5, unless you've manually changed the defaults. Now we can rebuild the plot and figure out how the size aesthetic is currently being mapped to area:
df <- data.frame(area = seq(from = 0, to = 1, length.out = 17), y = 1)
g <- ggplot(df, aes(x = area, y = y)) +
geom_point(aes(size = area), color = "red") + # Area point
geom_point() +
scale_size_area()
g2 <- ggplot_build(g)
g2$data[[1]] %>%
select(x, size)
x size
1 0.0000 0.000000
2 0.0625 1.500000
3 0.1250 2.121320
4 0.1875 2.598076
5 0.2500 3.000000
6 0.3125 3.354102
7 0.3750 3.674235
8 0.4375 3.968627
9 0.5000 4.242641
10 0.5625 4.500000
11 0.6250 4.743416
12 0.6875 4.974937
13 0.7500 5.196152
14 0.8125 5.408327
15 0.8750 5.612486
16 0.9375 5.809475
17 1.0000 6.000000
The relationship is size = 6*sqrt(x). Why 6? Because the scale_size_area has a default max_size of 6. So, to make it so the x-value of 0.5 maps to 1.5 size, we have to solve the above equation for a new max_size, and we get 1.5/sqrt(0.5).
To automate this, we can do the following:
default_size_val <- 0.5
max_size <- default_size/(sqrt(default_size_val))
ggplot(df, aes(x = area, y = y)) +
geom_point(aes(size = area), color = "red") + # Area point
geom_point() +
scale_size_area(max_size = max_size)
I am trying to do a bar chart of an aggregate, by the hour.
hourly <- data.frame(
hour = 0:23,
N = 7+0:23,
hour.mod = c(18:23, 0:17))
The day is from 6am to 6am, so I added an offset, hour.mod, and then:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count")
Except, the x-axis scale at 0 contradicts the label. While tinkering with scales: scale_x_discrete(breaks = c(6, 10, 14, 18, 22)) disappeared the scale altogether; which works for now but sub-optimal.
How do I specify x axis to start at an hour other than 0 or 23? Is there way to do so without creating an offset column? I am a novice, so please assume you are explaining to the village idiot.
You don't say what you want to see, but it's fairly clear that you should be using scale_x_continuous and shifting your labels somehow, either "by hand" or with some simple math:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count") +
scale_x_continuous(breaks= c(0,4,8,12,16), labels = c(6, 10, 14, 18, 22) )
Or perhaps:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count") +
scale_x_continuous(breaks= c(6, 10, 14, 18, 22)-6, # shifts all values lower
labels = c(6, 10, 14, 18, 22) )
It's possible you need to use modulo arithmetic, which in R involves the use of %% and %/%:
1:24 %% 12
[1] 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0
I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).
I am trying to have 2 "shadows" on the background of the below plot. These shadows should represent the density of the orange and blue points separately. Does it make sense?
Here is the ggplot to improve:
Here is the code and the data (matrix df) I used to create this plot:
PC1 PC2 aa
A_akallopisos 0.043272525 0.0151023307 2
A_akindynos -0.020707141 -0.0158198405 1
A_allardi -0.020277664 -0.0221016281 2
A_barberi -0.023165596 0.0389906701 2
A_bicinctus -0.025354572 -0.0059122384 2
A_chrysogaster 0.012608835 -0.0339330213 2
A_chrysopterus -0.022402365 -0.0092476009 1
A_clarkii -0.014474658 -0.0127024469 1
A_ephippium -0.016859412 0.0320034231 2
A_frenatus -0.024190876 0.0238499714 2
A_latezonatus -0.010718845 -0.0289904165 1
A_latifasciatus -0.005645811 -0.0183202248 2
A_mccullochi -0.031664307 -0.0096059126 2
A_melanopus -0.026915545 0.0308399009 2
A_nigripes 0.023420045 0.0293801537 2
A_ocellaris 0.052042539 0.0126144250 2
A_omanensis -0.020387101 0.0010944998 2
A_pacificus 0.042406273 -0.0260308092 2
A_percula 0.034591721 0.0071153133 2
A_perideraion 0.052830132 0.0064495142 2
A_polymnus 0.030902254 -0.0005091421 2
A_rubrocinctus -0.033318659 0.0474995722 2
A_sandaracinos 0.055839755 0.0093724082 2
A_sebae 0.021767793 -0.0218640814 2
A_tricinctus -0.016230301 -0.0018526482 1
P_biaculeatus -0.014466403 0.0024864574 2
ggplot(data=df,aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) + ggtitle(paste('Site n° ',Sites_names[j],sep='')) +geom_smooth(se=F, method='lm')+ geom_point() + scale_color_manual(name='mutation', values = c("darkorange2","cornflowerblue"), labels = c("A","S")) + geom_text(hjust=0.5, vjust=-1 ,size=3) + xlim(-0.05,0.07)
Here are some possible approaches using stat_density2d() with geom="polygon" and mapping or setting alpha transparency for the density fill regions. If you are willing to experiment with some the parameters, I think you can get some very useful plots. Specifically, you may want to adjust the following:
n controls the smoothness of the density polygon.
h is the bandwidth of the density estimation.
bins controls the number of density levels.
df = read.table(header=TRUE, text=
" PC1 PC2 aa
A_akallopisos 0.043272525 0.0151023307 2
A_akindynos -0.020707141 -0.0158198405 1
A_allardi -0.020277664 -0.0221016281 2
A_barberi -0.023165596 0.0389906701 2
A_bicinctus -0.025354572 -0.0059122384 2
A_chrysogaster 0.012608835 -0.0339330213 2
A_chrysopterus -0.022402365 -0.0092476009 1
A_clarkii -0.014474658 -0.0127024469 1
A_ephippium -0.016859412 0.0320034231 2
A_frenatus -0.024190876 0.0238499714 2
A_latezonatus -0.010718845 -0.0289904165 1
A_latifasciatus -0.005645811 -0.0183202248 2
A_mccullochi -0.031664307 -0.0096059126 2
A_melanopus -0.026915545 0.0308399009 2
A_nigripes 0.023420045 0.0293801537 2
A_ocellaris 0.052042539 0.0126144250 2
A_omanensis -0.020387101 0.0010944998 2
A_pacificus 0.042406273 -0.0260308092 2
A_percula 0.034591721 0.0071153133 2
A_perideraion 0.052830132 0.0064495142 2
A_polymnus 0.030902254 -0.0005091421 2
A_rubrocinctus -0.033318659 0.0474995722 2
A_sandaracinos 0.055839755 0.0093724082 2
A_sebae 0.021767793 -0.0218640814 2
A_tricinctus -0.016230301 -0.0018526482 1
P_biaculeatus -0.014466403 0.0024864574 2")
library(ggplot2)
p1 = ggplot(data=df, aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) +
ggtitle(paste('Site n° ',sep='')) +
stat_density2d(aes(fill=factor(aa), alpha = ..level..),
geom="polygon", color=NA, n=200, h=0.03, bins=4) +
geom_smooth(se=F, method='lm') +
geom_point() +
scale_color_manual(name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
scale_fill_manual( name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
geom_text(hjust=0.5, vjust=-1 ,size=3, color="black") +
scale_x_continuous(expand=c(0.3, 0)) + # Zooms out so that density polygons
scale_y_continuous(expand=c(0.3, 0)) + # don't reach edges of plot.
coord_cartesian(xlim=c(-0.05, 0.07),
ylim=c(-0.04, 0.05)) # Zooms back in for the final plot.
p2 = ggplot(data=df, aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) +
ggtitle(paste('Site n° ',sep='')) +
stat_density2d(aes(fill=factor(aa)), alpha=0.2,
geom="polygon", color=NA, n=200, h=0.045, bins=2) +
geom_smooth(se=F, method='lm', size=1) +
geom_point(size=2) +
scale_color_manual(name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
scale_fill_manual( name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
geom_text(hjust=0.5, vjust=-1 ,size=3) +
scale_x_continuous(expand=c(0.3, 0)) + # Zooms out so that density polygons
scale_y_continuous(expand=c(0.3, 0)) + # don't reach edges of plot.
coord_cartesian(xlim=c(-0.05, 0.07),
ylim=c(-0.04, 0.05)) # Zooms back in for the final plot.
library(gridExtra)
ggsave("plots.png", plot=arrangeGrob(p1, p2, ncol=1), width=8, height=11, dpi=120)
Here's my suggestion. Using shadows or polygons is going to get pretty ugly when you overlay two colors and densities. Contour plot could be nicer to look at and is certainly easier to work with.
I've created some random data as a reproducible example and used a simple density function that uses the average distance of the nearest 5 points.
df <- data.frame(PC1 = runif(20),
PC2 = runif(20),
aa = rbinom(20,1,0.5))
point.density <- function(row){
points <- df[df$aa == row[[3]],]
x.dist <- (points$PC1 - row[[1]])^2
y.dist <- (points$PC2 - row[[2]])^2
x <- x.dist[order(x.dist)[1:5]]
y <- y.dist[order(y.dist)[1:5]]
1/mean(sqrt(x + y))
}
# you need to calculate the density for the whole grid.
res <- c(1:100)/100 # this is the resolution, so gives a 100x100 grid
plot.data0 <- data.frame(x.val = rep(res,each = length(res)),
y.val = rep(res, length(res)),
type = rep(0,length(res)^2))
plot.data1 <- data.frame(x.val = rep(res,each = length(res)),
y.val = rep(res, length(res)),
type = rep(1,length(res)^2))
plot.data <- rbind(plot.data0,plot.data1)
# we need a density value for each point type, so 2 grids
densities <- apply(plot.data,1,point.density)
plot.data <- cbind(plot.data, z.val = densities)
library(ggplot2)
# use stat_contour to draw the densities. Be careful to specify which dataset you're using
ggplot() + stat_contour(data = plot.data, aes(x=x.val, y=y.val, z=z.val, colour = factor(type)), bins = 20, alpha = 0.4) + geom_point(data = df, aes(x=PC1,y=PC2,colour = factor(aa)))
contour plot http://img34.imageshack.us/img34/6215/1yvb.png
rcontourggplot2