I am going crazy with this, I know I am just doing something simple wrong.
All I want to do is to get this simple plot to go side by side to evaluate paired data. position = "dodge is not working
require(tidyverse)
mine = tibble(
x = seq(1,36,1)
y = rnorm(36),
z = rexp(36)
)
ggplot(data = mine,aes(x,y)) +
geom_col(colour = "red") +
geom_col(aes(x,z),colour="white")
I am either putting it in the wrong place, or my data is not set up correctly, but this should be simple!!
You need to prepare the data in a tidy way. Then you can use fill to separate the variables:
require(tidyr)
dp <- gather(mine, Var,Value,-x)
ggplot(data = dp,aes(x,Value, fill=Var)) +
geom_col( position="dodge") +
scale_fill_manual(values=c("red","white"))
Related
I'm trying to make a time series plot of weekly Covid cases using a bar chart. However when using ggplot, the plot has really inconsistent (though strangely regular?) spacing between bars and it looks messy. E.g.
Date <- seq(as.Date("2017-01-01"), as.Date("2020-01-05"), by = 7)
Cases <- seq(1,20,length.out = 158)
data <- as.data.frame(cbind(dates, values))
ggplot(data = data, aes(x = weeks, y = cases)) +
geom_bar(stat = "identity") +
ggtitle("Why the inconsistent spacing?")
Does anyone have any idea why this might be? Or know of a way I could fix it? Or a different plotting method that would achieve the same thing but without the inconsistent spacing?
Thanks,
Debbie
It's perhaps not a solution, but a remedy, try plotting with color and fill the same color:
geom_bar(stat = "identity", col="darkgrey", fill="darkgrey") +
You can solve this with geom_segment - it's likely an issue with pixel sizes, as suggested by Richard Telford in this thread.
library(ggplot2)
Date <- seq(as.Date("2017-01-01"), as.Date("2020-01-05"), by = 7)
Cases <- seq(1,20,length.out = 158)
data <- as.data.frame(cbind(Date, Cases))
ggplot(data = data, aes(x = Date, y = Cases)) +
geom_segment(aes(xend = Date, y = 0, yend = Cases)) +
ggtitle("This is a pixel issue")
Created on 2021-04-14 by the reprex package (v1.0.0)
Recently, I encountered a question in ggplot2 field. It's confused for me that everytime I plot first plot with ggplot names "pic1"(the result of running is okay), and then I plotted second one with ggplot2 called "pic2". Of course, the "pic2" is good. But at this moment, I check "pic1", I found the regression line became a vertical line.For example:
"pic1"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = otherCrop, y = eta ))
p <- p+ geom_point(data = dat,aes(x =otherCrop,
y = dat$sumEnemies, colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic1=p
"pic2"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = SHDI, y = eta ))
p <- p+ geom_point(data = dat,aes(x = dat$SHDI,
y = eta,colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic2=p
But at this moment, I started to review "pic1", I found it as below:
It became a strange short vertical line. This would be difficult because I cannot plot them in a same paper. Does anybody know what's the problem?
I think this is a great example of why using the dataframe$column syntax inside an aes call is discouraged: it makes your plot vulnerable to subsequent changes in your data. Here's a simple example. Start with a data frame with columns x and y:
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10)
Now make a ggplot, but instead of using aes(x = x, y = y), we make the mistake of doing aes(x = df$x, y = df$y):
vulnerable_plot <- ggplot()
vulnerable_plot <- vulnerable_plot + geom_line(data = df, aes(x = df$x, y = df$y))
pic1 <- vulnerable_plot
Now we review our plot. Sure, ggplot nags us to say we shouldn't use this syntax, but the plot looks fine, so who cares, right?
pic1
#> Warning: Use of `df$x` is discouraged. Use `x` instead.
#> Warning: Use of `df$y` is discouraged. Use `y` instead.
Now, let's make pic2 identical to pic1 except we use the correct syntax:
invulnerable_plot <- ggplot()
invulnerable_plot <- invulnerable_plot + geom_line(data = df, aes(x = x, y = y))
pic2 <- invulnerable_plot
Now we don't get any warning, but the plot looks the same.
pic2
So there's no difference between pic1 and pic2. Or is there? What happens when we change our data frame?
df$y <- 10:1
vulnerable_plot
Oh dear. Our first plot has changed because the plot object has a reference to an external variable that it relies on to build the plot. That's not what we wanted.
However, with the version where we used the correct syntax, a copy of the data was taken and is kept with the plot data, so it remains unaffected by subsequent changes to df:
invulnerable_plot
Created on 2020-08-23 by the reprex package (v0.3.0)
I find some of the Stata functions to quickly get an idea of what your panel is doing extremely useful. xtline is one. It gives you an overview of your variable in different lineplots: one for each country and all over time in one window.
It looks like this and while it isn't fast, its very useful to check if some operation did what you think it did. Does anyone know if a package which does something like that exist? If not, what are your tricks?
Simples:
dfr <- data.frame(id = rep(1:5, each = 20), time = rep(1991:2010, 5),
variable = rnorm(100))
Equivalent of xtline:
library(ggplot2)
ggplot(dfr, aes(x = time, y = variable)) + geom_line() + facet_wrap(~id)
Equivalent of xtline with overlay option:
ggplot(dfr, aes(x = time, y = variable, group = id, color = id)) + geom_line()
I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)
I have a sample data frame like this:
Measurement <- c("Length","Breadth","Length","Breadth","Height",
"Height","Breadth","Length","Height","Breadth",
"Length","Height","Height","Breadth","Length")
Value <- c(45,43,45,100,62,62,43,74,74,74,12,17,17,44,12)
data <- data.frame(Measurement, Value)
I am trying to visualize this data to see how the values are distributed for each measurement and also if we combine the measurements. I am using a basic plot of histogram to do this but this is not visually appealing
hist(data$Value)
Could some one help me with ggplot2 or other advanced visualization to view this data better and I would like to group by Measurements. I would like to see if density plots can mean something here. Any help would be appreciated.
Here are a couple interesting options:
library(ggplot2)
ggplot(data, aes(factor(Measurement), Value)) + geom_violin(aes(fill = factor(Measurement)))
ggplot(data, aes(Value, colour = Measurement, group = Measurement)) + geom_density(fill=NA)
They produce the following:
Hope this helps!
Here is another possibility using geom_histogram. To get the best looking, most informative histogram, it is important to set the binwidth manually for every new data set.
library(ggplot2)
p = ggplot(data=data, aes(x=Value, fill=Measurement)) +
geom_histogram(binwidth=1, colour="grey40", drop=TRUE) +
facet_grid(Measurement ~ ., margins=TRUE) +
theme_bw()
ggsave("hist.png", plot=p, width=8, height=4, dpi=150)
Not sure if I understood the questions. Do you want to separate the values?
For that, you can do something like this:
ValueLength <- data.frame(Value = Value[which(Measurement == "Length")], Measurement = "Lenghth")
ValueBreadth <- data.frame(Value = Value[which(Measurement == "Breadth")], Measurement = "Breadth")
ValueHeight <- data.frame(Value = Value[which(Measurement == "Height")], Measurement = "Height")
Then you can combine them in one data frame again:
Values <- rbind(ValueLength, ValueBreadth, ValueHeight)
And plot with ggplot:
ggplot(Values, aes(Value, fill = Measurement)) + geom_density(alpha = 0.2)
ggplot