I am building a vertical profile plot of water columns. My issue is that the dots are connected on the x observations, and not the y observations. Under ggplot, I know geom_path can do this, but I can't use ggplot as I want to add several x axis. Therefore I am using plot().
So here is what I tried:
Storfjorden <- read.delim("C:/Users/carvi/Desktop/Storfjorden.txt")
smooF=smooth.spline(Storfjorden$Fluorescence,Storfjorden$Depth,spar=0.50)
plot(Storfjorden$Fluorescence,Storfjorden$Depth,ylim=c(80,0),type="n")
lines(smooF)
Resulting plot
As you see, the dots are connected through x observations. But to observe a vertical profile, I would like to see them connected through y observations. I tried ordering them by depth (using order()) and it didn't affect the result. Anyone has a clue?
If, as an alternative, someone would have an idea how to plot different lines with different axis on a single plot (Temperature, salinity, fluorescence), then I may use geom_path (). Thank you!
**An emerging question I have that you may answer, is there a way in ggplot to make a geom_smooth(), but with the observations connected in order they appear instead of x axis?
ggplot(melteddf,aes(y=Depth,x=value))+geom_path()+facet_wrap
+(~variable,nrow=1,scales="free_x")+scale_y_reverse()
+geom_smooth(span=0.5,se=FALSE)
I tried using smooth.spline, but didn't recognize the object in geom_path. Thanks!
There is a reason that ggplot2 makes it difficult to plot multiple x-axes on a single plot -- it generally leads to difficult to read (or worse, misleading) graphs. If you have a motivating example for why your example will not fall into one of those categories, it might allow us to help you more to know more details. Below, however, are two workarounds that might help.
Here is a quick MWE to address the question -- it might be more helpful if you gave us something that looks like your actual data, but this at least gets things on very different scales (though, with no structure, the plots are rather messy).
Note that I am using dplyr for several manipulations and reshape2 to melt the data into a long format for easier plotting.
library(dplyr)
library(reshape2)
df <-
data.frame(
depth = runif(20, 0, 100) %>% round %>% sort
, measureA = rnorm(20, 10, 3)
, measureB = rnorm(20, 50, 10)
, measureC = rnorm(20, 1000, 30)
)
meltedDF <-
df %>%
melt(id.vars = "depth")
The first option is to simply use facets to plot the data next to each other:
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_wrap(~variable
, nrow = 1
, scales = "free_x") +
scale_y_reverse()
The second is to standardize the data, then plot that. Here, I am using the z-score, though if you have a reason to use something else (e.g. scaled to center at the "appropriate" amount of whatever variable you are using) you could change that formula:
meltedDF %>%
group_by(variable) %>%
mutate(Standardized = (value - mean(value)) / sd(value) ) %>%
ggplot(aes(y = depth
, x = Standardized
, col = variable)) +
geom_path() +
scale_y_reverse()
If you need to plot multiple sites, here is some sample data with sites:
df <-
data.frame(
depth = runif(60, 0, 100) %>% round %>% sort
, measureA = rnorm(60, 10, 3)
, measureB = rnorm(60, 50, 10)
, measureC = rnorm(60, 1000, 30)
, site = sample(LETTERS[1:3], 60, TRUE)
)
meltedDF <-
df %>%
melt(id.vars = c("site", "depth"))
You can either use facet_grid (my preference):
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_grid(site~variable
, scales = "free_x") +
scale_y_reverse()
or add facet_wrap to the standardized plot:
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_grid(site~variable
, scales = "free_x") +
scale_y_reverse()
Related
Consider the following example:
library(ggplot)
set.seed(1e3)
n <- 1e3
dt <- data.frame(
age = 50*rbeta(n, 5, 1),
value = 1000*rbeta(n, 1, 3)
)
And let's assume that you are interested by the relative behavior of value within each band of age.
dt %>% ggplot(aes(x = age, y = value)) + geom_bin2d() would provide an "absolute" map of the data (even if using geom_bin2d(aes(fill = ..density..)) which divide the whole data by total counts). Is there a way to achieve initial goal i.e. to rescale counts for each "column" (each group of age created by geom_bin2d()) in order to unbias comparison due to sample size in each group?
Would like to stick with "maps" since they are quite relevant when there is a lot of underlying data, but other approach is welcome.
When you are trying to do something a bit different from what the standard ggplot summary functions are used for, you often find it is easier to just manipulate the data yourself. For example, you can easily bin the data yourself using findInterval, then normalize each age band using standard dplyr functions. Then you are free to plot however you like, using a plain geom_tile without trying to coax a more complex calculation out of ggplot.
library(ggplot2)
library(dplyr)
dt %>%
mutate(age = seq(10, 50, 2)[findInterval(dt$age, seq(10, 50, 2))]) %>%
mutate(value = seq(0, 1000, 45)[findInterval(dt$value, seq(0, 1000, 45))]) %>%
count(age, value) %>%
group_by(age) %>%
mutate(n = n/sum(n)) %>%
ggplot(aes(age, value, fill = n)) +
geom_tile() +
scale_fill_viridis_c(name = 'normalized counts\nby age band') +
theme_minimal(base_size = 16)
While the question appears similar to others, there's a key difference in my mind.
I want to be able to calculate and/or print (graphing it would be the ultimate goal, but calculating it in the data frame the primary goal) the peak value of a density curve of EACH SUB-CONDITION BY FACET The density graph looks like this:
So, ideally, I would be able to know the intensity (x-axis value) corresponding to the highest peak of the density curves for each condition.
Here's some dummy data:
set.seed(1234)
library(tidyverse)
library(fs)
n = 100000
silence = factor(c("sil1", "sil2", "sil3", "sil4", "sil5"))
treat = factor(c("con", "uos", "uos+wnt5a", "wnt5a"))
silence = rep(silence, n)
treat = rep(treat, n)
intensity = sample(4000:10000, n)
df <- cbind(silence, treat, intensity)
df$silence <- silence
df$treat <- treat
What I've tried:
Subsetting the primary DF and going through and calculating the density of each condition, but this could take days
Something close to this answer: Calculating peaks in histograms or density functions but not quite. I think the data look better as a histogram personally, but that constructs an arbitrary number of bins for intensity data (a continuous measure). The histogram looks like this:
Again, it would be sufficient to get the peak values for each of these groups (i.e., treatments by silencing subdistributions) just in the console, but adding them as a vertical line in the graphs would be a sweet cherry on top (it could also make it hella busy, so I will see about that piece later)
Thank you!!
Depending on the way you're producing the density plots, there may be a more direct way to recreate the density calculation before it goes into ggplot. That'll be the easiest way to get the peak values and keep them in the format of your data.
Without that, here's a hack that should work in general, but requires some kludging to fit the extracted points back into the form of your original data.
Here's a plot like yours:
mtcars %>%
mutate(gear = as.character(gear)) %>%
ggplot(aes(wt, fill = gear, group = gear)) +
geom_density(alpha = 0.2) +
facet_wrap(~am) ->my_plot
Here are the components that make up that plot:
ggplot_build(my_plot) -> my_plot_innards
With some ugly hacking we can extract the points that make up the curves and make them look kind of like our original data. Some info is destroyed, e.g. the gear values 3/4/5 become group 1/2/3. There might be a cool way to convert back, but I don't know it yet.
extracted_points <- tibble(
wt = my_plot_innards[["data"]][[1]][["x"]],
y = my_plot_innards[["data"]][[1]][["y"]],
gear = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK
am = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK
)
ggplot(extracted_points, aes(wt, y, fill = gear)) +
geom_point(size = 0.3) +
facet_wrap(~am)
extracted_points_notes <- extracted_points %>%
group_by(gear, am) %>%
slice_max(y)
my_plot +
geom_point(data = extracted_points_notes,
aes(y = y), color = "red", size = 3, show.legend = FALSE) +
geom_text(data = extracted_points_notes, hjust = -0.5,
aes(y = y, label = scales::comma(y)), color = "red", size = 3, show.legend = FALSE)
I need to overlay normal density curves on 3 histograms sharing the same y-axis. The curves need to be separate for each histogram.
My dataframe (example):
height <- seq(140,189, length.out = 50)
weight <- seq(67,86, length.out = 50)
fev <- seq(71,91, length.out = 50)
df <- as.data.frame(cbind(height, weight, fev))
I created the histograms for the data as:
library(ggplot)
library(tidyr)
df %>%
gather(key=Type, value=Value) %>%
ggplot(aes(x=Value,fill=Type)) +
geom_histogram(binwidth = 8, position="dodge")
I am now stuck at how to overlay normal density curves for the 3 variables (separate curve for each histogram) on the histograms that I have generated. I won't mind the final figure showing either count or density on the y-axis.
Any thoughts on how to proceed from here?
Thanks in advance.
I believe that the code in the question is almost right, the code below just uses the answer in the link provided by #akrun.
Note that I have commented out the call to facet_wrap by placing a comment char before the last plus sign.
library(ggplot2)
library(tidyr)
df %>%
gather(key = Type, value = Value) %>%
ggplot(aes(x = Value, color = Type, fill = Type)) +
geom_histogram(aes(y = ..density..),
binwidth = 8, position = "dodge") +
geom_density(alpha = 0.25) #+
facet_wrap(~ Type)
Need help with ggplot that plots averages for y axis and returns the line plot with points and also the text labels for each points (using ggplot functionality) that are color coded as per the respective "color" object parameter. As far as possible I don't want to create any intermediate dataframe from original data to create summary for y means. I tried using fun.y as shown in the code snippet. Excel chart is also attached.
Sample data
set.seed(1)
age_range = sample(c("ar2-15", "ar16-29", "ar30-44"), 20, replace = TRUE)
gender = sample(c("M", "F"), 20, replace = TRUE)
region = sample(c("A", "B", "C"), 20, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 20, replace = TRUE)
height = sample(c(4,5,6), 20, replace = TRUE)
survey = data.frame(age_range, gender, region,physi,height)
ggplot code I tried
ggplot(survey, aes(x=age_range, y=height, color=gender)) + stat_summary(fun.y=mean, geom = "point")+geom_line()
Output I am getting
Output I am looking for
Following up on #Sandy's comment, you can also add the labels in a similar fashion, though here I am using the package ggrepel to make sure they don't overlap (without having to manually code the location). For the location, you can read the result from the call to mean which is returned as y by calling ..y.. in the aesthetics.
ggplot(survey, aes(x=age_range, y=height, color=gender, group = gender)) +
stat_summary(fun.y=mean, geom = "point") +
stat_summary(fun.y=mean, geom = "line") +
stat_summary(aes(label = round(..y.., 2)), fun.y=mean, geom = "label_repel", segment.size = 0)
Gives
(Note that segment.size = 0 is to ensure that there is not an additional line drawn from the point to the label.)
As of now, it does not appear that ggrepel offers text displacement in only one axis (see here ), so you may have to manually position labels if you want more precision.
If you want to set the label locations manually, here is an approach that uses dplyr and the %>% pipe to avoid having to save any intermediate data.frames
The basic idea is described here. To see the result after any step, just highlight up to just before the %>% at the end of a line and run. First, group_by the x location and grouping that you want to plot. Get the average of each using summarise. The data are still group_by'd the age_range (summarise only rolls up one group at a time). So, you can determine which of the groups has a higher mean at that point by subtracting the mean. I used sign just to pull if it was positive or negative, then multiplied/divided by a facto to get the spacing I wanted (in this case, divided by ten to get spacing of 0.1). Add that adjustment to the mean to set where you want the label to land. Then, pass all of that into ggplot and proceed as you would with any other data.frame.
survey %>%
group_by(age_range, gender) %>%
summarise(height = mean(height)) %>%
mutate(myAdj = sign(height - mean(height)) / 10
, labelLoc = height + myAdj) %>%
ungroup() %>%
ggplot(aes(x = age_range
, y = height
, label = round(height, 2)
, color = gender
, group = gender
)) +
geom_point() +
geom_line() +
geom_label(aes(y = labelLoc)
, show.legend = FALSE)
Gives:
Which seems to accomplish your base goals, though you may want to play around with spacing etc. for your actual use case.
I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)