Accessing the values of the reduced dataset after facet_grid in ggplot - r

Building on this earlier question, let's say the data table has columns ID,factor,SimulationID,Data. During the plotting, we want to plot a new graph. for each (factor, SimulationID) tuple using facet_grid(). And for each of these plots, we will use ID varialbe to connect to Data points as a line. However, the set of unique ID values in for each (factor, SimulationID) tuple are different from each other.
Now, What I want is to highlight one of the curves in each of these plots separated by facet_grid().
ggplot(d) +
facet_grid(factor~SimulationID,)+
geom_line(aes(idx, value, colour = type)) +
gghighlight(ID == <choose a valid ID randomly>)

Related

How do I create a grouped boxplot in R?

I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot

Different colours in geomline and geomplot from same vector

I am currently trying to plot some data with dots and lines. My dataframe has an own column (FarbDots) in which I specify my wanted colours. When I try to plot the data, geom_point takes the colours in the wanted order, while geom_lines() creates a total mess (see image).
I was not able to recreate the same effect in a sample data set. Any idea on how to get my colours in order while still specifying them within the geom_line()/ geom_point()?
This is the code I used for plotting: (with b specifying the dataset, x, y, and groups)
b +
geom_line(colour=Data_Biol_long$FarbDots)+
geom_point(colour=Data_Biol_long$FarbDots)+
scale_y_log10()+
facet_grid(Analysis~., scale='free')
dots and lines should receive colour from same vector?!

visualize relationship between categorical variable and frequency of other variable in one graph?

how in R, should I have a histogram with a categorical variable in x-axis and
the frequency of a continuous variable on the y axis?
is this correct?
There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results.
First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.
set.seed(950141237) # for reproducibility of results
df <- data.frame(group = rep(c("A","B","C","D"),200),
y_value = rnorm(800,mean=100,sd = 20))
The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.
Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group.
ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()
...and the resulting chart looks like this:
In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.
We can verify the "stacking" behavior by removing the fill = group argument from aes().
# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()
...and the output, which looks just like the first chart, but drawn in a single color.
Another way to render the data is to use group with facet_wrap(), where each distribution appears in a different facet on one chart.
ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)
The resulting chart looks like this:
The facet approach makes it easier to see differences in frequency of y values between the groups.

Using ggplot in R to display dataframe with two different colours depending on value of data. Issue with Data jumping

I am using the below code to plot a data frame on the same plot:
ggplot(df) + geom_line(aes(x = date, y = values, colour = X > 5))
The plot is working and looks great all except for the fact that when the values are bigger than 5, because I am using geom_line, it then starts connecting points that are above the threshold. like below. I do not want the lines connecting the blue data.
How do I stop this from happening?
Here's an example using the economics dataset included in ggplot2. You see the same thing if we highlight the line based on values above 8000:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000))
When a mapping is defined in your dataset, by default ggplot2 also groups your data based on this. This makes total sense if you're trying to plot a line where you have data in long form and want to draw separate lines for each different value in a column. In cases like this, you want ggplot2 to change the color of the line based on the data, but you want to tell ggplot2 not to group based on color. This is why you will need to override the group= aesthetic.
To override the group= aesthetic change that happens when you map your line geom, you can just say group=1 or really group= any constant value. This effectively sets every observation mapped to the same group, and the line will connect all your points, but be colored differently:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000, group=1))

geom_density(aes(y=..count..)) plot for multiple groups show a wrong x-axis count

My data frame (df) consists of 5 columns with 2,000 numerical values for each one.
Using reshape I reformatted my data frame to two columns: 1st containing the values (df$Values) (a total of 10,000) and a 2nd containing the name of the column (df$Labels) from where the value in col 1 is coming from.
I will use the 2nd column as a group factor.
I generated a mycolor and myshapes for coloring and setting the shape of lines.
With ggplot I tried to generate a density plot containing the density plot for the five factors.
The problem is that the x-axis show the counts, which maximum is 10,000. This value does not make any sense because the maximum possible counts for each plot must be 2,000. Anyone knows what is going on? Which is code I need to use to properly correct the x-axis?
ggplot2, geom_density() plot:
Here is the code:
ggplot(df, aes(x=Values, colour=Labels, linetype=Labels))+
geom_density(aes(y=..count..))+
theme_classic()+
scale_colour_manual(values = mycolor)+
scale_linetype_manual(values = myshapes)+
ggtitle("Title")+
scale_x_continuous(limits = c(0.5,1.5))

Resources