I've written something in R using ggplot2 and don't know why it behaves as it does.
If I plot my data using geom_point and geom_line it is supposed to draw lines trough those points. but instead of connecting all the points it only connects those that are on a horizontal line. I don't know how to handle this.
This is a simple version of the code:
date<-c("2014-07-01","2014-07-02","2014-07-03",
"2014-07-04","2014-07-05","2014-07-06",
"2014-07-07")
mbR<- c(160,163,169,169,169,169,169)
mbL<- c(166,166,166,166,NA, NA, NA)
mb<-data.frame(mbR,mbL)
mb<-data.frame(t(as.Date(date)),mb)
colnames(mb)<-c("Datum","R","L")
mb$Datum<-date
plot1<-ggplot(mb,aes(x=mb$Datum,y=mb$R))+
geom_point(data=mb,aes(x=mb$Datum,y=mb$R,color="R",size=2),
group=mb$R,position="dodge")+
geom_line(data=mb,aes(y=mb$R,color="R",group=mb$R))+
geom_point(aes(y=mb$L,color="L",size=2),position="dodge")
plot1
I used group, otherwise I wouldn't have been able to draw any lines, still it doesn't do what I intended.
I hope you guys can help me out a little. :) It may be a minor fault.
First, melt your data to long format and then plot it. The column called variable in the melted data is the category (R or L). The column called value stores the data values for each instance of R and L. We group and color the data by variable in the call to ggplot, which gives us separate lines/points for R and L.
Also, you only need to provide the data frame and column mappings in the initial call to ggplot. They will carry through to geom_point and geom_line. Furthermore, when you provide the column names, you don't need to (and shouldn't) include the name of the data frame, because you've already specified the data frame in the data argument to ggplot.
library(reshape2)
mb.l = melt(mb, id.var="Datum")
ggplot(data=mb.l, aes(x=Datum, y=value, group=variable, color=variable)) +
geom_point(size=2) +
geom_line()
Related
I am using the below code to plot a data frame on the same plot:
ggplot(df) + geom_line(aes(x = date, y = values, colour = X > 5))
The plot is working and looks great all except for the fact that when the values are bigger than 5, because I am using geom_line, it then starts connecting points that are above the threshold. like below. I do not want the lines connecting the blue data.
How do I stop this from happening?
Here's an example using the economics dataset included in ggplot2. You see the same thing if we highlight the line based on values above 8000:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000))
When a mapping is defined in your dataset, by default ggplot2 also groups your data based on this. This makes total sense if you're trying to plot a line where you have data in long form and want to draw separate lines for each different value in a column. In cases like this, you want ggplot2 to change the color of the line based on the data, but you want to tell ggplot2 not to group based on color. This is why you will need to override the group= aesthetic.
To override the group= aesthetic change that happens when you map your line geom, you can just say group=1 or really group= any constant value. This effectively sets every observation mapped to the same group, and the line will connect all your points, but be colored differently:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000, group=1))
I'm trying to solve the following exercise:
Make a scatter plot of the relationship between the variables 'K1' and 'K2' with "faceting" based on the parameters 'diam' and 'na' (subdivide the canvas by these two variables). Finally, assign different colors to the points depending on the 'thickness' of the ring (don't forget to factor it before). The graph should be similar to this one ("grosor" stands by "thickness"):
Now, the last code I tried with is the following one (the dataset is called "qerat"):
ggplot(qerat, aes(K1,K2, fill=factor(grosor))) + geom_point() + facet_wrap(vars(diam,na))
¿Could somebody give me a hand pointing out where the mistake is? ¡Many thanks in advance!
Maybe you are looking for a facet_grid() approach. Here the code using a data similar to yours:
library(ggplot2)
#Data
data("diamonds")
#Plot
ggplot(diamonds,aes(x=carat,y=price,color=factor(cut)))+
geom_point()+
facet_grid(color~clarity)
Output:
In the case of your code, as no data is present, I would suggest next changes:
#Code
ggplot(qerat, aes(K1,K2, color=factor(grosor)))+
geom_point() +
facet_grid(diam~na)
I'm trying to make Gene EXPRESSION PROFILE plot in R. My input data is a data frame where column 1 has gene names and next column2:18 are multiple cancer types. Here is a small set of data.
what I want is to make a graph that has samples on x-axis and from y=axis expression line of each gene.
something that looks like this.
but instead of timepoints on x-axis it should have Cancer types (columns)
so far I've tried ggplot() and geneprofiler() but i failed over and over.
any help will be greatly appreciated.
Data Format
The current format of the data is referred to as wide format, but ggplot requires long format data. The tidyr package (part of the tidyverse) has functions for converting between wide and long formats. In this case, you want the function tidyr::pivot_longer. For example, if you have the data in a data.frame (or tibble) called df_gene_expr, the pivot would go something like
library(tidyverse)
df_gene_expr %>%
pivot_longer(cols=2:18, names_to="cancer_type", values_to="gene_expr") %>%
filter(ID == "ABCA8") %>%
ggplot(aes(x=cancer_type, y=gene_expr)) +
geom_point()
where here we single out the one gene "ABCA8". Change the geom_point() to whatever geometry you actually want (perhaps geom_bar(stat='identity').
Discrete Trendline
I'm not sure that geom_smooth is entirely appropriate - it is designed with continuous-continuous data in mind. Instead, I'd recommend stat_summary.
There's a slight trick to this because the discrete cancer_type on the x-axis. Namely, the cancer_type variable should be a factor, but we will use the underlying codes for the x-values in stat_summary. Otherwise, it would complain that using a geom='line' doesn't make sense.
Something along the lines:
ggplot(df_long, aes(x=cancer_type, y=gene_expr)) +
geom_hline(yintercept=0, linetype=4, color="red") +
geom_line(aes(group=ID), size=0.5, alpha=0.3, color="black") +
stat_summary(aes(x=as.numeric(cancer_type)), fun=mean, geom='line',
size=2, color='orange')
Output from Fake Data
Technically, this same trick (aes(x=as.numeric(cancer_type))) could be equally-well applied to geom_smooth, but I think it still makes more sense to use the stat_summary which let's one explicitly pick the stat to be computed. For example, perhaps, median instead of mean might be more appropriate in this context for the summary function.
This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer!
I have a working script to create the attached boxplot in basic R.
http://i.stack.imgur.com/NaATo.jpg
This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons.
I've looked at the following questions and they are close, but not complete:
Why does a boxplot in ggplot requires axis x and y?
How do you draw a boxplot without specifying x axis?
My data is basically like "mtcars" if all the numerical variables were on the same scale.
All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale.
What am I missing?
Though I don't think mtcars makes a great example for this, here it is:
First, we make the data (hopefully) more similar to yours by using a column instead of rownames.
mt = mtcars
mt$car = row.names(mtcars)
Then we reshape to long format:
mt_long = reshape2::melt(mt, id.vars = "car")
Then the plot is easy:
library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
geom_boxplot()
Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.
I am working my way through The R Graphics Cookbook and ran into this set of code:
library(gcookbook)
library(ggplot2)
p <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point() +
stat_density2d(aes(alpha=..density.., fill=..density..), geom="tile", contour=FALSE)
It runs fine, but I don't understand what the .. before and after density is referring to. I can't seem to find it mentioned in the book either.
Variable names beginning with .. are possible in R, and are treated in the same way as any other variable. Trying creating one of your own.
..x.. <- 1:5
ggplot2 often creates appends extra columns to your data frame in order to draw the plot. (In ggplot2 terminology, this is "fortifying the data".) ggplot2 uses the naming convention ..something.. for these fortified columns.
This is partly because using ..something.. is unlikely to clash with existing variables in your dataset. Take that as a hint that you shouldn't name the columns in your dataset using that pattern.
The stat_density* functions use ..density.. to represent the density of the x variable. Other fortified variable names include ..count...