I made an ordination of a time series of some vegetation data, using the vegan package. Since ordination diagrams often are cluttered with many data points, I extracted the eigenvalues of the first two ordination axes and took the mean of each group. Now I have only one point per site (11 sites total) To still show some of the variation, I added ellipses with standard deviation and 95% confidence interval:
The last thing I want to do is to connect points of the same group (either A, B or C) with an arrow, indicating direction of change over time. All movement is from right to left.
I initially wanted to use the ordiarrow function in vegan, but this works only when class is decorana. My class is a factor.
Using ggplot2 does not seem like a valid option as the ordiellipse function (creating the ellipses) does not work there.
code for plotting data:
install.packages("vegan")
library(vegan)
plot(Ord_KIKKER, type = "n", main = "Kikkervalleien",
xlab = "DCA1 Eigenvalue = 0.62", ylab = "DCA2 Eigenvalue = 0.39")
points(ORD_KIKKER, cex = 2, pch = 19,
col = c("black", "black", "black", "red","red", "green", "green", "green", "blue", "blue", "blue"))
The resulting plot looks a bit different since I posted a reduced dataset here.
My data (Ord_KIKKER):
structure(list(DCA1 = c(2.676616032, 0.361181861, -1.363464067,
3.176862449, -0.087190269, 2.059548542, 0.167440366, -0.459090096,
1.571536367, 0.309623788, -0.25787459), DCA2 = c(0.276788721,
0.422077659, 0.181723453, 0.221610649, 0.940063655, -0.116083905,
-0.539375059, -0.545053063, -0.06120542, -0.367148924, -1.679257818
), Unique = structure(c(1L, 5L, 8L, 2L, 9L, 3L, 6L, 10L, 4L,
7L, 11L), .Label = c("2001A", "2001B", "2001C", "2001D", "2008A",
"2008C", "2008D", "2018A", "2018B", "2018C", "2018D"), class = "factor"),
BLOCK = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D"), class = "factor")), .Names = c("DCA1",
"DCA2", "Unique", "BLOCK"), class = "data.frame", row.names = c("2001A",
"2008A", "2018A", "2001B", "2018B", "2001C", "2008C", "2018C",
"2001D", "2008D", "2018D"))
vegan::ordiarrows() will work, if you give it only the variables that have scores:
ordiarrows(Ord_KIKKER[,1:2], Ord_KIKKER$BLOCK) # one way
However, you should also remember to have asp=1 in the initial plot to force equal aspect ratio to axes.
I cannot do full testing, because the graph cannot be reproduced with the data you posted: If you issue plot(Ord_KIKKER, ...) with a data frame, you will not get ordinary plot, but a panel plot of all variables against each other (pairs() plot), and also give an error for type = "n" argument. It seems that you instead used some non-standard graphics tools, and I am not sure that standard R graphics of vegan::ordiarrows() can be combined with those.
Related
Hi I am relatively new in R / ggplot2 and I would like to ask for some advice on how to create a plot that looks like this:
Explanation: A diverging bar plot showing biological functions with genes that have increased expression (yellow) pointing towards the right, as well as genes with reduced expression (purple) pointing towards the left. The length of the bars represent the number of differentially expressed genes, and color intensity vary according to their p-values.
Note that the x-axis must be 'positive' in both directions.
(In published literature on gene expression experimental studies, bars that point towards the left represent genes that have reduced expression, and right to show genes that have increased expression. The purpose of the graph is not to show the "magnitude" of change (which would give rise to positive and negative values). Instead, we are trying to plot the NUMBER of genes that have changes of expression, therefore cannot be negative)
I have tried ggplot2 but fails completely to reproduce the graph that is shown.
Here is the data which I am trying to plot: Click here for link
> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L,
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation",
"Autophagy", "Cell cycle", "Cell division", "Cell polarity",
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation",
"Protein metabolism", "Protein translation", "Proteolysis", "Replication",
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L,
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L,
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06,
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253,
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414,
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA,
-20L))
Thank you very much for your help and suggestions, this is really help with my learning.
My own humble attempt was this:
table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) +
geom_bar(stat="identity",position="identity") +
xlab("number of genes") +
ylab("Name"))
Result was error message regarding the aes
Although not exactly what you are looking for, but the following should get you started. #Genoa, as the expression goes, "there are no free lunches". So in this spirit, like #dww has rightly pointed out, show "some effort"!
# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
x y normalize_occurence
a : 1 Min. :0.00394 Min. :-1.8000000
b : 1 1st Qu.:0.31010 1st Qu.:-0.6900000
c : 1 Median :0.47881 Median :-0.0800000
d : 1 Mean :0.50126 Mean : 0.0007692
e : 1 3rd Qu.:0.70286 3rd Qu.: 0.7325000
f : 1 Max. :0.93091 Max. : 1.5600000
(Other):20
category
Length:26
Class :character
Mode :character
ggplot(df,aes(x = x,y = normalize_occurence)) +
geom_bar(aes(fill = category),stat = "identity") +
labs(title= "Diverging Bars")+
coord_flip()
#ddw and #Ashish are right - there's a lot in this question. It's also not clear how ggplot "failed" in reproducing the figure, and that would help understand what you're struggling with.
The key to ggplot is that pretty much everything that you want to include in the plotting should be included in the data. Adding a few variables to your table to help with putting bars in the right direction will get you a long way toward what you want. Make the variables that are actually negative ("down" values) negative, and they'll plot that way:
r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)
Then reorder your "Name" so that it plots according to the new PValue2 variable:
r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)
Lastly, you'll want to left-justify some labels and right-justify others, so make that a variable now:
r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)
Then some fairly minimal plot code gets you quite close to what you want:
ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
geom_bar(stat="identity") +
scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
scale_x_discrete("", labels=NULL) +
scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
coord_flip() +
theme_minimal() +
geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)
You can explore the theme commands on the ggplot2 help page to figure out the rest of the formatting.
I regularly produce ggplot() graphics and have struggled to come across a way to scale the plot window without affecting other elements such as titles, axis labels, axes text, legends, etc. I typically use ggsave() and have pulled from here a bit but due to scaling and resizing issues they never get standardized. I'm trying to get better at streamlining everything but have come to a roadblock. Here is a similar question asking how to do the same thing with pdf() though the central problem appears to be the same.
Question
Is there any way to adjust the scaling of only the plot window and (edit: non-text) geom elements without adjusting text or image size?
Example:
Here's some sample data I'm working with. Percent of jobs filled and offered by day of week:
dput(day_data)
structure(list(day = structure(c(2L, 6L, 7L, 3L, 1L, 2L, 6L,
7L, 3L, 1L), .Label = c("F", "M", "R", "Sa", "Su", "T", "W"), class = "factor"),
order = c(2L, 3L, 4L, 5L, 6L, 2L, 3L, 4L, 5L, 6L), Status = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Jobs Filled",
"Jobs Offered"), class = "factor"), total = c(13724496, 15298119,
15293656, 16272599, 17652393, 16252141, 17590028, 17549470,
18875899, 21441775), percent = c(17.5, 19.5, 19.5, 20.8,
22.6, 17.7, 19.2, 19.1, 20.6, 23.4)), .Names = c("day", "order",
"Status", "total", "percent"), row.names = c(2L, 3L, 4L, 5L,
6L, 9L, 10L, 11L, 12L, 13L), class = "data.frame")
Here's some sample code to produce a graph. It doesn't look quite like my attached images but that's because I dropped unnecessary theme and other tidy-up elements to shoot for a minimally reproducible example.
ggplot(data=month_data, aes(x=reorder(month, order_sch), y=percent, fill=Status)) +
geom_bar(position="dodge", stat="identity") +
geom_text(aes(label = percent, y=percent+0.2),
position=position_dodge(width=1),
family="serif") +
labs(x="Day of Week",
y="Percent") +
theme(text = element_text(size=12, family="serif"))
Now, here's the nitty gritty of the problem. When using ggsave(), we can specify a given height and width, and once we do that in combination with setting element_text(size=12) everything should be set. But, sometimes the plot doesn't look quiet the same. Attached are two examples, showing the difference in scaling. The first was done with scale=1, the default, and the second was done with scale=4, and extreme example. One might think scale could be used to adjust only the plot window and not other (text) elements, but actually all it does is override the height and width arguments and act as a multiplier. So if height/width/scale = 4/4/1 we get a 4"x4" image, but if height/width/scale = 4/4/4 we get a 16"x16" image.
ggsave(paste0(getwd(), "/example.jpg"), width=4, height=4, dpi=300)
So the question again: Is there any way to adjust the scaling of only the plot window and bar/line/etc elements without adjusting text or image size? Can it be done either within the original ggplot() call or within ggsave() or someother export function?**
I have the following problem: I want to create a plot using ggplot, showing the relationship between two variables (Microplastic quantification in mussels, denoted as MP and Lipofuscinaccumulation denoted as Lip) of different treatment groups and independence of exposure time.
My data look like this:
And here is my Code:
ggplot(Catrv_all,aes(Lip,MP,color=treatment))+
geom_smooth(method="lm", se=FALSE)+
geom_point(size = 2)+
theme(legend.position = "bottom")+
theme(plot.title = element_text(hjust = 0.5))+
labs(x = "Lipofuscin accumulation [% area]",
y = "Microplastic quantification [% area]",
title = "Lipofuscin accumulation vs. Microplastic quantification")
The plot looks like this:
I recognized that ggplot obviously did not order the values in the correct way for exposure Time because the values disagree (it starts for example not with the value for 0 h).
My question is: how can I tell ggplot to reorder the values for MP and Lip in the right order in terms of exposure Time? Should I create second x-axes? If yes, how can I do that in ggplot?
I saw a lot of discussions in SO, that this is difficult to create a second x/y axes in ggplot, but I don't know how I should visualize my data in another way.
Update for my question: I heed advice of sconfluentus and found a very interesting answer of Ben Bolker in the following post:
How can I plot with 2 different y-axes?
I adapted the provided code:
## add extra space to right margin of plot within frame
par(mar=c(5, 4, 4, 6) + 0.1)
## split data set for treatment groups
MP_Ko<-Catrv_all$MP[1:8]
exp<-Catrv_all$expTime[1:8]
Lip_Ko<-Catrv_all$Lip[1:8]
## Plot first set of data and draw its axis
plot(exp, MP_Ko, pch=16, axes=FALSE, xlab="", ylab="",
type="b",col="black", main="Microplastic quantification vs. Lipofuscin accumulation in Controls")
axis(2,col="black",las=1) ## las=1 makes horizontal labels
mtext("Microplastic quantification [% area]",side=2,line=2.5)
box()
## Allow a second plot on the same graph
par(new=TRUE)
## Plot the second plot and put axis scale on right
plot(exp,Lip_Ko, pch=15, xlab="", ylab="",
axes=FALSE, type="b", col="red")
## a little farther out (line=4) to make room for labels
mtext("Lipofuscin accumulation [% area]",side=4,col="red",line=4)
axis(4, col="red",col.axis="red",las=1)
## Draw the time axis
axis(1,pretty(range(Catrv_all$expTime, 672)))
mtext("Time (Hours)",side=1,col="black",line=2.5)
## Add Legend
legend("topright",legend=c("Microplastic quantification","Lipofuscin accumulation"),
text.col=c("black","red"),pch=c(16,15),col=c("black","red"))
... and got the following plot:
enter image description here
Time consuming, but this approach was very helpful.
Thank you all for your advices! I now used dput(Catrv_all), and here is the output of my data:
structure(list(treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Co", "CoP", "HDPE"), class = "factor"),
expTime = c(0L, 3L, 6L, 24L, 96L, 168L, 336L, 672L, 0L, 3L,
6L, 24L, 96L, 168L, 336L, 672L, 0L, 3L, 6L, 24L, 96L, 168L,
336L, 672L), MP = c(0.056481655, 0.098508038, 0.097108112,
0.056848278, 0.082198187, 0.052261369, 0.022911461, 0.023901656,
0.056481655, 0.124866733, 0.125732967, 0.07986102, 0.071233133,
0.128376543, 0.331948, 0.121689155, 0.056481655, 0.186735799,
0.137477095, 0.41251914, 0.093364945, 0.085760245, 0.249371764,
0.187693319), Lip = c(9.848221569, 11.62875399, 9.530378924,
12.67745734, 14.14610784, 11.44140636, 11.55310567, 12.37321851,
9.848221569, 8.889567938, 12.5142123, 13.79770638, 11.26698845,
14.67064904, 14.56027915, 15.24772977, 9.848221569, 12.22424265,
13.05104725, 12.96830215, 12.10175574, 14.66505958, 13.67550035,
11.65168387), Cat = c(6.681571728, 7.321681629, 4.939885929,
7.73812502, 6.85066487, 9.317238053, 8.309505248, 9.33338377,
6.681571728, 7.517468479, 7.151607966, 9.074518192, 6.350614893,
9.749092742, 9.335634354, 11.43658695, 6.681571728, 6.164473371,
9.416062149, 9.19813927, 8.041328941, 8.736550013, 9.788258534,
10.55471537), CI = c(120.5252336, 110.1709456, 112.9077575,
110.9032308, 101.0274926, 101.1970679, 107.1464111, 97.42950278,
120.5252336, 101.7284063, 132.6162567, 108.7251954, 107.2199383,
102.9096767, 100.9637646, 101.6655302, 120.5252336, 102.1888777,
111.9139996, 113.7840225, 104.4767637, 103.1984161, 96.67797683,
95.59369834)), .Names = c("treatment", "expTime", "MP", "Lip",
"Cat", "CI"), class = "data.frame", row.names = c(NA, -24L))
Hopefully it would help to reconstruct my code.
Again to my question: yes, I would like to show exposure Time as well on one of the axes (if this is possible). And secondly, I want to show a kind of "time series" (from 0h to 672 h) and the behaviour of both MP and Lip for all treatment groups. So my first idea was: y-axes: MP, x-axes bottom: Lip, x-axes on top: exposure Time --> plot values for all treatment groups in the right order for exposure time (from 0 to 672). Try to plot a trend line. In fact, I want a visual evidence, that MP behavior (over time) led to changes in Lipofuscinaccumulation for different treatment groups.
#Jake Kaupp: I am not sure, how to facet_wrap in ggplot. May you specify that a bit please?
I am trying to make a network using an example data like below but way much bigger
data <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 5L, 5L, 6L, 2L,
4L, 4L, 3L, 7L), .Label = c("A", "AB", "AD", "AN", "B", "D",
"GDH"), class = "factor"), V2 = structure(c(4L, 5L, 6L, 5L, 6L,
5L, 5L, 1L, 2L, 3L, 3L, 7L), .Label = c("AC", "AD", "AG", "B",
"C", "D", "THG"), class = "factor")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-12L))
library(igraph)
g = graph.data.frame(data)
plot(g)
which gives me this
what i want is to
1- change the color and shape for each cluster (like THG and GDH becomes blue) but prefer randomly because when the data is huge one by one applying it will not be suitable
2- remove the name inside instead put a number close to based on their order in the data for example
A B
A C
A D
A C
B D
B C
D C
is 1
AB AC
is 2
etc etc
This is a 90% solution. It does everything that you want except for the component label.
I think that what you are trying to do is get at the connected components of your graph. That is available through the biconnected_components function in {igraph}. I use that to build a vector which indicates which component each vertex is in. You can modify the vertices with the vertex.xxx group of parameters. Below, I change the color,shape and label using the component number.
(Updated to include shape)
Components = biconnected_components(g)$components
VComponent = rep(0, length(V(g)))
for(i in 1:length(Components)) { VComponent[Components[[i]]] = i }
plot(g, vertex.color=VComponent, vertex.label=VComponent,
vertex.shape = shapes()[c(1:3,8)][VComponent])
The default shapes do not allow a lot of variation, although it is possible to add other shapes. This page shows how to add a triangle and a star as options. I am labeling the vertices with the cluster number. You can make the vertices have no label by specifying vertex.label = NA. But you wanted to place a text label near each component. I do not see how to get the locations of the components to place the text, so I labeled the nodes instead.
I have a short data frame I randomly created to have a practice before it gets to Big Data frames. I made it with the same Variables as the original should be but way shorter.
The problem I'm having is that Excel takes dates with month first, so R is confused and it's putting 10/1/2015 first. When it's supposed to be last.
What can I do so R correctly orders the dates?
Also I want to for example calculate the Total amount of money (Data$Total) that I made in one month.
What would be the script for that?
Also if I'm already here I could kill two birds with one stone. I know there is already an answer for this, but the answer I saw involves using Direct.labels package that completely messes up with the whole graphic.
What would you advise to prevent the labels going over the plot
margin?
DPUT()
dput(Data)
structure(list(JOB = structure(c(2L, 3L, 1L, 3L, 3L), .Label = c("JAGER",
"PLAY", "RUGBY"), class = "factor"), AGENCY = structure(c(1L,
1L, 2L, 1L, 1L), .Label = c("LONDON", "WILHEL"), class = "factor"),
DATE = structure(c(4L, 5L, 1L, 2L, 3L), .Label = c("10/1/2015",
"10/3/2015", "10/9/2015", "9/24/2015", "9/26/2015"), class = "factor"),
RATE = c(90L, 90L, 100L, 90L, 90L), HS = c(8L, 6L, 4L, 6L,
4L), TOTAL = c(720L, 540L, 400L, 540L, 360L)), .Names = c("JOB",
"AGENCY", "DATE", "RATE", "HS", "TOTAL"), class = "data.frame", row.names = c(NA,
-5L))
Here is how I went about what you're after:
rugger is the dataset I constructed from your dput()
plot(order(as.Date(rugger$DATE,"%m/%d/%Y")),rugger$TOTAL,xaxt="n",xlab="",ylab="Total")
labs <- as.Date(rugger$DATE,"%m/%d/%Y")
axis(side = 1,at = rugger$DATE,labels = rep("",5))
text(cex=1, x=order(as.Date(rugger$DATE,"%m/%d/%Y"))+0.1, y=min(rugger$TOTAL)-25, labs, xpd=TRUE, srt=45, pos=2)
The text call allows you to manipulate the labels far more, srt is a rotation call. I used order() to put the days in chronological order, this will also turn them into the numbers that represent those Dates as ordered Dates appeared to be managed as factors (I'm not positive on that, it's just what I'm seeing).
If you don't want dots check out the pch argument within plot(). Pch types.