I am working on finalizing a NMDS plot that I created in vegan and ggplot2 but cannot figure out how to add envfit species-loading vectors to the plot. When I try to it says "invalid graphics state".
The example below is slightly modified from another question (Plotting ordiellipse function from vegan package onto NMDS plot created in ggplot2) but it expressed exactly the example I wanted to include since I used this question to help me get metaMDS into ggplot2 in the first place:
library(vegan)
library(ggplot2)
data(dune)
# calculate distance for NMDS
NMDS.log<-log(dune+1)
sol <- metaMDS(NMDS.log)
# Create meta data for grouping
MyMeta = data.frame(
sites = c(2,13,4,16,6,1,8,5,17,15,10,11,9,18,3,20,14,19,12,7),
amt = c("hi", "hi", "hi", "md", "lo", "hi", "hi", "lo", "md", "md", "lo",
"lo", "hi", "lo", "hi", "md", "md", "lo", "hi", "lo"),
row.names = "sites")
# plot NMDS using basic plot function and color points by "amt" from MyMeta
plot(sol$points, col = MyMeta$amt)
# same in ggplot2
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = sol$points[,2])
ggplot(data = NMDS, aes(MDS1, MDS2)) +
geom_point(aes(data = MyMeta, color = MyMeta$amt))
#Add species loadings
vec.sp<-envfit(sol$points, NMDS.log, perm=1000)
plot(vec.sp, p.max=0.1, col="blue")
The problem with the (otherwise excellent) accepted answer, and which explains why the vectors are all of the same length in the included figure [Note that the accepted Answer has now been edited to scale the arrows in the manner I describe below, to avoid confusion for users coming across the Q&A], is that what is stored in the $vectors$arrows component of the object returned by envfit() are the direction cosines of the fitted vectors. These are all of unit length, and hence the arrows in #Didzis Elferts' plot are all the same length. This is different to the output from plot(envfit(sol, NMDS.log)), and arises because we scale the vector arrow coordinates by the correlation with the ordination configuration ("axes"). That way, species that show a weak relationship with the ordination configuration get shorter arrows. The scaling is done by multiplying the direction cosines by sqrt(r2) where r2 are the values shown in the table of printed output. When adding the vectors to an existing plot, vegan also tries to scale the set of vectors such that they fill the available plot space whilst maintaining the relative lengths of the arrows. How this is done is discussed in the Details section of ?envfit and requires the use of the un-exported function vegan:::ordiArrowMul(result_of_envfit).
Here is a full working example that replicates the behaviour of plot.envfit using ggplot2:
library(vegan)
library(ggplot2)
library(grid)
data(dune)
# calculate distance for NMDS
NMDS.log<-log1p(dune)
set.seed(42)
sol <- metaMDS(NMDS.log)
scrs <- as.data.frame(scores(sol, display = "sites"))
scrs <- cbind(scrs, Group = c("hi","hi","hi","md","lo","hi","hi","lo","md","md",
"lo","lo","hi","lo","hi","md","md","lo","hi","lo"))
set.seed(123)
vf <- envfit(sol, NMDS.log, perm = 999)
If we stop at this point and look at vf:
> vf
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
Belper -0.78061195 -0.62501598 0.1942 0.174
Empnig -0.01315693 0.99991344 0.2501 0.054 .
Junbuf 0.22941001 -0.97332987 0.1397 0.293
Junart 0.99999981 -0.00062172 0.3647 0.022 *
Airpra -0.20995196 0.97771170 0.5376 0.002 **
Elepal 0.98959723 0.14386566 0.6634 0.001 ***
Rumace -0.87985767 -0.47523728 0.0948 0.429
.... <truncated>
So the r2 data is used to scale the values in columns NMDS1 and NMDS2. The final plot is produced with:
spp.scrs <- as.data.frame(scores(vf, display = "vectors"))
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs))
p <- ggplot(scrs) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = Group)) +
coord_fixed() + ## need aspect ratio of 1!
geom_segment(data = spp.scrs,
aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
geom_text(data = spp.scrs, aes(x = NMDS1, y = NMDS2, label = Species),
size = 3)
This produces:
Start with adding libraries. Additionally library grid is necessary.
library(ggplot2)
library(vegan)
library(grid)
data(dune)
Do metaMDS analysis and save results in data frame.
NMDS.log<-log(dune+1)
sol <- metaMDS(NMDS.log)
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = sol$points[,2])
Add species loadings and save them as data frame. Directions of arrows cosines are stored in list vectors and matrix arrows. To get coordinates of the arrows those direction values should be multiplied by square root of r2 values that are stored in vectors$r. More straight forward way is to use function scores() as provided in answer of #Gavin Simpson. Then add new column containing species names.
vec.sp<-envfit(sol$points, NMDS.log, perm=1000)
vec.sp.df<-as.data.frame(vec.sp$vectors$arrows*sqrt(vec.sp$vectors$r))
vec.sp.df$species<-rownames(vec.sp.df)
Arrows are added with geom_segment() and species names with geom_text(). For both tasks data frame vec.sp.df is used.
ggplot(data = NMDS, aes(MDS1, MDS2)) +
geom_point(aes(data = MyMeta, color = MyMeta$amt))+
geom_segment(data=vec.sp.df,aes(x=0,xend=MDS1,y=0,yend=MDS2),
arrow = arrow(length = unit(0.5, "cm")),colour="grey",inherit_aes=FALSE) +
geom_text(data=vec.sp.df,aes(x=MDS1,y=MDS2,label=species),size=5)+
coord_fixed()
May i add something late?
Envfit provides pvalues, and sometimes you want to just plot the significant parameters (something vegan can do for you with p.=0.05 in the plot command). I struggled to do that with ggplot2. Here is my solution, maybe you find a more elegant one?
Starting from Didzis' answer from above:
ef<-envfit(sol$points, NMDS.log, perm=1000)
ef.df<-as.data.frame(ef$vectors$arrows*sqrt(ef$vectors$r))
ef.df$species<-rownames(ef.df)
#only significant pvalues
#shortcutting ef$vectors
A <- as.list(ef$vectors)
#creating the dataframe
pvals<-as.data.frame(A$pvals)
arrows<-as.data.frame(A$arrows*sqrt(A$r))
C<-cbind(arrows, pvals)
#subset
Cred<-subset(C,pvals<0.05)
Cred <- cbind(Cred, Species = rownames(Cred))
"Cred "can now be implemented in the geom_segment-argument as discussed above.
Short addition: To get a full representation of the plot.envfit functionality within ggplot2 aka "arrow lengths make full use of plot area" a factor needs to be applied. I don't know if it was intentionally left out in the answers above, as it was even specifically mentioned by Gavin? Just extract the required scaling factor using arrow_factor <- ordiArrowMul(vf) and then you can either apply it to both NMDS columns in spp.scrs or you can do this manually like
arrow_factor <- ordiArrowMul(vf)
spp.scrs <- as.data.frame(scores(vf, display = "vectors")) * arrow_factor
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs), Pvalues = vf$vectors$pvals, R_squared = vf$vectors$r)
# select significance similarly to `plot(vf, p.max = 0.01)`
spp.scrs <- subset(spp.scrs, Pvalues < 0.01)
# you can also add the arrow factor in here (don't do both!)
ggplot(scrs) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = Group)) +
coord_fixed() + ## need aspect ratio of 1!
geom_segment(data = spp.scrs,
aes(x = 0, xend = NMDS1 * arrow_factor, y = 0, yend = NMDS2 * arrow_factor),
arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
geom_text(data = spp.scrs, aes(x = NMDS1 * arrow_factor, y = NMDS2 * arrow_factor, label = Species),
size = 3)
Related
I'm looking to replicate this correlation plot, or at least get as close as possible to it.
Specifically, I want:
the correlation values in the lower half, with values varying on a greyscale based on absolute value
the circles in the top half, with varying diameter and on the colour scale.
I want to be able to edit the axis scale labels so that full descriptions are on the y-axis, and numeric references on the x-axis
I have gotten relatively close, but have not managed precise enough replication. I describe my closest attempts below with reproducible code. The corrplot package has gotten me closest.
# general preparation
library(car)
correlations = cor(mtcars)
corrplot package
library(corrplot)
corrplot.mixed(correlations,
upper = "number", #upper.col = ???
lower = "circle", #lower.col = ???
tl.pos = "lt", tl.col = "black", tl.cex = 0.5)
Notes:
there is a way to make the coefficients in greyscale, but I don't understand it: https://rdrr.io/cran/corrplot/man/COL1.html
For some bizarre reason, when I use my own data (as opposed to mtcar), the coefficient colours don't match with the actual correlation values. I cannot give a reproducible code example here, because it works fine with the mtcar data.
cormat package
source("http://www.sthda.com/upload/rquery_cormat.r")
rquery.cormat(mtcar)
ggcorrplot
library("ggcorrplot")
# circles separate
ggcorrplot(correlations, # correlation matrix
method = "circle", # circles instead of squares
type = "upper", # show only upped triangle
show.diag = F, # don't show diagonal values (1)
lab = F, # don't show cor coeffs
outline.col = "white", # no outline of circles
ggtheme = theme_bw, # theme
colors = c("#440154FF","#238A8DFF","#FDE725FF"))
# coefs separate
ggcorrplot(correlations, # correlation matrix
method = "circle", # circles instead of squares
type = "upper", # show only upped triangle
show.diag = F, # don't show diagonal values (1)
lab = T, # don't show cor coeffs
outline.col = NA, # don't show circles
ggtheme = theme_bw, # theme
colors = c("#440154FF","#238A8DFF","#FDE725FF"))
# can't combine both plots?
corrgram package
library(corrgram)
corrgram(correlations,
labels = indices_all,
lower.panel = "panel.fill",
upper.panel = "panel.cor")
Some other notes:
It seems the halves of the plots tend to run via the opposite diagonal than in the example plot, but I guess that's not a big concern.
Out-of-the-box options are quick and nice. However, when it comes to customizing then IMHO it may be worthwhile to build up the plot from scratch using ggplot2. As a first step this involves some data wrangling to get you correlation matrix into the right shape. Also in this step I convert the categories to factors and a numeric id. Based on the ids I split the data in the upper and lower diagonal values which could then be plotted separately using a geom_point and a geom_text. Besides that it's important to add the drop=FALSE to the x and y scale to keep all factor levels and the right order. Also I use some functions to get the desired axis labels:
EDIT: Following the suggestion by #AllanCameron I added a coord_equal as the "final" touch to get a nice square matrix like look. And Thanks to #RichtieSacramento the code now maps the absolute value on the size aes.
library(dplyr)
library(tidyr)
library(ggplot2)
correlations = cor(mtcars)
levels <- colnames(mtcars)
corr_long <- correlations %>%
data.frame() %>%
mutate(row = factor(rownames(.), levels = levels),
rowid = as.numeric(row)) %>%
pivot_longer(-c(row, rowid), names_to = "col") %>%
mutate(col = factor(col, levels = levels),
colid = as.numeric(col))
ggplot(corr_long, aes(col, row)) +
geom_point(aes(size = abs(value), fill = value),
data = ~filter(.x, rowid > colid), shape = 21) +
geom_text(aes(label = scales::number(value, accuracy = .01), color = abs(value)),
data = ~filter(.x, rowid < colid), size = 8 / .pt) +
scale_x_discrete(labels = ~ attr(.x, "pos"), drop = FALSE) +
scale_y_discrete(labels = ~ paste0(.x, " (", attr(.x, "pos"), ")"), drop = FALSE) +
scale_fill_viridis_c(limits = c(-1, 1)) +
scale_color_gradient(low = grey(.8), high = grey(.2)) +
coord_equal() +
guides(size = "none", color = "none") +
theme(legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks = element_blank()) +
labs(x = NULL, y = NULL, fill = NULL)
Hi How can calculate the overlapping area between 2 columns ( or 2 subsets of a column) in R.
Please see the example data below:
set.seed(1234)
df <- data.frame(
Data=factor(rep(c("D1", "D2"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)
library(ggplot2)
plot <- ggplot(df, aes(weight,fill = Data))+
geom_density()
plot
This results in the below plot. I am wondering, how to color the overlapping area and calculate the overlapping coefficient (OVL) similar to what is done here Using Monte Carlo Integration?
Please note that the link (and example above) provided uses parametric distribution while I am asking if I have a dataset of observed values.
I normally find it easier to work directly with the densities and plot them as geom_area. If you get the x-axis sampling points to match on the two distributions you can find the overlap area using pmin, and the sum of its values divided by the sum of the values for the two curves should give you the proportion of the total area that is overlapped.
d1dens <- with(df, density(weight[Data == "D1"],
from = min(weight),
to = max(weight)))
d2dens <- with(df, density(weight[Data == "D2"],
from = min(weight),
to = max(weight)))
joint <- pmin(d1dens$y, d2dens$y)
df2 <- data.frame(x = rep(d1dens$x, 3),
y = c(d1dens$y, d2dens$y, joint),
Data = rep(c("D1", "D2", "overlap"), each = length(d1dens$x)))
ggplot(df2, aes(x, y, fill = Data)) +
geom_area(position = position_identity(), color = "black") +
scale_fill_brewer(palette = "Pastel2") +
theme_bw()
sum(joint) / sum(d1dens$y, d2dens$y)
#> [1] 0.1480701
I'm struggling with the following problem:
I use pam to cluster my dataset v in 7 clusters:
x <- pam(v,7)
I know that there is a vector clustering in x which contains the according numbers of clusters.
I would like to get a subset of x which only contains cluster 1.
Is this possible?
Edit:
Here is an example. Cluster iris in three clusters and plot them.
library(ggfortify)
library(cluster)
v <- iris[-5]
x <- pam(v,3)
autoplot(x, frame = TRUE, frame.type = 'norm')
The question: How can I plot only the first cluster? It should look like the first plot without cluster 2 and 3.
Edit: I think I found a solution. Therefore I don't use autoplot anymore but calculate the convex hull of every cluster and plot it.
library(cluster)
library(plyr)
library(ggplot2)
library(ggrepel)
find_hull <- function(df) df[chull(df$x, df$y),]
v<-iris[-5]
pp <- pam(v,3)
n<-princomp(pp$data, scores = TRUE, cor = ncol(pp$data) != 2)$scores
df<-data.frame(n[,1],n[,2],pp$clustering)
colnames(df)<-c("x","y","z")
hulls <- ddply(df, "z", find_hull)
p<-qplot(x,y,data=df,color=as.factor(z))+
geom_polygon(data=hulls, alpha=1, fill=NA)+
geom_text_repel(aes(label = rownames(df)),arrow = arrow(length = unit(0.00, 'inches'), angle = 0.00),size=5.5,colour="grey55")+
theme_classic(base_size = 16)+
theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
p
df2<-df[df$z==1,]
hulls <- ddply(df2, "z", find_hull)
p1<-qplot(x,y,data=df2,color=as.factor(z))+
geom_polygon(data=hulls, alpha=0.8, fill=NA)+
geom_text_repel(aes(label = rownames(df2)),arrow = arrow(length = unit(0.00, 'inches'), angle = 0.00),size=5.5,colour="grey25")+
theme_classic(base_size = 16)+
theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())+
p1
Now I want to plot both figures in one device. I have already tried the multiplot from cookbook-r but it gives the error
Error: Aesthetics must be either length 1 or the same as the data (26): label, x, y
It must be because of the labels I guess.
I also tried
grid.arrange(p,p1, ncol=1)
from the gridExtra package but it gives the same error.
Is there any other option to arrange multiple figures with labels in one figure?
My initial goal was to plot a population of individual points and then draw a convex hull enclosing 80% of that population centered on the mass of the population.
After trying a number of ideas, the best solution I came up with was to use ggplot's stat_density2d. While this works great for a qualitative analysis, I still need to indicate an 80% boundary. I started out looking for a way to outline the 80th percentile population boundary, but I can work with an 80% probability density boundary instead.
Here's where I'm looking for help. The bin parameter for kde2d (used by stat_density2d) is not clearly documented. If I set bin = 4 in the example below, am I correct in interpreting the central (green) region as containing a 25% probability mass and the combined yellow, red, and green areas as representing a 75% probability mass? If so, by changing the bin to = 5, would the area inscribed then equal an 80% probability mass?
set.seed(1)
n=100
df <- data.frame(x=rnorm(n, 0, 1), y=rnorm(n, 0, 1))
TestData <- ggplot (data = df) +
stat_density2d(aes(x = x, y = y, fill = as.factor(..level..)),
bins=4, geom = "polygon", ) +
geom_point(aes(x = x, y = y)) +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black"))
TestData
I repeated a number of test cases and manually counted the excluded points [would love to find a way to count them based on what ..level.. they were contained within] but given the random nature of the data (both my real data and the test data) the number of points outside of the stat_density2d area varied enough to warrant asking for help.
Summarizing, is there a practical means of drawing a polygon around the central 80% of the population of points in the data frame? Or, baring that, am I safe to use stat_density2d and set bin equal to 5 to produce an 80% probability mass?
Excellent answer from Bryan Hanson dispelling the fuzzy notion that I could pass an undocumented bin parameter in stat_density2d. The results looked close at values for bin around 4 to 6, but as he stated, the actual function is unknown and therefore not usable.
I used the HDRegionplot as provided in the accepted answer by DWin to solve my problem. To that, I added a center of gravity (COGravity) and point in polygon (pnt.in.poly) from the SDMTools package to complete the analysis.
library(MASS)
library(coda)
library(SDMTools)
library(emdbook)
library(ggplot2)
theme_set(theme_bw(16))
set.seed(1)
n=100
df <- data.frame(x=rnorm(n, 0, 1), y=rnorm(n, 0, 1))
HPDregionplot(mcmc(data.matrix(df)), prob=0.8)
with(df, points(x,y))
ContourLines <- as.data.frame(HPDregionplot(mcmc(data.matrix(df)), prob=0.8))
df$inpoly <- pnt.in.poly(df, ContourLines[, c("x", "y")])$pip
dp <- df[df$inpoly == 1,]
COG100 <- as.data.frame(t(COGravity(df$x, df$y)))
COG80 <- as.data.frame(t(COGravity(dp$x, dp$y)))
TestData <- ggplot (data = df) +
stat_density2d(aes(x = x, y = y, fill = as.factor(..level..)),
bins=5, geom = "polygon", ) +
geom_point(aes(x = x, y = y, colour = as.factor(inpoly)), alpha = 1) +
geom_point(data=COG100, aes(COGx, COGy),colour="white",size=2, shape = 4) +
geom_point(data=COG80, aes(COGx, COGy),colour="green",size=4, shape = 3) +
geom_polygon(data = ContourLines, aes(x = x, y = y), color = "blue", fill = NA) +
scale_fill_manual(values = c("yellow","red","green","royalblue", "brown", "black", "white", "black", "white","black")) +
scale_colour_manual(values = c("red", "black"))
TestData
nrow(dp)/nrow(df) # actual number of population members inscribed within the 80% probability polgyon
Alright, let me start by saying I'm not entirely sure of this answer, and it's only a partial answer! There is no bin parameter for MASS::kde2d which is the function used by stat_density2d. Looking at the help page for kde2d and the code for it (seen simply by typing the function name in the console), I think the bin parameter is h (how these functions know to pass bin to h is not clear however). Following the help page, we see that if h is not provided, it is computed by MASS:bandwidth.nrd. The help page for that function says this:
# The function is currently defined as
function(x)
{
r <- quantile(x, c(0.25, 0.75))
h <- (r[2] - r[1])/1.34
4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)
}
Based on this, I think the answer to your last question ("Am I safe...") is definitely no. r in the above function is what you need for your assumption to be safe, but it is clearly modified, so you are not safe. HTH.
Additional thought: Do you have any evidence that your code is using your bins argument? I'm wondering if it is being ignored. If so, try passing h in place of bins and see if it listens.
HPDregionplot in package:emdbook is supposed to do that. It does use MASS::kde2d but it normalizes the result. It has the disadvantage to my mind that it requires an mcmc object.
library(MASS)
library(coda)
HPDregionplot(mcmc(data.matrix(df)), prob=0.8)
with(df, points(x,y))
Building on the answer by 42, I've simplified HPDregionplot() to reduce dependencies and remove the requirement to work with mcmc-objects. The function works on a two-column data.frame and creates no intermediate plots. Note, however, that the this approach breaks as soon as grDevices::contourLines() return multiple contours.
hpd_contour <- function (x, n = 50, prob = 0.95, ...) {
post1 <- MASS::kde2d(x[[1]], x[[2]], n = n, ...)
dx <- diff(post1$x[1:2])
dy <- diff(post1$y[1:2])
sz <- sort(post1$z)
c1 <- cumsum(sz) * dx * dy
levels <- sapply(prob, function(x) {
approx(c1, sz, xout = 1 - x)$y
})
as.data.frame(grDevices::contourLines(post1$x, post1$y, post1$z, levels = levels))
}
theme_set(theme_bw(16))
set.seed(1)
n=100
df <- data.frame(x=rnorm(n, 0, 1), y=rnorm(n, 0, 1))
ContourLines <- hpd_contour(df, prob=0.8)
ggplot(df, aes(x = x, y = y)) +
stat_density2d(aes(fill = as.factor(..level..)), bins=5, geom = "polygon") +
geom_point() +
geom_polygon(data = ContourLines, color = "blue", fill = NA) +
scale_fill_manual(values = c("yellow","red","green","royalblue", "brown", "black", "white", "black", "white","black")) +
scale_colour_manual(values = c("red", "black"))
Moreover, the workflow now easily extends to grouped data.
ContourLines <- iris[, c("Species", "Sepal.Length", "Sepal.Width")] %>%
group_by(Species) %>%
do(hpd_contour(.[, c("Sepal.Length", "Sepal.Width")], prob=0.8))
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
geom_polygon(data = ContourLines, fill = NA) +
guides(color = FALSE) +
theme(plot.margin = margin())
I am trying to plot a simple scatter plot for 3 groups, overlaying a segment indicating the mean for each group and labelling the groups.
I have managed to get a scatter plot with error bars, but I would only like a segment indicating where the mean is. I also cannot seem to be getting the group labelling right.
To get the summary statistics I am using the function "summarySE" from this page. [EDIT: note this function is also provided in the Rmisc package]
Is there any simpler way to do this, and to get a segment instead of a point for the mean?
I really appreciate your help!
library(ggplot2)
library(plyr)
df <- data.frame(tt = rep(1:3, each = 40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
# After loading the summarySE function:
dfc <- summarySE(df, measurevar="val", groupvars="tt")
ggplot(dfc, aes(tt, val), main="Scatter plot with mean bars",
xlab="Groups", ylab="Values", names=c("Group1", "Group2", "Group3"))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin=val-sd, ymax=val+sd), width = 0.01, size = 1)
You can use geom_crossbar() and use val as y, ymin and ymax values. With scale_x_continuous() you can change x axis labels for original data or use #agstudy solution to change original data and labels will appear automatically.
ggplot()+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_crossbar(data=dfc,aes(x=tt,ymin=val, ymax=val,y=val,group=tt), width = 0.5)+
scale_x_continuous(breaks=c(1,2,3),labels=c("Group1", "Group2", "Group3"))
To get the group labelling , You can change continuous tt a factor like this :
dfc$tt <- factor(dfc$tt,labels=c("Group1", "Group2", "Group3"))
Of course before calling summarySE and creating dfc.
and using crossbar as mentioned in the other solution below , you get: