How to modify ggplot2 scatter plot - r

Here is my sample dataset:
df1 = data.frame(Count.amp = c(8,8,1,2,2,5,8), Count.amp.1 = c(4,4,2,3,2,5,4))
I tried
library(ggplot2)
qplot(Count.amp,Count.amp.1, data=df1)
Is there any way to plot in such a way that the size of the dot is proportional to the number of elements in each dots?

Yes, broadly speaking you are looking at creating a bubble plot, this code:
df1 = data.frame(Count.amp = c(8,8,1,2,2,5,8), Count.amp.1 = c(4,4,2,3,2,5,4))
df1$sum <- df1$Count.amp + df1$Count.amp.1
ggplot(df1, aes(x=Count.amp, y=Count.amp.1, size=sum),guide=FALSE)+
geom_point(colour="white", fill="red", shape=21)+ scale_size_area(max_size = 15)+
theme_bw()
would give you something like that:
It wasn't immediately clear to me what do yo mean by the number of elements but on principle you can pass any figures into the size= to get the desired result.

Related

How to draw bar plot including different groups in R with ggplot2?

I want to draw a combined bar plot, so that I can make comparision among different score types.
compare_data = data.frame(model=c(lr,rf,gbm,xgboost),
precision=c(0.6593,0.7588,0.6510,0.7344),
recall=c(0.5808,0.6306,0.4897,0.6416),f1=c(0.6176,0.6888,0.5589,0.6848),
acuracy=c(0.6766,0.7393,0.6453,0.7328))
compare1 <- ggplot(evaluation_4model, aes(x=Model, y=Precision)) +
geom_bar(aes(fill = Model), stat="identity")
compare1 <- compare+labs(title = "Precision")
Here is one of the barplot I draw, and this is the type of "precision", however, I want to make a wide bar plot, with all the models under 4 score types sharing the same Y-axis, also with subtitle if possible.
Your code throws an error, because evaluation_4model is not defined.
However, the answer to your problem is likely to make a faceted plot and hence melt the data to a long format. To do this, I usually make use of the reshape library. Tweaking your code looks like this
library(ggplot2)
library(reshape2)
compare_data = data.frame(model=c("lr","rf","gbm","xgboost"),
precision=c(0.6593,0.7588,0.6510,0.7344),
recall=c(0.5808,0.6306,0.4897,0.6416),
f1=c(0.6176,0.6888,0.5589,0.6848),
acuracy=c(0.6766,0.7393,0.6453,0.7328))
plotdata <- melt(compare_data,id.vars = "model")
compare2 <- ggplot(plotdata, aes(x=model, y=value)) +
geom_bar(aes(fill = model), stat="identity")+
facet_grid(~variable)
compare2
does that help?

3-variables plotting heatmap ggplot2

I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)

How to change origin line position in ggplot bar graph?

Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()

R ggplot with two series: points and errorbars with legends

If I have a dataframe like this:
obs<-rnorm(20)
d<-data.frame(year=2000:2019,obs=obs,pred=obs+rnorm(20,.1))
d$pup<-d$pred+.5
d$plow<-d$pred-.5
d$obs[20]<-NA
d
And I want the observation and model prediction error bars to look something like:
(p1<-ggplot(data=d)+aes(x=year)
+geom_point(aes(y=obs),color='red',shape=19)
+geom_point(aes(y=pred),color='blue',shape=3)
+geom_errorbar(aes(ymin=plow,ymax=pup))
)
How do I add a legend/scale/key identifying the red points as observations and the blue plusses with error bars as point predictions with ranges?
Here is one solution melting pred/obs into one column. Can't post image due to rep.
library(ggplot2)
obs <- rnorm(20)
d <- data.frame(dat=c(obs,obs+rnorm(20,.1)))
d$pup <- d$dat+.5
d$plow <- d$dat-.5
d$year <- rep(2000:2019,2)
d$lab <- c(rep("Obs", 20), rep("Pred", 20))
p1<-ggplot(data=d, aes(x=year)) +
geom_point(aes(y = dat, colour = factor(lab), shape = factor(lab))) +
geom_errorbar(data = d[21:40,], aes(ymin=plow,ymax=pup), colour = "blue") +
scale_shape_manual(name = "Legend Title", values=c(6,1)) +
scale_colour_manual(name = "Legend Title", values=c("red", "blue"))
p1
edit: Thanks for the rep. Image added
Here is a ggplot solution that does not require melting and grouping.
set.seed(1) # for reproducible example
obs <- rnorm(20)
d <- data.frame(year=2000:2019,obs,pred=obs+rnorm(20,.1))
d$obs[20]<-NA
library(ggplot2)
ggplot(d,aes(x=year))+
geom_point(aes(y=obs,color="obs",shape="obs"))+
geom_point(aes(y=pred,color="pred",shape="pred"))+
geom_errorbar(aes(ymin=pred-0.5,ymax=pred+0.5))+
scale_color_manual("Legend",values=c(obs="red",pred="blue"))+
scale_shape_manual("Legend",values=c(obs=19,pred=3))
This creates a color and shape scale wiith two components each ("obs" and "pred"). Then uses scale_*_manual(...) to set the values for those scales ("red","blue") for color, and (19,3) for scale.
Generally, if you have only two categories, like "obs" and "pred", then this is a reasonable way to go use ggplot, and avoids merging everything into one data frame. If you have more than two categories, or if they are integral to the dataset (e.g., actual categorical variables), then you are much better off doing this as in the other answer.
Note that your example left out the column year so your code does not run.

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Resources