R-ggplot plot median with ranked values - r

I'm trying to make a plot of the median of my ranked data. And under this plot I'm trying to plot the ranked value.
example data:
test=data.frame(a=rep(seq(-5,5,by=0.1),each=1,length.out=101),b=runif(101, min=-5, max=5))
test$range=rep(seq(1, 101, by=1), each=1,length.out=length(test[,1]))
So I'm trying to plot only the median.
I tried :
ggplot(data= test) + stat_summary(
mapping = aes(x = range, y = b),
fun.y = median)
But I got a Warning message : Removed 101 rows containing missing values (geom_pointrange).
I got it with this command :
ggplot(test, aes(x = range, y = b, color = b )) +
geom_line(size = 0.5) +
geom_smooth(aes(color=..y..), size=1.5, method = "loess", se=FALSE) +
scale_colour_gradient2(low = "green", mid = "yellow" , high = "red",
midpoint=median(test$b))
but it's not exactly what I want, I want only the median.
Also I want to plot the value of test$a under this plot. But I have no idea of how can i do this :
Thank you !

So I'm confused by some things. First, the first plot you show has b on the y-axis, yet the code implies you're plotting a on the y-axis. So do you want the median of a or b? Also, I don't understand what the range variable is supposed to do.
That said, maybe this will be of some help. I assumed the variable of interest was b. We can make something resembling your min-to-max illustration by plotting b as a function of its rank. Next we can add a horizontal line at the height of the median.
ggplot(test, aes(rank(b), b)) +
geom_line() +
geom_hline(yintercept = median(test$b))
Which gave me a plot like this:
Hope this was of some help!

Related

Grouping 2 categorical variables with geom_boxplot

I have tried some examples I found here but I always get an error or a different graph from what I need (e.g. lines instead of the boxplot, or only 2 boxes instead of 4).
I want to plot the following
Condition Time mean sem
A I 0.5578552 0.05294356
A II 0.6957565 0.09149457
P I 0.7078374 0.08142464
P II 0.7762761 0.10945771 ```
I need "Condition" in the x axis and I need to group "Time".
The idea is to get a similar visual representation to this:
enter image description here
My attempt was:
ggplot(data = means.sem, aes(x = Condition, y = mean, fill=Time, ymin = mean-sem, ymax = mean + sem))
+ geom_boxplot() +
stat_boxplot(geom ='errorbar', width = 0.5)+
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.85))+ scale_fill_manual(values=c("black", "grey"))+
labs(y= "Mean", x="")+ theme_classic()```
Thank you!
What do you want your y-axis to be? On the assumption it is, for example, the sem variable, I use the following code:
boxplot <- ggplot(data=dataset, aes(x=condition, y=sem, fill=time)) + geom_boxplot(position="dodge2")
Obviously you can alter the colours, etc as you need to.
EDIT: changed the position to dodge2 as this creates a pleasing small gap between each boxplot within a group.

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

overlay the grand mean and se into a scatter dot

I have a dot plot created by ggplot, in which I plot every subject's individual responses. The subjects are organized into 3 groups in the plot and I have also estimated and plotted for each subject the mean and se. Now, I want to add at the same plot the grand mean and Se for each group.
This is how I created the first plot:
mazeSRDataS1_Errorplot<-ggplot(mazeSRDataS1, aes(Errorfixed, GroupSub,
colour=as.factor(Group)))+geom_point() +
mytheme3+ ggtitle("mazeSR-S1 Error plot")+ labs(y="Subject ID", x = "Error (degrees)", colour =
"Group")+ scale_colour_manual(values = c("brown4", "slategray3", "tan1"))
mazeSRDataS1_Errorplot + stat_summary(fun = mean, position = 'dodge', shape=1, size=0.5,
colour='black') + stat_summary(fun.data = mean_cl_normal, geom = 'errorbar', colour='black')
This is how I plotted the grand mean and se for each group. (i first aggregated the data and computed the mean and se for each group).
ggplot(meanSEErrorMazeSR1, aes(x=Error, y=Group, colour=Group)) +
geom_errorbar(aes(xmin=Error-se, xmax=Error+se), width=.1, position='dodge') +
geom_line(position='dodge') + geom_point(position='dodge')
But, how do I merge these plots and overlay the one over the other?
Thank you in advance!!
You can add y-axis positions to the aggregated data you've made to specify where on the first plot you want them plotted, and then add another geom_errorbar(data = ...) where you specify to use the aggregated data e.g.:
meanSEErrorMazeSR1 <-
meanSEErrorMazeSR1 %>%
mutate(y_position = c(30, 90, 150) # since you didn't provide a reproducible example you'll need to figure out the best positions yourself here
mazeSRDataS1_Errorplot +
geom_errorbar(data = meanSEErrorMazeSR1, aes(y = y_position, xmin=Error-se, xmax=Error+se), width=.1)
You can toy around with different y-values to use for the positioning of the error bars. In your case, because the y-axis is discrete due to being based on Subject IDs, the y-values will correspond to the order of the subject on the plot - the y_position = c(30, 90, 150) above corresponds to the 30th, 90th, and 150th subject, respectively.
Note also that the argument position='dodge' is not needed because you're not using a group aesthetic!

ggplot: why is the y-scale larger than the actual values for each response?

Likely a dumb question, but I cannot seem to find a solution: I am trying to graph a categorical variable on the x-axis (3 groups) and a continuous variable (% of 0 - 100) on the y-axis. When I do so, I have to clarify that the geom_bar is stat = "identity" or use the geom_col.
However, the values still show up at 4000 on the y-axis, even after following the comments from Y-scale issue in ggplot and from Why is the value of y bar larger than the actual range of y in stacked bar plot?.
Here is how the graph keeps coming out:
I also double checked that the x variable is a factor and the y variable is numeric. Why would this still be coming out at 4000 instead of 100, like a percentage?
EDIT:
The y-values are simply responses from participants. I have a large dataset (N = 600) and the y-value are a percentage from 0-100 given by each participant. So, in each group (N = 200 per group), I have a value for the percentage. I wanted to visually compare the three groups based on the percentages they gave.
This is the code I used to plot the graph.
df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
Are you perhaps trying to plot the mean percentage in each group? Otherwise, it is not clear how a bar plot could easily represent what you are looking for. You could perhaps add error bars to give an idea of the spread of responses.
Suppose your data looks like this:
set.seed(4)
df <- data.frame(group = factor(rep(1:3, each = 200)),
confid = sample(40, 600, TRUE))
Using your plotting code, we get very similar results to yours:
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
plot
However, if we use stat_summary, we can instead plot the mean and standard error for each group:
ggplot(df, aes(group, confid)) +
stat_summary(geom = "bar", fun = mean, width = 0.6,
fill = "deepskyblue", color = "gray50") +
geom_errorbar(stat = "summary", width = 0.5) +
geom_point(stat = "summary") +
ylab("confid %") +
xlab("group")

Plot median values on top of a density distribution

I'm trying to plot the median values of some data on a density distribution using the ggplot2 R library. I would like to print the median values as text on top of the density plot.
You'll see what I mean with an example (using the "diamonds" default dataframe):
I'm printing three itmes: the density plot itself, a vertical line showing the median price of each cut, and a text label with that value. But, as you can see, the median prices overlap on the "y" axis (this aesthetic is mandatory in the geom_text() function).
Is there any way to dynamically assign a "y" value to each median price, so as to print them at different heights? For example, at the maximum density value of each "cut".
So far I've got this
# input dataframe
dia <- diamonds
# calculate mean values of each numerical variable:
library(plyr)
dia_me <- ddply(dia, .(cut), numcolwise(median))
ggplot(dia, aes(x=price, y=..density.., color = cut, fill = cut), legend=TRUE) +
labs(title="diamond price per cut") +
geom_density(alpha = 0.2) +
geom_vline(data=dia_me, aes(xintercept=price, colour=cut),
linetype="dashed", size=0.5) +
scale_x_log10() +
geom_text(data = dia_me, aes(label = price, y=1, x=price))
(I'm assigning a constant value to the y aesthetics in the geom_text function because it's mandatory)
This might be a start (but it's not very readable due to the colors). My idea was to create an 'y'-position inside the data used to plot the lines for the medians. It's a bit arbitrary, but I wanted y-positions to be between 0.2 and 1 (to nicely fit on the plot). I did this by the sequence-command. Then I tried to order it (didn't do a lot of good) by the median price; this is arbitrary.
#scatter y-pos over plot
dia_me$y_pos <- seq(0.2,1,length.out=nrow(dia_me))[order(dia_me$price,decreasing = T)]
ggplot(dia, aes(x=price, y=..density.., color = cut, fill = cut), legend=TRUE) +
labs(title="diamond price per cut") +
geom_density(alpha = 0.2) +
geom_vline(data=dia_me, aes(xintercept=price, colour=cut),
linetype="dashed", size=0.5) +
scale_x_log10() +
geom_text(data = dia_me, aes(label = price, y=y_pos, x=price))

Resources