R: why is my ggplot geom_point() symbol not visible? - r

I am trying to place a symbol on the lowest point in a certain time series, which I have plotted with ggplot's geom_line. However, the geom_point is not showing up on the plot. I have myself successfully used geom_point for this kind of thing before by following hadley's example here (search for 'highest <- subset' to get the relevant assignment) so I know very well that it can be done. I'm just at a loss to spot what I have done differently here that is causing it not to display. I'm guessing it's something straightforward like a missing argument or similar - easy points for a pair of fresh eyes, I think.
Minimal example follows:
require(ggplot2)
fstartdate <- as.Date('2009-06-01')
set.seed(12345)
x <- data.frame(mydate=seq(as.Date("2003-06-01"), by="month", length.out=103),myval=runif(103, min=180, max=800))
lowest <- subset(x, myval == min(x[x$mydate >= fstartdate,]$myval))
thisplot <- ggplot() +
geom_line(data = x, aes(mydate, myval), colour = "blue", size = 0.7) +
geom_point(data = lowest, size = 5, colour = "red")
print(thisplot)

The point appears if you add the aesthetic:
thisplot + geom_point(
data = lowest,
aes(mydate, myval),
size = 5, colour = "red"
)

Related

How do you fix warning message: colourbar guide needs continuous scales?

I would like to produce multiple contour plots using ggplot2 and
geom_contour_filled()
but the z values range is too large. To give you a little bit of an idea of what the values are, it ranges from -2,71 to -157,28. So I thought I should change the breaks so it covers all of these values.
The code below is not the data I work with, but it should represent the problem I have:
The data
h_axis <- 10^(seq(log10(0.1), log10(1000),
length.out = 20))
a_axis <- 10^(seq(log10(0.1), log10(1000),
length.out = 20))
comb <- expand.grid(h_axis, a_axis)
h_val <- comb$Var2
a_val <- comb$Var1
values <- seq(-2, -150, length.out = 400)
dt <- data.frame(h = h_val, a = a_val, values)
First, let's say I don't change the breaks. Then, using this code
ggplot(dt, aes(x = log10(h_val), y = log10(a_val), z = values)) +
geom_contour_filled() +
# geom_contour(color = "black", size = 0.1) +
xlab(expression(log[10](h))) +
ylab(expression(log[10](a))) +
guides(fill = guide_colorbar(title = expression('E ||'*g - hat(g)*'||'[2]*'')))
will produce the following figure:
So a lot of the area will be covered by the same colour, which is a problem since my data consists of multiple factors. Factor 1 is covered by the yellow, Factor 2 is covered by the green, and so on.
Then my second approach, is to add
bar <- 10^(seq(log10(-min(values)), log10(-max(values)),
length.out = 100))
and put bar in the geom_contour_filled() like this
geom_contour_filled(breaks = -bar)
Then I get
which is nice! But, in both cases I get the following warning
Warning message:
colourbar guide needs continuous scales.
Also, the legend is not shown on the right side. What do I need to do to fix the warning and how can I make sure that the legend is shown?
Try guide_legend instead of guide_colorbar.

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

feature visualization in tsne plot

I have a table 1, where each row corresponds to the feature vector of gene in particular patient. The patient IDs located in the first column (label), while gene index located in the second column (geneIndex). The rest of the columns have feature values in various dimensions (128 overall).
I was able to perform the tsne reduction on these data to 2D and label clusters according to patient IDs. Here is the code:
library(Rtsne)
experiment<- read.table("test.txt", header=TRUE, sep= "\t")
metadata <- data.frame(sample_id = rownames(experiment),
colour = experiment$label)
data <- as.matrix(experiment[,2:129])
set.seed(1)
tsne <- Rtsne(data)
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
colour = metadata$colour)
library(ggplot2)
ggplot(df, aes(x, y, colour = colour)) +
geom_point()
However, my goal is to visualize feature vectors related to geneIndex. For example, I would like to pinpoint geneIndex "3" in red color, while the rest of the points on the plot will have grey color.
I would appreciate any suggestions!
Thank you!
Looking at the data, seems like there's not a lot of 3's and so if you just plot with others getting a transparent gray and selected have red.. i think it's hard to see:
df$geneIndex = experiment$geneIndex
plotIndex = function(data,selectedGene){
data$Gene = ifelse(data$geneIndex == selectedGene,selectedGene,"others")
ggplot(data, aes(x, y, colour = Gene))+
geom_point(alpha=0.3,size=1)+
scale_color_manual(values=c("#FF0000E6","#BEBEBE1A"))+
theme_bw()
}
plotIndex(df,3)
Maybe try circling the plots by plotting again, in combination with a new legend:
library(ggnewscale)
plotIndex = function(data,selectedGene){
subdf = subset(data,geneIndex == selectedGene)
ggplot(data, aes(x, y, colour = colour)) +
geom_point(alpha=0.3,size=2,shape=20)+
new_scale_color()+
geom_point(data=subdf,
aes(col=factor(geneIndex)),
shape=1,stroke=0.8,size=2.1)+
scale_color_manual("geneIndex",values="red")+
theme_bw()
}
plotIndex(df,3)
You can forget about the ggnewscale library if you don't need a legend. This package might be able to do the above too.. you needa check.

Highlight specific dot in ggplot2

I am trying to plot a scatterplot using ggplot2 in R. I have data as follows in csv format
A B
-4.051587034 -2.388276692
-4.389339837 -3.742321425
-4.047207557 -3.460923901
-4.458420756 -2.462180905
-2.12090412 -2.251811973
I want to high light specific two dot with corresponds -2.462180905 and -3.742321425 and to in plot with different colors. Which should to different than default colors in the plot. I tried following code
library(ggplot2)
library(reshape2)
library(methods)
library(RSvgDevice)
Data<-read.csv("table.csv",header=TRUE,sep=",")
data1<-Data[,-3]
plot2<-ggplot(data1,aes(x = A, y = B)) + geom_point(aes(size=2,color=ifelse(y=-2.462180905,'red')))
graph<-plot2 + theme_bw()+opts(axis.line = theme_segment(colour = "black"),panel.grid.major=theme_blank(),panel.grid.minor=theme_blank(),panel.border = theme_blank())
ggsave(graph,file="figure.svg",height=6,width=7)
It is not working the way i want. It gives all dots in same color. Can anybody help?
Another way, which may be more or less efficient depending on your requirements, would be to add another geom_point():
x <- c(-4.051587034, -4.389339837, -4.047207557, -4.458420756, -2.12090412)
y <- c(-2.388276692, -3.742321425, -3.460923901, -2.462180905, -2.251811973)
d <- data.frame(x, y)
require("ggplot2")
h <- c(2, 4) # put row numbers in here or use condition
ggplot() +
geom_point(data = d, aes(x, y), colour = "red", size = 5) +
geom_point(data = d[h, ], aes(x, y), colour = "blue", size = 5)
# notice the colour is outside the aesthetic arguments
Which gives you this:
Add a different column with the same value for all points except the highlighted point, assign the colour aesthetic to that column, then change the colours manually.
data1$highlight <- data1$B == -2.462180905 # FALSE except for the one you want
ggplot(data1, aes(x = A, y = B)) +
geom_point(aes(colour = highlight), size = 2) +
scale_colour_manual(values = c("FALSE" = "black", "TRUE" = "red"))
Note that the condition in the first line will have to be exact in order to get TRUE at the right row. Either ensure the value is exact or use a condition that will match the desired row.
Also note that opts is deprecated. Use theme instead. But that's another question.

Adding points to GGPLOT2 Histogram

I'm trying to produce a histogram that illustrates observed points(a sub-set) on a histogram of all observations. To make it meaningful, I need to color each point differently and place a legend on the plot. My problem is, I can't seem to get a scale to show up on the plot. Below is an example of what I've tried.
subset <-1:8
results = data.frame(x_data = rnorm(5000),TestID=1:5000)
m <- ggplot(results,aes(x=x_data))
m+stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data, y = 0),
colour = as.factor(results$TestID[results$TestID %in% subset]),
size = 5)+
scale_colour_brewer(type="seq", palette=3)
Ideally, I'd like the points to be positioned on the density line(but I'm really unsure of how to make that work, so I'll settle to position them at y = 0). What I need most urgently is a legend which indicates the TestID that corresponds to each of the points in subset.
Thanks a lot to anyone who can help.
This addresses your second point - if you want a legend, you need to include that variable as an aesthetic and map it to a variable (colour in this case). So all you really need to do is move colour = as.factor(results$TestID[results$TestID %in% subset]) inside the call to aes() like so:
ggplot(results,aes(x=x_data)) +
stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data,
y = 0,
colour = as.factor(results$TestID[results$TestID %in% subset])
),
size = 5) +
scale_colour_brewer("Fancy title", type="seq", palette=3)

Resources