I have a table 1, where each row corresponds to the feature vector of gene in particular patient. The patient IDs located in the first column (label), while gene index located in the second column (geneIndex). The rest of the columns have feature values in various dimensions (128 overall).
I was able to perform the tsne reduction on these data to 2D and label clusters according to patient IDs. Here is the code:
library(Rtsne)
experiment<- read.table("test.txt", header=TRUE, sep= "\t")
metadata <- data.frame(sample_id = rownames(experiment),
colour = experiment$label)
data <- as.matrix(experiment[,2:129])
set.seed(1)
tsne <- Rtsne(data)
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
colour = metadata$colour)
library(ggplot2)
ggplot(df, aes(x, y, colour = colour)) +
geom_point()
However, my goal is to visualize feature vectors related to geneIndex. For example, I would like to pinpoint geneIndex "3" in red color, while the rest of the points on the plot will have grey color.
I would appreciate any suggestions!
Thank you!
Looking at the data, seems like there's not a lot of 3's and so if you just plot with others getting a transparent gray and selected have red.. i think it's hard to see:
df$geneIndex = experiment$geneIndex
plotIndex = function(data,selectedGene){
data$Gene = ifelse(data$geneIndex == selectedGene,selectedGene,"others")
ggplot(data, aes(x, y, colour = Gene))+
geom_point(alpha=0.3,size=1)+
scale_color_manual(values=c("#FF0000E6","#BEBEBE1A"))+
theme_bw()
}
plotIndex(df,3)
Maybe try circling the plots by plotting again, in combination with a new legend:
library(ggnewscale)
plotIndex = function(data,selectedGene){
subdf = subset(data,geneIndex == selectedGene)
ggplot(data, aes(x, y, colour = colour)) +
geom_point(alpha=0.3,size=2,shape=20)+
new_scale_color()+
geom_point(data=subdf,
aes(col=factor(geneIndex)),
shape=1,stroke=0.8,size=2.1)+
scale_color_manual("geneIndex",values="red")+
theme_bw()
}
plotIndex(df,3)
You can forget about the ggnewscale library if you don't need a legend. This package might be able to do the above too.. you needa check.
Related
I'm trying to create a graph in R using the ggplot2 package. I can make the graph without any issues as per the simplified example below:
library(tidyverse)
A <- c(10,9,8,7,6,5,4,3,2,1,2,3,4,5,6,7,8,9,10)
B <- c(15,14,13,12,11,10,9,8,7,6,7,8,9,10,11,12,13,14,15)
C <- rep(5,19)
D <- c(1:19)
data1 <- tibble(A, B, C, D)
data1 <- gather(data1, Type, Value, -D)
plot <- ggplot(data = data1, aes(x = D, y = Value))+
theme_light()+
geom_line(aes(colour = Type), size=0.75)+
theme(legend.title = element_blank())
The original graphs look like this but they are made in pretty much the same way as in the example:
The issue is that I want to crop A (red in example; solid blue line in the original graphs) when it hits C (blue in example; solid red line in original graph) without affecting B (green in example; everything else in original graph).
The complication is that I don't really want to change the structure of my original dataset so I'm hoping that there's a way of doing this while I'm creating the plot?
Many thanks,
Carolina
Yes, you can do it within the ggplot call by passing a filtered version of the data frame to the data argument of geom_line:
ggplot(data = data1, aes(x = D, y = Value))+
theme_light() +
geom_line(aes(colour = Type),
data = data1 %>%
filter(!(Type == "A" & Value > mean(Value[Type == "C"]))),
size = 0.75)+
theme(legend.title = element_blank())
However, from looking at your original plot, I think the tidiest way to do this would be to narrow the x axis limits to "chop off" either end of the lower blue line where it meets the horizontal red line, since this will not affect the dashed upper blue line at all. You can cut off the data with lims(x = c(0.25, 12.5) but retain the lower limit of 0 on your axis by setting coord_cartesian(xlim = c(0, 12.5))
If I have a dataframe like this:
obs<-rnorm(20)
d<-data.frame(year=2000:2019,obs=obs,pred=obs+rnorm(20,.1))
d$pup<-d$pred+.5
d$plow<-d$pred-.5
d$obs[20]<-NA
d
And I want the observation and model prediction error bars to look something like:
(p1<-ggplot(data=d)+aes(x=year)
+geom_point(aes(y=obs),color='red',shape=19)
+geom_point(aes(y=pred),color='blue',shape=3)
+geom_errorbar(aes(ymin=plow,ymax=pup))
)
How do I add a legend/scale/key identifying the red points as observations and the blue plusses with error bars as point predictions with ranges?
Here is one solution melting pred/obs into one column. Can't post image due to rep.
library(ggplot2)
obs <- rnorm(20)
d <- data.frame(dat=c(obs,obs+rnorm(20,.1)))
d$pup <- d$dat+.5
d$plow <- d$dat-.5
d$year <- rep(2000:2019,2)
d$lab <- c(rep("Obs", 20), rep("Pred", 20))
p1<-ggplot(data=d, aes(x=year)) +
geom_point(aes(y = dat, colour = factor(lab), shape = factor(lab))) +
geom_errorbar(data = d[21:40,], aes(ymin=plow,ymax=pup), colour = "blue") +
scale_shape_manual(name = "Legend Title", values=c(6,1)) +
scale_colour_manual(name = "Legend Title", values=c("red", "blue"))
p1
edit: Thanks for the rep. Image added
Here is a ggplot solution that does not require melting and grouping.
set.seed(1) # for reproducible example
obs <- rnorm(20)
d <- data.frame(year=2000:2019,obs,pred=obs+rnorm(20,.1))
d$obs[20]<-NA
library(ggplot2)
ggplot(d,aes(x=year))+
geom_point(aes(y=obs,color="obs",shape="obs"))+
geom_point(aes(y=pred,color="pred",shape="pred"))+
geom_errorbar(aes(ymin=pred-0.5,ymax=pred+0.5))+
scale_color_manual("Legend",values=c(obs="red",pred="blue"))+
scale_shape_manual("Legend",values=c(obs=19,pred=3))
This creates a color and shape scale wiith two components each ("obs" and "pred"). Then uses scale_*_manual(...) to set the values for those scales ("red","blue") for color, and (19,3) for scale.
Generally, if you have only two categories, like "obs" and "pred", then this is a reasonable way to go use ggplot, and avoids merging everything into one data frame. If you have more than two categories, or if they are integral to the dataset (e.g., actual categorical variables), then you are much better off doing this as in the other answer.
Note that your example left out the column year so your code does not run.
I'm trying to produce a histogram that illustrates observed points(a sub-set) on a histogram of all observations. To make it meaningful, I need to color each point differently and place a legend on the plot. My problem is, I can't seem to get a scale to show up on the plot. Below is an example of what I've tried.
subset <-1:8
results = data.frame(x_data = rnorm(5000),TestID=1:5000)
m <- ggplot(results,aes(x=x_data))
m+stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data, y = 0),
colour = as.factor(results$TestID[results$TestID %in% subset]),
size = 5)+
scale_colour_brewer(type="seq", palette=3)
Ideally, I'd like the points to be positioned on the density line(but I'm really unsure of how to make that work, so I'll settle to position them at y = 0). What I need most urgently is a legend which indicates the TestID that corresponds to each of the points in subset.
Thanks a lot to anyone who can help.
This addresses your second point - if you want a legend, you need to include that variable as an aesthetic and map it to a variable (colour in this case). So all you really need to do is move colour = as.factor(results$TestID[results$TestID %in% subset]) inside the call to aes() like so:
ggplot(results,aes(x=x_data)) +
stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data,
y = 0,
colour = as.factor(results$TestID[results$TestID %in% subset])
),
size = 5) +
scale_colour_brewer("Fancy title", type="seq", palette=3)
I am trying to place a symbol on the lowest point in a certain time series, which I have plotted with ggplot's geom_line. However, the geom_point is not showing up on the plot. I have myself successfully used geom_point for this kind of thing before by following hadley's example here (search for 'highest <- subset' to get the relevant assignment) so I know very well that it can be done. I'm just at a loss to spot what I have done differently here that is causing it not to display. I'm guessing it's something straightforward like a missing argument or similar - easy points for a pair of fresh eyes, I think.
Minimal example follows:
require(ggplot2)
fstartdate <- as.Date('2009-06-01')
set.seed(12345)
x <- data.frame(mydate=seq(as.Date("2003-06-01"), by="month", length.out=103),myval=runif(103, min=180, max=800))
lowest <- subset(x, myval == min(x[x$mydate >= fstartdate,]$myval))
thisplot <- ggplot() +
geom_line(data = x, aes(mydate, myval), colour = "blue", size = 0.7) +
geom_point(data = lowest, size = 5, colour = "red")
print(thisplot)
The point appears if you add the aesthetic:
thisplot + geom_point(
data = lowest,
aes(mydate, myval),
size = 5, colour = "red"
)
I have a data frame that contains x and y coordinates for a random walk that moves in discrete steps (1 step up, down, left, or right). I'd like to plot the path---the points connected by a line. This is easy, of course. The difficulty is that the path crosses over itself and becomes difficult to interpret. I add jitter to the points to avoid overplotting, but it doesn't help distinguish the ordering of the walk.
I'd like to connect the points using a line that changes color over "time" (steps) according to a thermometer-like color scale.
My random walk is stored in its own class and I'm writing a specific plot method for it, so if you have suggestions for how I can do this using plot, that would be great. Thanks!
This is pretty easy to do in ggplot2:
so <- data.frame(x = 1:10,y = 1:10,col = 1:10)
ggplot(so,aes(x = x, y = y)) +
geom_line(aes(group = 1,colour = col))
If you prefer not to use ggplot, then ?segments will do what you want. -- I'm assuming here that x and y are both functions of time, as implied in your example.
If you use ggplot, you can set the colour aesthetic:
library(ggplot2)
walk <-cumsum(rnorm(n=100, mean=0))
dat <- data.frame(x = seq_len(length(walk)), y = walk)
ggplot(dat, aes(x,y, colour = x)) + geom_line()