I have a data frame that contains x and y coordinates for a random walk that moves in discrete steps (1 step up, down, left, or right). I'd like to plot the path---the points connected by a line. This is easy, of course. The difficulty is that the path crosses over itself and becomes difficult to interpret. I add jitter to the points to avoid overplotting, but it doesn't help distinguish the ordering of the walk.
I'd like to connect the points using a line that changes color over "time" (steps) according to a thermometer-like color scale.
My random walk is stored in its own class and I'm writing a specific plot method for it, so if you have suggestions for how I can do this using plot, that would be great. Thanks!
This is pretty easy to do in ggplot2:
so <- data.frame(x = 1:10,y = 1:10,col = 1:10)
ggplot(so,aes(x = x, y = y)) +
geom_line(aes(group = 1,colour = col))
If you prefer not to use ggplot, then ?segments will do what you want. -- I'm assuming here that x and y are both functions of time, as implied in your example.
If you use ggplot, you can set the colour aesthetic:
library(ggplot2)
walk <-cumsum(rnorm(n=100, mean=0))
dat <- data.frame(x = seq_len(length(walk)), y = walk)
ggplot(dat, aes(x,y, colour = x)) + geom_line()
Related
I have a table 1, where each row corresponds to the feature vector of gene in particular patient. The patient IDs located in the first column (label), while gene index located in the second column (geneIndex). The rest of the columns have feature values in various dimensions (128 overall).
I was able to perform the tsne reduction on these data to 2D and label clusters according to patient IDs. Here is the code:
library(Rtsne)
experiment<- read.table("test.txt", header=TRUE, sep= "\t")
metadata <- data.frame(sample_id = rownames(experiment),
colour = experiment$label)
data <- as.matrix(experiment[,2:129])
set.seed(1)
tsne <- Rtsne(data)
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
colour = metadata$colour)
library(ggplot2)
ggplot(df, aes(x, y, colour = colour)) +
geom_point()
However, my goal is to visualize feature vectors related to geneIndex. For example, I would like to pinpoint geneIndex "3" in red color, while the rest of the points on the plot will have grey color.
I would appreciate any suggestions!
Thank you!
Looking at the data, seems like there's not a lot of 3's and so if you just plot with others getting a transparent gray and selected have red.. i think it's hard to see:
df$geneIndex = experiment$geneIndex
plotIndex = function(data,selectedGene){
data$Gene = ifelse(data$geneIndex == selectedGene,selectedGene,"others")
ggplot(data, aes(x, y, colour = Gene))+
geom_point(alpha=0.3,size=1)+
scale_color_manual(values=c("#FF0000E6","#BEBEBE1A"))+
theme_bw()
}
plotIndex(df,3)
Maybe try circling the plots by plotting again, in combination with a new legend:
library(ggnewscale)
plotIndex = function(data,selectedGene){
subdf = subset(data,geneIndex == selectedGene)
ggplot(data, aes(x, y, colour = colour)) +
geom_point(alpha=0.3,size=2,shape=20)+
new_scale_color()+
geom_point(data=subdf,
aes(col=factor(geneIndex)),
shape=1,stroke=0.8,size=2.1)+
scale_color_manual("geneIndex",values="red")+
theme_bw()
}
plotIndex(df,3)
You can forget about the ggnewscale library if you don't need a legend. This package might be able to do the above too.. you needa check.
I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)
Consider the following example of plotting 100 overlapping points:
ggplot(data.frame(x=rnorm(100), y=rnorm(100)), aes(x=x, y=y)) +
geom_point(size=100) +
xlim(-10, 10) +
ylim(-10, 10)
I now want to save the image as vector graphics, e.g. in PDF. This is not a problem with the above example, but once I've got over a million points (e.g. from a volcano plot), the file size can exceed 100 MB for one page and it takes ages to display or edit.
In the above example the same shape could could still be represented by either
converting the points to a shape outline, or
keeping a couple of points and discarding the rest.
Is there any way (or preferably tool that already does this) to remove points from a plot that will never be visible? (ideally supporting transparency)
The best approach I have heard so far is to round the position of the dots and remove grid points that have > N points, then use the original positions of the remaining ones. Is there anything better?
Note that this should work with an arbitrary structure of points, and only remove those that are not visible.
You could do something with the convex hull, like this, filling in the polygon that makes up the convex hull:
library(ggplot2)
set.seed(123)
df <- data.frame(x = rnorm(100), y = rnorm(100))
idx <- chull(df)
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 100,color="darkgrey") +
geom_polygon(data=df[idx,],color="blue") +
geom_point(size = 1, color = "red", size = 2) +
xlim(-10, 10) +
ylim(-10, 10)
yielding:
(Note that I pulled this chull-idea out of Hadley's "Extending ggplot2" guide https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html.)
In your case you would drop the geom_point calls and set transparency on the geom_polygon. Also not sure how fast chull is for millions of points, though it will clearly be faster than plotting them all.
And I am not really sure what you are after. If you really want the 100 pixel radius, they you could probably just do it for the ones on the complex hull, plus fill in the middle with geom_polygon.
So using this code:
ggplot(df[idx,], aes(x = x, y = y)) +
geom_point(size = 100, color = "black") +
geom_polygon(fill = "black") +
xlim(-10, 10) +
ylim(-10, 10)
to make this:
I have some data I want to graph on a semi-log scale, however I get some artifacts when there is a large jump between points. On linear scale, a straight line is drawn between subsequent points, which is a fine approximation for visualization. However, the exact same thing is done when using the log scale (either by using scale_x_log10 or scale_x_continuous with a log transformation). A line between two points on the semi-log scale should show up curved. In other words, this:
df <- data.frame(x = c(0, 1), y = c(0, 1))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
produces this:
when I would expect something more like this:
generated by this code:
df <- data.frame(x = seq(0, 1, 0.01), y = seq(0, 1, 0.01))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
It's clear what's happening, but I'm not sure what the best way to fix the interpolation is. In the actual data I'm plotting there are a few jumps at various points, which makes the plots very misleading when trying to compare two lines. (They're ROC curves in this instance.)
One thought is I can search the data for jumps and fill in some interpolated points myself, but I'm hoping for a cleaner way that doesn't involve me adding in a bunch of fake data points.
What you describe is a transformation of the coordinate system, not a transformation of the scales. The distinction is that scale transformations take place before any statistical transformations, and coordinate transformations take place afterward. In this case, the "statistical transformation" is "draw a straight line between the points". With a transformed scale, the line is straight in the transformed (log) space; with a transformed coordinate, it is straight in the original (linear) space and therefore curved in log space.
# don't include 0 in the data because log 0 is -Inf
DF <- data.frame(x = c(0.1, 1), y = c(0.1, 1))
ggplot(data = DF, aes(x = x, y = y)) +
geom_line() +
coord_trans(x="log10")
I am trying to create a graph where because there are so many points on the graph, at the edges of the green it starts to fade to black while the center stays green. The code I am currently using to create this graph is:
plot(snb$px,snb$pz,col=snb$event_type,xlim=c(-2,2),ylim=c(1,6))
I looked into contour plotting but that did not work for this. The coloring variable is a factor variable.
Thanks!
This is a great problem for ggplot2.
First, read the data in:
snb <- read.csv('MLB.csv')
With your data frame you could try plotting points that are partly transparent, and setting them to be colored according to the factor event_type:
require(ggplot2)
p1 <- ggplot(data = snb, aes(x = px, y = py, color = event_type)) +
geom_point(alpha = 0.5)
print(p1)
and then you get this:
Or, you might want to think about plotting this as a heatmap using geom_bin2d(), and plotting facets (subplots) for each different event_type, like this:
p2 <- ggplot(data = snb, aes(x = px, y = py)) +
geom_bin2d(binwidth = c(0.25, 0.25)) +
facet_wrap(~ event_type)
print(p2)
which makes a plot for each level of the factor, where the color will be the number of data points in each bins that are 0.25 on each side. But, if you have more than about 5 or 6 levels, this might look pretty bad. From the small data sample you supplied, I got this
If the levels of the factors don't matter, there are some nice examples here of plots with too many points. You could also try looking at some of the examples on the ggplot website or the R cookbook.
Transparency could help, which is easily achieved, as #BenBolker points out, with adjustcolor:
colvect = adjustcolor(c("black", "green"), alpha = 0.2)
plot(snb$px, snb$pz,
col = colvec[snb$event_type],
xlim = c(-2,2),
ylim = c(1,6))
It's built in to ggplot:
require(ggplot2)
p <- ggplot(data = snb, aes(x = px, y = pz, color = event_type)) +
geom_point(alpha = 0.2)
print(p)