Related
I am attempting to place individual points on a plot using ggplot2, however as there are many points, it is difficult to gauge how densely packed the points are. Here, there are two factors being compared against a continuous variable, and I want to change the color of the points to reflect how closely packed they are with their neighbors. I am using the geom_point function in ggplot2 to plot the points, but I don't know how to feed it the right information on color.
Here is the code I am using:
s1 = rnorm(1000, 1, 10)
s2 = rnorm(1000, 1, 10)
data = data.frame(task_number = as.factor(c(replicate(100, 1),
replicate(100, 2))),
S = c(s1, s2))
ggplot(data, aes(x = task_number, y = S)) + geom_point()
Which generates this plot:
However, I want it to look more like this image, but with one dimension rather than two (which I borrowed from this website: https://slowkow.com/notes/ggplot2-color-by-density/):
How do I change the colors of the first plot so it resembles that of the second plot?
I think the tricky thing about this is you want to show the original values, and evaluate the density at those values. I borrowed ideas from here to achieve that.
library(dplyr)
data = data %>%
group_by(task_number) %>%
# Use approxfun to interpolate the density back to
# the original points
mutate(dens = approxfun(density(S))(S))
ggplot(data, aes(x = task_number, y = S, colour = dens)) +
geom_point() +
scale_colour_viridis_c()
Result:
One could, of course come up with a meausure of proximity to neighbouring values for each value... However, wouldn't adjusting the transparency basically achieve the same goal (gauging how densely packed the points are)?
geom_point(alpha=0.03)
I've got a dataset of different energies (eV) and related counts. I changed the detection wavelength throughout the measurement which resulted in having a first column with all wavelength and than further columns. There the different rows are filled with NAs because no data was measured at the specific wavelength.
I would like to plot the spectra in R, but it doesn't work because the length of X and y values differs for each column.
It would be great, if someone could help me.
Thank you very much.
It would be better if we could work with (simulated) data you provided. Here's my attempt at trying to visualize your problem the way I see it.
library(ggplot2)
library(tidyr)
# create and fudge the data
xy <- data.frame(measurement = 1:20, red = rnorm(20), green = rnorm(20, mean = 10), uv = NA)
xy[16:20, "green"] <- NA
xy[16:20, "uv"] <- rnorm(5, mean = -3)
# flow it into "long" format
xy <- gather(xy, key = color, value = value, - measurement)
# plot
ggplot(xy, aes(x = measurement, y = value, group = color)) +
theme_bw() +
geom_line()
I started learning R for data analysis and, most importantly, for data visualisation.
Since I am still in the switching process, I am trying to reproduce the activities I was doing with Graphpad Prism or Origin Pro in R. In most of the cases everything was smooth, but I could not find a smart solution for plotting multiple y columns in a single graph.
What I usually get from the softwares I use for data visualisations look like this:
Each single black trace is a measurement, and I would like to obtain the same plot in R. In Prism or Origin, this will take a single copy-paste in a XY graph.
I exported the matrix of data (one X, which indicates the time, and multiple Y values, which are the traces you see in the image).
I imported my data in R with the following commands:
library(ggplot2) #loaded ggplot2
Data <- read.csv("Directory/File.txt", header=F, sep="") #imported data
DF <- data.frame(Data) #transformed data into data frame
If I plot my data now, I obtain a series of columns, where the first one (called V1) is the X axis and all the others (V2 to V140) are the traces I want to put on the same graph.
To plot the data, I tried different solutions:
ggplot(data=DF, aes(x=DF$V1, y=DF[V2:V140]))+geom_line()+theme_bw() #did not work
plot(DF, xy.coords(x=DF$V1, y=DF$V2:V140)) #gives me an error
plot(DF, xy.coords(x=V1, y=c(V2:V10))) #gives me an error
I tried the matplot, without success, following the EZH guide:
The code I used is the following: matplot(x=DF$V1, type="l", lty = 2:100)
The only solution I found would be to individually plot a command for each single column, but it is a crazy solution. The number of columns varies among my data, and manually enter commands for 140 columns is insane.
What would you suggest?
Thank you in advance.
Here there are also some data attached.Data: single X, multiple Y
I tried using the matplot(). I used a very sample data which has no trend at all. so th eoutput from my code shall look terrible, but my main focus is on the code. Since you have already tried matplot() ,just recheck with below solution if you had done it right!
set.seed(100)
df = matrix(sample(1:685765,50000,replace = T),ncol = 100)
colnames(df)=c("x",paste0("y", 1:99))
dt=as.data.frame(df)
matplot(dt[["x"]], y = dt[,c(paste0("y",1:99))], type = "l")
If you want to plot in base R, you have to make a plot and add lines one at a time, however that isn't hard to do.
we start by making some sample data. Since the data in the link seemed to all be on the same scale, I will assume your data frame only has y values and the x value is stored separately.
plotData <- as.data.frame(matrix(sort(rnorm(500)),ncol = 5))
xval <- sort(sample(200, 100))
Now we can initialize a plot with the first column.
plot(xval, plotData[[1]], type = "l",
ylim = c(min(plotData), max(plotData)))
type = "l" makes a line plot instead of a scatter plot
ylim = c(min(plotData), max(plotData)) makes sure the y-axis will fit all the data.
Now we can add the rest of the values.
apply(plotData[-1], 2, lines, x = xval)
plotData[-1] removes the column we already plotted,
apply function with 2 as the second parameter means we want to execute a function on every column,
lines defines the function we are applying to the columns. lines adds a new line to the current plot.
x = xval passes an extra parameter (x) to the lines function.
if you wat to plot the data using ggplot2, the data should be transformed to long format;
library(ggplot2)
library(reshape2)
dat <- read.delim('AP.txt', header = F)
# plotting only first 9 traces
# my rstudio will crach if I plot the full data;
df <- melt(dat[1:10], id.vars = 'V1')
ggplot(df, aes(x = V1, y = value, color = variable)) + geom_line()
# if you want all traces to be in same colour, you can use
ggplot(df, aes(x = V1, y = value, group = variable)) + geom_line()
I have data in R with overlapping points.
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)
How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.
Here's one way using ggplot2:
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()
By default, stat_sum uses the proportion of instances. You can use raw counts instead by doing something like:
ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))
Here's a simpler (I think) solution:
x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)
## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))
## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
## to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)
You didn't really ask for this approach but alpha may be another way to address this:
library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)
You need to add the parameter cex to your plot function. First what I would do is use the function as.data.frame and table to reduce your data to unique (x,y) pairs and their frequencies:
new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency
The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!
plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))
You may also want to try sunflowerplot.
sunflowerplot(x,y)
Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.
So, option 1: plot each point with transparency --- ninja'd by Tyler!
option 2: use jitter to push your data around slightly so the plotted points don't overlap.
A solution using lattice and table ( similar to #R_User but no need to remove 0 since lattice do the job)
dt <- as.data.frame(table(x,y))
xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)
I am a ggplot2 newbie. I am making a scatter plot where the points are colored based on a third continuous variable. However, for some of the points, that continuous variable has either an Inf value or a NaN. How can I generate a continuous scale that has a special, separate color for Inf and another separate color for NaN?
One way to get this behavior is to subset the data, and make a separate layer for the special points, where the color is set. But I'd like the special colors to enter the legend as well, and think it would be cleaner to eliminate the need to subset the data.
Thanks!
Uri
I'm sure this can be made more efficient, but here's one approach. Essentially, we follow your advice of subsetting the data into the different parts, divide the continuous data into discrete bins, then patch everything back together and use a scale of our own choosing.
library(ggplot2)
library(RColorBrewer)
#Sample data
dat <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
dat[sample(nrow(dat), 5), 3] <- NA
dat[sample(nrow(dat), 5), 3] <- Inf
#Subset out the real values
dat.good <- dat[!(is.na(dat$z)) & is.finite(dat$z) ,]
#Create 6 breaks for them
dat.good$col <- cut(dat.good$z, 6)
#Grab the bad ones
dat.bad <- dat[is.na(dat$z) | is.infinite(dat$z) ,]
dat.bad$col <- as.character(dat.bad$z)
#Rbind them back together
dat.plot <- rbind(dat.good, dat.bad)
#Make your own scale with RColorBrewer
yourScale <- c(brewer.pal(6, "Blues"), "red","green")
ggplot(dat.plot, aes(x,y, colour = col)) +
geom_point() +
scale_colour_manual("Intensity", values = yourScale)