I am plotting various plots in the shiny app that I have developed, the raw dataset that I have, have all the data points in meters, for eg. one of my raw data set looks like this:
df <- data.frame(X = c(0.000000000000,4.99961330240E-005,9.99922660480E-005,0.000149988399072,0.00019998453209, 0.000249980665120,0.000299976798144,0.000349972931168,0.000399969064192,0.000449965197216,0.000499961330240,0.000549957463264,0.000599953596288,0.000649949729312,0.000699945862336,0.000749941995360,0.000799938128384,0.000849934261408,0.000899930394432,0.000949926527456,0.000999922660480,0.00104991879350,0.00109991492653,0.00114991105955),
Y = c(0.00120303964354,0.00119632557146,0.00119907223731,0.00120059816279,0.00119785149693,0.00119876705222,0.00119327372051,0.00118900112918,0.00118930631428,0.00119174779504,0.00119113742485,0.00119541001617,0.00119815668203,0.00119052705466,0.00119205298013,0.00118930631428,0.00119174779504,0.00119388409070,0.00118778038881,0.00122287667470,0.00122684408094,0.00122623371075,0.00122867519150,0.00122379222999))
My attempt to plot:
g <- ggplot(data = df) + theme_bw() +
geom_point(aes_string(x= df[,1], y= df[,2]), colour= "red", size = 0.1)
ggplotly(g)
And the plot looks like this:
What I want:
The data that I have in the datafile is in meters, but on the plot, I need Y-axis data to be shown in Micrometer and X-axis data to be shown in Millimeter. And the dataframe that I have illustrated above is just a small part of my actual dataframe. In the actual dataframe, data is very big.
Is there any way we can do this automatically without having the user to change the units manually?
In the end, I want 'Y' values to be multiplied by 10^6 and 'X' value to be multiplied by 10^3 in order to convert them into micrometers and millimeters respectively.
I got two possible answers to my question:
1st is:
g <- ggplot(data = df) + theme_bw() +
geom_point(aes_string(x= df[,1]*10^3, y= df[,2]*10^6), colour= "red", size = 0.1)
ggplotly(g)
2nd is:
M <- data.frame(x= df[,1]* 10^3, y= df[,2]* 10^6)
g <- ggplot(data = M) + theme_bw() +
geom_point(aes_string(x= M[,1], y= M[,2]), colour= "red", size = 0.1)
ggplotly(g)
Related
My data consists of three numeric variables. Something like this:
set.seed(1)
df <- data.frame(x= rnorm(10000), y= rnorm(10000))
df$col= df$x + df$y + df$x*df$y
Plotting this as a heatplot looks good:
ggplot(df, aes(x, y, col= col)) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
But real variables can have some skewness or outliers and this totally changes the plot. After df$col[nrow(df)] <- 100 same ggplot code as above returns this plot:
Clearly, the problem is that this one point changes the scale and we get a plot with little information. My solution is to rank the data with rank() which gives a reasonable color progression for any variable I`ve tried so far. See here:
ggplot(df, aes(x, y, col= rank(col))) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
The problem with this solution that the new scale (2,500 to 10,000) is shown as the color label. I want the original scale to be shown as color label (o to 10). Therefor, I want that the color progression corresponds to the ranked data; i.e. I need to somehow map the original values to the ranked color values. Is that possible? I tried to change limits argument to limits= c(0, 10) inside scale_color_distiller() but this does not help.
Sidenotes: I do not want to remove the outlier. Ranking works well. I wan to use scale_color_distiller(). If possible, I want not to use any additional packages than ggplot2.
rescale the rank to the range of your original df$col.
library(tidyverse)
set.seed(1)
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
df %>%
mutate(
col = x + y + x * y,
scaled_rank = scales::rescale(rank(col), range(col))
) %>%
ggplot(aes(x, y, col = scaled_rank)) +
geom_point(size = 2) +
scale_color_distiller(palette = "Spectral")
Created on 2021-11-17 by the reprex package (v2.0.1)
I've been trying to generate a Manhattan plot using ggplot, which I finally got to work. However, I cannot get the points to be colored by chromosome, despite having tried several different examples I've seen online. I'm attaching my code and the resulting plot below. Can anyone see why the code is failing to color points by chromosome?
library(tidyverse)
library(vroom)
# threshold to drop really small -log10 p values so I don't have to plot millions of uninformative points. Just setting to 0 since I'm running for a small subset
min_p <- 0.0
# reading in data to brassica_df2, converting to data frame, removing characters from AvsDD p value column, converting to numeric, filtering by AvsDD (p value)
brassica_df2 <- vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD"))
brassica_df2 <- as.data.frame(brassica_df2)
brassica_df2$AvsDD <- gsub("1:2=","",as.character(brassica_df2$AvsDD))
brassica_df2$AvsDD <- as.numeric(brassica_df2$AvsDD)
brassica_df2 <- filter(brassica_df2, AvsDD > min_p)
# setting significance threshhold
sig_cut <- -log10(1)
# settin ylim for graph
ylim <- (max(brassica_df2$AvsDD) + 2)
# setting up labels for x axis
axisdf <- as.data.frame(brassica_df2 %>% group_by(chromosome) %>% summarize(center=( max(position) + min(position) ) / 2 ))
# making manhattan plot of statistically significant SNP shifts
manhplot <- ggplot(data = filter(brassica_df2, AvsDD > sig_cut), aes(x=position, y=AvsDD), color=as.factor(chromosome)) +
geom_point(alpha = 0.8) +
scale_x_continuous(label = axisdf$chromosome, breaks= axisdf$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(axisdf$chromosome)))) +
geom_hline(yintercept = sig_cut, lty = 2) +
ylab("-log10 p value") +
ylim(c(0,ylim)) +
theme_classic() +
theme(legend.position = "n")
print(manhplot)
I think you just need to move your color=... argument inside the call to aes():
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD),
color=as.factor(chromosome))
becomes...
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD, color=as.factor(chromosome)))
Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")
I am using hexbin() to bin data into hexagon objects, and ggplot() to plot the results. I notice that, sometimes, the binning data frame contains a different number of hexagons than the plot that results from plotting that same binning data frame. Below is an example.
library(hexbin)
library(ggplot2)
set.seed(1)
data <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100), D=rnorm(100), E=rnorm(100))
maxVal = max(abs(data))
maxRange = c(-1*maxVal, maxVal)
x = data[,c("A")]
y = data[,c("E")]
h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, xbnds=maxRange, ybnds=maxRange)
hexdf <- data.frame (hcell2xy (h), hexID = h#cell, counts = h#count)
# Both objects below indicate there are 17 hexagons
# hexdf
# table(h#cID)
# However, plotting only shows 16 hexagons
ggplot(hexdf, aes(x=x, y=y, fill = counts, hexID=hexID)) + geom_hex(stat="identity") + scale_x_continuous(limits = maxRange) + scale_y_continuous(limits = maxRange)
In this example, the hexdf data frame contains 17 hexagons. However, the ggplot(hexdf) resulting plot only shows 16 hexagons, as is shown below.
Note: Syntax in the above example may seem cumbersome, but some of it is because this is a MWE for a more complex goal and I am intentionally keeping those components so that any possible solution might extend to my more complex goal. For instance, I want to maintain the capability to allow for the maxRange variable to be computed from the original data frame called data (which contains additional columns "B", "C", and "D"). At the same time, there may be parts of my syntax that are unnecessarily cumbersome and may be causing the problem - so I am happy to try to fix them to see.
Any ideas what might be causing this discrepancy and how to fix it? Thank you!
The last hexagon is missing as it's (partly) outside the limits you set. It's included if you change the limits, e.g. like so:
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
scale_x_continuous(limits = maxRange * 1.5) +
scale_y_continuous(limits = maxRange * 1.5)
or by using coord_cartesian instead:
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
coord_cartesian(xlim = c(maxRange[1], maxRange[2]), ylim = c(maxRange[1], maxRange[2]))
In the following, by selecting free_y, the maximum values of each scale adjust as expected, however, how can I get the minimum values to also adjust? at the moment, they both start at 0, when I really want the upper facet to start at about 99 and go to 100, and the lower facet to start at around 900 and go to 1000.
library(ggplot2)
n = 100
df = rbind(data.frame(x = 1:n,y = runif(n,min=99,max=100),variable="First"),
data.frame(x = 1:n,y = runif(n,min=900,max=1000),variable="Second"))
ggplot(data=df,aes(x,y,fill=variable)) +
geom_bar(stat='identity') +
facet_grid(variable~.,scales='free')
You could use geom_linerange rather than geom_bar. A general way to do this is to first find the min of y for each value of variable and then merge the minimums with the original data. Code would look like:
library(ggplot2)
min_y <- aggregate(y ~ variable, data=df, min)
sp <- ggplot(data=merge(df, min_y, by="variable", suffixes = c("","min")),
aes(x, colour=variable)) +
geom_linerange(aes(ymin=ymin, ymax=y), size=1.3) +
facet_grid(variable ~ .,scales='free')
plot(sp)
Plot looks like: