Use ggplot2 to generate histogram from each row of data properly - r

I have a set of data like below:
pos A C G T
0 0.291398 0.190061 0.315722 0.202818
1 0.315597 0.227511 0.175448 0.281445
2 0.252149 0.194597 0.222815 0.330438
Then I imported the table:
library(ggplot2)
d = read.table(tablename, sep = '\t', header = T)
d = d[2:5]
data.frame(t(d))
And I got a reformatted table as below:
X1 X2 X3
A 0.291398 0.315597 0.252149
C 0.190061 0.227511 0.194597
G 0.315722 0.175448 0.222815
T 0.202818 0.281445 0.330438
However, when I tried to plot it:
qplot(X1, data = d, geom = 'histogram')
It gives the image below:
And what I want should be like:(I used libreoffice, so the color and the width and other parameters do not matter)
May I know how to correct my code to make this shape?
Any help is appreciated. Sorry but I am really new to R and ggplot2.

You aren't telling the plot what you want as your Y value. The X1 choice is the value you got, not the base, and everything is present once, so you get all 1s.
You want X1 as your Y and base as your X.
To fix your plot, from d:
d$base<-rownames(d)
ggplot(d,aes(x=base,y=X1))+geom_bar(stat="identity")
or using qplot nomenclature:
d$base<-rownames(d)
qplot(data = d, x = base, y = X1, geom = 'histogram', stat = "identity")
Edit: Here's how I would plot it for all rows:
library(reshape2)
d1 <- melt(d, id = "pos")
ggplot(d1, aes(x = variable, y = value, fill = factor(pos))) +
geom_bar(stat = "identity", position = "dodge")

Related

R control jitter function - avoid overplotting / non-random jitter

My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)
Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.
But here is my question:
As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.
Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this?
Not really a minimal example, but also not too long:
library("imputeTS")
library("ggplot2")
data <- tsAirgap
# 2.1 Create required data
# Get all indices of the data that comes directly before and after an NA
na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))
# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))
# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)
# 2.4 Create dataframe for ggplot2
# join the data together in one dataframe
df <- rbind(before, after, all)
# Create the plot
gg <- ggplot(data = df) +
geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5)
gg <- gg + ggplot2::scale_color_manual(
values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)
gg <- gg + ggplot2::scale_alpha_manual(
values = c("before" = 1, "after" = 1,"source" = 0.3),
)
gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()
So many good suggestions...here is what Bens suggestion would look like for my example:
I changed parts of my code to:
gg <- ggplot(data = df, aes(x = input, color = type, fill = type, alpha = type)) +
geom_dotplot(binwidth = 15)
Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.
I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.
df <- data.frame(x = rnorm(200),
col = sample(LETTERS[1:3], 200, replace = TRUE),
y = 1)
ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
ggrepel::geom_text_repel(segment.color = NA,
box.padding = 0.01, key_glyph = "point")
A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.
ggplot(df, aes(x,y, color = col)) +
ggbeeswarm::geom_quasirandom(groupOnX = FALSE)

Creating and labelling points on a geom_line() graph [duplicate]

This question already has an answer here:
label specific point in ggplot2
(1 answer)
Closed 1 year ago.
Basically, I want to make a graph that shows where different "analysts" chose a certain point on the graph.
This is what the base graph looks like
.
This is what I want to produce
.
I have a separate dataframe called sum_data that summarizes the time choices made by each analyst. It looks like this. The following is the code used to create the plot:
gqplot <- ggplot(Qdata,
aes(x = date,
y = cfs))+
labs(#title = paste(watershedID,"_",event),
x = "Date",
y = "Flow [cfs]")+
geom_line(colour = "#000099")+
# Show plot
gqplot
Hey what you need is a data.frame that contains the choices of those 2 people (like your sum_data) and then use geom_point().
Here is an example. I made up data and code etc because you didn't provide a completely reproducible example.
# Library
library(ggplot2)
# Seed for exact reproducibility
set.seed(20210307)
# Main data frame
main_data <- data.frame(x = 1:10, y = rnorm(10))
# Analyst data frame
analyst_choice <- data.frame(x = c(2, 3), y = main_data[2:3, 'y'], analyst = c('John', 'Paul'))
# Create plot
ggplot(main_data, aes(x = x, y = y)) +
geom_line() +
geom_point(data = analyst_choice, aes(x = x, y = y, colour = analyst), size = 10, shape = 4)
That's what this code produces:

matching of shape, color and legend in bubble plot with subset of variable

I have some data
library(data.table)
wide <- data.table(id=c("A","C","B"), var1=c(1,6,1), var2=c(2,6,5), size1=c(11,12,13), size2=c(10,12,10), flag=c(FALSE,TRUE,FALSE))
> wide
id var1 var2 size1 size2 flag
1: A 1 2 11 10 FALSE
2: C 6 6 12 12 TRUE
3: B 1 5 13 10 FALSE
which I would like to plot as bubble plots where id is ordered by var2, and bubbles are as follows:
ID A and B: var1 is plotted in size1 and "empty bubbles" and var2 is plotted in size2 with "filled" bubbles.
ID C is flagged because there is only one value (this is why var1=var2) and it should have a "filled bubble" of a different color.
I have tried this as follows:
cols <- c("v1"="blue", "v2"="red", "flags"="green")
shapes <- c("v1"=16, "v2"=21, "flags"=16)
p1 <- ggplot(data = wide, aes(x = reorder(id,var2))) + scale_size_continuous(range=c(5,15))
p1 <- p1 + geom_point(aes(size=size1, y = var1, color = "v1", shape = "v1"))
p1 <- p1 + geom_point(aes(size=size2, y = var2, color = "v2", shape = "v2", stroke=1.5))
p1 <- p1 + geom_point(data=subset(wide,flag), aes(size=size2[flag], y=var2[flag], color= "flags", shape="flags"))
p1 <- p1 + scale_color_manual(name = "test",
values = cols,
labels = c("v1", "v2", "flags"))
p1 <- p1 + scale_shape_manual(name = "test",
values = shapes,
labels = c("v1", "v2", "flags"))
which gives (in my theme)
but two questions remain:
What happened to the order in the legend? I have followed the recipe of the bottom solution in Two geom_points add a legend but somehow the order does not match.
How to get rid of the stroke around the green bubble and why is it there?
Overall, something appears to go wrong in matching shape and color.
I admit, it took me a while to understand your slightly convoluted plot. Forgive me, but I have allowed myself to change the way to plot, and make (better?) use of ggplot.
The data shape is less than ideal. ggplot works extremely well with long data.
It was a bit of a guesswork to reshape your data, and I decided to go the quick and dirty way to simply bind the rows from selected columns.
Now you can see, that you can achieve the new plot with a single call to geom_point. The rest is "scale_aesthetic" magic...
In order to combine the shape and color legend, safest is to use override.aes. But beware! It does not take named vectors, so the order of the values needs to be in the exact order given by your legend keys - which is usually alphabetic, if you don't have the factor levels defined.
update re: request to order x labels
This hugely depends on the actual data structure. if it is originally as you have presented, I'd first make id a factor with the levels ordered based on your var2. Then, do the data shaping.
library(tidyverse)
# data reshape
wide <- data.frame(id=c("C","B","A"), var1=c(1,6,1), var2=c(2,6,5), size1=c(11,12,13), size2=c(10,12,10), flag=c(FALSE,TRUE,FALSE))
wide <- wide %>% mutate(id = reorder(id, var2))
wide1 <- wide %>% filter(!flag) %>%select(id, var = var1, size = size1)
wide2 <- wide %>% filter(!flag) %>% select(id, var = var2, size = size2)
wide3 <- wide %>% filter(flag) %>% select(id, var = flag, size = size2) %>%
mutate(var = 6)
long <- bind_rows(list(v1 = wide1, v2 = wide2, flag = wide3), .id = "var_id")
# rearrange the vectors for scales aesthetic
cols <- c(flag="green", v1 ="blue", v2="red" )
shapes <- c(flag=16, v1=16, v2 =21 )
ggplot(data = long, aes(x = id, y = var)) +
geom_point(aes(size=size, shape = var_id, color = var_id), stroke=1.5) +
scale_size_continuous(limits = c(5,15),breaks = seq(5,15,5)) +
scale_shape_manual(name = "test", values = shapes) +
scale_color_manual(values = cols, guide = FALSE) +
guides(shape = guide_legend(override.aes = list(color = cols)))
P.S. the reason for the red stroke around the green bubble in your plot is that you also plotted the 'var2' behind your flag.
Created on 2020-04-08 by the reprex package (v0.3.0)

ggplot facet_wrap with italics

I have a dataset I'm plotting, with facets by variables (in the toy dataset - densities of 2 species). I need to use the actual variable names to do 2 things: 1) italicize species names, and 2) have the 2 in n/m2 properly superscripted (or ASCII-ed, whichever easier).
It's similar to this, but I can't seem to make it work for my case.
toy data
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10,
z = rep(c("Species1 density (n/m2)", "Species2 density (m/m2)"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2))
ggplot(df) + geom_point(aes(x = x, y = y)) + facet_grid(z1 ~ z)
I get an error (variable z not found) when I try to use the code in the answer naively. How do I get around having 2 variables in the facetting?
A little modification gets the code from your link to work. I've changed the code to use data_frame to stop the character vector being converted to a factor, and taken the common information out of the codes so it can be added via the labeller (otherwise it would be a pain to make half the text italic)
library(tidyverse)
df <- data_frame(
x = 1:10,
y = 1:10,
z = rep(c("Species1", "Species2"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2)
)
ggplot(df) +
geom_point(aes(x = x, y = y)) +
facet_grid(z1 ~ z, labeller = label_bquote(col = italic(.(z))~density~m^2))

plotting with ggplot2. Error

I am trying to plot the data using the ggplot2 package, but I am crossing with an error:
the data are set of columns which represents every day values (the values change in altitude)
V1 V2.... V500
2E-15.....3E-14
3e-14.....3E-21
1.3E-15....NA
I want to plot all the data in two axis with a fill of the values.
Code;
a<-data.frame("/../vertical_value.csv",sep=",",header=F)
am<-melt(t(a))
dataset<-expand.grid(X = 1:500, H = seq(1,25,by=1))
dataset$axp<-am$value
g<-ggplot(dataset, aes(x = X, y = H, fill = axp)) + geom_tile()
error:
Error: Casting formula contains variables not found in molten data: XHaxp
Looking at this again, I think that you should be able to bypass this just by dropping NA rows after you melt.
a<-data.frame("/../vertical_value.csv",sep=",",header=F)
am<-melt(t(a))
am <- na.omit(am) ## ADD THIS LINE
dataset<-expand.grid(X = 1:500, H = seq(1,25,by=1))
dataset$axp<-am$value
g<-ggplot(dataset, aes(x = X, y = H, fill = axp)) + geom_tile()

Resources