I have a file that looks like this
2 3 LOGIC:A
2 5 LOGIC:A
3 4 LOGIC:Z
I plotted column 1 on x axis vs column 2 on y with column 3 acting as a legend
ggplot(Data, aes(V1, V2, col = V3)) + geom_point()
However is it possible in ggplot itself to subtract column 2 and column 1 and label the top 10 highest absolute difference rows of this subtraction with column 3 values on each scatter point. I dont want to label the entire dataset. Just the top 10 highest deltas
You can try this (if you original dataframe is Data):
library(dplyr)
library(ggplot2)
Data$sub <- abs(Data$V2 - Data$V1)
Data2<- Data %>%
top_n(10,sub)
ggplot()+ geom_text(data=Data2,aes(V1,V2-0.1,label=V3))+
geom_point(data=Data,aes(V1,V2))
With the library dplyr you can filter the top values of a dataframe.
You can change "0.1" for a better value in your plot
Related
I am currently stuck on formatting a grouped bar chart.
I have a dataframe, which I would like to visualize:
iteration position value
1 1 eEP_SRO 20346
2 1 eEP_drift 22410
3 1 eEP_hole 29626
4 2 eEP_SRO 35884
5 2 eEP_drift 39424
6 2 eEP_hole 51491
7 3 eEP_SRO 51516
8 3 eEP_drift 55523
9 3 eEP_hole 74403
The position should be shown as color and the value should be represented in the height of the bar.
My code is:
fig <- ggplot(df_eEP_Location_plot, aes(fill=position, y=value, x=iteration, order=position)) +
geom_bar(stat="identity")
which gives me this result:
I would like to have a correct y-axis labelling and would also like to sort my bars from largest to smallest (ignoring the iteration number). How can I achieve this?
Thank you very much for your help!
I would recommend using fct_reorder from the forcats package to reorder your iterations along the specified values prior to plotting in ggplot. See the following with the sample data you've provided:
library(ggplot2)
library(forcats)
iteration <- factor(c(1,1,1,2,2,2,3,3,3))
position <- factor(rep(c("eEP_SRO","eEP_drift","eEP_hole")))
value <- c(20346,22410,29626,35884,39424,51491,51516,55523,74403)
df_eEP_Location_plot <- data.frame(iteration, position, value)
df_eEP_Location_plot$iteration <- fct_reorder(df_eEP_Location_plot$iteration,
-df_eEP_Location_plot$value)
fig <- ggplot(df_eEP_Location_plot, aes(y=value, x=iteration, fill=position)) +
geom_bar(stat="identity")
fig
I have a data set containing gene expression data for various genes, across 24 different samples. In my current dataframe, each row is a gene and each column is a sample.
I want to create a dot plot where each dot is a gene, the y-axis represents the expression of that gene in sample A, and the x-axis represents the expression of the same gene in sample B.
I have tried to search for this but don't know what such a plot is called or how I can find it. Most of my other plots are plotted with ggplot2, but it does not matter what package is used to solve the problem.
Example data:
sample_A<-c(2,3,1)
sample_B<-c(-1,4,-3)
genes <- c("gene1","gene2","gene3")
df<-data.frame(sample_A,sample_B,row.names = genes)
Data frame:
sample_A sample_B
gene1 2 -1
gene2 3 4
gene3 1 -3
geom_point with ggplot2 is probably what you're looking for. The dots can also be labelled using geom_label.
require(ggplot2)
p <- ggplot(df, aes(x = sample_B, y = sample_A))+
geom_point()+
geom_label(aes(label = rownames(df)))
I want to make a simple histogram which involves two vectors ,
values <- c(1,2,3,4,5,6,7,8)
freq <- c(4,6,4,4,3,2,1,1)
df <- data.frame(values,freq)
Now the data.farame df consists the following values :
values freq
1 4
2 6
3 4
4 4
5 3
6 2
7 1
8 1
Now I want to draw a simple histogram, in which values are on the x axis and freq is on y axis. I am trying to use the hist function, but I am not able to give two variables. How can I make a simple histogram from this data?
using ggplot2:
library(ggplot2)
ggplot(df, aes(x = values, y = freq)) +
geom_bar(stat="identity")
Since you have the frequencies already, what you really want is a bar plot:
barplot(df$freq,names.arg=df$values)
If you've got your heart set on using hist, you should do:
hist(rep(df$values,df$freq))
Please read ?barplot and ?hist for further plotting options.
Also, because I'm somewhat of a zealot, I think the code looks cleaner if you use data.table:
library(data.table)
setDT(df) #convert df to a data.table by reference
df[,barplot(freq,names.arg=values)]
and
df[,hist(rep(values,freq))]
I'm a bit out of my depth with this one here. I have the following code that generates two equally sized matrices:
MAX<-100
m<-5
n<-40
success<-matrix(runif(m*n,0,1),m,n)
samples<-floor(MAX*matrix(runif(m*n),m))+1
the success matrix is the probability of success and the samples matrix is the corresponding number of samples that was observed in each case. I'd like to make a bar graph that groups each column together with the height being determined by the success matrix. The color of each bar needs to be a color (scaled from 1 to MAX) that corresponds to the number of observations (i.e., small samples would be more red, for instance, whereas high samples would be green perhaps).
Any ideas?
Here is an example with ggplot. First, get data into long format with melt:
library(reshape2)
data.long <- cbind(melt(success), melt(samples)[3])
names(data.long) <- c("group", "x", "success", "count")
head(data.long)
# group x success count
# 1 1 1 0.48513473 8
# 2 2 1 0.56583802 58
# 3 3 1 0.34541582 40
# 4 4 1 0.55829073 64
# 5 5 1 0.06455401 37
# 6 1 2 0.88928606 78
Note melt will iterate through the row/column combinations of both matrices the same way, so we can just cbind the resulting molten data frames. The [3] after the second melt is so we don't end up with repeated group and x values (we only need the counts from the second melt). Now let ggplot do its thing:
library(ggplot2)
ggplot(data.long, aes(x=x, y=success, group=group, fill=count)) +
geom_bar(position="stack", stat="identity") +
scale_fill_gradient2(
low="red", mid="yellow", high="green",
midpoint=mean(data.long$count)
)
Using #BrodieG's data.long, this plot might be a little easier to interpret.
library(ggplot2)
library(RColorBrewer) # for brewer.pal(...)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=count),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)
Note that actual values are probably different because you use random numbers in your sample. In future, consider using set.seed(n) to generate reproducible random samples.
Edit [Response to OP's comment]
You get numbers for x-axis and facet labels because you start with matrices instead of data.frames. So convert success and samples to data.frames, set the column names to whatever your test names are, and prepend a group column with the "list of factors". Converting to long format is a little different now because the first column has the group names.
library(reshape2)
set.seed(1)
success <- data.frame(matrix(runif(m*n,0,1),m,n))
success <- cbind(group=rep(paste("Factor",1:nrow(success),sep=".")),success)
samples <- data.frame(floor(MAX*matrix(runif(m*n),m))+1)
samples <- cbind(group=success$group,samples)
data.long <- cbind(melt(success,id=1), melt(samples, id=1)[3])
names(data.long) <- c("group", "x", "success", "count")
One way to set a threshold color is to add a column to data.long and use that for fill:
threshold <- 25
data.long$fill <- with(data.long,ifelse(count>threshold,max(count),count))
Putting it all together:
library(ggplot2)
library(RColorBrewer)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=fill),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)+
theme(axis.text.x=element_text(angle=-90,hjust=0,vjust=0.4))
Finally, when you have names for the x-axis labels they tend to get jammed together, so I rotated the names -90°.
I have 3 column data. The first column, depth, should be on the x axis. The other two columns are nr and r. I need to plot the data in a stacked barplot with A on the bottom and B on the top of nr. The data is very large (ie. the read depth goes from 0 to 1022), so I can't type everything out specifically in r or on here. Here's an example of what the data would look like:
Depth r nr
6 2395 2904
8 0 3095
9 2689 0
12 3894 3578
15 5 4739
the r and the nr have to be on the y axis, and the depth has to be on the x axis. I've tried everything I can think of and am unable to get a 'height' to use or to just get the basic equation.
Work in long format
#using reshape2::melt
library(reshape2)
# assuming your original data.frame is called `D`
longD <- melt(D, id.var = 1)
ggplot(longD, aes(x = Depth, y = value, colour = variable, fill = variable)) +
geom_bar(stat = 'identity')
Using barchart from lattice you can deal with wide format :
library(lattice)
barchart(r+nr~factor(Depth),data=dt,stack=TRUE,auto.key=TRUE)
equivalent to this , using long format from #mnel answer:
barchart(value~factor(Depth),data=longD,
groups=variable,stack=TRUE,auto.key=TRUE)
Just to show base R graphics can match it as well, and assuming your data.frame is called dat:
barplot(
t(dat)[2:3,],
names.arg=t(dat)[1,],
space=c(0,diff(t(dat)[1,])),
axis.lty=1
)