Gnuplot bar chart with personalize interval on x-axis - plot

I'm new using gnuplot and i would like to replicate this plot: https://images.app.goo.gl/DqygL2gfk3jZ7jsK6
I have a file.dat with continuous value between 0 and 100 and i would like to plot it, subdivided in intervals ( pident> 98, 90 < pident < 100...) Etc. And on y-axis the total occurrences.
I looked everywhere finding a way but still I cannot do it.
Thank you !
sample of the data, with the value and the counts:
33.18 5
43.296 1
33.19 1
27.168 5
71.429 11
30.698 9
47.934 1
43.299 3
30.699 3
37.092 2
24.492 2
24.493 2
24.494 7
47.938 1
24.497 1
37.097 8
37.099 2
33.824 7
51.111 15
59.025 2
62.553 2
62.554 2
57.867 2
33.826 2
62.555 1
33.827 5
62.556 2
33.828 1
59.028 1
46.429 11
51.117 1
75.158 2
27.621 1
27.623 1
27.624 2
37.5 113
37.6 2
32.313 8
27.626 3
37.7 3
32.314 1
67.797 3
27.628 2
32.316 2
37.9 1
61.044 1
43.81 5
32.317 8
32.318 2
43.82 4
32.319 2
43.83 2
37.551 3
61.048 1
48.993 6
29.43 2
This is the code tried so far (where i also calculate the mean):
#!/usr/bin/gnuplot -persist
set noytics
# Find the mean
mean= system("awk '{sum+=$1*$2; tot+=$2} END{print sum/tot}' hist.dat")
set arrow 1 from mean,0 to mean, graph 1 nohead ls 1 lc rgb "blue"
set label 1 sprintf(" Mean: %s", mean) at mean, screen 0.1
# Histogram
binwidth=10
bin(x,width)=width*floor(x/width)
plot 'hist.dat' using (bin($1,binwidth)):(1.0) smooth freq with boxes
This is the result:

The following script takes your data and sums up the second column within the defined bins.
If you have values of equal 100 in the first column, those values would be in the bin 100-<110.
With Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5, the bins are shifted by half a binwidth to let the boxes on the x-axis range from the beginning of the bin to the end of the bin (and not centered at the beginning of the respective bin).
If you explicitely want to have xtics labels like in the example graph you've shown, i.e. 10-<20, 20-<30 etc. you would have to fiddle around with the xtic labels.
Edit: Forgot the mean value. There is no need for calling awk. Gnuplot can do this for you as well, check help stats.
Code:
### create histogram
reset session
$Data <<EOD
33.18 5
43.296 1
33.19 1
27.168 5
71.429 11
30.698 9
47.934 1
43.299 3
30.699 3
37.092 2
24.492 2
24.493 2
24.494 7
47.938 1
24.497 1
37.097 8
37.099 2
33.824 7
51.111 15
59.025 2
62.553 2
62.554 2
57.867 2
33.826 2
62.555 1
33.827 5
62.556 2
33.828 1
59.028 1
46.429 11
51.117 1
75.158 2
27.621 1
27.623 1
27.624 2
37.5 113
37.6 2
32.313 8
27.626 3
37.7 3
32.314 1
67.797 3
27.628 2
32.316 2
37.9 1
61.044 1
43.81 5
32.317 8
32.318 2
43.82 4
32.319 2
43.83 2
37.551 3
61.048 1
48.993 6
29.43 2
EOD
# Histogram
BinWidth = 10
Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5
# Mean
stats $Data u ($1*$2):2 nooutput
mean = STATS_sum_x/STATS_sum_y
set arrow 1 from mean, graph 0 to mean, graph 1 nohead lw 2 lc rgb "red" front
set label 1 sprintf("Mean: %.1f", mean) at mean, graph 1 offset 1,-0.7
set xlabel "Identity / %"
set xrange [0:100]
set xtics 10 out
set ylabel "The number of blast hits"
set style fill solid 0.3
set boxwidth BinWidth
set key noautotitle
set grid x,y
plot $Data using (Bin($1)):2 smooth freq with boxes lc "blue"
### end of code
Result:

Related

gnuplot: stacked newhistogram, starting with same fillstyle pattern, not appearing in shared key

I am trying to create multiple columnstacked histograms in gnuplot using the newhistogram command.
The data share the same structure so I want the boxes to share the same key like in this question, but with a pattern fillstyle (not only color).
My script looks like this:
set style data histogram
set style histogram columnstacked
set style fill pattern
set boxwidth 0.75
set xtics ("fff" 0, "ggg" 1, "hhh" 3, "iii" 4)
set grid y
set border 3 # remove top and right plot-border
set key outside title 'key' width -30
plot \
newhistogram "data A" fs pattern 2 lt 1, \
"plot_test_data.out" u 2, '' u 3, \
newhistogram "data B" fs pattern 2 lt 1, \
"plot_test_data_2.out" u 2:key(1), '' u 3
which produces the following output:
Could anybody explain why the key doesn't show the correct pattern? (I tried adding fs pattern in each using directive and removing the linetype, but nothing bears the desired output).
I am using
G N U P L O T
Version 4.6
patchlevel 3
last modified 2013-04-12
Build System: Linux x86_64
with 2 test data sets
plot_test_data.out:
0 1 3 2 2
1 1 6 4 4
2 2 5 4 5
3 1 1 3 4
4 1 2 4 5
plot_test_data_2.out:
0 1 4 3 1
1 3 6 4 2
2 2 5 6 4
3 2 5 5 3
4 1 3 4 2
thank you!

ggplot stacked bar plot with x value and no y/fill value

I am trying to make a stacked bar plot with the X axis being time, Y axis being amount, and fill color being certain features.
My data looks something like this:
> base
Number Mut Time Percent
1 117 22:A->G 2 81.81
2 13 24:G->A 2 9.09
3 10 22:A->G 24:G->A 108:G->A 158:G->A 162:G->A 2 6.99
4 1 22:A->G 24:G->A 2 0.69
5 32 24:G->A 3 94.11
6 1 24:G->A 162:G->T 3 2.94
7 1 24:G->A 82:G->T 3 2.94
When I do a stacked bar graph in ggplot using the code:
ggplot(base,aes(x = Time, fill = Mut, y = Percent)) + geom_bar(stat='identity') + theme(legend.key.size = unit(.5, "cm")) + ylab("Number")
I get a graph that looks like this:
http://imgur.com/32XCkTm,yfCAJsx#0
My problem is I want there to be zero values for time = 1 and time = 4.
Something similar to this:
http://imgur.com/32XCkTm,yfCAJsx#1
Is there a way I can do this? Right now I just added 0 values to the data for times 1 and 4 and added my fill feature(Mut) to be one that already showed up in the data:
> base
Reads Mut Time Percent
1 0 22:A->G 1 0.00
2 117 22:A->G 2 81.81
3 13 24:G->A 2 9.09
4 10 22:A->G 24:G->A 108:G->A 158:G->A 162:G->A 2 6.99
5 1 22:A->G 24:G->A 2 0.69
6 32 24:G->A 3 94.11
7 1 24:G->A 162:G->T 3 2.94
8 1 24:G->A 82:G->T 3 2.94
9 0 22:A->G 4 0.00
My problem is I dont want to have to keep searching for a feature (Mut) that is already in the data. is there a way to just have ggplot automatically put x values for time=1 and time =4 with no bar graphs without having to add values to the data? I have been searching for hours and cant find any answers.
Thanks.
Add + scale_x_discrete(limits = 1:4)

How to plot using multiple criteria in R?

Following are first 15 rows of my data:
> head(df,15)
frame.group class lane veh.count mean.speed
1 [22,319] 2 5 9 23.40345
2 [22,319] 2 4 9 24.10870
3 [22,319] 2 1 11 14.70857
4 [22,319] 2 3 8 20.88783
5 [22,319] 2 2 6 16.75327
6 (319,616] 2 5 15 22.21671
7 (319,616] 2 2 16 23.55468
8 (319,616] 2 3 12 22.84703
9 (319,616] 2 4 14 17.55428
10 (319,616] 2 1 13 16.45327
11 (319,616] 1 1 1 42.80160
12 (319,616] 1 2 1 42.34750
13 (616,913] 2 5 18 30.86468
14 (319,616] 3 3 2 26.78177
15 (616,913] 2 4 14 32.34548
'frame.group' contains time intervals, 'class' is the vehicle class i.e. 1=motorcycles, 2=cars, 3=trucks and 'lane' contains lane numbers. I want to create 3 scatter plots with frame.group as x-axis and mean.speed as y-axis, 1 for each class. In a scatterplot for one vehicle class e.g. cars, I want 5 plots i.e. one for each lane. I tried following:
cars <- subset(df, class==2)
by(cars, lane, FUN = plot(frame.group, mean.speed))
There are two problems:
1) R does not plot as expected i.e. 5 plots for 5 different lanes.
2) Only one is plotted and that too is box-plot probably because I used intervals instead of numbers as x-axis.
How can I fix the above issues? Please help.
Each time a new plot command is issued, R replaces the existing plot with the new plot. You can create a grid of plots by doing par(mfrow=c(1,5)), which will be 1 row with 5 plots (other numbers will have other numbers of rows and columns). If you want a scatterplot instead of a boxplot you can use plot.default
It is easier to do all this with the ggplot2 library instead of the base graphics, and the resulting plot will look much nicer:
library(ggplot2)
ggplot(cars,aes(x=frame.group,y=mean.speed))+geom_point()+facet_wrap(~lane)
See the ggplot2 documentation for more details: http://docs.ggplot2.org/current/

R stacked percentage bar plot with percentage of binary factor and labels (with ggplot)

I want to produce a graphic that looks something like this:
My original data set looks something like this:
> bb[sample(nrow(bb), 20), ]
IMG QUANT FIX
25663 1 1 0
7936 2 2 0
23586 3 2 0
23017 2 2 1
31363 1 3 1
7886 2 2 0
23819 3 3 1
29838 2 2 1
8169 2 3 1
9870 2 3 0
31440 2 1 0
35564 3 1 0
24066 1 2 0
12020 3 2 0
6742 3 2 0
6189 2 3 0
26692 2 3 0
1387 3 2 0
31839 2 3 1
28637 3 2 0
So the idea is that the bars display where FIX = 1 per factor QUANT and per
factor IMG.
I've aggregated my data set into percentages using plyr
library(plyr)
bb.perc <- ddply(bb,.(QUANT,IMG),summarise,FIX.PROP = sum(FIX) / length(FIX))
It does almost the right thing:
QUANT IMG FIX.PROP
1 1 1 0.52439024
2 1 2 0.19085366
3 1 3 0.13658537
4 2 1 0.20414201
5 2 2 0.53964497
6 2 3 0.09585799
7 3 1 0.29000000
8 3 2 0.13000000
9 3 3 0.40705882
But now if I make a graph, it doesn't account for the FIX==0 cases, i.e. all bars have the same height, namely 100%, which isn't what I want. Note how the individual QUANT subframes don't add up to 100%:
> sum(bb.perc[1:3,]$FIX.PROP)
[1] 0.8518293
> sum(bb.perc[4:6,]$FIX.PROP)
[1] 0.839645
> sum(bb.perc[7:9,]$FIX.PROP)
[1] 0.8270588
The best I could do with R is to display counts:
# Take only the positive samples
bb.pos <- bb[bb$FIX == 1,]
# Plot the counts
ggplot(bb,aes(factor(QUANT),fill=factor(IMG))) + geom_bar() +
scale_y_continuous(labels=percent)
And results in:
This is also not what I want:
The percentage scale is way off. I need a way to pass the 100% point to the
percent function, but I have no idea how.
It lacks the labels.
There are a great deal of similar questions on SO already, but I seem to lack
the sufficient amount of intelligence (or understanding of R) to extrapolate
from them to a solution to my particular problem.
Thanks for any pointers!
EDIT: Sven Hohenstein provided an answer already, but here's how I ended up doing it myself as well:
> ggplot(bb.perc,aes(x=factor(QUANT),y=FIX.PROP,label=paste(round(FIX.PROP*100),
"%"),fill=factor(IMG)))+ geom_bar(stat="identity") + geom_text(position="stack",
aes(ymax=1),vjust=5) + scale_y_continuous(labels = percent)
Using the bb.perc that I defined further up using plyr. This one has the
advantage that the percentages are computed locally per column, and not
globally.
Thanks everyone for the help. The following two questions and their respective
answers helped me greatly in getting it right:
Stacked Bar Graph Labels with ggplot2
Adding labels to ggplot bar chart
What I did wrong initially, was pass the position = "fill" parameter to
geom_bar(), which for some reason made all the bars have the same height!
This is a way to generate the plot:
ggplot(bb[bb$FIX == 1, ],aes(x = factor(QUANT), fill = factor(IMG),
y = (..count..)/sum(..count..))) +
geom_bar() +
stat_bin(geom = "text",
aes(label = paste(round((..count..)/sum(..count..)*100), "%")),
vjust = 5) +
scale_y_continuous(labels = percent)
Change the value of the vjust parameter to adjust the vertical position of the labels.

How can I sort the X axis in a Barplot in R?

I have binned data that looks like this:
(8.048,18.05] (-21.95,-11.95] (-31.95,-21.95] (18.05,28.05] (-41.95,-31.95]
81 76 18 18 12
(-132,-122] (-122,-112] (-112,-102] (-162,-152] (-102,-91.95]
6 6 6 5 5
(-91.95,-81.95] (-192,-182] (28.05,38.05] (38.05,48.05] (58.05,68.05]
5 4 4 4 4
(78.05,88.05] (98.05,108] (-562,-552] (-512,-502] (-482,-472]
4 4 3 3 3
(-452,-442] (-412,-402] (-282,-272] (-152,-142] (48.05,58.05]
3 3 3 3 3
(68.05,78.05] (118,128] (128,138] (-582,-572] (-552,-542]
3 3 3 2 2
(-532,-522] (-422,-412] (-392,-382] (-362,-352] (-262,-252]
2 2 2 2 2
(-252,-242] (-142,-132] (-81.95,-71.95] (148,158] (-1402,-1392]
2 2 2 2 1
(-1372,-1362] (-1342,-1332] (-942,-932] (-862,-852] (-822,-812]
1 1 1 1 1
(-712,-702] (-682,-672] (-672,-662] (-632,-622] (-542,-532]
1 1 1 1 1
(-502,-492] (-492,-482] (-472,-462] (-462,-452] (-442,-432]
1 1 1 1 1
(-432,-422] (-352,-342] (-332,-322] (-312,-302] (-302,-292]
1 1 1 1 1
(-202,-192] (-182,-172] (-172,-162] (-51.95,-41.95] (88.05,98.05]
1 1 1 1 1
(108,118] (158,168] (168,178] (178,188] (298,308]
1 1 1 1 1
(318,328] (328,338] (338,348] (368,378] (458,468]
1 1 1 1 1
How can I plot this data so that the bin is sorted from most negative on the left to most positive on the right? Currently my graph looks like this. Notice that it is not sorted at all. In particular the second bar (value = 76) is placed to the right of the first:
(8.048,18.05] (-21.95,-11.95]
81 76
This is the command I use to plot:
barplot(x,ylab="Number of Unique Tags", xlab="Expected - Observed")
I really want to help answer your question, but I gotta tell you, I can't make heads or tails of your data. I see a lot of opening parenthesis but no closing ones. The data looks sorted descending by whatever the values are on the bottom of each row. I have no idea what to make out of a value like "(8.048,18.05]"
Am I missing something obvious? Can you make a more simple example where your data structure is not a factor?
I would generally expect a data frame or a matrix with two columns, one for the X and one for the Y.
See if this example of sorting helps (I'm sort of shooting in the dark here)
tN <- table(Ni <- rpois(100, lambda=5))
r <- barplot(tN)
#stop here and examine the plot
#the next bit converts the matrix to a data frame,
# sorts it, and plots it again
df<-data.frame(tN)
df2<-df[order(df$Freq),]
barplot(df2$Freq)

Resources