I am thinking which way to do the addition of errorbars better by thinking the format of data.
The standard way of adding errors bars is discussed here, for instance.
My original data is in ranges
Model Decreasing Constant Increasing
2025 73-78 80-85 87-92
2035 63-68 80-85 97-107
2050 42-57 75-90 104.5-119.5
where the values are ranges.
I cannot plot directly in Gnuplot so I have to split it to averages and to error values in two files:
Averages:
Model Decreasing Constant Increasing
2025 75.5 82.5 89.5
2035 65.5 82.5 102
2050 49.5 82.5 112
and error configuration in ybar
Model Decreasing Constant Increasing
2025 2.5 2.5 2.5
2035 2.5 2.5 5
2050 7.5 7.5 7.5
I normally plot data like this as a one file
plot for [i=2:4] 'data.dat' using 1:i w linespoints
but now I should go through two files at the same time while doing the plot.
The normal syntax of plotting errorbars is
plot 'data' using 1:2:0:($1+$3):4:5 with yerrorlines
and manual here.
How can you plot from two files with errorbars in Gnuplot?
Feel free to propose if you know better way to do the addition of these errorbars in gnuplot.
Output to Cristoph's answer
where error bars missing in the first and third points.
Gnuplot 5 supports that you specify several characters as data file separators.
So, if you are sure you'll never get negative values (which I hopen given the format of your data), then you can use your original data file and set both white space and hyphen as datafile separator:
set datafile separator " -"
plot for [i=2:6:2] "data" using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i))) with yerrorlines
First of all, I wonder about your columns used for plotting with yerrorlines. If your first data for 2025 is 75.5+/-2.5, you usually plot it with
plot "datafile" using <xcolum>:<ycolum>:<yerrorcolumn>
Your six columns are for the case of xy errorbars and specify the point itself and the lower and upper absolute values in x and y. But may be you are just doing it as you need it...
Now back to your question:
Gnuplot can not handle data from two files simultaneously, i.e. it can not take xy-values from one file and y-errors from another.
If you're running linux, the command line tool join can help.
Your averages stored in file A and the errors in file B, join A B will concatenate lines with the same value in the first colum like this:
2025 75.5 82.5 89.5 2.5 2.5 2.5
So,
plot "<join A B" using 1:2:5 with yerrorlines
should do the job. ("<join A B" will call the join command in the background and read its output like a data file)
Related
So I have plotted a curve, and have had a look in both my book and on stack but can not seem to find any code to instruct R to tell me the value of y when along curve at 70 x.
curve(
20*1.05^x,
from=0, to=140,
xlab='Time passed since 1890',
ylab='Population of Salmon',
main='Growth of Salmon since 1890'
)
So in short, I would like to know how to command R to give me the number of salmon at 70 years, and at other times.
Edit:
To clarify, I was curious how to command R to show multiple Y values for X at an increase of 5.
salmon <- data.frame(curve(
20*1.05^x,
from=0, to=140,
xlab='Time passed since 1890',
ylab='Population of Salmon',
main='Growth of Salmon since 1890'
))
salmon$y[salmon$x==70]
1 608.5285
This salmon data.frame gives you all of the data.
head(salmon)
x y
1 0.0 20.00000
2 1.4 21.41386
3 2.8 22.92768
4 4.2 24.54851
5 5.6 26.28392
6 7.0 28.14201
If you can also use inequalities to check the number of salmon in given ranges using the syntax above.
It's also simple to answer the 2nd part of your question using this object:
salmon$z <- salmon$y*5 # I am using * instead of + to make the plot more clear
plot(x=salmon$x,y=salmon$z, xlab='Time passed since 1890', ylab='Population of Salmon',type="l")
lines(salmon$x,salmon$y, col="blue")
curve is plotting the function 20*1.05^x
so just plug any value you want in that function instead of x, e.g.
> 20*1.05^70
[1] 608.5285
>
20*1.05^(seq(from=0, to=70, by=10))
Was all I had to do, I had forgotten until Ed posted his reply that I could type a function directly into R.
I've been struggling to get a plot that shows my data accurately, and spent a while getting gap.plot up and running. After doing so, I have an issue with labelling the points.
Just plotting my data ends up with this:
Plot of abundance data, basically two different tiers of data at ~38,000, and between 1 - 50
As you can see, that doesn't clearly show either the top or the bottom sections of my plots well enough to distinguish anything.
Using gap plot, I managed to get:
gap.plot of abundance data, 100 - 37000 missed, labels only appearing on the lower tier
The code for my two plots is pretty simple:
plot(counts.abund1,pch=".",main= "Repeat 1")
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
gap.plot(counts.abund1[,1],counts.abund1[,2],gap=c(100,38000),gap.axis="y",xlim=c(0,60),ylim=c(0,39000))
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
But I don't know why/can't figure out why the labels (which are just the letters that the points denote) are not being applied the same in the two plots.
I'm kind of out of my depth trying this bit, very little idea how to plot things like this nicely, never had data like it when learning.
The data this comes from is originally a large (10,000 x 10,000 matrix) that contains a random assortment of letters a to z, then has replacements and "speciation" or "immigration" which results in the first lot of letters at ~38,000, and the second lot normally below 50.
The code I run after getting that matrix to get the rank abundance is:
##Abundance 1
counts1 <- as.data.frame(as.list(table(neutral.v1)))
counts.abund1<-rankabundance(counts1)
With neutral.v1 being the matrix.
The data frame for counts.abund1 looks like (extremely poorly formatted, sorry):
rank abundance proportion plower pupper accumfreq logabun rankfreq
a 1 38795 3.9 NaN NaN 3.9 4.6 1.9
x 2 38759 3.9 NaN NaN 7.8 4.6 3.8
j 3 38649 3.9 NaN NaN 11.6 4.6 5.7
m 4 38639 3.9 NaN NaN 15.5 4.6 7.5
and continues for all the variables. I only use Rank and Abundance right now, with the a,x,j,m just the variable that applies to, and what I want to use as the labels on the plot.
Any advice would be really appreciated. I can't really shorten the code too much or provide the matrix because the type of data is quite specific, as are the quantities in a sense.
As I mentioned, I've been using gap.plot to just create a break in the axis, but if there are better solutions to plotting this type of data I'd be absolutely all ears.
Really sorry that this is a mess of a question, bit frazzled on the whole thing right now.
gap.plot() doesn't draw two plots but one plot by decreasing upper section's value, drawing additional box and rewriting axis tick labels. So, the upper region's y-coordinate is neither equivalent to original value nor axis tick labels. The real y-coordinate in upper region is "original value" - diff(gap).
gap.plot(counts.abund1[,1], counts.abund1[,2], gap=c(100,38000), gap.axis="y",
xlim=c(0,60), ylim=c(0,39000))
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
text(counts.abund1[,1], counts.abund1[,2] - diff(c(100, 38000)), labels=row.names(counts.abund1), cex=1.5)
# the example data I used
set.seed(1)
counts.abund1 <- data.frame(rank = 1:50,
abundance = c(rnorm(25, 38500, 100), rnorm(25, 30, 20)))
Using a for loop I need to make three separate plots with regression lines on the same figure, using a unique value that is found in one the group column. (3 plots with different data and regression lines).
Group Height Weight
A 5.6 59.5
A 5.9 68.7
B 4.8 57.1
B 5.0 42.9
C 7.3 43.3
C 7.1 39.7
I tried this code which only gives me the points for group C for some reason? I don't think this is what I really want though, as I need the points with a regression line.
for(i in unique(df$Group)){
example<-subset(df,df$Group==i)
plot(example$Height,example$Weight)
}
I need one set of data points for each Group A, B, and C on one figure with a regression line for each, using a for loop.
I'm fairly new to R but I am trying to create line graphs that monitor growth of bacteria over the course of time. I can successfully do this but the resulting graph isn't to my satisfaction. This is because I'm not using evenly spaced time increments although R plots these increments equally. Here is some sample data to give you and idea of what I'm talking about.
x=c(.1,.5,.6,.7,.7)
plot(x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")
axis(1,at=1:5,lab=c(0,24,72,96,120))
As you can see there are 48 hours between 24 and 72 but this is evenly distributed on the graph, is there anyway I can adjust the scale to more accurately display my data?
It's always best in R to use data structures that exhibit the relationships between your data. Instead of defining growth and time as two separate vectors, use a data frame:
growth <- c(.1,.5,.6,.7,.7)
time <- c(0,24,72,96,120)
df <- data.frame(time,growth)
print(df)
time growth
1 0 0.1
2 24 0.5
3 72 0.6
4 96 0.7
5 120 0.7
plot(df, type="o")
Not sure if this produces the exact x-axis labels that you want, but you should be free to edit the graph now without changing the relationship between the growth and time variables.
x=data.frame(x=c(.1,.5,.6,.7,.7), y=c(0,24,72,96,120))
plot(x$y, x$x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")
I have data in tab delimited format with nearly 400 columns filled with values ie
X Y Z A B C
2.34 .89 1.4 .92 9.40 .82
6.45 .04 2.55 .14 1.55 .04
1.09 .91 4.19 .16 3.19 .56
5.87 .70 3.47 .80 2.47 .90
Now I want visualize the data using box plot method.Though it is difficult to view 400 in single odf,I want split into 50 each.ie(50 x 8).Here is the code I used:
boxplot(data[1:50],xlab="Samples",xlim=c(0.001,70),log="xy",
pch='.',col=rainbow(ncol(data[1:50)))
but I got the following error:
In plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs)
: nonfinite axis limits [GScale(-inf,4.4591,2, .); log=1]
I want to view the box plots for 400 samples with 50 each in a 8 different pdf....Please do help me in getting better visualization.
Others have already pointed out that actual boxplots are not going to work well. However, there is a very efficient way to visually scan all of your variables: Simply plot their distributions as an image (i.e. heatmap). Here is an example showing how it is really quite easy to get the gist of 400 variables and 80,000 individual data points!
# Simulate some data
set.seed(12345)
n.var = 400
n.obs = 200
data = matrix(rnorm(n.var*n.obs), nrow=n.obs)
# Summarize data
breaks = seq(min(data), max(data), length.out=51)
histdata = apply(data, 2, function(x) hist(x, plot=F, breaks=breaks)$counts)
# Plot
dev.new(width=4, height=4)
image(1:n.var, breaks, t(histdata), xlab='Variable Index', ylab='Histogram Bin')
This will be most useful if all your variables are comparable, or are at least sorted into rational groups. hclust and the heatmap functions can also be helpful here for more complicated displays. Good luck!
I agree that you will have to do something a bit drastic to distinguish 400 boxes in the same graph. The code below uses two tricks: (1) reverse the usual x-y order so that it's easier to read the labels (plotted on the y axis); (2) send the output to a tall, skinny PDF file so that you can scroll through it at your leisure. I also opted to sort the variables by mean, to make the plot easier to interpret -- that would be optional, but I suspect you'd have a hard time looking up a particular category in a 400-box plot in any case ...
nc <- 400
z <- as.data.frame(matrix(rnorm(nc*100),ncol=nc))
library(ggplot2)
m <- melt(z)
m <- transform(m,variable=reorder(variable,value))
pdf(width=10,height=50,file="boxplot.pdf")
print(ggplot(m,aes(x=variable,y=value))+geom_boxplot()+coord_flip())
dev.off()
Considering that you are plotting 400 boxes in your box plot, I am not surprised that you are having trouble seeing them. Suppose that you have a monitor that is 1024 pixels wide. Your application will only be able to display the boxes as two pixels wide. Even with larger screens you will not increase the number of pixels by much (a screen with 2000 pixels will at most show you boxes that are 5 pixels wide).
I would suggest plotting your boxes on two or more separate plots.