How to plot minimum, maximum, and mean in r - r

I've been reading how to plot points in r, but can't find anything that matches my problem. My data is a matrix; the rows start with a column called 'site' and it is followed by three columns containing the parameters: minimum, mean, and maximum. There are four rows in the matrix, corresponding to 4 sites.
What I want is a graph that has the 4 sites on the x-axis and the three data points (min, mean max) above each site, connected by a line. The mean would be represented by a circle, while the min and max by a cross bar. Each of the means would be connected by a line. My output would look like a boxplot without the boxes and with a line connecting the means.
Can anyone help me? It seems like a simple problem but I'm stumped.

Define a random matrix:
set.seed(1)
n_sites <- 4
myMatrix <- cbind(t(replicate(n_sites,sort(rnorm(3)))),1:n_sites)
dimnames(myMatrix) <- list(paste("Site",1:n_sites),c("Min","Mean","Max","n"))
Plot:
plot(c(1,n_sites),range(myMatrix),type="n",xlab="",ylab="",xaxt="n",las=1)
axis(1,1:n_sites,rownames(myMatrix))
arrows(x0=1:n_sites,y0=myMatrix[,"Min"],x1=1:n_sites,y1=myMatrix[,"Max"],angle=90,code=3,length=0.1)
points(1:n_sites,myMatrix[,"Mean"],bg="white",pch=21,type="o")
text(1:n_sites,myMatrix[,"Max"],myMatrix[,"n"],pos=3)
I like using arrows() in cases like this.

Related

Correlation between 3 continuous variables

I am pretty new to statistics, and I am stuck with this.
I have the data containing birth weight, length of baby and head circumference.
I need to provide an answer to how are they related.
How can I do that?
Thank you very much :)
I was thinking doing the Pearson test between each pair.
You could
First just start with finding the correlations (Pearson or any other kind) between the pairs.
Then You probably would want to plot them on X & Y axes (scatter plot) for each pair to visualize. In the plot you will see if higher values of one variable are associated (or not) with the other variable's higher values.
Lastly, you could also check
Each variable's distribution using a box plot. This will tell you the mean, median and the standard deviation.

How to create a simples (ternary) plot with color-coded triangles in R?

I have a matrix with 4 variables whereas 3 variables are parameters and the 4th variable gives the mean sum of squares for simulation results with the corresponding variables. Now I'd like to create a ternary plot with R where the triangle corresponding to the 3 parameter values should be colored by the mean sum of squares value. Alternatively, I'd like to plot interpolated mean sum of squares in the whole simplex triangle.
I was already looking for some functions or code that does what I'm looking for. But I didn't succeed.
Nevertheless, here's an example code of how my data set looks like (for which I'd like to create the ternary plot):
grid <- as.matrix(expand.grid(seq(0,0.5,0.025), seq(0,0.5,0.025), seq(-0.25,0.25,0.025)))
data <- cbind (grid, runif(9261,0,2))
I'd be very thankful if you'd provide R code that can create the plot I'd like to get. Maybe there's even a pre-implemented function in a package that I haven't found?!
Thanks a lot in advance for your help!

R - hexbinplot of two datasets

How can I visualize overlapping values among two datasets in R. Preferably, I like to use a Hexbinplot (http://www.everydayanalytics.ca/2014/09/5-ways-to-do-2d-histograms-in-r.html)
Here I have a dataset with two variables.
Variable A: http://pastebin.com/0ayrgU9C
Variable B: http://pastebin.com/9WZQWXsA
In R you can load the data via
data1 <- read.table("http://pastebin.com/raw.php?i=0ayrgU9C", header=TRUE)
data2 <- read.table("http://pastebin.com/raw.php?i=9WZQWXsA", header=TRUE)
The values in the variables reach from 0.1 to a max of 1.0. The two sets have a different size (row length). Now, how can I visualize in which area the two sets overlap?
It should be red where the most values appear in both datasets. I assume that equal bins have to be created in order to do see overlapping within certain ranges, but I'm not sure how to do this either. I know that a Kernel Density histogram is an alternative but I want to find out how a Hexbinplot can solve it, too.
Would be great to see a solution with the provided dataset.

Is there a way to plot a frequency histogram from a continuous variable?

I have DNA segment lengths (relative to chromosome arm, 251296 entries), as such:
0.24592963
0.08555043
0.02128725
...
The range goes from 0 to 2, and I would like to make a continuous relative frequency plot. I know that I could bin the values and use a histogram, but I would like to show continuity. Is there a simple strategy? If not, I'll use binning. Thank you!
EDIT:
I have created a binning vector with 40 equally spaced values between 0 and 2 (both included). For simplicity's sake, is there a way to round each of the 251296 entries to the closest value within the binning vector? Thank you!
Given that most of your values are not duplicated and thus don't have an easy way to derive a value for plotting on the y-axis, I'd probably go for a density plot. This will highlight dense segment lengths i.e. where you have lots of segment lengths occurring near each other.
d <- c(0.24592963, 0.08555043, 0.02128725)
plot(density(d), xlab="DNA Segment Length", xlim=c(0,2))

R question about plotting probability/density histogram the right way

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.
So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.
Thanks in advance!
the end of the whole data looks like this: just for a little orientation
[490,] 2.338260830 0.04858685
[491,] 2.347839477 0.04797310
[492,] 2.357418125 0.04736149
[493,] 2.366996772 0.04675206
[494,] 2.376575419 0.04614482
[495,] 2.386154067 0.04553980
[496,] 2.395732714 0.04493702
[497,] 2.405311361 0.04433653
[498,] 2.414890008 0.04373835
[499,] 2.424468656 0.04314252
[500,] 2.434047303 0.04254907
#everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.
The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)
Any suggestions?
Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.
[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817
Best regards,
appreciate the fast responses!(bow)
What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?
Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:
plot(your.data, type = 'l')
EDIT:
Ok, I'm not sure this is better than the density plot, but it can be done:
x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)
In your case, replace x with the probabilities from the second column of your matrix.
EDIT2:
On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.
Assuming M is the matrix. wouldn't this just be :
plot(x=M[ , 1], y = M[ , 2] )
You have already done the density estimation since this is not the original data.

Resources