Suppose you have two vectors, where the first vector is a vector of values a variable takes, and second is the probability (frequency) of this value. It should look like this:
a <- c(2.1,7.5,5.2,6,5.8)
b <- c(1/8,1/8,1/3,5/24,5/24)
I have a lot of those combinations of vectors. Suppose values in all a vectors are in range from 1 to 10.
What i need is two different graphs:
First is a barplot/histogram of values of a based on frequencies b (so heights of bars are equal to frequencies), and values on x axis include only the ordered values of a,
second is a similar graph, but on x axis should be all possible values of a vectors - so the axis should be from 1 to 10 and the bars should only be above the relevant values of a.
In both cases bars should have same width. My question is: How to plot these graphs?
I really think that what I want is called histogram but before I posted this question I made research on this web, and I became very confused about bar plots and histograms.
Related
Hi ~ I'm try to make graph which has sample mean on x-axis and
relative frequency(?) on y-axis
to make sure i will give example!
for example when i pick 1sample from c(1,2,3,4,5)
the possible result will be 1 and 2 and 3 and 4 and5
in that case the relative frequency is 1/5 each !!
so in this case my graph will show 1,2,3,4,5 on x-axis
0.2 for y -axis (because they are same in 1/5)
and if i pick 2sample from c(1,2,3,4,5) case would be
(1,2) and (1,3), (1,4), (1,5) (2,3)..... and so on (total 10cases)
so sample mean would be (1+2)/2=1.5 .. (1+3)/2=2 .... etc
so in this case x value will be 1.5, 2 ... etc and y value will
1/10 1/10 ...
so, My question is, is histogram is appropriate for this graph??
i want to plot which have sample mean on x -axis, relative frequency on y-axis
and make a line that connect a dot
sorry for too long question
thanks for reading!!
Yes, it's entirely appropriate to plot a histogram of sample means. This is an example of a sampling distribution.
To do this, you would create an object that contains the sample means, and then just plot a histogram of that object as you would with any other histogram. The value of the sample means would be on the x axis, and frequency or relative frequency on the y axis. You would have to choose an appropriate bin number and breaks vector for your purpose, but it's the same as any other histogram.
To illustrate:
x=rep(c(rep(1,4),rep(3,4),rep(6,4)),4)
y=rep(c(1,10,100,1000),12)
z=runif(48)
mydata=cbind(x,y,z)
scatterplot3d(mydata,pch=16,xlab="months",
ylab="parameter",zlab="values")
Here, x is number of months of which I have only three categories and y are values of four parameters used in my study. I want my x axis to show only 1,3,6 and the y axis to show only 1,10,100,1000 with equidistant separation on the axis. The present plot plots y axis as 0,200,400,600,800,1000. So my three data points pertaining to 1,10 and 100 values of y are restricted to a narrow zone.
set(gca,'yscale','log')
A log scale on the y-axis should give you what you're looking for in this instance. More generally, if you want axis points equidistant regardless of their value you might consider storing their values as a string and see what that does for you.
I have DNA segment lengths (relative to chromosome arm, 251296 entries), as such:
0.24592963
0.08555043
0.02128725
...
The range goes from 0 to 2, and I would like to make a continuous relative frequency plot. I know that I could bin the values and use a histogram, but I would like to show continuity. Is there a simple strategy? If not, I'll use binning. Thank you!
EDIT:
I have created a binning vector with 40 equally spaced values between 0 and 2 (both included). For simplicity's sake, is there a way to round each of the 251296 entries to the closest value within the binning vector? Thank you!
Given that most of your values are not duplicated and thus don't have an easy way to derive a value for plotting on the y-axis, I'd probably go for a density plot. This will highlight dense segment lengths i.e. where you have lots of segment lengths occurring near each other.
d <- c(0.24592963, 0.08555043, 0.02128725)
plot(density(d), xlab="DNA Segment Length", xlim=c(0,2))
Is it possible to generate a heatmap taking into consideration both the color and the transparency, with these two parameters given from two different matrices (matrix 1 defines color, matrix 2 defines alpha)?
A little more information on what I'm after:
I have successfully used R and the heatmap.2 function in the gplots package to generate heatmaps - in this case to visualize miRNA interactions. Here, what I want to show is the probability of a particular nucleotide along the typical 20-24 nucleotides of the miRNA in being engaged in target pairing. My heatmap matrix consists of miRNAs (rows) and positions 1-24 (columns) with numeric paring probability in each cell. An example would be changing the alpha parameter of the color determined by the matrix values, such that white=no pairing and dark red=high pairing.
The heatmap.2 function works great for a single such plot, but I would now like to take in overlap information from two different species. Thus, I would need my heatmap to basically consider two matrices:
1) A matrix with the degree of species overlap, e.g. ranging from red-purple-blue for species1-only to species1+2 to species2-only.
2) A matrix with the average degree of pairing, e.g. visualized by the alpha parameter going from a weak-to-strong average pairing (whatever the color) at a given position in matrix 1.
I have tried to use the principles from this post:
Place 1 heatmap on another with transparency in R
But haven't been able to apply its suggestions to my own question.
Thanks in advance!
I have two data sets that I am comparing using a ked2d contour plot on a log10 scale,
Here I will use an example of the following data sets,
b<-log10(rgamma(1000,6,3))
a<-log10((rweibull(1000,8,2)))
density<-kde2d(a,b,n=100)
filled.contour(density,color.palette=colorRampPalette(c('white','blue','yellow','red','darkred')))
This produces the following plot,
Now my question is what does the z values on the legend actually mean? I know it represents where most the data lies but 0-15 confuses me. I thought it could be a percentage but without the log10 scale I have values ranging from 0-1? And I have also produced plots with scales 1-1.2, 1-2 using my real data.
The colors represent the the values of the estimated density function ranging from 0 to 15 apparently. Just like with your other question about the odd looking linear regression I can relate to your confusion.
You just have to understand that a density's integral over the full domain has to be 1, so you can use it to calculate the probability of an observation falling into a specific region.