In excel when we plot a histogram , we can define bins, and values that are greater than the bin values are shown as "more" in the histogram. Can we do similar kind of thing in R (using the base plotting system).
As Roman said you can use the cut function,
r<-cut(x,breaks=c(0,50,Inf),levels=c("lev1","lev2")
will partition the x into two levels. Then you can draw the histogram using the usual hist command.
Yes, but this means you have to pre-calculate the values yourself. You can use function cut to define breaks in your data. The result will be a factor (with bin names as indicators where the split was done). You can then merge factor levels and plot the result.
Related
I'm trying to make a plot in R. My x-axis is a week number converted to factor and my y-axis is an amount.
When I run plot() instead of dots I get horizontal lines.
Why does this happen?
Here is a sample dataset:
df <- data.frame(fin_week=as.factor(seq(1,20, by =1)), amount=(rnorm(20)^2)*100)
plot(df)
Looking at the documentation, it's because the first column is a factor. When R tries to find the right plot() to run, it looks into plot.dataframe, where it plots on the type of 1st column i.e a factor. Hence it plots using plot.factor(), which gives a line by default, which is used for box plots.
try using plot.default(df) to plot and you should get it the scatter plot
I have a matrix which has the following approximate dimensions: 20000 x 1. I would like to plot the values in a histogram with bins of length 0.01 from -0.05 to +0.15. However, the values in the matrix are pretty random - for eg, 0.0123421, 0.0124523, 0.124523, -0.011234, etc. Thus, I need to first count the number of values that fall into a particular bin, and then plot a histogram. For the numbers I gave, I'd have 2 values between 0.01 and 0.02, 1 between -0.02 and -0.01, and so on, which I need in a histogram. Is there an easy way to do this? I'm relatively new to R, so any help is appreciated!
As an example illustrating breaks (content summarized from an excellent post on R-bloggers which you can refer to here), lets assume that you start with some normally distributed data. In R, you can generate normal data this way using the rnorm() function:
data <-rnorm(n=1000, m=24.2, sd=2.2)
We can then generate a simple histogram using the following call:
hist(data)
Now, let's assume that you want to have coarser or finer groups for your bins. There are a number of ways to do this. You could, for example, use the breaks() option. Below is a tidy example illustrating this:
hist(data, breaks=20, main="Breaks=20")
hist(data, breaks=5, main="Breaks=5")
Now, if you want more control over exactly the breakpoints between bins, you can be more precise with the breaks() option and give it a vector of breakpoints, like this:
hist(data, breaks=c(17,20,23,26,29,32), main="Breaks is vector of breakpoints")
This dictates exactly the start and end point of each bin. Of course, you could give the breaks vector as a sequence like this to cut down on the messiness of the code:
hist(data, breaks=seq(17,32,by=3), main="Breaks is vector of breakpoints")
Note that when giving breakpoints, the default for R is that the histogram cells are right-closed (left open) intervals of the form (a,b]. You can change this with the right=FALSE option, which would change the intervals to be of the form [a,b). This is important if you have a lot of points exactly at the breakpoint.
hist(x, breaks = seq(-.05, .15, .01))
See ?hist
Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:
My problem is very simple.
I have to plot a data series in R, using bars. Data are contained in a vector vet.
I've used barplot, that plots my data from the first to the last:
barplot(vet), and everything was fine.
Now, on the contrary, I would like to plot not all my data, but just a part of them: from 10% to the end.
How could I do this with barplot()?
How could I do this with plot()?
Thanx
You need to subset your data before plotting:
##Work out the 10% quantile and subset
v = vet[vet > quantile(vet, 0.1)]
It is not clear exactly what you want to do.
If you want to plot only a subset of the bars (but the whole bars) then you could just subset the data before passing it to barplot.
If you want to plot all the bars, but only that part beyond 10% (not include 0) then you can do this by setting the ylim argument. But it is very discouraged to do a barplot that does not include 0. You may be better off using a dotplot instead of a barplot if 0 is not meaningful.
If you want the regular plot, but want to exclude plotting outside of a given window within the plot then the clip function may be what you want.
The gap.barplot function from the plotrix package may also be what you want.
I have a bunch of histograms to plot on data that is still coming. As the sample sizes vary, in order to compare them I need to plot the histograms with percentages not counts.
qplot (field, data=mydata, geom="histogram", binwidth=10)
the above qplot displays the counts. The density option is not applicable as it divides the counts within a bin to the bin's width, whereas I need to divide on the total number of samples.
I can precalculate a column containing the percentage, but it's cumbersome (I have many data sets).
Is there a better way to tell qplot to directly plot the histogram with percentages (ideally, also displayed as percentages (as 69%) and not as 0.69)?
Thanks!
try this:
ggplot(movies,aes(x=rating))+stat_bin(aes(n=nrow(movies), y=..count../n))+
scale_y_continuous(formatter = "percent")