R barplot label size of each sample - r
I would like to label each of the boxes in a barplot by their size(i.e number of observations in dataframe which are in the group).
e.g If the first variable has 3 levels and the second variable has 4 levels, I would like 12 labels.
(Also, is it possible to control the size or position of these labels)
Thank you for any help.
Here's one way to do it, using the data VADeaths as an example (it will be in your R workspace by default, or if not, use library(datasets)).
bar <- barplot(VADeaths)
text(rep(bar,each=nrow(VADeaths)), as.vector(apply(VADeaths,2,cumsum)),
labels=as.vector(apply(VADeaths,2,cumsum)),pos=3)
It looks like this:
To modify the size of the font you can use text(...,cex=2) to make things twice the size they were, e.g.
Now, let's explain this code so you know how to do it yourself!
First, let's look at VADeaths: it's a tally of deaths in each age group by category:
> VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0
Now, to do the text on the barplot, we basically draw the barplot, and then draw the text on top using R command text (see ?text).
text requires x,y coordinates and corresponding pieces of text to draw on the bar plot. We will give it the coordinates of each line in the bar plot to draw the text on.
To do this, see the "Value" section ?barplot. This function not only plots your bar plot, but returns the x coordinate of each bar. score!
> bar <- barplot(VADeaths)
> bar
[1] 0.7 1.9 3.1 4.3
Now all we need is y coordinates to go with our x coordinates.
Well, a stacked bar plot just tallies up the frequencies in VADeaths as you go along.
For example, in the 'Rural Male' group, the first line is drawn at 11.7, and the second is drawn at 11.7 + 18.1 = 29.8, the third at 11.7 + 18.1 + 26.9 = 56.7, and so on (see the values in VADeaths).
So, our y coordinates need to be cumulative sums going down the columns.
To calculate these for each column, we can use cumsum. For example
> cumsum(c(1,2,3,4,5))
[1] 1 3 6 10 15
Since we want to do this for each column in VADeaths, we have to use the function apply.
> apply(VADeaths,2,cumsum)
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 29.8 20.4 39.7 22.0
60-64 56.7 40.7 76.7 41.3
65-69 97.7 71.6 131.3 76.4
70-74 163.7 125.9 202.4 126.4
apply(VADeaths,2,cumsum) means: "For each column in VADeaths, calculate the cumsum of that".
This gives us the y values for each line of the bar plot.
Let's save these yvalues for further use:
> yvals <- as.vector(apply(VADeaths,2,cumsum))
The reason I use as.vector is just to flatten the matrix into a vector of values -- it makes the plotting easier.
One last thing -- my x values (that I stored in bar) only have one value per bar, but I need to expand it out so there's one x value per line on each bar. To do this:
> xvals <- rep(bar,each=nrow(VADeaths))
This turns my previous x1,x2,x3,x4 into x1,x1,x1,x1,x1, x2,x2,x2,x2,x2, ..., x4,x4,x4,x4,x4.
Now my xvals match my yvals.
After this it's simply a case of using text.
> text( xvals, yvals, labels=yvals, pos=3 )
The labels arguments tells text what text to put at the x/y positions.
The pos=3 means "draw each bit of text just above my specified x/y value". Otherwise, the numbers would be drawn over the lines of the barplot which would be hard to read.
Now, there are many options for customising the position and size of text, and I suggest you read ?text to see them.
All this code condenses down to the two-liner I gave at the beginning of the answer, but this version might be a little more understandable:
bar <- barplot(VADeaths)
xvals <- rep(bar,each=nrow(VADeaths))
yvals <- as.vector(apply(VADeaths,2,cumsum))
text( xvals, yvals, labels=yvals, pos=3 )
Related
Manage Circles size in plot using symbols
I am using symbols function in r to draw cycles in a map, which has been imported as a plot. According to the function Cycles radius are scaled basted on the max value of the data set. I am plotting the same map for different time periods (different data set) and i want the maps to be comparable, meaning that the circle radius refers to the same values in all different maps. Is there a way that I can manage circle scaling? Thanks This is my code #for the first map 2010 plot(my_map) symbols(data2010$Lon, data2010$Lat, circles= data2010$number, inches=0.25,add=T) #then the map for 2011 plot(my_map) symbols(data2011$Lon, data2011$Lat, circles= data2011$number, inches=0.25,add=T)
The manual page suggests that setting inches=FALSE will accomplish what you want. Since you did not provide a sample of your data, we have to use data already available. This data set is used in the Examples on the manual page for the symbols() function: data(trees) str(trees) # 'data.frame': 31 obs. of 3 variables: # $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... # $ Height: num 70 65 63 72 81 83 66 75 80 75 ... # $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ... Since we only have one sample, we can plot the the symbols with and without the 31th row which is the largest. with(trees, symbols(Height, Volume, circles = Girth/24, inches = FALSE)) Now add the data without row 31: with(trees[-31, ], symbols(Height, Volume, circles = Girth/24, fg="red", inches = FALSE, add=TRUE)) We can tell that the scaling is the same because the red circles match the black circles even though the largest girth is missing from the second plot. For this to work you will have to specify the same values for xlim= and ylim= in each plot. Run this code again replacing inches=FALSE with inches=.5 to see the difference.
How to sort vector into bins in R?
I have a vector that consists of numbers that can take on any value between 1 and 100. I want to sort that vector into bins of a certain size. My logic: 1.) Divide the range (in this case, 1:100) into the amount of bins you want (lets say 10 for this example) Result: (1, 10.9], 10.9,20.8], (20.8,30.7], (30.7,40.6], (40.6,50.5], (50.5,60.4], (60.4,70.3], (70.3,80.2], (80.2,90.1], (90.1,100] 2.) Then sort my vector I found a handy function that almost does all this in one fell swoop: cut(). Here is my code: > table(cut(vector, breaks = 10)) (0.959,10.9] (10.9,20.8] (20.8,30.7] (30.7,40.5] (40.5,50.4] (50.4,60.3] (60.3,70.1] (70.1,80] (80,89.9] (89.9,99.8] 175 171 117 103 82 67 54 46 39 31 Unfortunately, the intervals are different than the bins we calculated from the possible range (1:100). So I tried fixing this by adding in that range into the vector: > table(cut(c(1,100,vector), breaks = 10)) (0.901,10.9] (10.9,20.8] (20.8,30.7] (30.7,40.6] (40.6,50.5] (50.5,60.4] (60.4,70.3] (70.3,80.2] (80.2,90.1] (90.1,100] 176 171 117 104 82 66 54 48 38 31 This almost worked perfectly except the left-most interval which starts from 0.901 for some reason. My questions: 1.) Is there a way to do this (using cut or another function/package) without having to insert artificial data points to get the specified bin ranges? 2.) If not, why does the lower bin start from 0.901 and not 1?
Based on your response to #Allan Cameron, I understand taht you want to divide your vector in 10 bins of the same size. But when you define this number of breaks in the cut() function, the size of the intervals calculated by the function, are different accros the groups. As #akrun sad, this occurs because of the method of calculus that the function uses on this case you define only the number's of breaks. I do not know if there is a way to avoid this in the function. But I think it will be easier if you define the bins as you want as #Gregor Thomas suggested. Here is an example of how I would approach your desire: vec <- sample(1:100, size = 500, replace = T) # Here I suppose that you want to divide the data in # intervals of the same length breaks <- seq(min(vec), max(vec), by = 9.9) cut(vec, breaks = breaks) Other option, would be the cut_interval() function from ggplot2 package, that cut's the vector in n groups with the same length. library(ggplot2) cut_interval(vec, n = 10)
why does the lower bin start from 0.901 and not 1? The answer is the first bit of the Details section of the ?cut help page: When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. That .1% adjustment is the reason your lower bound is 0.901 --- the upper bound isn't adjusted because it is a closed, ], not open ) interval on that end. If you'd like to use other breaks, you can specify exact breaks however you want. Perhaps this: my_breaks = seq(1, 100, length.out = 11) ## for n bins, you need n+1 breaks my_breaks # [1] 1.0 10.9 20.8 30.7 40.6 50.5 60.4 70.3 80.2 90.1 100.0 cut(vector, breaks = my_breaks, include.lowest = TRUE) But I actually think Allan's suggestion of 0:10 * 10 might be what you really want. I wouldn't dismiss it too quickly: table(cut(1:100, breaks = 0:10*10)) # (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] (80,90] (90,100] # 10 10 10 10 10 10 10 10 10 10
R - Changing the Magnitude of the x-axis with ggplot2
I am student self-learning R. I have a simple question in relation to ggplot2. Currently, I have plotted a graph with the following x-axis. 36000 36200 36400 36600 36800 I would like to change the magnitude of it to... 36.0 36.2 36.4 36.6 36.8 without changing the graph. How would I accomplish this?
Creating multiple plots on one figure with a for loop using unique in R
Using a for loop I need to make three separate plots with regression lines on the same figure, using a unique value that is found in one the group column. (3 plots with different data and regression lines). Group Height Weight A 5.6 59.5 A 5.9 68.7 B 4.8 57.1 B 5.0 42.9 C 7.3 43.3 C 7.1 39.7 I tried this code which only gives me the points for group C for some reason? I don't think this is what I really want though, as I need the points with a regression line. for(i in unique(df$Group)){ example<-subset(df,df$Group==i) plot(example$Height,example$Weight) } I need one set of data points for each Group A, B, and C on one figure with a regression line for each, using a for loop.
plotting degrees in IDL
I'm using IDL 8.2 I have a list of positions (RA and Dec) of stars and i want to plot them on a figure, eg. 37.9 ~ 37 54' 0" 37.7 ~ 37 42' 0" I read in the positions (degrees) in as strings and extract the degrees, minutes and seconds into separate arrays. These are then used to convert the values to decimal degrees for plotting here. I would like to also have the alternate axis labelled with degrees. i.e. 37.9 ~ 37 54' 0" 37.6 ~ 37 42' 0" Is there a way to do this other than using something like power point to do it? Also is there a better way, than having the axis scaled the same, to force the plot to be a square plot using the plot procedure?
A good solution was posted here. https://groups.google.com/forum/#!starred/comp.lang.idl-pvwave/EsbGiqZnhRw Effectively, writing a function to perform write user defined tick marks.