Related
mydat <- c(-0.668273585044951, -0.668349982287781, -0.668310321831499,
-0.668340631422403, -0.668289165486385, -0.668291483851571, -0.668248500627415,
-0.668268688870632, -0.668268351574953, -0.668257756588694, -0.668248478450076,
-0.668328152612909, -0.668290376266347, -0.668259884873199, -0.668336672777167,
-0.66825928480685, -0.66824973313942, -0.668267624202257, -0.668257723530567,
-0.66824923987732, -0.66827032896674, -0.668296341784685, -0.668269768188303,
-0.668317368549213, -0.668252896596478, -0.668259320312454, -0.668358156147855,
-0.66825880651865, -0.668266956160338, -0.668256850457009, -0.668248611381841,
-0.668254104957217, -0.668261414353884, -0.668287765972581, -0.668298912305036,
-0.668249596880931, -0.668248527406481, -0.668248522170991, -0.668257508692103,
-0.668249113734722, -0.668290173078401, -0.668249175742675, -0.668320493819355,
-0.668281236636089, -0.66825021985808, -0.668337981787248, -0.668252181034097,
-0.668251699984127, -0.668328704629881, -0.668297709662365, -0.668343189322021,
-0.668259570734301, -0.668264001643703, -0.668253905874522, -0.668289644511903,
-0.668264044020724, -0.66826754890621, -0.668358431773946, -0.668248494061212,
-0.668319635038578, -0.668250207208146, -0.668332762567503, -0.668250151279497,
-0.66825077635007, -0.668259787977316, -0.668249253537775, -0.668299308339268,
-0.668252348743164, -0.668314437014813, -0.668356526573943, -0.668250417392189,
-0.668249626422078, -0.66825978956337, -0.668262741366315, -0.668251621645908,
-0.668396319087419, -0.668267793877638, -0.668323245408643, -0.668272920904538,
-0.668258917039086, -0.668250552455484, -0.668365803143064, -0.668249334840926,
-0.668274369881539, -0.668268739577508, -0.66827144082708, -0.668312908978672,
-0.668328199044459, -0.668300853703848, -0.668265592444912, -0.668390761110759,
-0.668270647634556, -0.668357480805416, -0.668248487606074, -0.668248495246697,
-0.668259841639527, -0.668348041989136, -0.668249924780612, -0.6683072047334,
-0.668250109800024)
hist(mydat, breaks = 10)
abline(v = max(mydat), col = "blue")
In my plot, I'm adding a blue vertical line to indicate where the maximum value of mydat is. In this example, it is at -0.6682485. However, because the bars in hist are centered at the values in my data set, I have a bar that is centered at exactly -0.6682485. Therefore, even when I add the blue line, which is supposed to represent the maximum value of mydat, it may not look it because half of the last bar is to the right of the blue line. So it may convey a misleading message that the blue line is not the maximum.
Is there a way to not center the bars at the values in the data set? I just want to make sure that when I plot the blue line, all the bars are to the left of it, signifying that the blue line indicates the maximum value.
Histograms do not work as you expected. Because they draw an interval around the determined means which are changing with respect to the number of bins. If the bin number goes to infinity, then you may expect the maximum point of the histogram equals the maximum of your data.
However if you run the code,
my_hist <- hist(mydat, breaks = 10 ,plot=FALSE)
my_hist
$breaks
[1] -0.66840 -0.66838 -0.66836 -0.66834 -0.66832 -0.66830 -0.66828 -0.66826 -0.66824
You get the break points. As you can see, the maximum point is -0.66824 in here. If you really want to show the max point at the right hand side, you can run,
hist(mydat, breaks = 10)
abline(v =max(my_hist$breaks), col = "blue")
which gives,
Note that, this is the calculated max wrt the bin number, not the max of your data.
I generated a random vector from normal distribution and plotted a histogram.
I modified the counts of the each bin and I want to plot another histogram with the same breaks(break_vector) and the new bin count vector (new_counts).
How to do that?
I tried barplot(), but the way it displays the bin labels is different.
x = rnorm(500,1,6)
delta = 1
break_vector = seq(min(x)-delta,max(x)+delta,by=delta)
hist_info = hist(x,breaks=break_vector)
new_counts = hist_info$counts+5
Try
new_hist <- hist_info
new_hist$counts <- hist_info$counts + 5
plot(new_hist)
I have some skewed data and want to create a histogram with custom breaks, but want it to actually look readable w/ constant widths for the bins (which would throw off the scale of the x axis, but that's fine). Does anyone know how to do this in ggplot/R?
This is what I don't want, but I don't know how to make breaks not override the width argument:
library(ggplot2)
test_data = rep(c(1,2,3,4,5,8,9,14,20,42,98,101,175), c(50,40,30,20,10,6,6,7,9,5,6,4,1))
buckets = c(-.5,.5,1.5,2.5,3.5,4.5,5.5,10.5,99.5,200)
q1 = qplot(test_data,geom="histogram",breaks=buckets)
print(q1)
Not the histogram I want :(
As ulfelder suggested, use cut():
library(ggplot2)
test_data = rep(c(1,2,3,4,5,8,9,14,20,42,98,101,175),
c(50,40,30,20,10,6,6,7,9,5,6,4,1))
buckets = c(-.5,.5,1.5,2.5,3.5,4.5,5.5,10.5,99.5,200)
q1 = qplot(cut(test_data, buckets), geom="histogram")
print(q1)
I have a data.matrix that is approximately 4000 rows and 100 columns. I am doing a heatmap of the data like:
data<-heatmap(data_matrix,Rowv=NA,Colv=NA,col=cm.colors(256),scale="column",margins=c(5,10))
But the problem that I got is that the labels that appear in the column are too grouped, so it is impossible to visualize them correctly. How I can resize the heatmap so I can see the values of the labels of the column? I tried to print it in pdf, but it only appears a black stripe.
Thanks
I am including a figure of the heatmap, the portion that I want to see are the labels that are in the right part, but they are too close together.
First of all it's better to put your output directly to a PDF file - you may use other image formats but PDF is the best because it is a vector output and you can zoom as much as you want:
pdf("Your-file.pdf", paper="a4", width=8, height=8)
Then it's better to use pheatmap( = pretty heatmap) package. It makes really better heatmaps with a color key besides your heatmap. Finally although the pheatmap() function tries to reduce the label size while you have many rows, but it fails for really large number of rows. So I use the code below for really high - but not too high - number of rows:
library(pheatmap)
library(gplots)
if (nrow(table) > 100) stop("Too many rows for heatmap, who can read?!")
fontsize_row = 10 - nrow(table) / 15
pheatmap(table, col=greenred(256), main="My Heatmap", cluster_cols=F,
fontsize_row=fontsize_row, border_color=NA)
You may change fontsize_col for the column labels. You have many interesting options like display_numbers to have the values inside the cells of your heatmap. Just read ?pheatmap.
This is an example generated by the default parameters of pheatmap() command:
Finally note that too many rows are easy to read on a display, but useless for print.
In Rstudio you can easily resize the graphic window, same holds for Rgui. Alternatively, if you save the plot to file you can use a bigger size for your graphics, e.g. bigger width and height when calling pdf or png.
You can use cexRow = and cexCol =.
You can get more information into ??heatmap.2
# Row/Column Labeling
margins = c(5, 5),
ColSideColors,
RowSideColors,
cexRow = 0.2 + 1/log10(nr),
cexCol = 0.2 + 1/log10(nc),
labRow = NULL,
labCol = NULL,
srtRow = NULL,
srtCol = NULL,
adjRow = c(0,NA),
adjCol = c(NA,0),
offsetRow = 0.5,
offsetCol = 0.5,
colRow = NULL,
colCol = NULL
If you use pheatmap (https://www.rdocumentation.org/packages/COMPASS/versions/1.10.2/topics/pheatmap) you can spread out those labels by adjusting the cellheight parameter.
If you are doing this in R notebook, even though the entire heat map will not display in your output window when you run the code, when you save the heat map to your computer using the filename parameter, pheatmap will automatically calculate the optimal size for the output file so that your entire heatmap will be displayed in your output file. If this size is not to your liking you can adjust using width and height parameters, but it is unlikely you will want to do this.
I want to achieve the following outcomes:
Rescale the size of the bubbles such that the largest bubble has a
diameter of 1 (on whichever has the more compressed scale of the x
and y axes).
Rescale the size of the bubbles such that the smallest bubble has a diameter of 1 mm
Have a legend with the first and last points the minimum non-zero
frequency and the maximum frequency.
The best I have been able to do is as follows, but I need a more general solution where the value of maxSize is computed rather than hard-coded. If I was doing it in the traditional R plots I would use par("pin") to work out the size of plot area and work backwards, but I cannot figure out how to access this information with ggplot2. Any suggestions?
library(ggplot2)
agData = data.frame(
class=rep(1:7,3),
drv = rep(1:3,rep(7,3)),
freq = as.numeric(xtabs(~class+drv,data = mpg))
)
agData = agData[agData$freq != 0,]
rng = range(agData$freq)
mn = rng[1]
mx = rng[2]
minimumArea = mx - mn
maxSize = 20
minSize = max(1,maxSize * sqrt(mn/mx))
qplot(class,drv,data = agData, size = freq) + theme_bw() +
scale_area(range = c(minSize,maxSize),
breaks = seq(mn,mx,minimumArea/4), limits = rng)
Here is what it looks like so far:
When no ggplot, lattice or other highlevel package seems to do the job without hours of fine tuning I always revert to the base graphics. The following code gets you what you want, and after it I have another example based on how I would have plotted it.
Note however that I have set the maximum radius to 1 cm, but just divide size.range/2 to get diameter instead. I just thought radius gave me nicer plots, and you'll probably want to adjust things anyways.
size.range <- c(.1, 1) # Min and max radius of circles, in cm
# Calculate the relative radius of each circle
radii <- sqrt(agData$freq)
radii <- diff(size.range)*(radii - min(radii))/diff(range(radii)) + size.range[1]
# Plot in two panels
mar0 <- par("mar")
layout(t(1:2), widths=c(4,1))
# Panel 1: The circles
par(mar=c(mar0[1:3],.5))
symbols(agData$class, agData$drv, radii, inches=size.range[2]/cm(1), bg="black")
# Panel 2: The legend
par(mar=c(mar0[1],.5,mar0[3:4]))
symbols(c(0,0), 1:2, size.range, xlim=c(-4, 4), ylim=c(-2,4),
inches=1/cm(1), bg="black", axes=FALSE, xlab="", ylab="")
text(0, 3, "Freq")
text(c(2,0), 1:2, range(agData$freq), col=c("black", "white"))
# Reset par settings
par(mar=mar0)
Now follows my suggestion. The largest circle has a radius of 1 cm and area of the circles are proportional to agData$freq, without forcing a size of the smallest circle. Personally I think this is easier to read (both code and figure) and looks nicer.
with(agData, symbols(class, drv, sqrt(freq),
inches=size.range[2]/cm(1), bg="black"))
with(agData, text(class, drv, freq, col="white"))