Suppose I need to plot a dataset like below:
set.seed(1)
dataset <- sample(1:7, 1000, replace=T)
hist(dataset)
As you can see in the plot below, the two leftmost bins do not have any space between them unlike the rest of the bins.
I tried changing xlim, but it didn't work. Basically I would like to have each number (1 to 7) represented as a bin, and additionally, I would like any two adjacent bins to have space beween them...Thanks!
The best way is to set the breaks argument manually. Using the data from your code,
hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4))
gives the following plot:
The first part, rep(1:7,each=2), is what numbers you want the bars centered around. The second part controls how wide the bars are; if you change it to c(-.49,.49) they'll almost touch, if you change it to c(-.3,.3) you get narrower bars. If you set it to c(-.5,.5) then R yells at you because you aren't allowed to have the same number in your breaks vector twice.
Why does this work?
If you split up the breaks vector, you get one part that looks like this:
> rep(1:7,each=2)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7
and a second part that looks like this:
> c(-.4,.4)
[1] -0.4 0.4
When you add them together, R loops through the second vector as many times as needed to make it as long as the first vector. So you end up with
1-0.4 1+0.4 2-0.4 2+0.4 3-0.4 3+0.4 [etc.]
= 0.6 1.4 1.6 2.4 2.6 3.4 [etc.]
Thus, you have one bar from 0.6 to 1.4--centered around 1, with width 2*.4--another bar from 1.6 to 2.4 centered around 2 with with 2*.4, and so on. If you had data in between (e.g. 2.5) then the histogram would look kind of silly, because it would create a bar from 2.4 to 2.6, and the bar widths would not be even (since that bar would only be .2 wide, while all the others are .8). But with only integer values that's not a problem.
You need six bars NOT seven bars; that is what your histogram has space for. But then you end up generating seven bars. That is the bug.
do sample(1:6, 1000, replace=T) instead of sample(1:7, 1000, replace=T)
If you do need seven bars, then seed with 0
Related
I have data that contains information about sub-plots with different numbers and their corresponding species types (more than 3 species within each subplot). Every species have X & Y coordinates.
> df
subplot species X Y
1 1 Apiaceae 268675 4487472
2 1 Ceyperaceae 268672 4487470
3 1 Vitaceae 268669 4487469
4 2 Ceyperaceae 268665 4487466
5 2 Apiaceae 268662 4487453
6 2 Magnoliaceae 268664 4487453
7 3 Magnoliaceae 268664 4487453
8 3 Apiaceae 268664 4487456
9 3 Vitaceae 268664 4487458
with these data, I have created ppp for the points of each subplot within a window of general plot (big).
grp <- factor(data$subplot)
win <- ripras(data$X, data$Y)
p.p <- ppp(data$X, data$Y, window = window, marks = grp)
Now I want to divide a plot into equal 3 x 3 sub-plots because there are 9 subplots. The genetal plot is not rectangular looks similar to rombo shape when I plot.
I could use quadrats() funcion as below but it has divided my plot into unequal subplots. Some are quadrat, others are traingle etc which I don't want. I want all the subplots to be equal sized quadrats (divide it by lines that paralel to each sides). Can you anyone guide me for this?
divide <-quadrats(p.patt,3,3)
plot(divide)
Thank you!
Could you break up the plot canvas into 3x3, then run each plot?
> par(mfrow=c(3,3))
> # run code for plot 1
> # run code for plot 2
...
> # run code for plot 9
To return back to one plot on the canvas type
> par(mfrow=c(1,1))
This is a question about the spatstat package.
You can use the function quantess to divide the window into tiles of equal area. If you want the tile boundaries to be vertical lines, and you want 7 tiles, use
B <- quantess(Window(p.patt), "x", 7)
where p.patt is your point pattern.
I've been struggling to get a plot that shows my data accurately, and spent a while getting gap.plot up and running. After doing so, I have an issue with labelling the points.
Just plotting my data ends up with this:
Plot of abundance data, basically two different tiers of data at ~38,000, and between 1 - 50
As you can see, that doesn't clearly show either the top or the bottom sections of my plots well enough to distinguish anything.
Using gap plot, I managed to get:
gap.plot of abundance data, 100 - 37000 missed, labels only appearing on the lower tier
The code for my two plots is pretty simple:
plot(counts.abund1,pch=".",main= "Repeat 1")
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
gap.plot(counts.abund1[,1],counts.abund1[,2],gap=c(100,38000),gap.axis="y",xlim=c(0,60),ylim=c(0,39000))
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
But I don't know why/can't figure out why the labels (which are just the letters that the points denote) are not being applied the same in the two plots.
I'm kind of out of my depth trying this bit, very little idea how to plot things like this nicely, never had data like it when learning.
The data this comes from is originally a large (10,000 x 10,000 matrix) that contains a random assortment of letters a to z, then has replacements and "speciation" or "immigration" which results in the first lot of letters at ~38,000, and the second lot normally below 50.
The code I run after getting that matrix to get the rank abundance is:
##Abundance 1
counts1 <- as.data.frame(as.list(table(neutral.v1)))
counts.abund1<-rankabundance(counts1)
With neutral.v1 being the matrix.
The data frame for counts.abund1 looks like (extremely poorly formatted, sorry):
rank abundance proportion plower pupper accumfreq logabun rankfreq
a 1 38795 3.9 NaN NaN 3.9 4.6 1.9
x 2 38759 3.9 NaN NaN 7.8 4.6 3.8
j 3 38649 3.9 NaN NaN 11.6 4.6 5.7
m 4 38639 3.9 NaN NaN 15.5 4.6 7.5
and continues for all the variables. I only use Rank and Abundance right now, with the a,x,j,m just the variable that applies to, and what I want to use as the labels on the plot.
Any advice would be really appreciated. I can't really shorten the code too much or provide the matrix because the type of data is quite specific, as are the quantities in a sense.
As I mentioned, I've been using gap.plot to just create a break in the axis, but if there are better solutions to plotting this type of data I'd be absolutely all ears.
Really sorry that this is a mess of a question, bit frazzled on the whole thing right now.
gap.plot() doesn't draw two plots but one plot by decreasing upper section's value, drawing additional box and rewriting axis tick labels. So, the upper region's y-coordinate is neither equivalent to original value nor axis tick labels. The real y-coordinate in upper region is "original value" - diff(gap).
gap.plot(counts.abund1[,1], counts.abund1[,2], gap=c(100,38000), gap.axis="y",
xlim=c(0,60), ylim=c(0,39000))
text(counts.abund1, labels=row.names(counts.abund1), cex= 1.5)
text(counts.abund1[,1], counts.abund1[,2] - diff(c(100, 38000)), labels=row.names(counts.abund1), cex=1.5)
# the example data I used
set.seed(1)
counts.abund1 <- data.frame(rank = 1:50,
abundance = c(rnorm(25, 38500, 100), rnorm(25, 30, 20)))
I have some measured data, experiment.dat which goes like this:
1 2
2 3
Now I want to plot them via some command line
plot "experiment.dat" using 1:2 title "experiment" with lines lw 3
Is there some way how to scale the different lines with some scaling factor like -1?
Yes, you can do any kind of calculations inside the using statement. To scale the y-value (the second column) with -1, use
plot "experiment.dat" using 1:(-1*$2)
You don't need to multiply the column by minus one, you can simply use:
p "experiment.dat" u 1:(-$2)
at least with Version 5.4 works fine.
You can also only use the initial letter of every command.
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.
I have a question about the package gplots. I want to use the function heatmap.2 and therefore I want to change my symmetric point in color key from 0 to 1. Normally when symkey=TRUE and you use the col=redgreen(), a colorbar is created where the colors are managed like this:
red = -2 to -0.5
black=-0.5 to 0.5
green= 0.5 to 2
Now i want to create a colorbar like this:
red= -1 to 0.8
black= 0.8 to 1.2
green= 1.2 to 3
Is something like this possible?
Thank you!
If you look at the heatmap.2 help file, it looks like you want the breaks argument. From the help file:
breaks (optional) Either a numeric vector indicating the splitting points for binning x into colors, or a integer number of break points to be used, in which case the break points will be spaced equally between min(x) and max(x)
So, you use breaks to specify the cutoff points for each colour. e.g.:
library(gplots)
# make up a bunch of random data from -1, -.9, -.8, ..., 2.9, 3
# 10x10
x = matrix(sample(seq(-1,3,by=.1),100,replace=TRUE),ncol=10)
# plot. We want -1 to 0.8 being red, 0.8 to 1.2 being black, 1.2 to 3 being green.
heatmap.2(x, col=redgreen, breaks=c(-1,0.8,1.2,3))
The crucial bit is the breaks=c(-1,0.8,1.2,3) being your cutoffs.