I have 2 sets of depth point measurements, for example:
> a
depth value
1 2 2
2 4 3
3 6 4
4 8 5
5 16 40
6 18 45
7 20 58
> b
depth value
1 10 10
2 12 20
3 14 35
I want to show both groups in one figure plotted with depth and with different symbols as you can see here
plot(a$value, a$depth, type='b', col='green', pch=15)
points(b$value, b$depth, type='b', col='red', pch=14)
The plot seems okay, but the annoying part is that the green symbols are all connected (though I want connected lines also). I want connection only when one group has a continued data points at 2 m interval i.e. the symbols should be connected with a line from 2 to 8 m (green) and then group B symbols should be connected from 10-14 m (red) and again group A symbols should be connected (green), which means I do NOT want to see the connection between 8 m sample with the 16 m for group A.
An easy solution may be dividing the group A into two parts (say, A-shallow and A-deep) and then plotting A-shallow, B, and A-deep separately. But this is completely impractical because I have thousands of data points with hundreds of groups i.e. I have to produce many depth profiles. Therefore, there has to be a way to program so that dots are NOT connected beyond a prescribed frequency/depth interval (e.g. 2 m in this case) for a particular group of samples. Any idea?
If plot or lines encounters and NA value, it will automatically break the line. Using that, we can insert NA values for missing measurements in your data and that would fix the problem. One way is this
rng<-range(range(a$depth), range(b$depth))
rng<-seq(rng[1], rng[2], by=2)
aa<-rep(NA, length(rng))
aa[match(a$depth, rng)]<-a$value
bb<-rep(NA, length(rng))
bb[match(b$depth, rng)]<-b$value
plot(aa, rng, type='b', col='green', pch=15)
points(bb, rng, type='b', col='red', pch=14)
Which produces
Note that this code assumes that all depth measurements are evenly divisible by 2.
I'm not sure if you really have separate data.frames for all of your groups, but there may be better ways to fill in missing values depending on your real data structure.
We can use the fact that lines will but breaks in when there is a NA, like MrFlick suggests. There might be a simpler way, though:
#Merge the two sets together
all = merge(a,b,by='depth', all=T)
#Plot the lines
plot(all$value.x, all$depth, type='b', col='green', pch=15)
points(all$value.y, all$depth, type='b', col='red', pch=14)
Related
Still learning R, and have been struggling with plotting. Below is part of my data, and I will try to explain the type of plot:
> head(bees.net.counts)
Month Block Treatment Flower Bee_Richness Bee_Abundance
1 May 1 UB POSI 1 1
2 May 2 DS ERST 4 38
3 May 2 UB RUBU 2 2
4 May 3 DS ERST 3 4
5 May 3 DS TROH 1 10
6 May 3 GS ERST 1 1
I want to make a plot where Flower is on the x-axis (there are 54 different ones), Bee_Richness or Bee_Abundance is on the y-axis, different colored symbols for Block (n=4) and amount of shading in each of those symbols for Treatment (n=3) (ie Block 1 Treatment UB is a red circle unfilled, Block 1 Treatment DS is a circle with half shaded red, and Block 1 Treatment GS is fully shaded red).
The problem I have is that each line is plotted instead of putting every point above a specific flower spp (there are multiple rows that have, say, CHFA, but those represent different Blocks and Treatments).
I have also tried this by month, where I separated the four months to make different graphs (to limit the length of the x-axis). There are 10 records in May, with 4 different flower species. I still can't figure out a way to do this.
Thank you for your help!!
Edit: Here is what I hope to get = plot idea
This uses the idea of #d.b 's solution, but improves the axis labels.
plot(x = as.numeric(as.factor(df$Flower)), df$Bee_Richness,
pch = as.numeric(as.factor(df$Block)),
col = as.numeric(as.factor(df$Treatment)),
xaxt="n", xlab="Flower", ylab="Richness")
axis(1, at=1:length(levels(df$Flower)),
labels=levels(df$Flower))
Some added explanation
As you requested, the character is based on the Block.
The color is based on the Treatment. Let's look at the
color/Treatment. The trick is that when you make Treatment a factor,
each value is internally represented as an integer, so you can
use as.numeric on the factor and it translates
DS to 1, GS to 2 and UB to 3. That makes the argument
col = as.numeric(as.factor(df$Treatment))
give DS color 1 and so on. R uses the numbers 1-8 as some
easy-to-access colors. Since you only need 3, this works fine.
Similarly,
pch = as.numeric(as.factor(df$Block))
picks characters 1 through 3 for the three Block values in the small test data.
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.
I have a data frame(mappedUn) of the structure:
C1 C2 C3 C4 C5 C6
1 1 1 3 1 1
3 3 3 16 3 3
10 NA 10 NA 6 6
11 NA 11 NA 10 11
NA NA NA NA 11 NA
NA NA NA NA 12 NA
note :I have stripped the entries in the above example to fit it here ,also I have replaced the column names to make it simpler
I was wondering if there is a way to color code scatter plots in R, I am using the pairs method to plot different scatter plots, The method I run is :
pairs(mappedUn[1:6])
Here is what I get:
Notice some graphs have two points some have 3 and so on...Is there a way to add different background color to each of the plot in the above graph based on how many point it has ,
for instance 4 points- red, 3-yellow,2 green etc
My ultimate goal is to visually distinguish the plots with high number of common points
The key here is to customize the parameter panel inside pairs(). Try the following to see whether it meets your requirement.
n.notNA <- function(x){
# define the function that returns the number of non-NA values
return(length(x) - sum(is.na(x)))
}
myscatterplot <- function(x, y){
# ll is used for storing the parameters for plotting region
ll <- par("usr")
# bg is used for storing the color (an integer) of the background of current panel, which depends on the number of points. When x and y have different numbers of non-NA values, use the smaller one as the value of bg.
bg <- min(n.notNA(x), n.notNA(y))
# plot a rectangle framework whose dimension and background color are given by ll and bg
rect(ll[1], ll[3], ll[2], ll[4], col = bg)
# fill the rectangle with points
points(x, y)
}
# "panel = myscatterplot" means in each panel, the plot is given by "myscatterplot()" using appropriate combination of variables
pairs(data, panel = myscatterplot)
A related question : R: How to colorize the diagonal panels in a pairs() plot?
I have a number of coordinates and I want to plot them in a gridded interface by using R.
The problem is that the relative distance between observations is large. Coordinates are in a geographic coordinate system and the study area is Switzerland. Moreover, id of the points is required to be plotted.
The problem is that two clusters of the points are dense and some other points are separated with a large distance. How I can plot them in a proper way to have readable presentation? Any suggestion for plotting the data?
Preferably, do not use ggplot as I used it before and it did not present proper results.
Data:
id x y
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
6 7.17807 45.87014001
7 7.177229 45.86923001
8 7.17524 45.86808001
9 7.181409 45.87177001
10 7.179299 45.87020001
11 7.178359 45.87070001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
15 7.176839 45.86939001
17 7.18099 45.87262001
18 7.18015 45.87248001
19 7.18122 45.87355001
20 7.17491 45.86922001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
GSBi_1 7.184 45.896
GSBi__1 7.36 45.901
GSBj__1 7.268 45.961
GSBj_1 7.276 45.836
GSB 7.272 45.899
GSB_r 7.166667 45.866667
Location of points:
As you can see in the plot, the points' ids are not readable both for the dense parts and others.
Practically, it is not always possible to ensure that all points are visually separable on the screen when plotting a set of points that contains very close and very far points at the same time.
Think of a 1000x800 pixel screen. Let's say we have three points A, B and C that are located respectively on the same horizontal line such that: the distance between A and B is 1 unit and the distance between A and C is 4000 unit.
If you map this maximum distance (4000 unit) to the width of the screen (1000px). Then a pixel will correspond to 4 units in horizontal. That means A and B will fit into one pixel since the distance between them is only 1 unit. So, they will not be visually separable on the screen.
Your points are far too close to really do too much with, but an idea might be spread.labels from plotrix:
opar <- par()
par(xpd=TRUE)
plot(dat$x, dat$y)
spread.labels(dat$x,dat$y,dat$id)
par(opar)
You may want to consider omitting all the numerical labels and placing them in a different graph.
I have a text file having the numbers(of float type) which represents time in seconds. I wish to represent the number of occurances every 15 minutes. The sample of my file is:
0.128766
2.888977
25.087900
102.787657
400.654768
879.090874
903.786754
1367.098789
1456.678567
1786.564569
1909.567567
for first 900 seconds(15 minutes), there are 6 occurances. I want to plot that point on y axis first. Then from 900-1800(next 15 minutes), there are 4 occurances. So, i want to plot 4 on my y-axis next. This should go on...
I know the basic plot() function, but i don't know how to plot every 15 minutes. If there is a link present, please guide me to that link.
Use findInterval():
counts <- table(findInterval(x, seq(0, max(x), 900)))
counts
1 2 3
6 4 1
It's easy to plot:
plot(counts)
To build on Andrie's answer. You can add plot(counts, type = 'p') to plot points or plot(counts, type = 'l') to plot a connected line. If you want to plot a curve for the counts you would need to model it using ?lm or ?nls.