I have a number of coordinates and I want to plot them in a gridded interface by using R.
The problem is that the relative distance between observations is large. Coordinates are in a geographic coordinate system and the study area is Switzerland. Moreover, id of the points is required to be plotted.
The problem is that two clusters of the points are dense and some other points are separated with a large distance. How I can plot them in a proper way to have readable presentation? Any suggestion for plotting the data?
Preferably, do not use ggplot as I used it before and it did not present proper results.
Data:
id x y
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
6 7.17807 45.87014001
7 7.177229 45.86923001
8 7.17524 45.86808001
9 7.181409 45.87177001
10 7.179299 45.87020001
11 7.178359 45.87070001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
15 7.176839 45.86939001
17 7.18099 45.87262001
18 7.18015 45.87248001
19 7.18122 45.87355001
20 7.17491 45.86922001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
GSBi_1 7.184 45.896
GSBi__1 7.36 45.901
GSBj__1 7.268 45.961
GSBj_1 7.276 45.836
GSB 7.272 45.899
GSB_r 7.166667 45.866667
Location of points:
As you can see in the plot, the points' ids are not readable both for the dense parts and others.
Practically, it is not always possible to ensure that all points are visually separable on the screen when plotting a set of points that contains very close and very far points at the same time.
Think of a 1000x800 pixel screen. Let's say we have three points A, B and C that are located respectively on the same horizontal line such that: the distance between A and B is 1 unit and the distance between A and C is 4000 unit.
If you map this maximum distance (4000 unit) to the width of the screen (1000px). Then a pixel will correspond to 4 units in horizontal. That means A and B will fit into one pixel since the distance between them is only 1 unit. So, they will not be visually separable on the screen.
Your points are far too close to really do too much with, but an idea might be spread.labels from plotrix:
opar <- par()
par(xpd=TRUE)
plot(dat$x, dat$y)
spread.labels(dat$x,dat$y,dat$id)
par(opar)
You may want to consider omitting all the numerical labels and placing them in a different graph.
Related
I have a file with 52,000 points distributed in Brazil and a map of forest remnants (in polygon format).
What I want to do is calculate the distance from each point to each forest fragment that is within a buffer of, for example, 500m. So, if I have 3 fragments within a buffer of 500m, I want to have all three distances (euclidian) calculated from the centroid (focal point) to these fragments.
At the end I would like to take the mean distance from each focal point to their respective fragments.
I tried the function gWithinDistance,from the package "rgeos", like below:
near_frag_500 <- gWithinDistance (points, veg_natural, 500, byid=T)
being the argument "points" my focal points and "veg_natural" my forest remnant polygons. The number 500 refers to the buffer of 500m I want to calculate the distance. However, the output of this function is a matrix with TRUE or FALSE values. TRUE for those polygons which fall within the 500m buffer and FALSE for those polygons which fall outside the 500m buffer. It doesn´t give me the actual values of the distances calculated. I guess what I am looking for is an equivalent to the "Generate Near Table" function in ArcGIS.
I would really appreciate if someone could help me with that! I also have my forest remnants polygons in raster if there is any solution for that using a raster file.
I have made a simple test set with 7 points and 8 polygons. Everything has to be projected to a cartesian system in metres, so not lat-long. Use a local UTM zone if nothing else.
I compute the distance matrix from points to polygons:
> dmat = gDistance(points, veg_natural,byid=TRUE)
Then mask out anything over 500, and compute the row means:
> dmat[dmat>500]=NA
> apply(dmat, 1, mean, na.rm=TRUE)
0 1 2 3 4 5 6 7
331.5823 262.7129 380.2073 187.2068 111.9961 NaN 224.6962 360.7995
and that is the mean of the distances from each point to the nearest features within 500m. Note the NaN for point 5 which is because it is not 500m from any polygon features.
If this matrix is too big for your case with 52,000 points (and ?? polygons?) then just do it for 1000 points at a time in a loop or whatever your computer can cope with. I think mine would fall over with 52,000.
If you want to know which of the polygons are the ones within 500m of each point, then something like:
> apply(dmat,1, function(r){which(!is.na(r))})
$`0`
5 6
5 6
$`1`
4 5 7
4 5 7
shows my first point (labelled 0) is near to polygons 5 and 6.
I have a data frame data, I would like draw a proper density plot for it.When I have drown plot the interval is shown a wider range than my data.
input:
X Y
1 0.4078791 0.471845
2 0.2892282 0.205871
3 0.4254774 0.407548
4 0.4749196 0.396765
5 0.2763627 0.142572
6 0.3942402 0.457668
7 0.2427948 0.248003
8 0.3117754 0.322484
9 0.4350599 0.450679
10 0.4459200 0.338858
That's how a kernel density estimation works. The result must cover a wider range than your data. You can try different kernels and bandwidth algorithms or fiddle with the adjust parameter, but you actually want the density estimate to cover a wider range than your data. Otherwise it wouldn't be a proper density estimate.
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.
I have 2 sets of depth point measurements, for example:
> a
depth value
1 2 2
2 4 3
3 6 4
4 8 5
5 16 40
6 18 45
7 20 58
> b
depth value
1 10 10
2 12 20
3 14 35
I want to show both groups in one figure plotted with depth and with different symbols as you can see here
plot(a$value, a$depth, type='b', col='green', pch=15)
points(b$value, b$depth, type='b', col='red', pch=14)
The plot seems okay, but the annoying part is that the green symbols are all connected (though I want connected lines also). I want connection only when one group has a continued data points at 2 m interval i.e. the symbols should be connected with a line from 2 to 8 m (green) and then group B symbols should be connected from 10-14 m (red) and again group A symbols should be connected (green), which means I do NOT want to see the connection between 8 m sample with the 16 m for group A.
An easy solution may be dividing the group A into two parts (say, A-shallow and A-deep) and then plotting A-shallow, B, and A-deep separately. But this is completely impractical because I have thousands of data points with hundreds of groups i.e. I have to produce many depth profiles. Therefore, there has to be a way to program so that dots are NOT connected beyond a prescribed frequency/depth interval (e.g. 2 m in this case) for a particular group of samples. Any idea?
If plot or lines encounters and NA value, it will automatically break the line. Using that, we can insert NA values for missing measurements in your data and that would fix the problem. One way is this
rng<-range(range(a$depth), range(b$depth))
rng<-seq(rng[1], rng[2], by=2)
aa<-rep(NA, length(rng))
aa[match(a$depth, rng)]<-a$value
bb<-rep(NA, length(rng))
bb[match(b$depth, rng)]<-b$value
plot(aa, rng, type='b', col='green', pch=15)
points(bb, rng, type='b', col='red', pch=14)
Which produces
Note that this code assumes that all depth measurements are evenly divisible by 2.
I'm not sure if you really have separate data.frames for all of your groups, but there may be better ways to fill in missing values depending on your real data structure.
We can use the fact that lines will but breaks in when there is a NA, like MrFlick suggests. There might be a simpler way, though:
#Merge the two sets together
all = merge(a,b,by='depth', all=T)
#Plot the lines
plot(all$value.x, all$depth, type='b', col='green', pch=15)
points(all$value.y, all$depth, type='b', col='red', pch=14)
I'm fairly new to R but I am trying to create line graphs that monitor growth of bacteria over the course of time. I can successfully do this but the resulting graph isn't to my satisfaction. This is because I'm not using evenly spaced time increments although R plots these increments equally. Here is some sample data to give you and idea of what I'm talking about.
x=c(.1,.5,.6,.7,.7)
plot(x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")
axis(1,at=1:5,lab=c(0,24,72,96,120))
As you can see there are 48 hours between 24 and 72 but this is evenly distributed on the graph, is there anyway I can adjust the scale to more accurately display my data?
It's always best in R to use data structures that exhibit the relationships between your data. Instead of defining growth and time as two separate vectors, use a data frame:
growth <- c(.1,.5,.6,.7,.7)
time <- c(0,24,72,96,120)
df <- data.frame(time,growth)
print(df)
time growth
1 0 0.1
2 24 0.5
3 72 0.6
4 96 0.7
5 120 0.7
plot(df, type="o")
Not sure if this produces the exact x-axis labels that you want, but you should be free to edit the graph now without changing the relationship between the growth and time variables.
x=data.frame(x=c(.1,.5,.6,.7,.7), y=c(0,24,72,96,120))
plot(x$y, x$x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")