I feel like this is an easy question...
How do you identify coordinates in a figure? I plotted some data, used unireg (the uniReg package) to make a spline curve, and want to pull out the data from a point.
library(uniReg)
P0mM <- read.table(text="
Time FeuM
0.04 138.8181818
7 1258.636364
14 1320.545455
21 2110.37037
28 13730.37037
35 1550.909091",header=TRUE)
z=seq(min(P0mM$Time),max(P0mM$Time),length=201)
uf=with(P0mM,unireg(Time,FeuM,g=5,sigma=1))
plot(FeuM~Time,P0mM,ylim=c(0,16000),ylab="Fe2+ uM", xlab="Time", main="0mM P")
lines(z,uf$unimod.func(z))
I was able to find the max y value of the curve (which is 14444)
max((uf$unimod.func(z)))
I want to identify where on the x axis this happens. (Should be around 30, but I want to be exact).
How do you do this?
Thanks!
Seems like a case for optimise or optimize (depending on your affinity with British or American English):
optimise(uf$unimod.func, maximum=TRUE, interval=range(P0mM$Time))
#$maximum
#[1] 29.27168
#
#$objective
# [,1]
#[1,] 14444.85
Related
I want to identify a couple of points with high leverage on the plot below, but unfortunately, their row number is illegible, because there must be a couple of such points, and their id is printed out one on top of the other. They are all the way to the right of the plot:
How can the print out of these labels on the plot be resized and spread out so that they can be legible?
The easiest way to find the cooks distance is the built in function:
LM = lm(speed ~ dist, cars)
cooks.distance(LM)
You can pick out whatever values you want:
> which(cooks.distance(LM) > 0.05)
1 2 23 35 39 49
1 2 23 35 39 49
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.
I'm fairly new to R but I am trying to create line graphs that monitor growth of bacteria over the course of time. I can successfully do this but the resulting graph isn't to my satisfaction. This is because I'm not using evenly spaced time increments although R plots these increments equally. Here is some sample data to give you and idea of what I'm talking about.
x=c(.1,.5,.6,.7,.7)
plot(x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")
axis(1,at=1:5,lab=c(0,24,72,96,120))
As you can see there are 48 hours between 24 and 72 but this is evenly distributed on the graph, is there anyway I can adjust the scale to more accurately display my data?
It's always best in R to use data structures that exhibit the relationships between your data. Instead of defining growth and time as two separate vectors, use a data frame:
growth <- c(.1,.5,.6,.7,.7)
time <- c(0,24,72,96,120)
df <- data.frame(time,growth)
print(df)
time growth
1 0 0.1
2 24 0.5
3 72 0.6
4 96 0.7
5 120 0.7
plot(df, type="o")
Not sure if this produces the exact x-axis labels that you want, but you should be free to edit the graph now without changing the relationship between the growth and time variables.
x=data.frame(x=c(.1,.5,.6,.7,.7), y=c(0,24,72,96,120))
plot(x$y, x$x,type="o",xaxt="n",xlab="Time (hours)",ylab="Growth")
I have a number of coordinates and I want to plot them in a gridded interface by using R.
The problem is that the relative distance between observations is large. Coordinates are in a geographic coordinate system and the study area is Switzerland. Moreover, id of the points is required to be plotted.
The problem is that two clusters of the points are dense and some other points are separated with a large distance. How I can plot them in a proper way to have readable presentation? Any suggestion for plotting the data?
Preferably, do not use ggplot as I used it before and it did not present proper results.
Data:
id x y
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
6 7.17807 45.87014001
7 7.177229 45.86923001
8 7.17524 45.86808001
9 7.181409 45.87177001
10 7.179299 45.87020001
11 7.178359 45.87070001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
15 7.176839 45.86939001
17 7.18099 45.87262001
18 7.18015 45.87248001
19 7.18122 45.87355001
20 7.17491 45.86922001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
GSBi_1 7.184 45.896
GSBi__1 7.36 45.901
GSBj__1 7.268 45.961
GSBj_1 7.276 45.836
GSB 7.272 45.899
GSB_r 7.166667 45.866667
Location of points:
As you can see in the plot, the points' ids are not readable both for the dense parts and others.
Practically, it is not always possible to ensure that all points are visually separable on the screen when plotting a set of points that contains very close and very far points at the same time.
Think of a 1000x800 pixel screen. Let's say we have three points A, B and C that are located respectively on the same horizontal line such that: the distance between A and B is 1 unit and the distance between A and C is 4000 unit.
If you map this maximum distance (4000 unit) to the width of the screen (1000px). Then a pixel will correspond to 4 units in horizontal. That means A and B will fit into one pixel since the distance between them is only 1 unit. So, they will not be visually separable on the screen.
Your points are far too close to really do too much with, but an idea might be spread.labels from plotrix:
opar <- par()
par(xpd=TRUE)
plot(dat$x, dat$y)
spread.labels(dat$x,dat$y,dat$id)
par(opar)
You may want to consider omitting all the numerical labels and placing them in a different graph.
I am trying to visualize 3-tuple points that are NOT in a grid, and x and y are NOT equally spaced. Thus I can not make a matrix as mostly required, nor can I meet the requirements of the lattice contourplot, which accepts vectors, but they have to be in a pretty restrictive form. (x,y must form a grid and be equally spaced...)
I don't care, whether the result is a 3d surface or a 2D contourplot. But in some way I'ld like to visualize a (probably interpolated) surface of my 3-tuples.
Data will look like this:
myX myY myZ
1 458 4 0.54
2 101 5 0.46
3 390 0 0.45
4 186 2 0.84
5 241 3 0.50
6 495 2 0.67
I have tried several plotting functions from graphics, rgl and lattice packages.
I understand that the connecting of x,y pairs at arbitrary positions is everything but trivial - but is there any plotting function in any package, which can handle this? Or can I fill (interpolate) my data beforehand easily in order to have a full matrix? (I have fitted models visualized, but I want to see the raw data...)
Any help or hint is appreciated!
Cheers,
Niko
I bit hard to understand the question, but I will try to show how one interpolates to a full matrix. I usually use the interp function from the akima package:
set.seed(1)
x <- runif(20)
y <- runif(20)
z <- x^3 + sin(y)
require(akima)
F <- interp(x,y,z)
image(F)
points(x,y)
Here's an example of extrapolation:
F <- interp(x,y,z, linear=FALSE, extrap=TRUE)
image(F)
points(x,y)