Plot smooth graph from user defined points - r

I would like to draw a smooth graph, with just specifying some points. An example would be the following graph:
[desired graph]
I would like to draw it with the following points:
x <- c(7, 8, 9, 11, 12, 13, 16, 17, 18)
y <- c(0.05, 0.95, 0.3, 0.3, 0.7, 0.3, 0.3, 0.95, 0.2)
How can I estimate the missing points, so that it results in a smooth graph that looks similar to the figure?

A good way to do this is with a spline function. That will give you not only the curve, but a function to estimate y for any x.
SF = splinefun(x,y)
curve(SF, xlim=c(7,18))
points(x,y, pch=16, col="red")

For 10 data points you can add a 9th order polynomial to obtain something like it. This can be fit directly to your data with geom_smooth from ggplot2 in the following way
x <- c(7, 8, 9, 11, 12, 13, 16, 17, 18)
y <- c(0.05, 0.95, 0.3, 0.3, 0.7, 0.3, 0.3, 0.95, 0.2)
df <- data.frame(x,y)
library(ggplot2)
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method="lm",
formula=y ~ poly(x, 9, raw=TRUE),
colour="red")

Related

How to plot discrete density function in R

graph "density function" if I have for example X = 12, 13, 14 with probabilities of 0.25, 0.25, 0.50, all on the same graph? each number has the mentioned probability.
Yes. Many ways to do this. barplot() is one:
dat <- data.frame(
x = 12:14,
p = c(0.25, 0.25, 0.5)
)
barplot(p~x, data=dat)

x-y Plot based on 2 column table

I'm really new to R and I'm trying to convert a 2 column table into an xy-Plot.
Here's my .csv:
x [cm];y [cm]
0.5;0
2.6;9
0.5;1
0.6;2
0.7;3
0.8;4
1;5
1.2;6
1.5;7
1.9;8
Now: plot(data$`x [cm]`,data$`y [cm]`, type="b").
However I get this result:
I'm not quite sure why (0.5/y) and (2.6/y) are connected..
What I want is a simple line connecting all the dots since they are representing electric field lines. Is there an easy way of doing that?
Sort your data first:
data <- data[order(data[,1]),]
plot(data[,1], data[,2], type="b", xlab="x [cm]", ylab="y [cm]")
The points are connected like this because the connection is created based on their order in the matrix.
m <- matrix(c(
0.5, 0,
0.5, 1,
0.6, 2,
0.7, 3,
0.8, 4,
1, 5,
1.2, 6,
1.5, 7,
1.9, 8,
2.6, 9), ncol = 2, byrow = TRUE)
colnames(m) <- c("x", "y")
plot(m, type = "b")
Simly regrouping the matrix solves your problem.
You can use
library(ggplot2)
ggplot(data, aes(x=`x [cm]`, y=`y [cm]`)) + geom_point() + geom_line()
Or using base R plot
plot(data$`x [cm]`, data$`y [cm]`,
xlim=range(data$`x [cm]`), ylim=range(data$`y [cm]`),
xlab="x [cm]", ylab="y [cm]")
lines(data$`x [cm]`[order(data$`x [cm]`)], data$`y [cm]`[order(data$`y [cm]`)],
xlim=range(data$`x [cm]`), ylim=range(data$`y [cm]`))

How to get quantile from category count in R?

For example, I have a sample data of human height in a DataFrame:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(20, 30, 50, 30, 20))
How can I calculate the 90% quantile of this sample?
I know ggplot2 has a function can plot the ecdf of the sample:
ggplot(df, aes(x = height, y = number)) + stat_ecdf()
but I only need a specified quantile not the plot.
I could repeat each height number times to make a vector and use the quantile function on the vector, but as the number getting larger, this method seems to be very inefficient.
EDIT:
It seems stat_ecdf are not supposed to be used in this way, and when data distribution is skewed:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(100, 2, 3, 4, 5))
only quantile of the repeated vector gives the desired result:
quantile(c(rep(1.5,100), rep(1.6,2), rep(1.7,3), rep(1.8,4), rep(1.9,5)))

Multiple Layers in ggplot2

I want to overlay a plot of an empirical cdf with a cdf of a normal distribution. I can only get the code to work without using ggplot.
rnd_nv1 <- rnorm(1000, 1.5, 0.5)
plot(ecdf(rnd_nv1))
lines(seq(0, 3, by=.1), pnorm(seq(0, 3, by=.1), 1.5, 0.5), col=2)
For ggplot to work I would need a single data frame, for example joining rnd_vn1 and pnorm(seq(0, 3, by=.1), 1.5, 0.5), col=2). This is a problem, because the function rnorm gives me just the function values without values on the domain. I don't even know how rnorm creates these, if I view the table I just see function values. But then again, magically, the plot of rnd_nv1 works.
The following plots the two lines but they overlap, since they are almost equal.
set.seed(1856)
x <- seq(0, 3, by = 0.1)
rnd_nv1 <- rnorm(1000, 1.5, 0.5)
dat <- data.frame(x = x, ecdf = ecdf(rnd_nv1)(x), norm = pnorm(x, 1.5, 0.5))
library(ggplot2)
long <- reshape2::melt(dat, id.vars = "x")
ggplot(long, aes(x = x, y = value, colour = variable)) +
geom_line()

R: Bar plot on a continuous x-axis (time-scaled)

I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.
Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")

Resources