Extracting specific number of data points from a smooth curve - r

I have a dataset df, using that I am plotting a scatterplot. Following is the code for it:
g <- ggplot() + theme_bw() +
geom_point(data = df, aes_string(df[,1], df[,2]), color = 'red')+
geom_smooth(data = df, aes_string(df[,1], df[,2]),formula = y ~ splines::bs(x, df = input$degree_1), method = "lm", color = "green3", level = 1, size = 0.5)
input$degree_1 is the slider to change the degree of the polynomial fit.
Secondly, I am extracting the data points of the smoothen curve like this:
r <- ggplot_build(g)$data[[2]]
Now, I want to cut that smoothen curve using two verticle lines and extract the data points of the curve lying in between those two lines:
v_f1 <- subset(r, x > input$Vert1 & x < input$Vert2, select = c(x,y))
input$Vert1 and input$Vert2are the sliders to change the positions of the verticle lines.
What I want:
I want to be able to control the number of points that are getting subsetted and extracted in the above-mentioned command by those verticle lines.
For now, it is extracting a random number of points, I want the user to be able to control that. For eg., if I want to cut that profile and extract 100 points in one case and 120 points in another case and so on. Or I could just set a fix number for all the cases.

Related

Extend line length with geom_line

I want to represent three lines on a graph overlain with datapoints that I used in a discriminant function analysis. From my analysis, I have two points that fall on each line and I want to represent these three lines. The lines represent the probability contours of the classification scheme and exactly how I got the points on the line are not relevant to my question here. However, I want the lines to extend further than the points that define them.
df <-
data.frame(Prob = rep(c("5", "50", "95"), each=2),
Wing = rep(c(107,116), 3),
Bill = c(36.92055, 36.12167, 31.66012, 30.86124, 26.39968, 25.6008))
ggplot()+
geom_line(data=df, aes(x=Bill, y=Wing, group=Prob, color=Prob))
The above df is a dataframe for my points from which the three lines are constructed. I want the lines to extend from y=105 to y=125.
Thanks!
There are probably more idiomatic ways of doing it but this is one way to get it done.
In short you quickly calculate the linear formula that will connect the lines i.e y = mx+c
df_withFormula <- df |>
group_by(Prob) |>
#This mutate command will create the needed slope and intercept for the geom_abline command in the plotting stage.
mutate(increaseBill = Bill - lag(Bill),
increaseWing = Wing - lag(Wing),
slope = increaseWing/increaseBill,
intercept = Wing - slope*Bill)
# The increaseBill, increaseWing and slope could all be combined into one calculation but I thought it was easier to understand this way.
ggplot(df_withFormula, aes(Bill, Wing, color = Prob)) +
#Add in this just so it has something to plot ontop of. You could remove this and instead manually define all the limits (expand_limits would work).
geom_point() +
#This plots the three lines. The rows with NA are automatically ignored. More explicit handling of the NA could be done in the data prep stage
geom_abline(aes(slope = slope, intercept = intercept, color = Prob)) +
#This is the crucial part it lets you define what the range is for the plot window. As ablines are infite you can define whatever limits you want.
expand_limits(y = c(105,125))
Hope this helps you get the graph you want.
This is very much dependent on the structure of your data it could though be changed to fit different shapes.
Similar to the approach by #James in that I compute the slopes and the intercepts from the given data and use a geom_abline to plot the lines but uses
summarise instead of mutate to get rid of the NA values
and a geom_blank instead of a geom_point so that only the lines are displayed but not the points (Note: Having another geom is crucial to set the scale or the range of the data and for the lines to show up).
library(dplyr)
library(ggplot2)
df_line <- df |>
group_by(Prob) |>
summarise(slope = diff(Wing) / diff(Bill),
intercept = first(Wing) - slope * first(Bill))
ggplot(df, aes(x = Bill, y = Wing)) +
geom_blank() +
geom_abline(data = df_line, aes(slope = slope, intercept = intercept, color = Prob)) +
scale_y_continuous(limits = c(105, 125))

How to draw a radar chart with a lot of rows (or dimensions) using R?

Let's say I have such data:
a <- tibble(id=c(1,1.1,1.2,1.7,2,2.1,2.6,4,4.6,4.68),
x=c(0.3,0.5,0.2,0.7,0.1,0.5,0.43,0.6,0.3,0.65),
y=c(0.2,0.1,0.22,0.1,0.5,0.2,0.3,0.2,0.14,0.3))
This is just a sample, my real data is much more than this. and x+y+... = 1. I want to draw two lines: one line is for x, one line is for x+y:
ggplot(a) +
geom_line(aes(x=id,y=x),color='red') +
geom_line(aes(x=id,y=x+y),color='blue')
But what I really want something like a radar chart like:
You can see there is a circle with the radius to be 1. x and x+y, (maybe more in my data) are red and blue circles respectively. So, x+y must be larger than x but always in the circle because x+y+...=1. My data has a lot of ids, so it is not the traditional radar with few dimensions.
You can create radar charts with coord_polar() - e.g.
library(tidyverse)
ggplot(a) +
geom_smooth(aes(x=id,y=x),color='red', se = FALSE) +
geom_smooth(aes(x=id,y=x+y),color='blue', se = FALSE) +
geom_line(aes(x = id, y = 1)) +
coord_polar()
Note, that I used geom_smooth to get a closer to your intended result.

How can I draw a line through a group of points in R?

So, I have a table qb that contains a column, Total_Half_PPR and Tier. With the code below, I added a column containing a normal distribution of Total_Half_PPR for each Tier
for(tier in unique(qb$Final.Tier)){
qb$Norm[qb$Final.Tier == tier] <- dnorm(qb$Total_Half_PPR[qb$Final.Tier == tier],
mean = mean(qb$Total_Half_PPR[qb$Final.Tier == tier]),
sd = sd(qb$Total_Half_PPR[qb$Final.Tier == tier]))
}
Once that's done, I plotted each of the normal distributions on one plot, by adding a color to them. (I created a subset that only contained certain tiers of interest)
subset.data <- subset(qb, Final.Tier %in% c(1,2,3,4))
p <- ggplot(data = subset.data, aes(x = Total_Half_PPR, y = Norm)) +
geom_point(aes(color = Final.Tier))
Here is the resulting plot:
Normal Distribution Plot for each Tier
Right now, I am just using a different color to indicate the curve of a given tier. How do i got about plotting a line for each of the normal distributions shown in the pictures, instead of the scatter plot?

How to plot density of points in one dimension with different factors in ggplot2

I am attempting to place individual points on a plot using ggplot2, however as there are many points, it is difficult to gauge how densely packed the points are. Here, there are two factors being compared against a continuous variable, and I want to change the color of the points to reflect how closely packed they are with their neighbors. I am using the geom_point function in ggplot2 to plot the points, but I don't know how to feed it the right information on color.
Here is the code I am using:
s1 = rnorm(1000, 1, 10)
s2 = rnorm(1000, 1, 10)
data = data.frame(task_number = as.factor(c(replicate(100, 1),
replicate(100, 2))),
S = c(s1, s2))
ggplot(data, aes(x = task_number, y = S)) + geom_point()
Which generates this plot:
However, I want it to look more like this image, but with one dimension rather than two (which I borrowed from this website: https://slowkow.com/notes/ggplot2-color-by-density/):
How do I change the colors of the first plot so it resembles that of the second plot?
I think the tricky thing about this is you want to show the original values, and evaluate the density at those values. I borrowed ideas from here to achieve that.
library(dplyr)
data = data %>%
group_by(task_number) %>%
# Use approxfun to interpolate the density back to
# the original points
mutate(dens = approxfun(density(S))(S))
ggplot(data, aes(x = task_number, y = S, colour = dens)) +
geom_point() +
scale_colour_viridis_c()
Result:
One could, of course come up with a meausure of proximity to neighbouring values for each value... However, wouldn't adjusting the transparency basically achieve the same goal (gauging how densely packed the points are)?
geom_point(alpha=0.03)

Plotting error while using ggplot faceting function in R

I am trying to do the comparison of my observed and modeled data sets for two stations. One station is called station "red" and another is called "blue". I was able to create the facets but when I tried to add two series in one facet, only one facet got updated while other didn't.
This means for blue only one series is plotted and for red two series are plotted.
The code I used is as follows:
# install.packages("RCurl", dependencies = TRUE)
require(RCurl)
out <- postForm("https://dl.dropbox.com/s/ainioj2nn47sis4/watersurf1.csv?dl=1", format="csv")
watersurf <- read.csv(textConnection(out))
watersurf[1:100,]
watersurf$coupleid <- factor(rep(unlist(by(watersurf$id,watersurf$group1,
function(x) {ave(as.numeric(unique(x)),FUN=seq_along)}
)),each=6239))
p <- ggplot(data=watersurf,aes(x=time,y=data,group=id))+geom_line(aes(linetype=group1),size=1)+facet_wrap(~coupleid)
p
Is it also possible to add a third series in the graph but of unequal length (i.e not same interval)?
The output is
I followed the example on this page to create the graphs.
http://www.ats.ucla.edu/stat/r/faq/growth.htm
Is this what you are looking for,
ggplot(data = watersurf, aes( x = time, y = data))
+ geom_line(aes(linetype = group1, colour = group1), size = 0.2)
+ facet_wrap(~ id)

Resources