Plotting level plot in R - r

I have 12 variables, M1, M2, ..., M12, for which I compute a certain statistic x.
df = data.frame(model = paste("M", 1:28, sep = ""), x = runif(28, 1, 1.05))
levels = seq(0.8, 1.2, 0.05)
I would like to plot this data as follows:
Each circle (contour) represents the a level of that statistic "x". The three blue lines simply represent three different scenarios.
The dataframe included in this example represents one scenario. The blue line would simply join the values of all the models M1 to M28 for that specific scenario.
Is there any tool in R that allow for such a plot? I tried contour() from library(MASS) but the contours are not drawn as perfect circles.
Any help would be appreciated. Thanks!

Here is a ggplot solution:
library(ggplot2)
ggplot(data=df, aes(x=model, y=x, group=1)) +
geom_line() + coord_polar() +
scale_y_continuous(limits=range(levels), breaks=levels, labels=levels)
Note this is a little confusing because of the names in your data frame. x is really the y variable here, and model the real x, so the graph scale label seems odd.
EDIT: I had to set your factor levels for model in the data frame so they plot in the correct order.

Related

Extend line length with geom_line

I want to represent three lines on a graph overlain with datapoints that I used in a discriminant function analysis. From my analysis, I have two points that fall on each line and I want to represent these three lines. The lines represent the probability contours of the classification scheme and exactly how I got the points on the line are not relevant to my question here. However, I want the lines to extend further than the points that define them.
df <-
data.frame(Prob = rep(c("5", "50", "95"), each=2),
Wing = rep(c(107,116), 3),
Bill = c(36.92055, 36.12167, 31.66012, 30.86124, 26.39968, 25.6008))
ggplot()+
geom_line(data=df, aes(x=Bill, y=Wing, group=Prob, color=Prob))
The above df is a dataframe for my points from which the three lines are constructed. I want the lines to extend from y=105 to y=125.
Thanks!
There are probably more idiomatic ways of doing it but this is one way to get it done.
In short you quickly calculate the linear formula that will connect the lines i.e y = mx+c
df_withFormula <- df |>
group_by(Prob) |>
#This mutate command will create the needed slope and intercept for the geom_abline command in the plotting stage.
mutate(increaseBill = Bill - lag(Bill),
increaseWing = Wing - lag(Wing),
slope = increaseWing/increaseBill,
intercept = Wing - slope*Bill)
# The increaseBill, increaseWing and slope could all be combined into one calculation but I thought it was easier to understand this way.
ggplot(df_withFormula, aes(Bill, Wing, color = Prob)) +
#Add in this just so it has something to plot ontop of. You could remove this and instead manually define all the limits (expand_limits would work).
geom_point() +
#This plots the three lines. The rows with NA are automatically ignored. More explicit handling of the NA could be done in the data prep stage
geom_abline(aes(slope = slope, intercept = intercept, color = Prob)) +
#This is the crucial part it lets you define what the range is for the plot window. As ablines are infite you can define whatever limits you want.
expand_limits(y = c(105,125))
Hope this helps you get the graph you want.
This is very much dependent on the structure of your data it could though be changed to fit different shapes.
Similar to the approach by #James in that I compute the slopes and the intercepts from the given data and use a geom_abline to plot the lines but uses
summarise instead of mutate to get rid of the NA values
and a geom_blank instead of a geom_point so that only the lines are displayed but not the points (Note: Having another geom is crucial to set the scale or the range of the data and for the lines to show up).
library(dplyr)
library(ggplot2)
df_line <- df |>
group_by(Prob) |>
summarise(slope = diff(Wing) / diff(Bill),
intercept = first(Wing) - slope * first(Bill))
ggplot(df, aes(x = Bill, y = Wing)) +
geom_blank() +
geom_abline(data = df_line, aes(slope = slope, intercept = intercept, color = Prob)) +
scale_y_continuous(limits = c(105, 125))

ggplot2 does not plot multiple groups of a variable, only plots one line

I would like to make a plot with multiple lines corresponding to different groups of variable "Prob" (0.1, 0.5 and 0.9) using ggplot. Although that, when I run the code, it only plots one line instead of 3. Thanks for the help :)
Here my code:
Prob <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9)
nit <- c(0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999)
greek <- log((1-Prob)/Prob)/-10
italian <- ((0.997-nit)/(0.997-0.97))^3
Temp<-c(rep(25,111))
GT <- ((30-Temp)/(30-3.3))^3
GH <- 1-GT-italian
acid <- (-1*(((sign(GH)*(abs(GH)^(1/3)))*(7-5))-7))
Species<-c(rep("Case",111))
data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data = data, aes_string(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
The answer seems to be kind of two parts:
In your data frame data, the columns that should be numeric are not numeric.
The reason why you only see one line.
Fixing the Data Frame and Using aes() in place of aes_string()
I noticed something was odd when you had as.data.frame(cbind(... to make your data frame and are using aes_string(.. within the ggplot portion. If you do a quick check on data via str(data), you'll see all of your columns in data are characters, whereas in the environment the data prepared in the code for their respective columns are numeric. Ex. acid is numeric, yet data$acid is a character.
The reason for this is that you're binding the columns into a data frame by using as.data.frame(cbind(.... This results in all data being coerced into a character, so you loose the numeric nature of the data. This is also why you have to use aes_string(...) to make it work instead of aes(). To bind vectors together into a data frame, use data.frame(..., not as.data.frame(cbind(....
To fix all this, bind your columns together like this + the ggplot code:
data <- data.frame(Prob,greek,GT,GH,italian, Temp,acid,nit, Species)
# data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
Why is there only one line?
The simple answer to why you only see one line is that the line for each of the values of data$Prob is equal. What you see is the effect of overplotting. It means that the line for data$Prob == 0.1 is the same line when data$Prob == 0.5 and data$Prob = 0.9.
To demonstrate this, let's separate each. I'm going to do this realizing that Prob could be created by repeating 0.1, 0.5, and 0.9 each 37 times in a row. I'll create a factor that I'll use as multiplication factor for data$nit that will result in separating our our lines:
my_factor <- rep(c(1,1.1,1.5), each=37) # our multiplication fractor
data$nit <- data$nit * my_factor # new nit column
# same plot code
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
There ya go. We have all lines there, you just could not see them due to overplotting. You can convince yourself of this without the multiplication business and the original data by comparing the plots for each data$Prob:
# use original dataset as above
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8) +
facet_wrap(~Prob)

Can't get Geom_smooth working for my dataset

I am currently working with a big biological dataset with many datapoint. The Head() function in R gives me the following column names:
intensity - Sample - Acession - Study - Dx
Intensity is the only data that is numeric. The others are character.
First, I have unfactorized all data into the following df: unfactordata. Next, I am interested in making a scatterplot of a specific subset of data which I do with the following piece of code where after I try to scatterplot it with a geom_smooth line in between. I use the following code:
scatplotprot <- function(name){
proteinname <- subset(unfactordata, Acession == name)
p <- ggplot(data = proteinname, aes(x = Dx, y = intensity, color = Study)) +
geom_point() +
geom_smooth(method = 'lm', aes(group = Dx))
return(p)
}
This does gives me a scatterplot with all the intensity values between 2 groups (Dx), as well as being coloured depending on which Study the datapoint originates from. However, it will not show me a line between the two groups (Dx). Depending on which Acession I call I expect to see between 3 to 8 lines.
Hope anyone can help me clear this hopefully small problem.
Warmest,
Patrick

How to plot density of points in one dimension with different factors in ggplot2

I am attempting to place individual points on a plot using ggplot2, however as there are many points, it is difficult to gauge how densely packed the points are. Here, there are two factors being compared against a continuous variable, and I want to change the color of the points to reflect how closely packed they are with their neighbors. I am using the geom_point function in ggplot2 to plot the points, but I don't know how to feed it the right information on color.
Here is the code I am using:
s1 = rnorm(1000, 1, 10)
s2 = rnorm(1000, 1, 10)
data = data.frame(task_number = as.factor(c(replicate(100, 1),
replicate(100, 2))),
S = c(s1, s2))
ggplot(data, aes(x = task_number, y = S)) + geom_point()
Which generates this plot:
However, I want it to look more like this image, but with one dimension rather than two (which I borrowed from this website: https://slowkow.com/notes/ggplot2-color-by-density/):
How do I change the colors of the first plot so it resembles that of the second plot?
I think the tricky thing about this is you want to show the original values, and evaluate the density at those values. I borrowed ideas from here to achieve that.
library(dplyr)
data = data %>%
group_by(task_number) %>%
# Use approxfun to interpolate the density back to
# the original points
mutate(dens = approxfun(density(S))(S))
ggplot(data, aes(x = task_number, y = S, colour = dens)) +
geom_point() +
scale_colour_viridis_c()
Result:
One could, of course come up with a meausure of proximity to neighbouring values for each value... However, wouldn't adjusting the transparency basically achieve the same goal (gauging how densely packed the points are)?
geom_point(alpha=0.03)

Excluding Descriptive Info from Ordination, while keeping it associated for Plot Interpretation?

Totally new to R, so sorry if the question is phrased poorly or super basic, but I'm not really understanding any of the guides I can find on my own.
I am trying to run a Principle Coordinate Analysis (PCO) on a dataset I have. This dataset includes non-numerical information like taxon and age, as well as numerical data.
I want to run a PCO of the numerical data, but keep the non-numerical data associated so that I can color-code/label specimens by those categories in the resulting plot.
Currently, I'm loading the data with read.table(), putting it into a matrix with dist(), running the pco on the matrix with pco() (synonymous with cmdscale()), and plotting the pco with plot().
There's a much higher chance for getting an answer if you provide some data to play with. Here's some:
data(iris)
pca = prcomp(iris[,-5])
with gg plot: combine the pca coordinates with the category columns
pca_ploting=data.frame(pca$x, Species = iris[,5])
library(ggplot2)
ggplot(data=pca_ploting)+
geom_point(aes(x = PC1, y=PC2, color= Species))+
stat_ellipse(aes(x = PC1, y=PC2, color= Species))
# an approach I like is to scale the coords by the variance explained
coord_fixed(ratio = pca$sdev[2]^2/pca$sdev[1]^2)
# here it does not look so nice because PC1 explains over 90%, while PC2 only about 5%
with ggord:
library(ggord)
ggord(pca, axes = c("1", "2"), iris[,5], xlims = NULL,
ylims = NULL, ext = 1.1, vec_ext = 2, size = 1)

Resources