Plot a density differential semantic dataset - r

I'm coming here because after hours of research and failed trials, i dont know what to do next.
I've a database (opened via dyplr with the open_excel command) that looks like the one below (but more complexe with more variables) :
> dput(open)
structure(list(Subject = c(1, 2, 3, 4, 5), `Happy - Before` = c(4,
4, 2, 1, 7), `Courageous - Before` = c(5, 2, 1, 3, 4), `Strange - Before` = c(1,
2, 1, 4, 6), `Happy - After` = c(4, 2, 6, 2, 2), `Courageous - After` = c(7,
1, 5, 1, 2), `Strange - After` = c(3, 7, 4, 5, 4)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 5 x 7
Subject `Happy - Before` `Courageous - B… `Strange - Befo… `Happy - After`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 5 1 4
2 2 4 2 2 2
3 3 2 1 1 6
4 4 1 3 4 2
5 5 7 4 6 2
# … with 2 more variables: `Courageous - After` <dbl>, `Strange - After` <dbl>
My goal here is to plot a density graph with some specificity :
Density of scores obtained by all the subjects for each ability trait on a scale from 1 to 7
As you can see on my (awful) graph, i'm trying to display the responses of all my subjects on a scale from 1 to 7 (x axis) via a density plot, but for each trait i have (y - axis), and thus, with a separation from responses made before [a test] and after a test. And i need to get the same kind of legend (courageous shown to the left, not courageous shown to the right). The more a participant is close to 7 when he answer to the scale, the more he is [happy, courageous, depressed, anxious....] and the more he get close to 1, the more he is [not happy, not courageous, not depressed...]
I tried my best (using ggplot2 templates, trying to melt things and all but i fairly new to R and language programming :/)
All my variables have thos kind of name : [trait1]_before, [trait2]_before, [trait1]_After, [trait2]_After
I hope this post is clear. If not, i will be glad to add informations !
Thank you everyone (sorry for my pooor english)

A density plot assumes a continuous variable along the x axis, whereas your example only goes from 1 to 7. This means you can plot the density where you will have tails that go past 0 and 7 or force a cut-off at those values.
data <- structure(list(Subject = c(1, 2, 3, 4, 5),
`Happy - Before` = c(4, 4, 2, 1, 7),
`Courageous - Before` = c(5, 2, 1, 3, 4),
`Strange - Before` = c(1, 2, 1, 4, 6),
`Happy - After` = c(4, 2, 6, 2, 2),
`Courageous - After` = c(7, 1, 5, 1, 2),
`Strange - After` = c(3, 7, 4, 5, 4)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
library(ggplot2)
library(ggridges)
library(grid)
library(gtable)
dataPivot <- data %>%
pivot_longer(-Subject, names_to = "measure", values_to = "score") %>%
mutate(status = sub(".* - ", "", measure),
feature = sub(" - .*", "", measure),
featureOpposite = paste('Not', feature)) %>%
mutate_if(is.character, as.factor)
If you want to not truncate the plot:
# Create the first plot with the axis on the left
p1 <- ggplot(dataTest, aes(x = score, y = feature)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3) +
scale_x_continuous(breaks = c(1, 7)) +
labs(y = NULL) +
theme_ridges() +
theme(legend.position="bottom")
# Create a second plot with the legend on the right
p2 <- ggplot(dataTest, aes(x = score, y = featureOpposite)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3) +
scale_y_discrete(position = "right") +
theme_ridges() +
theme(legend.position="bottom")
# Convert both plots to gtables
g1 <- ggplot_gtable(ggplot_build(p1))
g2 <- ggplot_gtable(ggplot_build(p2))
# Add an empty column to the left side of the first plot to make room for the right
# axis
g1 <- gtable_add_cols(g1, widths = unit(0.2, "null"), pos = -1)
# overlap the panel of the 2nd plot on that of the 1st plot
pp <- c(subset(g1$layout, name == "axis-r", se = t:r))
# Add the y-axis from the second plot
g <- gtable_add_grob(g1, g2$grobs[[which(g2$layout$name == "axis-r")]], pp$t, pp$r,
pp$b, pp$r)
grid.draw(g)
If you want to truncate the ends:
# Here we are just adding in a height variable, changing stat to density and adding
# trim = T
p1 <- ggplot(dataTest, aes(x = score, y = feature, height = ..density..)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3, stat = "density",
trim = TRUE) +
scale_x_continuous(breaks = c(1, 7)) +
labs(y = NULL) +
theme_ridges() +
theme(legend.position="bottom")
p2 <- ggplot(dataTest, aes(x = score, y = featureOpposite, height = ..density..)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3, stat = "density",
trim = TRUE) +
scale_y_discrete(position = "right") +
theme_ridges() +
theme(legend.position="bottom")
g1 <- ggplot_gtable(ggplot_build(p1))
g2 <- ggplot_gtable(ggplot_build(p2))
g1 <- gtable_add_cols(g1, widths = unit(0.2, "null"), pos = -1)
## overlap the panel of the 2nd plot on that of the 1st plot
pp <- c(subset(g1$layout, name=="axis-r", se=t:r))
g <- gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="axis-r")]], pp$t, pp$r,
pp$b, pp$r)
grid.draw(g)

I wonder (as also hinted by #Amanda) whether a density plot is appropriate for this kind of data, but leaving this to the side, here's a shorter version that does not attempt to massage the data like #Amanda.
You can play around with the bw for different bandwidth selection methods (the default, nrd0 is generally not recommended for non-Gaussian distributions as it tends to over-smooth).
library(tidyverse)
open_long <- open %>%
pivot_longer(-Subject, names_to = c("state", "time"), names_pattern = "([A-Za-z]+) - ([A-Za-z]+)")
ggplot(open_long, aes(x = value, fill = time)) +
geom_density(alpha = 0.5, bw = "SJ") +
facet_wrap(~state, ncol = 1) +
theme_bw()
Created on 2019-12-20 by the reprex package (v0.3.0)

Related

ggplot: Mask Circles inside a non Geographic Shape

Is there a way within ggplot, to plot circles within a defined, non geographic shape, defined through a series of points, or alternatively an imported SVG?
The circles would be placed in rows and columns, similar to the simple example below. But then any circles, either with their circumference, or centre if that is more achievable, outside the shape would be excluded from the plot. So a kind of mask.
I know I could do this by through comparing the coordinates, but I'm interested to know if there is a more sophisticated masking function.
library(tidyverse)
maxX <- 12
maxY <- 9
circles <- data.frame(circleNo = seq(1, maxX * maxY, 1) - 1) %>%
mutate(x = circleNo %% maxX, y = floor(circleNo / maxX))
# Set line end to coordinates for next point
shape <- data.frame(x = c(1, 1, 7, 7, 11, 11, 6, 5, 3, 2, 1), y = c(1, 8, 7, 5, 5, 1, 3, 3, 3, 1, 1)) %>%
mutate(xend = lead(x), yend = lead(y))
# Set line end for last point to the first
shape[nrow(shape),3] = shape[1,1]
shape[nrow(shape),4] = shape[1,2]
ggplot(circles, aes(x = x, y = y)) +
geom_point(shape = 1, size = 9, fill = NA) +
geom_segment(data = shape, aes(x = x, xend = xend, y = y, yend = yend)) +
theme_void() +
coord_fixed(ratio = 1)
Here's one approach that is based on manipulating the pixels as a last step. It is not sophisticated enough to identify which circles are entirely within the polygon, though. For that, the sf package and this approach sound like what you want:
How to mark points by whether or not they are within a polygon
library(ggfx)
ggplot(circles, aes(x = x, y = y)) +
as_reference(
geom_polygon(data = shape),
id = "mask_layer"
) +
with_mask(
geom_point(shape = 1, size = 9, fill = NA),
mask = "mask_layer"
) +
theme_void() +
coord_fixed(ratio = 1)
My thanks to #Jon above for the pointers. This is what I came up. Note that I added a hole in the middle of the polygon for good measure.
library(tidyverse)
library(ggplot)
# Create grid of circles
maxX <- 24
maxY <- 18
circles <- data.frame(circleNo = seq(1, maxX * maxY, 1) - 1)
circles <- circles %>%
mutate(x = circleNo %% maxX, y = floor(circleNo / maxX))
# Create polygon
shape <- data.frame(x = c(2, 2, 14, 14, 22, 22, 12, 10, 6, 4, 2), y = c(2, 16, 14, 10, 10, 2, 6, 6, 6, 2, 2)) %>%
# With line ends equal to the next point
mutate(xend = lead(x), yend = lead(y))
# Except for the last, where it needs to equal the first
shape[nrow(shape),3] = shape[1,1]
shape[nrow(shape),4] = shape[1,2]
# Plot the circles and polygon without any masking
ggplot(circles, aes(x = x, y = y)) +
geom_point(shape = 1, size = 5, fill = NA) +
geom_segment(data = shape, aes(x = x, xend = xend, y = y, yend = yend)) +
theme_void() +
coord_fixed(ratio = 1)
# Now do similar with SF which allows masking using the helpful posts below
# Create simple feature from a numeric vector, matrix or list
# https://r-spatial.github.io/sf/reference/st.html
# How to mark points by whether or not they are within a polygon
# https://stackoverflow.com/questions/50144222/how-to-mark-points-by-whether-or-not-they-are-within-a-polygon
library(sf)
# Create outer polygon
outer = matrix(c(2,2, 2,16, 14,14, 14,10, 22,10, 22,2, 12,6, 10,6, 6,6, 4,2, 2,2), ncol=2, byrow=TRUE)
# And for good measure, lets put a hole in it
hole1 = matrix(c(10,10, 10,12, 12,12, 12,10, 10,10),ncol=2, byrow=TRUE)
polygonList= list(outer, hole1)
# Convert to simple feature
combinedPoints = lapply(polygonList, function(x) cbind(x, 0))
polygons = st_polygon(combinedPoints)
# Plot these new polygons
ggplot(polygons) +
geom_sf(aes())
# Not entirely sure why we need these two lines
polygonCast <- polygons %>% st_cast("POLYGON")
circlesSF <- st_as_sf(circles, coords = c("x", "y"))
# Detect which ones are inside the outer polygon and outside the inner one
circlesSF <- circlesSF %>% mutate(outside = lengths(st_within(circlesSF, polygonCast)))
# Convert to a data frame, extract out the coordinates and filter out the ones outside
circleCoords <- as.data.frame(st_coordinates(circlesSF))
circles2 <- circlesSF %>%
as.data.frame() %>%
cbind(circleCoords) %>%
select(-geometry) %>%
filter(outside > 0)
ggplot(circles2, aes(x = X, y = Y)) +
geom_point(shape = 1, size = 5, fill = NA) +
geom_segment(data = shape, aes(x = x, xend = xend, y = y, yend = yend)) +
theme_void() +
coord_fixed(ratio = 1)

Create small squares on ggplot

I'm trying to find a solution in order to create small squares figure that I will incorporate into a bigger figer. So for instance if we take this dataframe;
a b c d
1 1 1 1 1
2 1 0 1 1
I would like to create 2 plots :
for row1
and
for row2
Here is the data
data<-structure(list(a = c(1, 1),
b = c(1, 0), c = c(1,
1), d = c(1, 1
)), class = "data.frame", row.names = c(NA, -2L))
does someone have an idea please ?
The first step is to convert the data into a tidy (long) format:
library(tidyverse)
df1<-structure(list(a = c(1, 1),
b = c(1, 0), c = c(1,
1), d = c(1, 1
)), class = "data.frame", row.names = c(NA, -2L))
df1_tidy <- df1 %>%
rowid_to_column("rowID") %>%
pivot_longer(names_to = "colID", values_to= "value", -rowID) %>%
mutate(rowID = factor(rowID, levels = sort(unique(rowID), decreasing = T)))
Then to plot you can use geom_tile with coord_fixed to make sure you end up with squares.
ggplot(df1_tidy, aes(x = colID, y = rowID)) +
geom_tile(aes(width = value*0.75,height = value *0.75 , fill = colID)) +
coord_fixed() +
scale_fill_manual(values = c("#53ae32", "#2e76b5", "#f2d355", "#d23c28")) +
theme_void() +
theme(legend.position = "none")
Or in base R:
#genrate blank plot with correct dimensions
plot(1, type="n", xlab="", ylab="", xlim=c(0,5), ylim=c(-5,0), bty="n", xaxt='n',yaxt='n', ann=FALSE, )
clrs <- c("#53ae32", "#2e76b5", "#f2d355", "#d23c28")
for(i in 1:nrow(df1)){
for(j in 1:ncol(df1)){
cat(i,j, v, "\n")
v <- df1[i,j]
rect(xleft = j, xright = j+0.9, ybottom = -i, ytop = -i-0.9, col = ifelse(v==0, "white", clrs[j]), border = NA)
}
}
As to "is there a way without transforming to a long format" - well, not really for ggplot2. If you want to use ggplot, you will want to map your variables to an aesthetic, which will take columns, not rows. But there is no real issue here - you can just create a new data frame and add this to the other plot.
Another option instead of tiles is to draw points in the shape of squares. I take from your question that you want to indeed create separate plots. I am using patchwork here for the convenience of showing the output, but you can of course leave them separate. The advantage of this approach - you would not be constricted to use coord_fixed.
Apparently you also want to draw nothing when the value is 0. Best to replace with NA, so ggplot will ignore this point.
Thanks user GordonShumway for the colors!
library(ggplot2)
library(patchwork)
df<-structure(list(a = c(1, 1), b = c(1, 0), c = c(1, 1), d = c(1, 1 )), class = "data.frame", row.names = c(NA, -2L))
df_long <- data.frame(t(df))
df_long$colors <- c("#53ae32", "#2e76b5", "#f2d355", "#d23c28")
df_long[df_long == 0] <- NA
p_ls <-
purrr::map(1:nrow(df), function(i){
ggplot(df_long, aes_string(1:4, paste0("X", i))) +
geom_point(aes(color = colors), shape = 15, size = 20, show.legend = FALSE) +
scale_color_identity() +
coord_cartesian(clip = "off", expand = FALSE) +
theme_void() +
theme(plot.margin = margin(r = 1, l = 1, unit = "in")) +
ggtitle(paste("Plot", i))
})
wrap_plots(p_ls) +plot_layout(nrow = 2)
#> Warning: Removed 1 rows containing missing values (geom_point).
Created on 2021-03-12 by the reprex package (v1.0.0)

R - Overlay multiple least squares plots with colour coding

I'm trying to visualize some data that looks like this
line1 <- data.frame(x = c(4, 24), y = c(0, -0.42864), group = "group1")
line2 <- data.frame(x = c(4, 12 ,24), y = c(0, 2.04538, 3.4135), group = "group2")
line3 <- data.frame(x = c(4, 12, 24), y = c(0, 3.14633, 3.93718), group = "group3")
line4 <- data.frame(x = c(0, 3, 7, 12, 18), y = c(0, -0.50249, 0.11994, -0.68694, -0.98949), group = "group4")
line5 <- data.frame(x = c(0, 3, 7, 12, 18, 24), y = c(0, -0.55753, -0.66006, 0.43796, 1.38723, 3.17906), group = "group5")
df <- do.call(rbind, list(line1, line2, line3, line4, line5))
What I'm trying to do is plot the least squares line (and points) for each group on the same plot. And I'd like the colour of the lines and points to correspond to the group.
All I've been able to do is plot the points according to their group
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10))
But I have no idea how to add in the lines as well and make their colours correspond to the points that they are fitting.
I'd really appreciate any help with this. It's turning out to be so much harder than I though it would be.
You can simply add a geom_smooth layer to your plot
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10)) +
geom_smooth(method="lm",se=FALSE)
method="lm" specifies that you want a linear model
se=FALSE to avoid plotting confidence intervals

How to combine ggplot and plotly graph?

I prepare a data.frame as follow;
#create dataframe
df <-data.frame(x = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)), # create random data
y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
z = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
group = c(rep('A', 300), rep('B', 150))) # add two groups
The relationship between y and x is like below, when showing by ggplot2;
#for y-x correlation by group with fit curve
gg <- ggplot(df, aes(x=x, y=y)) +
stat_density_2d(geom = "polygon", aes(alpha = ..level..,fill=group))+
geom_smooth(method = 'loess')
print(gg)
Then, I created plot_ly 3D figure as follows;
#plot_ly 3D plot
s = interp(x = df$x, y = df$y, z = df$z,duplicate = "mean") # prepare for plot_ly plot
p <- plot_ly(x = s$x, y = s$y, z = s$z,colorscale = 'Jet')%>% # plot_ly
add_surface()
, which created a graph as below;
Then, here is the question.
I would like to add the first ggplot2 figure at the bottom of the second plot_ly figure, like as below;
Is there any way (function or package) to accomplish this with R?

Can't plot circular points in R using ggplot2

After experimenting with different point sizes and shapes when plotting with ggplot2, I found that I was no longer able to plot circular points. These simple examples illustrate the problem:
# Plot 1 - square points (symbol #15) appear correctly
#
df = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
g1 <- ggplot(df, aes(x = x, y = y))
g1 <- g1 + geom_point(size = 3, shape = 15)
g1
Plot 1 output:
# Plot 2 - circular points (symbol #16) appear as diamonds
#
df = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
g1 <- ggplot(df, aes(x = x, y = y))
g1 <- g1 + geom_point(size = 3, shape = 16)
g1
Plot 2 output:
# Plot 3 - triangular points (symbol #17) appear correctly
#
df = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
g1 <- ggplot(df, aes(x = x, y = y))
g1 <- g1 + geom_point(size = 3, shape = 17)
g1
Plot 3 output:
# Plot 4 - diamond points (symbol #18) appear correctly
#
df = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
g1 <- ggplot(df, aes(x = x, y = y))
g1 <- g1 + geom_point(size = 3, shape = 18)
g1
Plot 4 output:
What do I have to do to plot circular points again?
(I'm running R 3.1.3 and RStudio 0.98.1103 in Windows 7.)
It looks like it has to do with the limited resolution of the RStudioGD() graphics device. It becomes a non-issue by avoiding the RStudio interface:
g1 <- ggplot(df, aes(x = x, y = y))
g1 <- g1 + geom_point(size = 3)
g1
(from RStudio interface via save image)
ggsave(g1, filename = "image.png")
ggsave gives you more finely-tuned control over graphics parameters, including the height/width, dpi (for raster images, eg. png), and file format. See the ?ggsave documentation for details.
Or alternatively, bump the geom_point up to size = 4.

Resources