Create an image filled chart in R using ggplot [duplicate] - r

This question already has an answer here:
Use a custom icon in plotly's pie chart in R
(1 answer)
Closed 5 years ago.
I'm trying to create a chart like the below image; essentailly its a filled shape denoting a percentage (most likely an image of a human, but this could be anything theortically).
I've managed to do this is Excel, albeit with a lot of botching of bar charts. But is there a way to do this in R, preferably using ggplot?
I've read the similar question Use an image as area fill in an R plot which doesn't quite do the same thing, but I cannot envisage a solution using this method.
Any help is appreciated!
EDIT:
As pointed out, this has been answered using plotly: Use a custom icon in plotly's pie chart. Is this possible to do with ggplot?
Note I stole this image from a google image search for infographics.

You really only need to modify the plotting command from the answer that #PoGibas linked: here
library(png)
library(ggplot2)
genderselection <- read.table(text="
Gender Freq
F 70
M 30
", header=T)
pcts <- round(prop.table(genderselection$Freq)*100)
# Load png file from imgur as binary
con <- url("https://i.imgur.com/vFDSFYX.png",
open='rb')
rawpng <- readBin(con, what='raw', n=50000)
close(con)
img <- readPNG(rawpng)
h <- dim(img)[1]
w <- dim(img)[2]
# Find the rows where feet starts and head ends
pos1 <- which(apply(img[,,1], 1, function(y) any(y==1)))
mn1 <- min(pos1)
mx1 <- max(pos1)
pospctM <- round((mx1-mn1)*pcts[2]/100+mn1)
pospctF <- round((mx1-mn1)*pcts[1]/100+mn1)
# Fill bodies with a different color according to percentages
# Note that this relies on the fact that the png is a bitmap.
# The png is expressed as a matrix with a cell for each pixel
# and 3 layers for r,g,b.
dim(img)
#> [1] 360 360 3
# Create a 2d matrix by just taking the red values
# Image is black and white so black corresponds to 0
# white corresponds to 1. Then change the values of
# the cells to correspond to one of three categories.
imgmtx <- img[h:1,,1]
whitemtx <- (imgmtx==1)
colmtx <- matrix(rep(FALSE,h*w),nrow=h)
midpt <- round(w/2)-10
colmtx[mx1:pospctM,1:midpt] <- TRUE
colmtx[mx1:pospctF,(midpt+1):w] <- TRUE
imgmtx[whitemtx & colmtx] <- 0.5
# Need to melt the matrix into a data.frame that ggplot can understand
df <- reshape2::melt(imgmtx)
head(df)
#> Var1 Var2 value
#> 1 1 1 0
#> 2 2 1 0
#> 3 3 1 0
#> 4 4 1 0
#> 5 5 1 0
#> 6 6 1 0
cols <- c(rgb(255,255,255,maxColorValue = 255),
rgb(209,230,244,maxColorValue = 255),
rgb(42,128,183,maxColorValue = 255))
# Then use a heatmap with 3 colours for background, and percentage fills
# Converting the fill value to a factor causes a discrete scale.
# geom_tile takes three columns: x, y, fill corresponding to
# x-coord, y-coord, and colour of the cell.
ggplot(df, aes(x = Var2, y = Var1, fill = factor(value)))+
geom_tile() +
scale_fill_manual(values = cols) +
theme(legend.position = "none")

Related

Can you change the shape of the points in ggeffects or Sjplot?

I can't figure out how to change the shape of the points in sjplot or ggeffects -
I have
in sJplot:
p<-plot_model(my.lm, type = "pred", terms = c("Var1", "Var2", "Var3"))
This makes the points of Var 2 different colors, but I need them to be different shapes.
I can use ggeffects for this, but I still can't figure out how to make an equivalent graph to plot_model in sJplot with different point shapes.
g<-ggpredict(my.lm, terms=c("Var1","Var2", "Var3"))
Thank you
There doesn't seem to be a simple way to do this (like an argument within plot_model()). So, it seems the only option here is the "nuclear option" - which is changing the values in the $data layer of the plot object. It gets a bit "hacky", but it does work.
I'll demonstrate based on an adaptation of an example in this vignette here.
library(ggplot2)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
data(efc)
y <- ifelse(efc$neg_c_7 < median(na.omit(efc$neg_c_7)), 0, 1)
df <- data.frame(
y = to_factor(y),
sex = to_factor(efc$c161sex),
dep = to_factor(efc$e42dep),
barthel = efc$barthtot,
education = to_factor(efc$c172code)
)
set_label(df$y) <- "High Negative Impact"
fit <- glm(y ~., data = df, family = binomial(link = "logit"))
p <- plot_model(fit, colors = "black")
p
Our goal is to change the shape of those points to something else. First step is to use ggplot_build() to pull the information from the object we just created, p.
qq <- ggplot_build(p)
The object, qq, is a list containing 3 layers: data, layout, and plot. Herein, we want to change the information contained in qq$data, which has the aesthetics after they have been mapped. qq$data is a list itself, containing 3 tables. Each of these tables corresponds to a particular geom in the underlying ggplot2 call. The first table qq$data[[1]], is for the vline geom. The third table qq$data[[3]] is for all the line segments behind the points (you'll note they have xmin, xmax, ymin, and ymax columns). The second table is the one we want, which is for the point geoms:
> qq$data[[2]]
colour fill x y PANEL group shape size alpha stroke
1 black black 7 0.28191638 1 7 19 2.5 NA 0.5
2 black black 6 0.21074532 1 6 19 2.5 NA 0.5
3 black black 5 0.48863767 1 5 19 2.5 NA 0.5
4 black black 4 0.39529220 1 4 19 2.5 NA 0.5
5 black black 3 -0.01294270 1 1 19 2.5 NA 0.5
6 black black 2 0.09837842 1 3 19 2.5 NA 0.5
7 black black 1 0.12316583 1 2 19 2.5 NA 0.5
You can see all points are made with shape=19. We can change that, then rebuild the plot from our modified qq object like so:
qq$data[[2]]$shape <- 5
plot(ggplot_gtable(qq))
We can do fun things like change all sorts of stuff manually in the plot if you want:
qq$data[[1]]$colour <- "blue" # change color of vertical line
qq$data[[2]]$shape[3:5] <- 19 # points 3-5 are now circles again
qq$data[[2]]$color[4:7] <- "red" # last 4 point are red
qq$data[[2]]$size[c(2,4,6)] <- 4 # some points are bigger
plot(ggplot_gtable(qq))
A word on saving these files plots to file, if you like to use ggsave(...): You cannot save the output of plot(ggplot_gtable(qq)) directly using ggsave(), since the default value for plot= in ggsave() is last_plot(), which does not work here when using plot(...). Therefore if you want to save with ggsave(), you can do the following:
p <- ggplot_gtable(qq) # save to a large gtable
ggsave("name.png", plot = p # reference the gtable object

How to draw a basic histogram with X and Y axis in R

I want to make a simple histogram which involves two vectors ,
values <- c(1,2,3,4,5,6,7,8)
freq <- c(4,6,4,4,3,2,1,1)
df <- data.frame(values,freq)
Now the data.farame df consists the following values :
values freq
1 4
2 6
3 4
4 4
5 3
6 2
7 1
8 1
Now I want to draw a simple histogram, in which values are on the x axis and freq is on y axis. I am trying to use the hist function, but I am not able to give two variables. How can I make a simple histogram from this data?
using ggplot2:
library(ggplot2)
ggplot(df, aes(x = values, y = freq)) +
geom_bar(stat="identity")
Since you have the frequencies already, what you really want is a bar plot:
barplot(df$freq,names.arg=df$values)
If you've got your heart set on using hist, you should do:
hist(rep(df$values,df$freq))
Please read ?barplot and ?hist for further plotting options.
Also, because I'm somewhat of a zealot, I think the code looks cleaner if you use data.table:
library(data.table)
setDT(df) #convert df to a data.table by reference
df[,barplot(freq,names.arg=values)]
and
df[,hist(rep(values,freq))]

How can I have different color for each bar of stack barplots? in R

My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])

Calculate the run length of a variable and plot with ggplot

I'm using ggplot to plot an ordered sequence of numbers that is colored by a factor. For example, given this fake data:
# Generate fake data
library(dplyr)
set.seed(12345)
plot.data <- data.frame(fitted = rnorm(20),
actual = sample(0:1, 20, replace=TRUE)) %>%
arrange(fitted)
head(plot.data)
fitted actual
1 -1.8179560 0
2 -0.9193220 1
3 -0.8863575 1
4 -0.7505320 1
5 -0.4534972 1
6 -0.3315776 0
I can easily plot the actual column from rows 1–20 as colored lines:
# Plot with lines
ggplot(plot.data, aes(x=seq(length.out = length(actual)), colour=factor(actual))) +
geom_linerange(aes(ymin=0, ymax=1))
The gist of this plot is to show how often the actual numbers appear sequentially across the range of fitted values. As you can see in the image, sequential 0s and 1s are readily seen as sequential blue and red vertical lines.
However, I'd like to move away from the lines and use geom_rect instead to create bands for the sequential number. I can fake this with really thick lineranges:
# Fake rectangular regions with thick lines
ggplot(plot.data, aes(x=seq(length.out = length(actual)), colour=factor(actual))) +
geom_linerange(aes(ymin=0, ymax=1), size=10)
But the size of these lines is dependent on the number of observations—if they're too thick, they'll overlap. Additionally, doing this means that there are a bunch of extraneous graphical elements that are plotted (i.e. sequential rectangular sections are really just a bunch of line segments that bleed into each other). It would be better to use geom_rect instead.
However, geom_rect requires that data include minimum and maximum values for x, meaning that I need to reshape actual to look something like this instead:
xmin xmax colour
0 1 red
1 5 blue
I need to programmatically calculate the run length of each color to mark the beginning and end of that color. I know that R has the rle() function, which is likely the best option for calculating the run length, but I'm unsure about how to split the run length into two columns (xmin and xmax).
What's the best way to calculate the run length of a variable so that geom_rect can plot it correctly?
Thanks to #baptiste, it seems that the best way to go about this is to condense the data into just those rows that see a change in x:
condensed <- plot.data %>%
mutate(x = seq_along(actual), change = c(0, diff(actual))) %>%
subset(change != 0 ) %>% select(-change)
first.row <- plot.data[1,] %>% mutate(x = 0)
condensed.plot.data <- rbind(first.row, condensed) %>%
mutate(xmax = lead(x),
xmax = ifelse(is.na(xmax), max(x) + 1, xmax)) %>%
rename(xmin = x)
condensed.plot.data
# fitted actual xmin xmax
# 1 -1.8179560 0 0 2
# 2 -0.9193220 1 2 6
# 3 -0.3315776 0 6 9
# 4 -0.1162478 1 9 11
# 5 0.2987237 0 11 14
# 6 0.5855288 1 14 15
# 7 0.6058875 0 15 20
# 8 1.8173120 1 20 21
ggplot(condensed.plot.data) +
geom_rect(aes(xmin=xmin, xmax=xmax, ymin=0, ymax=1, fill=factor(actual)))

R how to bin weighted data

Hi I'm trying to draw an histogram in ggplot but my data doesn't have all the values but values and number of occurrences.
value=c(1,2,3,4,5,6,7,8,9,10)
weight<-c(8976,10857,10770,14075,18075,20757,24770,14556,11235,8042)
df <- data.frame(value,weight)
df
value weight
1 1 8976
2 2 10857
3 3 10770
4 4 14075
5 5 18075
6 6 20757
7 7 24770
8 8 14556
9 9 11235
10 10 8042
Anybody would know either how to bin the values or how to plot an histogram of binned values.
I want to get something that would look like
bin weight
1 1-2 19833
2 3-4 24845
...
I would add another variable that designates the binning and then
df$group <- rep(c("1-2", "3-4", "5-6", "7-8", "9-10"), each = 2)
draw it using ggplot.
ggplot(df, aes(y = weight, x = group)) + stat_summary(fun.y="sum", geom="bar")
Here's one method for binning the data up:
df$bin <- findInterval(df$value,seq(1,max(df$value),2))
result <- aggregate(df["weight"],df["bin"],sum)
# get your named bins automatically without specifying them individually
result$bin <- tapply(df$value,df$bin,function(x) paste0(x,collapse="-"))
# result
bin weight
1 1-2 19833
2 3-4 24845
3 5-6 38832
4 7-8 39326
5 9-10 19277
# barplot it (base example since Roman has covered ggplot)
with(result,barplot(weight,names.arg=bin))
Just expand your data:
value=c(1,2,3,4,5,6,7,8,9,10)
weight<-c(8976,10857,10770,14075,18075,20757,24770,14556,11235,8042)
dat = rep(value,weight)
# plot result
histres = hist(dat)
And histres contains some potentially useful information if you want details of the histogram data.

Resources