Grouped bar charts for modal split values - r

I want to plot my data frame containing different modal split values(% of car usage, % of bike usage..) for different path lengths(under 5 km, 5-10km, 10-30km...)
Each element in my data frame contains the % of vehicle usage for each path length.
My goal is to plot all values in one plot.
I want to create a bar chart, with a bar for each path length, representing all vehicle percentages (the modal split).
My dataframe contains in the first column the vehicle modes(car, bike,..), and the columns 2-10 contain the percentages for each path length group.
I tried:
testtest <- ggplot() + geom_col(data = ms_gruppen_d,
aes(x = colnames(ms_gruppen_d)[2:9],
y = ms_gruppen_d[,2:9],
fill = ms_gruppen_d[,1]))
My values are not categorical, so I cannot use the "count" function.
Can someone help?
Thanks
ms_gruppen_d <- structure(list(VM = c("Fußverkehr", "Fahrrad", "Motorrad/Moped/Mofa",
"Privater_pkw", "Gewerb_pkw", "Lkw_bis_3_5_", "Lkw_ab_3_5_",
"Sattelzug", "ÖPNV"), `Laenge unter 5km` = c(0.218428835651906,
0.208360071967382, 0, 0.337471470224058, 0.195785602540656, 0.0103830833919553,
0.0123737357892543, 0, 0.0171972004347874), `Laenge 5 - 10km` = c(0.138928420064367,
0.140725324716725, 0.00988051174398964, 0.289334484453904, 0.308718256514345,
0.0356902893023975, 0.00988051174398964, 0.0222528559093808,
0.044589345550901), `Laenge 10-20km` = c(0.0667063809168976,
0.172327489225668, 0, 0.271668790053295, 0.346741728107974, 0.0573103622018356,
0.0292526145926873, 0.0149058863164426, 0.0410867485852005),
`Laenge 20-30km` = c(0.0405426428226048, 0.1463357744637,
0.0236972749606593, 0.271246395715663, 0.354248166536575,
0.0855256681459516, 0.0173953892663395, 0.0432292937973128,
0.0177793942911947), `Laenge 30-50km` = c(0.0213163894963155,
0.0503758065644924, 0.0159090254544127, 0.178916279908378,
0.485985672387571, 0.148087763700495, 0.0378558845704386,
0.026693520571143, 0.0348596573467541), `Laenge 50-100km` = c(0.00652604845092996,
0.0123285212525124, 0, 0.177307097376991, 0.380919125770432,
0.154233838933756, 0.213479807823156, 0.0441531204824327,
0.0110524399097905), `Laenge 100-200km` = c(0, 0.00431357399129567,
0, 0.087013827371374, 0.173016082279325, 0.203265193001196,
0.399659385606215, 0.0655495360275712, 0.0671824017230226
), `Laenge 200-300km` = c(0, 0, 0, 0.00953852353026925, 0.147233787704061,
0.130598939323796, 0.518334554408677, 0.146338992010429,
0.0479552030227669), `Laenge 300km+` = c(0, 0, 0, 0.0333890118493603,
0.0876659311982381, 0.0979219742771943, 0.420951006142259,
0.297349051156633, 0.062723025376315)), row.names = c(NA,
-9L), class = "data.frame")

The main problem is I think that your data is in the wide format instead of the long format. You can reshape the data using tidyr::pivot_longer(). Here is how you can use that function to make a grouped bar chart:
library(ggplot2)
# Reshape data, excluding column 1
df <- tidyr::pivot_longer(ms_gruppen_d, -1, names_to = "Laenge")
# Making the distances more pretty to print
df$Laenge <- factor(df$Laenge, levels = colnames(ms_gruppen_d)[-1])
levels(df$Laenge) <- gsub("Laenge ", "", levels(df$Laenge))
# A grouped bar chart
ggplot(df, aes(Laenge, value, fill = VM)) +
geom_col(position = "dodge")
However, I think a stacked bar chart might make more sense in this case, as all fractions should add up to 1.
ggplot(df, aes(Laenge, value, fill = VM)) +
geom_col(position = "stack")
Created on 2021-09-10 by the reprex package (v2.0.1)

Related

ggplot2: heatmap customize legend

I am trying to plot a heatmap (colored by odds ratios) using ggplot2. The odds ratio values range from 0-200. I would like my heatmap legend to show markings corresponding to certain values (0.1, 1, 10, 50, 100, 200). This is the code I am using but my legend does not label all the values (see below)
Code below:
map is a sample data frame with columns: segments, OR, tissue type
segments <- c("TssA", "TssBiv", "BivFlnk", "EnhBiv","ReprPC", "ReprPCWk", "Quies", "TssAFlnk", "TxFlnk", "Tx", "TxWk", "EnhG", "Enh", "ZNF/Rpts", "Het")
OR <- c(1.4787622, 46.99886002, 11.74417278, 4.49223136, 204.975818, 1.85228517, 0.85762414, 0.67926846, 0.33696213, 0.06532777, 0.10478027, 0.07462983, 0.06501252, 1.32922162, 0.32638438)
df <- data.frame(segments, OR)
map <- df %>% mutate(tissue = 'colon')
ggplot(map, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(colours=c("lightskyblue1", "white","navajowhite","lightsalmon", "orangered2", "indianred1"),
values=rescale(c(0.1, 1, 10, 50, 100, 200)), guide="colorbar", breaks=c(0.1, 1, 10, 50, 150, 200))
I am looking for my legend to look something similar to this (using the values I specified):
With your map data, first rescale OR to log(OR).
Also, you might want to assign white to OR = 1. If that's the case, your approach would be able to achieve that. You may want to try different limits values to achieve that with real data.
map_1 <-map %>% mutate(OR = log(OR))
OR_max <- max(map$OR, na.rm = TRUE)
log_list <- c(0.2, 1, 10, 50, 200) %>% log
ggplot(map_1, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(
colours = c("red3", "white", "navy"),
values=rescale(log_list),
guide="colorbar",
breaks=log_list,
limits = c(1/OR_max, OR_max) %>% log,
labels = c("0.1", "1", "10", "50", "200")
)

Interactively identify 3D object in rgl plot

I want to identify 3d cylinders in an rgl plot to obtain one attribute of the nearest / selected cylinder. I tried using labels to simply spell out the attribute, but I work on data with more than 10.000 cylinders. Therefore, it gets so crowded that the labels are unreadable and it takes ages to render.
I tried to understand the documentation of rgl and I guess the solution to my issue is selecting the cylinder in the plot manually. I believe the function selectpoints3d() is probably the way to go. I believe it returns all vertices within the drawn rectangle, but I don't know how to go back to the cylinder data? I could calculate which cylinder is closest to the mean of the selected vertices, but this seems like a "quick & dirty" way to do the job.
Is there a better way to go? I noticed the argument value=FALSE to get the indices only, but I don't know how to go back to the cylinders.
Here is some dummy data and my code:
# dummy data
cylinder <- data.frame(
start_X = rep(1:3, 2)*2,
start_Y = rep(1:2, each = 3)*2,
start_Z = 0,
end_X = rep(1:3, 2)*2 + round(runif(6, -1, 1), 2),
end_Y = rep(1:2, each = 3)*2 + round(runif(6, -1, 1), 2),
end_Z = 0.5,
radius = 0.25,
attribute = sample(letters[1:6], 6)
)
# calculate centers
cylinder$center_X <- rowMeans(cylinder[,c("start_X", "end_X")])
cylinder$center_Y <- rowMeans(cylinder[,c("start_Y", "end_Y")])
cylinder$center_Z <- rowMeans(cylinder[,c("start_Z", "end_Z")])
# create cylinders
cylinder_list <- list()
for (i in 1:nrow(cylinder)) {
cylinder_list[[i]] <- cylinder3d(
center = cbind(
c(cylinder$start_X[i], cylinder$end_X[i]),
c(cylinder$start_Y[i], cylinder$end_Y[i]),
c(cylinder$start_Z[i], cylinder$end_Z[i])),
radius = cylinder$radius[i],
closed = -2)
}
# plot cylinders
open3d()
par3d()
shade3d(shapelist3d(cylinder_list, plot = FALSE), col = "blue")
text3d(cylinder$center_X+0.5, cylinder$center_Y+0.5, cylinder$center_Z+0.5, cylinder$attribute, color="red")
# get attribute
nearby <- selectpoints3d(value=TRUE, button = "right")
nearby <- colMeans(nearby)
cylinder$dist <- sqrt(
(nearby["x"]-cylinder$center_X)**2 +
(nearby["y"]-cylinder$center_Y)**2 +
(nearby["z"]-cylinder$center_Z)**2)
cylinder$attribute[which.min(cylinder$dist)]
If you call selectpoints3d(value = FALSE), you get two columns. The first column is the id of the object that was found. Your cylinders get two ids each. One way to mark the cylinders is to use "tags". For example, this modification of your code:
# dummy data
cylinder <- data.frame(
start_X = rep(1:3, 2)*2,
start_Y = rep(1:2, each = 3)*2,
start_Z = 0,
end_X = rep(1:3, 2)*2 + round(runif(6, -1, 1), 2),
end_Y = rep(1:2, each = 3)*2 + round(runif(6, -1, 1), 2),
end_Z = 0.5,
radius = 0.25,
attribute = sample(letters[1:6], 6)
)
# calculate centers
cylinder$center_X <- rowMeans(cylinder[,c("start_X", "end_X")])
cylinder$center_Y <- rowMeans(cylinder[,c("start_Y", "end_Y")])
cylinder$center_Z <- rowMeans(cylinder[,c("start_Z", "end_Z")])
# create cylinders
cylinder_list <- list()
for (i in 1:nrow(cylinder)) {
cylinder_list[[i]] <- cylinder3d(
center = cbind(
c(cylinder$start_X[i], cylinder$end_X[i]),
c(cylinder$start_Y[i], cylinder$end_Y[i]),
c(cylinder$start_Z[i], cylinder$end_Z[i])),
radius = cylinder$radius[i],
closed = -2)
# Add tag here:
cylinder_list[[i]]$material$tag <- cylinder$attribute[i]
}
# plot cylinders
open3d()
par3d()
shade3d(shapelist3d(cylinder_list, plot = FALSE), col = "blue")
text3d(cylinder$center_X+0.5, cylinder$center_Y+0.5, cylinder$center_Z+0.5, cylinder$attribute, color="red")
# Don't get values, get the ids
nearby <- selectpoints3d(value=FALSE, button = "right", closest = FALSE)
ids <- nearby[, "id"]
# Convert them to tags. If you select one of the labels, you'll get
# a blank in the list of tags, because we didn't tag the text.
unique(tagged3d(id = ids))
When I was trying this, I found that using closest = TRUE in selectpoints3d seemed to get too many ids; there may be a bug there.

Scatter plot with ggplot2 colored by specific dates interval in r

I'm trying to assign different colors to the scatterplot based on their dates, more specifically the year.
This is how my dataset looks like:
> dput(head(CORt_r100_stack_join_fspec,10))
structure(list(Date = structure(c(16779, 16834, 16884, 16924,
16973, 16997, 17031, 17184, 17214, 17254), class = "Date"), meanNDVIN_int = c(0.677501157246889,
0.632728796482024, 0.578636981692124, 0.547002029242488, 0.632635423362751,
NA, 0.699596252720458, 0.670059391804396, 0.643347941166436,
0.674034259709311), meanNDVIW_int = c(0.784142418592418, 0.652437451242156,
0.648319814752948, 0.593432266488189, 0.767890365415717, NA,
0.779249089832163, 0.71974944410843, 0.715777992826006, 0.685045115352089
), meanNDVIE_int = c(0.703614512017928, 0.701963337684803, 0.488628353756438,
0.631309466083632, 0.781589421376217, NA, 0.799663418920722,
0.78910564747191, 0.710962969930836, 0.715644011856453), meanNDVINr_int_f = c(0.677501157246889,
0.632728796482024, 0.578636981692124, 0.547002029242488, 0.632635423362751,
0.687343078509066, 0.699596252720458, 0.670059391804396, 0.643347941166436,
0.674034259709311), meanNDVIWr_int_f = c(0.784142418592418, 0.652437451242156,
0.648319814752948, 0.593432266488189, 0.767890365415717, 0.749505859407419,
0.779249089832163, 0.71974944410843, 0.715777992826006, 0.685045115352089
), meanNDVIEr_int_f = c(0.703614512017928, 0.701963337684803,
0.488628353756438, 0.631309466083632, 0.781589421376217, 0.625916155640988,
0.799663418920722, 0.78910564747191, 0.710962969930836, 0.715644011856453
), NDVI_N = c(0.17221248, 0.644239685, 0.57222623, 0.558666635,
0.51654034, 0.42053949, 0.396706695, 0.641767447, 0.641008268,
0.662841949), NDVI_W = c(0.08182944, 0.69112807, 0.637699375,
0.629429605, 0.658829525, 0.60621678, 0.57186129, 0.72636742,
0.724193596, 0.738424976), NDVI_E = c(0.17135712, 0.659222803,
0.58665977, 0.573081253, 0.533498035, 0.437643585, 0.412841468,
0.652057206, 0.651854988, 0.670345511), NDVI_U = c(0.40520304,
0.578414833, 0.455746833, 0.428289893, 0.208847548, 0, 0, 0.475193691,
0.478691084, 0.505043773)), row.names = c(NA, 10L), class = "data.frame")
I've been plotting meanNDVIN_int against NDVI_N using this code:
ggplot(CORt_r100_join_fspec_2NDVIday,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int), colour="red")
theme_bw()+
ylab("meanNDVIN_int")+
xlab("NDVI_N")
Now I want to color each point differently (no matter the color) based on their year, 2015, 2016, and 2017.
I've used the scale_color_manual function to introduce the dates but no success so far.
Any help will be much appreciated.
Here is an alternative where you substring the first 4 characters from Date in color
df
ggplot(df,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int, color=substring(Date,1,4))) +
labs(color="Year")+
theme_bw()+
ylab("meanNDVIN_int")+
xlab("NDVI_N")
I created a year variable with lubridate and stored it asfactor for discrete colouring. You were just missing moving color inside the aes() to color it by year.
# Add year Variable;
CORt_r100_stack_join_fspec <- CORt_r100_stack_join_fspec %>% mutate(
year = as.factor(lubridate::year(Date))
)
# Plot;
ggplot(CORt_r100_stack_join_fspec,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int, color = year)) +
theme_bw() +
ylab("meanNDVIN_int")+
xlab("NDVI_N")
Note: The data you provided, and named is not the same as in your plot-call. So I changed CORt_r100_join_fspec_2NDVIday to CORt_r100_join_fspec_2NDVIday to make the plot and mutate function properly.

Is there a faster way than flextable to add tables to ggplot objects?

I'm looking for a good way to add a table to a ggplot object. So far, I've tried ggplot2::annotate, ggpmisc::geom_table, and flextable::flextable.
As discussed in another question, flextable will do the job, but it is very slow (~5-10 seconds per table). This makes it less than ideal for adding multiple tables to a ggplot object.
I tried ggplot2::annotate, thinking that I could just lay out the table using tab characters (\t) and newlines (\n). It turns out that the annotate function ignores the tab characters, so I don't get a table.
Next I tried ggpmisc::geom_table. This gives a layout pretty close to what I want, but for some reason, when the text is left justified, it always cuts off the right hand edge of the longest line.
Is there a better way to add tables to ggplot objects? Or a way to repair one of my failed attempts?
I've also had suggestions from reddit to try ggtable or tableGrob, but so far I've made even less progress getting either of those to produce a ggplot object.
Here's the code I used to generate the tables in the images:
library(ggplot2)
library(ggpmisc)
library(tibble)
library(flextable)
# ggplot2::annotate
annotate_table <- paste0(
0, '\t', 'ggplot2::annotate ignores tabs', '\n',
1, '\t', 'one', '\n',
2, '\t', 'two', '\n',
7777, '\t', 'seven thousand seven hundred and seventy seven', '\n'
)
t_annotate <- ggplot() +
annotate('text', x = 0, y = 0, label = annotate_table) +
theme_void()
t_annotate
# ggpmisc::geom_table
my_df <- data.frame(
a = c(0, 1, 2, 7777),
b = c('ggpmisc::geom_table cuts off text', 'one', 'two', 'seven thousand seven hundred and seventy seven')
)
my_tibble <- tibble(x = 0, y = 0, tbl = list(my_df))
t_geom_table <- ggplot(my_df) +
geom_table(
data = my_tibble,
mapping = aes(x = 0, y = 0, label = tbl),
table.hjust = 'left',
table.colnames = FALSE,
table.theme = ttheme_gtminimal
) +
theme_void()
t_geom_table
# flextable::flextable
my_df <- data.frame(
a = c(0, 1, 2, 7777),
b = c('flextable::flextable is slow', 'one', 'two', 'seven thousand seven hundred and seventy seven')
)
my_ft <- flextable(my_df)
my_ft <- delete_part(my_ft, 'header')
my_ft <- border_remove(my_ft)
my_ft <- autofit(my_ft)
my_ft <- as_raster(my_ft) # This line is very slow
t_flextable <- ggplot() +
annotation_custom(grid::rasterGrob(my_ft)) +
theme_void()
t_flextable

how to make a merged heatmap between each two columns of values

How can I put two columns in one heatmap?
Lets say I have the following data
data<- structure(list(names = structure(c(5L, 1L, 10L, 2L, 6L, 4L, 9L,
7L, 11L, 3L, 8L), .Label = c("Bin", "Dari", "Down", "How", "India",
"Karachi", "Left", "middle", "Right", "Trash", "Up"), class = "factor"),
X1Huor = c(1.555555556, 5.2555556, 2.256544, 2.3654225, 1.2665545,
0, 1.889822365, 2.37232101, -1, -1.885618083, 1.128576187
), X2Hour = c(1.36558854, 2.254887, 2.3333333, 0.22255444,
2.256588, 5.66666, -0.377964473, 0.107211253, -1, 0, 0),
X3Hour = c(0, 1.222222222, 5.336666, 1.179323788, 0.832050294,
-0.397359707, 0.185695338, 1.393746295, -1, -2.121320344,
1.523019248), X4Hour = c(3.988620176, 3.544745039, -2.365555,
2.366666, 1.000000225, -0.662266179, -0.557086015, 0.862662186,
0, -1.305459824, 1.929157714), X5Hour = c(2.366666, 2.333365,
4.22222, 0.823333333, 0.980196059, -2.516611478, 2.267786838,
0.32163376, 0, -2.592724864, 0.816496581)), .Names = c("names",
"X1Huor", "X2Hour", "X3Hour", "X4Hour", "X5Hour"), class = "data.frame", row.names = c(NA,
-11L))
This data has 5 columns of values. I want to make a heatmap which half of it is the value from first colum and the other half of each cell is from the second column.
The same for the third column and fourth
The same for the fifth and sixth ( there is no sixth but I can leave it empty)
This is just an example to show what I am looking for. I have searched a lot but I could not find anything like this
The color range from Red to green, if the value is higher than 2 the color red and if the value is lower than -2 the color is green.
Any thought how to do this ?
This is a somewhat hacky solution, but it might work for you, so check this out.
The idea is to utilize geom_polygon to create the triangles and stack them. To do that we first need to generate the triangle coordinates
library(dplyr)
library(tidyr)
library(stringr)
# the following two line create the triangle coordinates
x = rep(c(1,2,2, 1, 1, 2),nrow(data))
y = rep(c(1,1,2, 1, 2, 2),nrow(data)) + rep(0:10, each=6)
Now that we have our coordinates we need to generate their ids, which are the names. But because we want each triangle to be unique, we need to create two unique versions of each name:
names <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
mutate(name = str_c(names, key)) %>%
.$name %>%
rep(each = 3)
And now we do the same with the hours:
hour <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
.$value %>%
rep(each = 3)
datapoly <- data.frame(x = x, y = y , hour = hour, names = names)
Since there are no proper labels for the plot in our datapoly df, we need to create one:
name_labels <- data %>%
select(names) %>%
arrange(names) %>%
.$names
The scene is now set for our graph:
ggplot(datapoly, aes(x = x, y = y)) +
geom_polygon(aes(group = names, fill = hour), color = "black") +
scale_fill_continuous(low = "green", high = "red") +
scale_y_continuous(breaks = 1:nrow(data), labels = name_labels) +
theme(axis.text.y = element_text(vjust = -2),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.title = element_blank())
The output looks like this:
Several points to keep in mind: Is this really a plot you want to be creating and using? Is this really useful for your purposes? Perhaps other, more traditional visualization methods are more suitable. Also, I didn't bother doing the same for the other hour columns as these are quite tedious, but the method on how to achieve them should be clear enough (I hope).

Resources