Plot percentages in R as blocks - r

I have the table to the left
table <- cbind(c("x1","x2", "x3"), c("0.4173","0.9211","0.0109"))
and is trying to make the plot two the right.
Is there any packages in R, which can do, what I'm trying to achieve?

A base R, option would be to use barplot applied on a named vector
barplot(v1)
Or convert to two column data.frame with stack and use the formula method
barplot(values ~ ind, stack(v1))
Or we can can use tidyverse with ggplot
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
enframe(v1, name = "id", value = 'block') %>%
mutate(non_block = 1 - block) %>%
pivot_longer(cols = -id) %>%
ggplot(aes(x = id, y = value, fill = name)) +
geom_col() +
coord_flip() +
theme_bw()
-output
data
v1 <- setNames(c(0.4173, 0.9211, 0.0109), paste0("x", 1:3))

Related

How to derive a relationship between the variables in R by using ggplot()

I have tried to determine the relationship between the variable "RainTomorrow" and others by the code below. But, seems like the way I coded is not giving me the output. How do I determine the relation of RainTomorrow and all other variables?
rattle::weatherAUS # to load the dataset into R
str(weather)
weather$Date <- as.Date(weather$Date)
weather$RainTomorrow <- as.factor(weather$RainTomorrow)
# exploring all the varibales
weather %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
rattle::weatherAUS merely prints the data to console. You need to run weather <- rattle::weatherAUS
After that everything will work fine.
I use facet_grid() to show RainTomorrow in each row and other numeric variables in each column.
library(tidyverse)
library(rattle)
# exploring all the varibales
weather %>%
mutate(RainTomorrow = as.integer(RainTomorrow)) %>%
keep(is.numeric) %>%
mutate(RainTomorrow = weather$RainTomorrow) %>%
pivot_longer(-RainTomorrow, names_to = "name", values_to = "value") %>%
ggplot(aes(value)) +
geom_histogram() +
facet_grid(vars(RainTomorrow), vars(name), scales = "free") +
theme_test()

bicolor heatmap with factor levels

I have this dataframe:
set.seed(0)
df <- data.frame(id = factor(sample(1:100, 10000, replace=TRUE), levels=1:100),
year = factor(sample(1950:2019, 10000, replace=TRUE), levels=1950:2019)) %>% unique() %>% arrange(id, year)
And I'm looking to plot a heatmap graph where the ids are in the X-axis, years at the Y-axis, and the color is blue when the data point exists and the color is red when the data doesn't exist. I'm almost there, but I can't figure out to change the fill argument for the two colors:
ggplot(df, aes(id, year, fill= year)) +
geom_tile()
The objective to plot both variables as factors is to plot them even when some year doesn't have any id (and plotting its whole row as red).
EDIT:
Two things I forgot to add (hope it's not too late):
How to add alpha transparency to geom_tile() without messing it?
I need to sort the ids from maximum missings to minimum missings.
The complete() function from the tidyr package is useful for filling in missing combinations. First, you need to set a flag variable to indicate if the data is present or not, and then expand the data frame with the missing combinations and fill the new flag variable with 0:
df <- df %>%
mutate(flag = TRUE) %>%
complete(id, year, fill = list(flag = FALSE))
ggplot(df, aes(id, year, fill = flag)) +
geom_tile()
EDIT1: To add transparency, add alpha = 0.x within geom_tile(), where x is a value indicating the transparency. The lower the value, the more transparent.
EDIT2: To sort by missingness add the following code prior to the ggplot code:
# Determine the order of the IDs
df_order <- df %>%
group_by(id) %>%
summarize(sum = sum(flag)) %>%
arrange(desc(sum)) %>%
mutate(order = row_number()) %>%
select(id, order)
# Set the IDs in order on the chart
df <- df %>%
left_join(df_order) %>%
mutate(id = fct_reorder(id, order))
I think you need to do some pre-processing before plotting. Create a temporary variable (data_exist) which denotes data is present for that id and year. Then use complete to fill the missing years for each id and plot it.
library(tidyverse)
df %>%
mutate_all(~as.integer(as.character(.))) %>%
mutate(data_exist = 1) %>%
complete(id, year = min(year):max(year), fill = list(data_exist = 0)) %>%
mutate(data_exist = factor(data_exist)) %>%
ggplot() + aes(id, year, fill= data_exist) + geom_tile()
With expand.gridyou can create a dataframe with all combinations of ids and years, then left join on this combinations to see if you had them in df
all <- expand.grid(id=levels(df$id),year=levels(df$year)) %>%
left_join(df) %>%
mutate(present=ifelse(is.na(present),'0','1'))
ggplot(all, aes(as.numeric(id), as.numeric(year), fill= present)) +
geom_tile() +
scale_fill_manual(values=c('0'='red','1'='blue')) + # change default colors
theme(legend.position="None") # hide legend

R - ggplot2 geom_bar() doesn't plot correctly column's values

I am new to R
I would like plot using ggplot2's geom_bar():
top_r_cuisine <- r_cuisine %>%
group_by(Rcuisine) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
top_n(10)
But when I try to plot this result by:
ggplot(top_r_cuisine, aes(x = Rcuisine)) +
geom_bar()
I get this:
which doesn't represent the values in top_r_cuisine. Why?
EDIT:
I have tried:
c_count=c(23,45,67,43,54)
country=c("america","india","germany","france","italy")
# sample Data frame #
finaldata = data.frame(country,c_count)
ggplot(finaldata, aes(x=country)) +
geom_bar(aes(weight = c_count))
you need to assign the weights in the geom_bar()

r dplyr non standard evaluation - ordering bar plot in a function

I have read http://dplyr.tidyverse.org/articles/programming.html about non standard evaluation in dplyr but still can't get things to work.
plot_column <- "columnA"
raw_data %>%
group_by(.dots = plot_column) %>%
summarise (percentage = mean(columnB)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
# mutate(!!plot_column := factor(!!plot_column, !!plot_column))%>%
ggplot() + aes_string(x=plot_column, y="percentage") +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
works fine when the mutate statement is disabled. However, when enabling it in order to order the bars by height only a single bar is returned.
How can I convert the statement above into a function / to use a variable but still plot multiple bars ordered by their size.
An example Dataset could be:
columnA,columnB
a, 1
a, 0.4
a, 0.3
b, 0.5
edit
a sample:
mtcars %>%
group_by(mpg) %>%
summarise (mean_col = mean(cyl)) %>%
filter(mean_col > 0) %>%
arrange(mean_col) %>%
mutate(mpg := factor(mpg, mpg))%>%
ggplot() + aes(x=mpg, y=mean_col) +
geom_bar(stat="identity")
coord_flip()
will output an ordered bar chart.
How can I wrap this into a function where the column can be replaced and I get multiple bars?
This works with dplyr 0.7.0 and ggplot 2.2.1:
rm(list = ls())
library(ggplot2)
library(dplyr)
raw_data <- tibble(columnA = c("a", "a", "b", "b"), columnB = c(1, 0.4, 0.3, 0.5))
plot_col <- function(df, plot_column, val_column){
pc <- enquo(plot_column)
vc <- enquo(val_column)
pc_name <- quo_name(pc) # generate a name from the enquoted statement!
df <- df %>%
group_by(!!pc) %>%
summarise (percentage = mean(!!vc)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
mutate(!!pc_name := factor(!!pc, !!pc)) # insert pc_name here!
ggplot(df) + aes_(y = ~percentage, x = substitute(plot_column)) +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
}
plot_col(raw_data, columnA, columnB)
plot_col(mtcars, mpg, cyl)
Problem I ran into was kind of that ggplot and dplyr use different kinds of non-standard evaluation. I got the answer at this question: Creating a function using ggplot2 .
EDIT: parameterized the value column (e.g. columnB/cyl) and added mtcars example.

Plot MNIST digits with ggplot2

I want to plot the MNIST digits using ggplot2.
I tried this but I'm getting the numbers rotated 90 degrees. The code below is to plot the 2nd number in the dataset which corresponds to a 2.
trainData = read.csv(file = url("https://drive.google.com/uc?export=download&id=0B4Tqe9kUUfrBSllGY29pWmdGQUE"))
df = expand.grid(y = 0:27, x = 0:27)
df$col = unlist(trainData[2, -c(1,2)])
ggplot(df, aes(x, y)) + geom_tile(aes(fill = col))
If possible, please consider in your solution that I plan expand this to plotting a matrix of numbers using facet_grid or facet_wrap. I want to end with a function that I will pass a vector of rows and the function will get those rows from the dataset and create a matrix of plots (one for each number).
Thanks!
mnist is a build-in dataset in keras package.
Here is one example plot with ggplot2 and tidyverse functions:
To make geom_tile work, we need to transform the data a bit.
library(keras)
library(dplyr)
library(tibble)
library(tidyr)
library(stringr)
mnist <- keras::dataset_mnist()
mnist$test$x[sample(1:100,1), 1:28, 1:28] %>%
as_data_frame() %>%
rownames_to_column(var = 'y') %>%
gather(x, val, V1:V28) %>%
mutate(x = str_replace(x, 'V', '')) %>%
mutate(x = as.numeric(x),
y = as.numeric(y)) %>%
mutate(y = 28-y) %>%
ggplot(aes(x, y))+
geom_tile(aes(fill = val+1))+
coord_fixed()+
theme_void()+
theme(legend.position="none")

Resources