I have data looking like:
Accounts
Income
Expense
Benefit
selling food
338.96
43.18
295.78
selling books
2757.70
2341.66
416.04
selling bikes
1369.00
1157.00
212.00
and I would like to get a combined bar plot such as this:
To do that, I wrote this R script:
## Take a data set in and produces a bar chart
## Get rid of the column containing account names to only keep values:
values <- as.matrix(data)[,-1]
## Convert strings to numbers:
values <- apply(values, 2, as.numeric)
## Transpose the matrix:
values <- t(values)
## Vertical axis labels are taken from the first column:
accountNames <- data$Accounts
## The legend is the first row except the first cell:
legend <- tail(names(data), -1)
## Colors are taken randomly
colors <- rainbow(length(legend))
## Increase left margin to fit horizontal axis labels:
par(mar=c(5,8,4,2)+.1)
## Axis labels are drawn horizontal:
par(las=1)
barplot(
values,
names.arg=accountNames,
col=colors,
beside = TRUE,
legend = legend,
horiz = TRUE
)
I would like to modernize this bar chart with ggplot2 which I use for other graphs of the same document. The documentations I found to do that always assume data in a very different shape and I don't know R enough to find out what to do by myself.
Here is the basic,then you can customize the plot the way you want
Libraries
library(tidyverse)
Data
data <-
tibble::tribble(
~Accounts, ~Income, ~Expense, ~Benefit,
"selling food", 338.96, 43.18, 295.78,
"selling books", 2757.7, 2341.66, 416.04,
"selling bikes", 1369, 1157, 212
)
Code
data %>%
#Pivot Income, Expense and Benefit
pivot_longer(cols = -Accounts) %>%
#Define each aesthetic
ggplot(aes(x = value, y = Accounts, fill = name))+
# Add geometry column
geom_col(position = position_dodge())
Results
1.Bring your data in long format with pivot_longer
2.Then plot with geom_bar and
Use coord_flip
library(tidyverse)
df %>%
pivot_longer(
cols= -Accounts,
names_to = "Category",
values_to = "values"
) %>%
ggplot(aes(Accounts, y=values, fill = Category)) +
geom_bar(stat="identity", position = "dodge")+
coord_flip() +
theme_classic()
Related
I'm not able to share my data, so sorry for that. Most of my data are either dummy or ordinal or unordered discrete variables. Only age is numeric.
I used this code to see which values are outliers
boxplot(df$var1, plot = TRUE)$out
And this code for count how many outliers:
length(boxplot(dataDK$sclmeet)$out)
I replaced the outliers with NA's using the sapply function.
I now want to either create boxplot or a table that count the amount of outliers and which they are. How is this possible?
If you help with the boxplot method then I can make mutilple boxplots and then combine them into one using par(mfrow = c(,))
The boxplot could look like this, where 1 (blue) is the value of outlier and 4 (blue) is the count of how many 1 there are:
Edit:
I forgot to mention that I know this method:
out <- boxplot.stats(df$var1)$out
boxplot(df$var1,
ylab = "var1",
main = "Boxplot for var1"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
This will give a plot similary to this. However it is not a good method for many different outliers
(taken from boxplot outlier labels):
These kinds of plots are easier with ggplot, not base R. Have you considered adding a table of your outliers next to your plot? There may be cases where you have different kinds of outliers (and thus your text would be cumbersome). However, if you already know how many outliers you have, you can use annotate to add simple text.
library(tidyverse)
library(cowplot) # to plot stuff side by side
library(gridExtra)
data(iris)
boxplot(iris$Sepal.Width, plot = TRUE)$out
length(boxplot(iris$Sepal.Width)$out)
# https://stackoverflow.com/questions/54993511/how-to-replace-outliers-with-na-in-r-from-vector-created-with-boxplotout
iris$is_outlier <- ifelse(iris$Sepal.Width %in% boxplot.stats(iris$Sepal.Width)$out, 1, 0)
iris <- iris %>%
select(Sepal.Width, is_outlier) %>%
mutate(Sepal.Width_NA = ifelse(is_outlier == 1, NA, Sepal.Width))
t <- iris %>%
filter(is_outlier == 1) %>%
select(Sepal.Width) %>% table() %>% as.data.frame() %>% tableGrob(rows = NULL)
p <- ggplot(iris, aes(y = Sepal.Width_NA)) +
geom_boxplot()
# plot with table side-by-side
plot_grid(p, t, rel_widths = c(2, 1))
# close to your original desired plot
ggplot(iris, aes(y = Sepal.Width_NA)) +
geom_boxplot() +
annotate("label", color = "blue",
size = 4,
x = 0, y = 2,
label = "1 (4)")
I already drew 3 plots using ggplot, geom_line and geom_ribbon etc.
I want to merge y axis plots of p_min, p_max and p_mean in a layout.
p_min, p_max and p_mean must locate in y axis.
x axis is number(1,2).
Let me know how to draw plots of multiple y axis using complex variables in a layout.
I think the crux here is that you should combine your data so that each geom only needs to refer to one table, with the characteristic of the source table (e.g. min vs. max vs. mean) made explicit as a variable in that combined table.
Here's a quick function to make some fake data and save it to three tables:
make_fake <- function(a, b, label) {
df <- data.frame(name = "apple", number = 1:5, value = a - b*sqrt(1:5), level = 2)
df$lower = df$value - 0.5; df$upper = df$value + 0.5; df$label = label
df
}
fake_min <- make_fake(3,.1, "min")
fake_max <- make_fake(7,1.5, "max")
fake_mean <- make_fake(5,0.8, "mean")
To plot them together, it will be simpler if they are combined such that each geom only needs to refer to one table. I've done this by adding a label variable in the fake data above.
If we use base::rbind, we can append the tables to each other and plot them, distinguishing series by having the color aesthetic connect to the label variable:
ggplot(data = rbind(fake_min, fake_max, fake_mean),
aes(x=number, y=value, group=label))+
geom_line(aes(color=label))+
geom_ribbon(aes(ymin=lower, ymax=upper, fill=label, group=label), alpha=0.3)
Maybe you want a combined ribbon showing the highest upper and lowest lower. Then you could precalc those, here using dplyr:
library(dplyr)
rbind(fake_min, fake_max, fake_mean) %>%
group_by(number) %>%
summarize(upper = max(upper),
lower = min(lower)) -> fake_ribbon
rbind(fake_min, fake_max, fake_mean) %>%
ggplot(aes(x=number)) +
geom_line(aes(color=label, y=value))+
geom_ribbon(data = fake_ribbon, aes(ymin=lower, ymax=upper), alpha=0.2)
I have been trying to create a stacked bar chart using the following codes. But I am facing a problem while generating the plot. Here is the problem and the codes for your reference:
#required packages
require(ggplot2)
require(dplyr)
require(tidyr)
#the data frame
myData <- data.frame(
a = c(70,113),
b = c(243, 238),
c = c(353, 219),
d = c(266, 148),
Gender = c("Male","Female"))
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>% mutate(pos = cumsum(Value) - (0.5 * Value))
# plot bars and add text
p <- ggplot(myData, aes(x = Gender, y = Value)) + geom_bar(aes(fill = Age),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 4)
p
These codes are producing this plot:
In this figure the "Female" bar is alright. But, You could see that the two values from the "Male" Bar that are "70" and "243" lying in the same box and the topmost portion is empty. The numbering order of the four groups are okay.
Why I am getting this? How to correct this figure?
Notice how the numbers aren't in the right colors? The default is order the bars from top to bottom. This is controled by the order of the levels of the variables. To change the way age is draw, reverse the levels of age
myData <- gather(myData,Age,Value,a:d)
myData <- group_by(myData,Gender) %>%
mutate(pos = cumsum(Value) - (0.5 * Value),
Age=forcats::fct_rev(factor(Age)))
Then you will get the ordering of your bars that matches the cumsum that you calculated.
Newer to using R and ggplot2 for my data analysis. Trying to figure out how to turn my data from R into the ggplot2 format. The data is a set of values for 5 different categories and I want to make a stacked bar graph that allows me to section the stacked bar graph into 3 sections based on the value. Ex. small, medium, and large values based on arbitrary cutoffs. Similar to the 100% stacked bar graph in excel where the proportion of all the values adds up to 1 (on the y axis). There is a fair amount of data (~1500 observations) if that is also a valuable thing to note.
here is a sample of what the data looks like (but it has approx 1000 observations for each column) (I put an excel screenshot because I don't know if that worked below)
dput(sample-data)
This sort of problem is usually a data reformating problem. See reshaping data.frame from wide to long format.
The following code uses built-in data set iris, with 4 numeric columns, to plot a bar graph with the data values cut into levels after reshaping the data.
I have chosen cutoff points 0.2 and 0.7 but any other numbers in (0, 1) will do. The cutoff vector is brks and levels names labls.
library(tidyverse)
data(iris)
brks <- c(0, 0.2, 0.7, 1)
labls <- c('Small', 'Medium', 'Large')
iris[-5] %>%
pivot_longer(
cols = everything(),
names_to = 'Category',
values_to = 'Value'
) %>%
group_by(Category) %>%
mutate(Value = (Value - min(Value))/diff(range(Value)),
Level = cut(Value, breaks = brks, labels = labls,
include.lowest = TRUE, ordered_result = TRUE)) %>%
ggplot(aes(Category, fill = Level)) +
geom_bar(stat = 'count', position = position_fill()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Here's a solution requiring no data reformating.
The diamonds dataset comes with ggplot2. Column "color" is categorical, column "price" is numeric:
library(ggplot)
ggplot(diamonds) +
geom_bar(aes(x = color, fill = cut(price, 3, labels = c("low", "mid", "high"))),
position = "fill") +
labs(fill = "price")
i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))