I have the following:
> ArkHouse2014 <- read.csv(file="C:/Rwork/ar14.csv", header=TRUE, sep=",")
> ArkHouse2014
DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349
What I would like to do is make a barplot (or series of barplots) to compare the totals in the second and third columns on the y-axis while the x-axis would display the information in the first column.
It seems like this should be very easy to do, but most of the information on making barplots that I can find has you make a table from the data and then barplot that, e.g.,
> table(ArkHouse2014$GOP)
2,936 3,258 3,508 3,573 3,581 3,588 3,638 3,830 3,899 3,951 4,133 4,166 4,319 4,330 4,345 4,391 4,396 4,588
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4,969 5,130 5,177 5,343 5,425 5,466 5,710 5,991 6,070 6,100 6,234 6,490 6,550 6,980 7,847 8,846
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I don't want the counts of how many have each total, I'd like to just represent the quantities visually. I feel pretty stupid not being able to figure this out, so thanks in advance for any advice you have to offer me.
Here's an option using libraries reshape2 and ggplot2:
I first read your data (with dec = ","):
df <- read.table(header=TRUE, text="DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349", dec = ",")
Then reshape it to long format:
library(reshape2)
df_long <- melt(df, id.var = "DISTRICT")
Then create a barplot using ggplot:
library(ggplot2)
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
or if you want the bars stacked:
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity")
Related
I have a data frame of multiple columns. I want to create a two boxplots of the two variable "secretary" and "driver" but the result is not satisfiying as the picture shows boxplot. This is my code:
profession ve.count.descrition euse.count.description Qualitative.result
secretary 0 1 -0.5
secretary 0 2 1
driver 1 1 -1
driver 0 2 0.3
data %>%
mutate(Qualitative.result = factor(Qualitative.result)) %>%
ggplot(aes(x = Profession , fill = Qualitative.result)) +
geom_boxplot()
You should not make Qualitative.result as factor. Maybe you want something like this:
library(tidyverse)
data %>%
ggplot(aes(x = Profession, y = Qualitative.result, fill = Profession)) +
geom_boxplot()
Output:
I have a dataset about accidents in the UK. Among other variables it contains the month of the accident and the severity (ranging from 1 to 3). Thus, you can imagine the dataset like this:
ID
Month
Accident_Severity
1
01
3
2
01
2
3
04
1
4
07
2
I would like to produce a bar chart with the months on the x-axis and the relative share of accidents out of the given severity class that happend in this month on the y-axis. This means each month should have three bars, let's say red, blue and green. Summing the relative share indicated by all bars of one color should equal to 100% for each color. I.e. if blue means Accident_Severity = 2 and the blue bar indicates 10% for January, this would mean 10% of all accidents with severity of 2 happend in january.
I managed to get these numbers as a table doing the following:
pivot_rel <- df %>%
select(month, Accident_Severity) %>%
group_by(month) %>%
table()
for (i in c(1,2,3)) {
for (j in seq(1,12)) {
pivot_rel[j,i] <- round(pivot_rel[j,i]/sum_severity[i],3)
}
}
pivot_rel
pivot_rel
However, i cannot use the object with ggplot. When trying I receive the error: "Fehler: data must be a data frame, or other object coercible by fortify(), not an S3 object with class table"
How do I visualize this table or is there an easier way to do what I try to achieve? Many Thanks!
Use xtabs to table the data and colSums to get the proportions. Then, with packages ggplot2 and scales, plot the graph.
library(ggplot2)
library(scales)
tbl <- xtabs( ~ Month + Accident_Severity, df1)
t(tbl)/colSums(tbl)
# Month
#Accident_Severity 1 4 7
# 1 0.0 1.0 0.0
# 2 0.5 0.0 0.5
# 3 1.0 0.0 0.0
as.data.frame(t(tbl)/colSums(tbl)) |>
ggplot(aes(factor(Month), Freq, fill = factor(Accident_Severity))) +
geom_col(position = position_dodge()) +
scale_fill_manual(values = c("red", "green", "blue")) +
scale_y_continuous(labels = percent_format()) +
xlab("Month") +
guides(fill = guide_legend(title = "Accident Severity"))
Data
df1 <- read.table(text = "
ID Month Accident_Severity
1 01 3
2 01 2
3 04 1
4 07 2
", header = TRUE)
A simple fix would be to change table to dataframe which can be used with ggplot.
pivot_rel <- as.data.frame.matrix(pivot_rel)
However, you might also go a step back and use count instead of table to generate the frequency counts of month and Accident_Severity.
library(dplyr)
pivot_rel <- df %>% count(month, Accident_Severity)
Using proportions on xtabs and base barplot.
proportions(xtabs( ~ Month + Accident_Severity, d), margin=2) |>
as.data.frame() |>
with(barplot(Freq ~ Accident_Severity + Month, beside=T, col=2:4,
main='Relative Frequencies',
legend.text=sort(unique(d$Accident_Severity)),
args.legend=list(title='Accident_Severity')))
Data:
d <- read.table(header=T, text='
ID Month Accident_Severity
1 01 3
2 01 2
3 04 1
4 07 2')
I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)
I am working with longitudinal data and assess the utilization of a policy over 13 months. In oder to get some barplots with the different months on my x-axis, I converted my data from wide Format to Long Format.
So now, my dataset looks like this
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
I thought, after reshaping I could easily use my newly created "month" variable as a factor and plot some graphs. However, it does not work out and tells me it's a list or an atomic vector. Transforming it into a factor did not work out - I would desperately Need it as a factor.
Does anybody know how to turn it into a factor?
Thank you very much for your help!
EDIT.
The OP's graph code was posted in a comment. Here it is.
library(ggplot2)
ggplot(data, aes(x = hours, y = month)) + geom_density() + labs(title = 'Distribution of hours')
# Loading ggplot2
library(ggplot2)
# Placing example in dataframe
data <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
# Converting month to factor
data$month <- factor(data$month, levels = 1:12, labels = 1:12)
# Plotting grouping by id
ggplot(data, aes(x = month, y = hours, group = id, color = factor(id))) + geom_line()
# Plotting hour density by month
ggplot(data, aes(hours, color = month)) + geom_density()
The problem seems to be in the aes. geom_density only needs a x value, if you think about it a little, y doesn't make sense. You want the density of the x values, so on the vertical axis the values will be the values of that density, not some other values present in the dataset.
First, read in the data.
Indirekte_long <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
Now graph it.
library(ggplot2)
g <- ggplot(Indirekte_long, aes(hours))
g + geom_density() + labs(title = 'Distribution of hours')
I want to compare the pageviews and sessions in each months.
So, I got a data frame as below
month time_interval sessions pageviews
1 < 10s 564622 577686
1 11 ~ 30 36575 84314
1 31~60 46547 127134
1 61~180 106056 408649
1 181~600 125891 839148
1 601~1800 99293 1143019
1 >1801 38534 1014548
2 < 10s 553552 566598
2 11 ~ 30 35440 82011
2 31~60 45558 124921
2 61~180 101529 390493
2 181~600 123027 820094
2 601~1800 98427 1137857
2 >1801 39178 1068057
3 < 10s 690598 706859
3 11 ~ 30 44409 102951
3 31~60 56585 156536
3 61~180 126382 492019
3 181~600 150267 1011472
3 601~1800 118928 1351807
3 >1801 45465 1195310
....
Now, I want to draw a dodged bar chart
here is my code
ggplot(data=mydata, aes(x=month,y=pageviews,fill=time_interval)) + geom_bar(stat="identity",position="dodge", colour="black")
Don't know why I got a weird graph as below :
Your pageviews data is a factor. It will work if you transform it to a numeric variable. Furthermore, you shoudl reorder the levels of the factor time_interval:
mydata <- transform(mydata,
time_interval = factor(time_interval, levels =
c('< 10s','11 ~ 30','31~60', '61~180','181~600', '601~1800','>1801')),
pageviews = as.numeric(as.character(pageviews)))
The plot:
library(ggplot2)
ggplot(data = mydata, aes(x = month, y = pageviews, fill = time_interval)) +
geom_bar(stat = "identity", position = "dodge", colour = "black")