How to plot using ggplot2 - r

I have a task and i need to plot graph using ggplot2.
I have a vector of rating (Samsung S4 ratings from its users)
I generate this data using this:
TestRate<- data.frame (rating=sample (x =1:5, size=100, replace=T ), month= sample(x=1:12,size=100,rep=T) )
And now I need to plot a graph, where on X axis will be dates (monthes in our example data) and 5 different lines grouped by 5 different ratings (1,2,3,4,5). Each line shows count of its ratings for corresponding month
How can I plot this in ggplot2?

You need first to count the number of elements per couple of (rating, month):
library(data.table)
setDT(TestRate)[,count:=.N,by=list(month, rating)]
And then you can plot the result:
ggplot(TestRate, aes(month, count, color=as.factor(rating))) + geom_line()

If your data.table is not set (so to speak), you can use dplyr (and rename the legend while you are at it).
df <- TestRate %>% group_by(rating, month) %>% summarise(count = n())
ggplot(df, aes(x=month, y=count, color=as.factor(rating))) + geom_line() + labs(color = "Rating")

Related

geom_bar not showing counts properly? (R)

I'm trying to make a bar graph with ten variables and when I enter in my code, I seem to get a weird graph that just shows the frequencies as 1.00. I'm not looking for frequencies, I'm looking for the counts that are already in my data frame. Here is my code so far.
library(dplyr)
library(tidyverse)
path <- file.path("~", "Desktop", "Police_Use_of_Force.csv")
invisible(Force <- read.csv(path, stringsAsFactors = FALSE))
invisible(ProblemDf <- Force %>%
select(Problem))
ProblemDf[ProblemDf==""] <- NA
hi <- tibble(ProblemDf[rowSums(is.na(ProblemDf)) != ncol(ProblemDf), ])
names(hi) = "Problem"
topTen <- hi %>%
count(Problem) %>%
arrange(desc(n)) %>%
top_n(10, n)
ggplot(topTen, aes(y = Problem)) + geom_bar()
and here is the graph that it produces.
Bar Graph
The geom_bar() is essentially a univariate plot. It automatically counts the number of times each value appears for you. For example
ggplot(data.frame(vals=c("a","a","a","z","z")), aes(y=vals)) + geom_bar()
However in your case you are already calculating the counts so you are really making a bivariate plot. The correct geom for that is geom_col and you need to tell ggplot which column contains the counts. Use
ggplot(topTen, aes(y = Problem, x=n)) + geom_col()
ggplot(data.frame(vals=c("a","z"), n=c(3,2)), aes(y=vals, x=n)) + geom_col()

Plot multicolor vertical lines by using ggplot to show average time taken for each type as facet. Each type will have different vertical lines

I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.

Reordering categories in stacked bar chart based on count

I want to produce a stacked bar chart in ggplot2 where the bars in the stack are ordered according to the count of that category. When I attempt this using the below code, it appears that ggplot2 arranges the order of the bars in the stacked plot according to alphabetical order. Other answers on Stackoverflow suggest that ggplot2 order the bars according to the order in which R consumes the data, however in the 'a' dataframe, the appliance column is in the order of 'Radio', 'Laptop', 'TV' 'Fridge' (the first 4 rows) which isn't how it is shown in the graph either.
library(ggplot2)
library(tidyr)
#some data
SalesData<-data.frame(Appliance=c("Radio", "Laptop", "TV", "Fridge"), ThisYear=c(5,25,5,8), LastYear=c(6,20,5,8))
#transform the data into 'long format' for ggplot2
a<- gather(SalesData, Sales, Total, ThisYear, LastYear)
#Produce the bar chart
p<-ggplot(a, aes(fill=Appliance, y=Total, x=Sales)) +
geom_bar( stat="identity")
p
What I want to happen is for the largest counts to be at the bottom of the graph, so I need a way to order the data in this way. So in this example it would be 'Laptop' at the bottom, then 'Fridge', 'Radio' and 'TV' and for the legend to match this order.
Does anyone have any suggestions?
You need to reorder the factor levels before you plot the stacked bar chart. For this, there are several possibilities:
With base R
order_appliance <- unique(a$Appliance[order(a$Total)])
a$Appliance <- factor(a$Appliance, levels = order_appliance)
With dplyr
library(dplyr)
a <- a %>%
arrange(Total) %>%
mutate(Appliance = factor(Appliance, levels = unique(Appliance)))
With forcats
library(forcats)
a$Appliance <- fct_reorder(a$Appliance, a$Total)
For the plot you can use `geom_col` instead of `geom_bar(stat = "identity")`:
ggplot(a, aes(fill = Appliance, y = Total, x = Sales)) +
geom_col()
Geom_bar uses factors to create the stacks. You can see the levels present in your data with factor(a$Appliance). By default, these levels are sorted on alphabetic order. However, you can manually set the order of the levels as follows:
a$Appliance = factor(a$Appliance, levels=c("TV", "Radio", "Fridge", "Laptop"))
If you do this before creating your ggplot, you will have your desired order.
We could re-order factors based on sum, then plot, see example:
# reorder labels based on row sums
myFac <- SalesData$Appliance[ order(rowSums(SalesData[, 2:3])) ]
# wide-to-long, then reorder factor
a <- gather(SalesData, Sales, Total, ThisYear, LastYear) %>%
mutate(Appliance = factor(Appliance, labels = myFac, levels = myFac ))
# then plot
ggplot(a, aes(fill = Appliance, y = Total, x = Sales)) +
geom_bar(stat = "identity")

How to use ggplot to create facets with two factors?

I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.
ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")

Plotting columns as series

I have a dataframe which has 12 columns (one for each month of the year) and an id. Each record in this dataframe corresponds to the transaction amount(in dollars) a customer has made over the course of last twelve months. I want to plot these columns as series. And I also want to plot all the customers in the dataframe. The x-axis will be the month index and y-axis will be dollar value. So basically for each customer I need a line or series chart on the same graph.
Code for generating random data
a <- data.frame(id = seq(1,1000,1))
b <- data.frame(replicate(12,sample(1000:100000,1000,rep=TRUE)))
df <- cbind(a,b)
This is what I tried but its not what I want
library(reshape2)
library(ggplot2)
df_lg <- melt(df, id = 'id') # convert from wide to tall
ggplot(data=df_lg,
aes(x=variable, y=value, colour=variable)) +
geom_line()
Any ideas how to do this?
Just add group to your aesthetics, so the colour and group should be the id variable you want in the legend.
ggplot(data=df_lg,
aes(x=variable, y=value, colour=id, group = id)) +
geom_line()

Resources