Why does my line plot (ggplot2) look vertical? - r

I am new to coding in R, when I was using ggplot2 to make a line graph, I get vertical lines. This is my code:
all_trips_v2 %>%
group_by(Month_Name, member_casual) %>%
summarise(average_duration = mean(length_of_ride))%>%
ggplot(aes(x = Month_Name, y = average_duration)) + geom_line()
And I'm getting something like this:
This is a sample of my data:
(Not all the cells in the Month_Name is August, it's just sorted)
Any help will be greatly appreciated! Thank you.

I added a bit more code just for the mere example. the data i chose is probably not the best choice to display a proper timer series.
I hope the features of ggplot i displayed will be benficial for you in the future
library(tidyverse)
library(lubridate)
mydat <- sample_frac(storms,.4)
# setting the month of interest as the current system's month
month_of_interest <- month(Sys.Date(),label = TRUE)
mydat %>% group_by(year,month) %>%
summarise(avg_pressure = mean(pressure)) %>%
mutate(month = month(month,label = TRUE),
current_month = month == month_of_interest) %>%
# the mutate code is just for my example.
ggplot(aes(x=year, y=avg_pressure,
color=current_month,
group=month,
size=current_month
))+geom_line(show.legend = FALSE)+
## From here its not really important,
## just ideas for your next plots
scale_color_manual(values=c("grey","red"))+
scale_size_manual(values = c(.4,1))+
ggtitle(paste("Averge yearly pressure,\n
with special interest in",month_of_interest))+
theme_minimal()
## Most important is that you notice the group argument and also,
# in most cases you will want to color your different lines.
# I added a logical variable so only October will be colored,
# but that is not mandatory

You should add a grouping argument.
see further info here:
https://ggplot2.tidyverse.org/reference/aes_group_order.html
# Multiple groups with one aesthetic
p <- ggplot(nlme::Oxboys, aes(age, height))
# The default is not sufficient here. A single line tries to connect all
# the observations.
p + geom_line()
# To fix this, use the group aesthetic to map a different line for each
# subject.
p + geom_line(aes(group = Subject))

Related

geom_bar not showing counts properly? (R)

I'm trying to make a bar graph with ten variables and when I enter in my code, I seem to get a weird graph that just shows the frequencies as 1.00. I'm not looking for frequencies, I'm looking for the counts that are already in my data frame. Here is my code so far.
library(dplyr)
library(tidyverse)
path <- file.path("~", "Desktop", "Police_Use_of_Force.csv")
invisible(Force <- read.csv(path, stringsAsFactors = FALSE))
invisible(ProblemDf <- Force %>%
select(Problem))
ProblemDf[ProblemDf==""] <- NA
hi <- tibble(ProblemDf[rowSums(is.na(ProblemDf)) != ncol(ProblemDf), ])
names(hi) = "Problem"
topTen <- hi %>%
count(Problem) %>%
arrange(desc(n)) %>%
top_n(10, n)
ggplot(topTen, aes(y = Problem)) + geom_bar()
and here is the graph that it produces.
Bar Graph
The geom_bar() is essentially a univariate plot. It automatically counts the number of times each value appears for you. For example
ggplot(data.frame(vals=c("a","a","a","z","z")), aes(y=vals)) + geom_bar()
However in your case you are already calculating the counts so you are really making a bivariate plot. The correct geom for that is geom_col and you need to tell ggplot which column contains the counts. Use
ggplot(topTen, aes(y = Problem, x=n)) + geom_col()
ggplot(data.frame(vals=c("a","z"), n=c(3,2)), aes(y=vals, x=n)) + geom_col()

Plot multicolor vertical lines by using ggplot to show average time taken for each type as facet. Each type will have different vertical lines

I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.

Plotting a vertical line in R if a condition is met

I'm trying to plot vertical lines if a condition is met.
Example dataframe:
require(ggplot2)
require(dplyr)
example <- data.frame(
X = c (1:5),
Y = c(8,15,3,1,4),
indicator = c(1,0,1,0,0)
)
example %>% ggplot(aes(x=X,y=Y)) + geom_line() + geom_vline(xintercept=X)
where X in the intercept is the value of X when the indicator value is 1. So in this case, I would only want the vertical lines for when the indicator value is 1. In this example, that would create a vertical line at X=1 and X=3. Does anyone have any ideas on how to tackle this? Thanks!
The following should do what you want
library(ggplot2)
library(dplyr)
example <- data.frame(
X = c (1:5),
Y = c(8,15,3,1,4),
indicator = c(1,0,1,0,0)
)
example %>%
ggplot(aes(x=X,y=Y)) +
geom_line() +
geom_vline(aes(xintercept = X),
data = example %>% filter(indicator == 1))
Here is the resulting image.
Note: In the example above, the data.frame named example was used in the call to geom_vline, but this can be any other data.frame that contains the desired values to use as an intercept.
Minor tweak from above:
example %>% ggplot(aes(x=X,y=Y)) + geom_line() +
geom_vline(aes(xintercept=X), data=. %>% filter(indicator == 1))
data can also be a function, so you don't need to hard code example in the geom_vline layer. Since you are using dplyr anyway, it's easy to convert a pipe to a function by starting the pipeline with a dot.

Plotting a line graph with multiple lines

I am trying to plot a line graph with multiple lines in different colors, but not having much luck. My data set consists of 10 states and the voting turnout rates for each state from 9 elections (so the states are listed in the left column, and each subsequent column is an election year from 1980-2012 with the voting turnout rate for each of the 10 states). I would like to have a graph with the year on the X axis and the voting turnout rate on the Y axis, with a line for each state.
I found this previous answer (Plotting multiple lines from a data frame in R) to a similar question but cannot seem to replicate it using my data. Any ideas/suggestions would be immensely appreciated!
Use tidyr::gather or reshape::melt to transform the data to a long form.
## Simulate data
d <- data.frame(state=letters[1:10],
'1980'=runif(10,0,100),
'1981'=runif(10,0,100),
'1982'=runif(10,0,100))
library(dplyr)
library(tidyr)
library(ggplot2)
## Transform to a long df
e <- d %>% gather(., key, value, -state) %>%
mutate(year = as.numeric(substr(as.character(key), 2, 5))) %>%
select(-key)
## Plot
ggplot(data=e,aes(x=year,y=value,color=state)) +
geom_point() +
geom_line()
Please include your data, or sample data, in your question so that we can answer your question directly and help you get to the root of the problem. Pasting your data is simplified by using dput().
Here's another solution to your problem, using scoa's sample data and the reshape2 package instead of the tidyr package:
# Sample data
d <- data.frame(state = letters[1:10],
'1980' = runif(10,0,100),
'1981' = runif(10,0,100),
'1982' = runif(10,0,100))
library(reshape2)
library(ggplot2)
# Melt data and remove X introduced into year name
melt.d <- melt(d, id = "state")
melt.d[["variable"]] <- gsub("X", "", melt.td[["variable"]])
# Plot melted data
ggplot(data = melt.d,
aes(x = variable,
y = value,
group = state,
color = state)) +
geom_point() +
geom_line()
Produces:
Note that I left out the as.numeric() conversion for year from scoa's example, and this is why the graph above does not include the extra x-axis ticks that scoa's does.

How to melt a dataframe into multiple factors

I have been trying to plot a line plot with ggplot.
My data looks something like this:
I04 F04 I05 F05 I06 F06
CAT 3 12 2 6 6 20
DOG 0 0 0 0 0 0
BIEBER 1 0 0 1 0 0
and can be found here.
Basically, we have a certain number of CATs (or other creatures) initially in a year (this is I04), and a certain number of CATs at the end of the year (this is F04). This goes on for some time.
I can plot something like this fairly simply using the code below, and get this:
This is fantastic, but doesn't work very well for me. After all, I have these staring and ending inventory for each year. So I am interested in seeing how the initial values (I04, I05, I06) change over time. So, for each animal, I would like to create two different lines, one for initial quantity and one for final quantity (F01, F05, F06). This seems to me like now I have to consider two factors.
This is really difficult given the way my data is set up. I'm not sure how to tell ggplot that all the I prefixed years are one factor, and all the F prefixed years are another factor. When the dataframe gets melted, it's too late. I'm not sure how to control this situation.
Any advice on how I can separate these values or perhaps another, better way to tackle this situation?
Here is the code I have:
library(ggplot2)
library(reshape2)
DF <- read.csv("mydata.csv", stringsAsFactors=FALSE)
## cleaning up, converting factors to numeric, etc
text_names <- data.frame(as.character(DF$animals))
names(text_names) <- c("animals")
numeric_cols <- DF[, -c(1)]
numeric_cols <- sapply(numeric_cols, as.numeric)
plot_me <- data.frame(cbind(text_names, numeric_cols))
plot_me$animals <- as.factor(plot_me$animals)
meltedDF <- melt(plot_me)
p <- ggplot()
p <- p + geom_line(aes(seq(1:36), meltedDF$value, group=meltedDF$animals, color=meltedDF$animals))
p
Using your original data from the link:
nd <- reshape(mydata, idvar = "animals", direction = "long", varying = names(mydata)[-1], sep = "")
ggplot(nd, aes(x = time, y = I, group = animals, colour = animals)) + geom_line() + ggtitle("Development of initial inventories")
ggplot(nd, aes(x = time, y = F, group = animals, colour = animals)) + geom_line() + ggtitle("Development of final inventories")
I think from a data analyst perspective the following approach might provide better insight.
For each animal we visualize the initial and the final quantity in a separate panel. Moreover, each subplot has its own y scale because the values of the different animal types are radically different. Like this, differences within and across animal types are easier to spot.
Given the current structure of your data, we do not need two different factors. After the gather call the indicator column includes data like I04, F04, etc. We just need to separate the first character from the rest resulting in two columns type and time. We can use type as the argument for color in the ggplot call. time provides a unified x-axis across all animal types.
library(tidyr)
library(dplyr)
library(ggplot2)
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep = 1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = type)) +
geom_line() +
facet_grid(animals ~ ., scales = "free_y")
Of course, you might also do it the other way round, namely using a subplot for the initial and the final quantities like this:
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep=1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = animals)) +
geom_line() +
facet_grid(type ~ ., scales = "free_y")
But as described above, I would not recommend that because the y scale varies too much across animal types.

Resources