I am trying to plot a line graph with multiple lines (grouped by a categorical value - factor) and based on what I have done in the past and what I can find online here the easiest way to do this is by assigning the categorical value to the group aesthetic - but this isn't working for me I am only getting one line on the line graph. I am 100% sure I am doing something super silly but I can't for the life of me work it out. Thanks in advance :)
#dummy data for example
test <- data.frame(x = sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), by="day"), 20),
y = sample(10:300, 10),
Origin_Station = as.factor(rep(1, 10)),
Neighbour_station = as.factor(rep(1:5, each = 20)))
#plot - what I want to see is a line for each of the 5 Neighbour_station categories (1:5) but what I get is just one line
ggplot(test, aes(x=x, y=y, group = Neighbour_station))+
geom_line()
I have also tried this:
ggplot(test, aes(x=x, y=y, group = factor(Neighbour_station), colour = Neighbour_station))+
geom_line()
Hi Rhetta also from Aus here, big ups Australian useRs:
library(ggplot2)
ggplot(test, aes(x = x, y = y, group = Neighbour_station, colour = Neighbour_station))+
geom_line()
Note the reason you can't see the distinct lines is because your data is exactly the same for each factor level (Neighbour_station 1:5).
Related
I want to create an area plot with a line above it (I'm showing that some components of my data don't sum up to the total and want to discuss that). Here is the code that throws an error:
library(ggplot2)
plotdata <- data.frame(x= rep(1:5,2), y = abs(rnorm(10)), id = rep(c("a","b"), each =5))
plotdata_total <- data.frame(x = 1:5,
y = plotdata[plotdata$id =="a", "y"]+
plotdata[plotdata$id =="b", "y"]+1:5/0.2)
ggplot(plotdata,
aes(x=x, y=y, group = id, fill = id)) +
geom_area() +
geom_line(plotdata_total, aes(x=x, y=y))
and the error is "mapping must be created by aes()". So there is something wrong in the mapping but even if I manually add an id variable to plotdata_total, I get this error. It also doesnt help to specify color, group, fill in both aes() arguments. What am I missing? Comment out the last geom to see that the area plot works.
I am trying to get a boxplot with 3 different tools in each dataset size like the one below:
ggplot(data1, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
But I need to transform x-axis to log scale. For that, I need to numericize each dataset to be able to transform them to log scale. Even without transforming them, they look like the one below:
ggplot(data2, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
I checked boxplot parameters and grouping parameters of aes, but could not resolve my problem. At first, I thought this problem is caused by scaling to log, but removing those elements did not resolve the problem.
What am I missing exactly? Thanks...
Files are in this link. "data2" is the numericized version of "data1".
Your question was a tough cookie, but I learned something new from it!
Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.
This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).
# This is for pretty-printing the axis labels
my_labs <- function(x){
paste0(x/1000, "k")
}
levs <- unique(data2$dataset)
ggplot(data2, aes(x = dataset, y = time, color = tool,
group = interaction(dataset, tool))) +
geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
scale_y_log10() + theme_bw()
This plots
I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?
As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()
I am currently working on R, and I have some troubles with the boxplot from the package ggplot2.
What I want to do is to plot the NO2 concentration depending on the speed of the vehicles on the road. So I have a continuous x-axis and a continuous y-axis. When I use geom_boxplot, I get those graphs
ggplot(df, aes(x=Speed, y=Concentration)) +
geom_boxplot() +
scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0,500))
We can see that the boxes are randomly disposed on this graph What I want is to get a different boxplot every 20 km/h between 0 and 100 km/h.
I have tried different things seen on other topics from the forum, like :
aes(group = cut_width(Speed, 20))
but nothing is changing and my boxes won't be positioned every 20 km/h.
I am not sure that my explanations are very clear, please do not hesitate to ask if you don't understand something.
It's been a few days that I'm trying to solve this problem, and I would be very grateful if someone could help me on that issue.
Thank you,
Valentine
Edit : Here is a code to create a dataset, and a picture of the result.
df = data.frame(matrix(ncol = 2, nrow = 20))
colnames(df) = c("Speed", "Concentrations")
df$Speed = runif(20, 0,100)
df$Concentrations = runif(20,0,500)
ggplot(df, aes(x = Speed, y = Concentrations)) + geom_boxplot(aes(group = cut_width(Speed, 20)))
The result is here. What I want is to have a box at Speed 20, 40, 60, 80.
Consider adding the following discrete variable to your data instead of applying cut_width() in your ggplot commands:
df$Speed_Cat = cut_width(df$Speed, 20)
Then your plot will be constructed via:
ggplot(df, aes(x = Speed_Cat, y = Concentrations)) +
geom_boxplot() +
scale_x_discrete(labels=seq(0,100,20))
Just know what you want your cuts to represent! Buckets become [-10,10], (10,30], ...,
but you can always adjust these when you create the variable in your data.
Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()