How to remove dots and extend boxplots in ggplot2 [duplicate] - r

This question already has answers here:
ggplot2 - Boxplot Whiskers at Min/Max
(2 answers)
Closed 7 years ago.
I have some data that I'm trying to build some boxplots with, but I'm getting this error:
Warning message: Removed 1631 rows containing non-finite values
(stat_boxplot).
There are no NA values and all the data seems fine. How can I fix this as these are certainly valuable points in my data and should be extended by the whiskers?
Data
The data is fairly large, and I couldn't get a smaller subsample to produce the errors, so I'll just post the original data.
dat.rds
ggplot2
dat <- readRDS("./dat.rds")
ggplot(dat, aes(x = factor(year), y = dev)) + geom_boxplot() + ylim(-40, 260)
Edit
I was able to get it to work in boxplot with `range = 6'. Is there a way to do this in ggplot?
boxplot(dev~year, data = d, range = 6)

Remove the ylim restriction and use the coef argument of geom_boxplot, then it works fine:
library(ggplot2)
download.file(url = "https://www.dropbox.com/s/5mgogyclhim6hom/dat.rds?dl=1", tf <- tempfile(fileext = ".rds"))
dat <- readRDS(tf)
ggplot(dat, aes(x = factor(year), y = dev)) +
geom_boxplot(coef = 6)

Related

Adding legend to ggplot curves plotted on the same axis [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 months ago.
I have a graph that I'm trying to add a legend to but I can't find any answers.
Here's what the graph looks like
I made a dataframe containing my x-axis as a colum and several othe columns containing y values that I graphed against x (fixed) in order to get these curves. I want a legend to appear on the side saying column 1, ...column 11 and corresponding to the color of the graph
How do I do this? I feel like I'm missing something obvious
Here's what my code looks like:(sorry for the pic. I keep getting errors that my code is not formatted correctly even though I'm using the code button)
interval is just 2:100 and aaaa etc... is a vector the same length as interval.
As Peter says, you will need to convert your data into "long" format. Here is an example using reshape2::melt:
library(reshape2)
library(ggplot2)
n <- 20
df <- data.frame(x = seq(n))
tmp <- as.data.frame(do.call("cbind", lapply(seq(5), FUN = function(x){rnorm(n)})))
names(tmp) <- paste0("aaaa", letters[1:5])
df <- cbind(df, tmp)
head(df)
df2 <- melt(df, id.vars = "x")
head(df2)
ggplot(data = df2) + aes(x = x, y = value, color = variable) +
geom_point() +
geom_line()

How to connect lines when x are strings? [duplicate]

This question already has answers here:
ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
(6 answers)
Closed 12 months ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
library(ggplot2)
x=letters[1:3]
y=1:3
qplot(x, y)
qplot(x, y, geom=c('point', 'line'))
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?
I want to connect lines between the points. But when the x is a string, the above commands won't work. It works when the x is numeric. I'd think qplot should be made more user-friendly in this case.
How to make it connect the points with lines when x is a string?
One solution is provided by #stefan. Another one could be the following.
Sample data:
x=letters[1:3]
y=1:3
Sample code:
d <- data.frame(x, y) %>%
mutate(x = x %>%
factor(levels = x))
library(ggplot2)
ggplot(data = d, aes(x = x, y = y, group = 1)) +
geom_line() +
scale_x_discrete(labels = x, breaks = x)
Plot:

ggplot2 does not plot multiple groups of a variable, only plots one line

I would like to make a plot with multiple lines corresponding to different groups of variable "Prob" (0.1, 0.5 and 0.9) using ggplot. Although that, when I run the code, it only plots one line instead of 3. Thanks for the help :)
Here my code:
Prob <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9)
nit <- c(0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999)
greek <- log((1-Prob)/Prob)/-10
italian <- ((0.997-nit)/(0.997-0.97))^3
Temp<-c(rep(25,111))
GT <- ((30-Temp)/(30-3.3))^3
GH <- 1-GT-italian
acid <- (-1*(((sign(GH)*(abs(GH)^(1/3)))*(7-5))-7))
Species<-c(rep("Case",111))
data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data = data, aes_string(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
The answer seems to be kind of two parts:
In your data frame data, the columns that should be numeric are not numeric.
The reason why you only see one line.
Fixing the Data Frame and Using aes() in place of aes_string()
I noticed something was odd when you had as.data.frame(cbind(... to make your data frame and are using aes_string(.. within the ggplot portion. If you do a quick check on data via str(data), you'll see all of your columns in data are characters, whereas in the environment the data prepared in the code for their respective columns are numeric. Ex. acid is numeric, yet data$acid is a character.
The reason for this is that you're binding the columns into a data frame by using as.data.frame(cbind(.... This results in all data being coerced into a character, so you loose the numeric nature of the data. This is also why you have to use aes_string(...) to make it work instead of aes(). To bind vectors together into a data frame, use data.frame(..., not as.data.frame(cbind(....
To fix all this, bind your columns together like this + the ggplot code:
data <- data.frame(Prob,greek,GT,GH,italian, Temp,acid,nit, Species)
# data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
Why is there only one line?
The simple answer to why you only see one line is that the line for each of the values of data$Prob is equal. What you see is the effect of overplotting. It means that the line for data$Prob == 0.1 is the same line when data$Prob == 0.5 and data$Prob = 0.9.
To demonstrate this, let's separate each. I'm going to do this realizing that Prob could be created by repeating 0.1, 0.5, and 0.9 each 37 times in a row. I'll create a factor that I'll use as multiplication factor for data$nit that will result in separating our our lines:
my_factor <- rep(c(1,1.1,1.5), each=37) # our multiplication fractor
data$nit <- data$nit * my_factor # new nit column
# same plot code
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
There ya go. We have all lines there, you just could not see them due to overplotting. You can convince yourself of this without the multiplication business and the original data by comparing the plots for each data$Prob:
# use original dataset as above
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8) +
facet_wrap(~Prob)

How to add ggplots into one if data come from different data sets? And how to add geom ribbon to discrete data?

I have ggplot with mean of imdb movie rating per year plotted and I wanted to plot ribbon like layer to it, that shows the standard error for each point but is obviously continues ( if that's possible even)
ggplot(data = avg_imdb_movie_year, aes( x = startYear, y = avg_rating)) +
geom_point() +
geom_ribbon(aes(x = start_Year, y = standard_error, xmin = min(xx), xmax = max(xx)))
The xx is sequence corresponding to the years of the movies. The standard_error is simply calculated as sd(average_rating) [that is the difference to mean for each data point]
I think I do something completely wrong. If my data is discrete is there a way I can draw ribbon like standard error around the mean points?
Additional to that I have a question about adding layers that have different data frame. Here is example, I want to add to this ggplot another geom_point() layer where the data would be awarded movie ratings average per year. But I run into error:
ggplot(data = avg_imdb_movie_year, aes( x = startYear, y = avg_rating)) +
geom_point() +
geom_point(aes(x = avg_awarded_moves_year$year_film,
y = avg_awarded_moves_year$average_per_year))
Error message: Error: Aesthetics must be either length 1 or the same as the data (138): x and y
I realise that it's because there are less years (rows) in awarded_movies table, but I don't know how to add another plot from different dataset to existing ggplot. Do anyone has any ideas?

Plot Frequency of Data in ggplot2 [duplicate]

This question already has answers here:
ggplot side by side geom_bar()
(2 answers)
Closed 4 years ago.
I have a data frame that looks like the following:
threshold <- c("thresh1","thresh3","thresh10","thresh3","thresh3", "thresh10")
expression <- c("expressed", "expressed", "expressed", "depleted", "expressed", "depleted")
data.frame("Threshold" = threshold, "Expression" = expression)
I would like to generate a histogram of counts of the different thresholds, bucketed by the expression.
I have attempted to do so using geom_bar(), but I not want the data stacked. Rather, I want the different categories (depleted, enriched etc...) to be represented in their own bars.
ggplot(final_nonexpressed, aes(x = threshold, fill = expression))+geom_bar(width = 0.5)
Any help would be appreciated!
Check out the help page for ?geom_bar(), specifically the dodge argument. For example:
library(ggplot2)
g <- ggplot(mpg, aes(class, fill = factor(drv)))
g + geom_bar()
g + geom_bar(position = "dodge")
Created on 2019-01-15 by the reprex package (v0.2.1)

Resources