I hope this question isn't a duplicate. I tried to find answers per the site's requirements before posting, but since I am so new, the help forums are too foreign to me.
Following Wickham's R for data visualization, I easily used geom_point for an integrated data set, mpg:
simple reference code:
ggplot(data = mpg)+
geom_smooth(mapping = aes(x=displ, y=hwy))+
geom_point(mapping = aes(x=displ, y=hwy))
Excited by this cool plot, I tried to do the same for some personal research data, which describes inteferon-beta production over five time points (A,b,c,d,e instead of numerical data).
I used the same code, essentially:
ggplot(data = ifnonly)+
geom_smooth(mapping = aes(x=HOURS, y=IFNB))+
geom_point(mapping = aes(x=HOURS, y=IFNB))
Unfortunately, the line does not display. In fact, nothing displays until I add the geom_point function. What am I missing here? Is there more complex code required or is there some subtlety that I can apply to future uses of this function and ggplot?
I think you should get your desired output with following one line code
library(ggplot2)
ggplot(mtcars, aes(disp,mpg))+geom_smooth() # one line code where I have mentioned data is mtcars , and disp as x axis and mpg as y axis you could get following output
# please check this link for output
o/p without geom_point
library(ggplot2)
ggplot(mtcars, aes(disp,mpg))+geom_smooth()+geom_point()
o/p with geom_point
Related
I am just starting to work with R, so apologies if my question is too basic,
I have an excel sheet , here's the link: https://file.io/LfsAOdDCVnFq
where I am trying to plot a simple bar plot as follows:
X = I want it to be my sample names , the column called OTU ID in the file
Y = I want it to be the sum of my variables for each sample, column called Sum ZOTUs in the file
so far, I have installed and called library of ggplot2 and tried to plot my data frame but when I do that it only shows one bar, and I don't know what is wrong
install.packages("readxl")
install.packages("ggplot2")
library(readxl)
library(ggplot2)
ZOTU <- read_excel(file.choose())
ggplot(data=ZOTU, aes(x="OTU ID")) + geom_bar ()
and it shows the plot below:
can anyone help how to fix this?
Thanks
I can't see your uploaded image with the excel sheet screenshot.
My guess would be using quotation marks instead of backticks. Try running this code:
ggplot(data = ZOTU, aes(x = `OTU ID`)) + geom_bar()
First
Your question can be better formulated, please read how to ask a good question and how to create a minimal example to understand the basics of a workable question.
In R, you have a very good tool for creating reproducible examples: the reprex package
Also, I would not download anything from a given link in a random question in StackOverflow, and neither should you.
Try
Execute this code in your computer, and see if it helps you understand how ggplot works:
library(ggplot2) # load ggplot
mpg # let's look at a 'mpg' data included in the ggplot package
# Now, a simple bar plot
ggplot(mpg, aes(x = fl)) +
geom_bar()
We use the mpg data as the data for our figure, and we set the x-axis to be the fl column of that data. Finally, we "add" a bar plot to the figure.
By default, the bar plot will plot the count of the different values present in the column you passed as x-axis.
After comments
Following our discussion in the comment section, maybe this is what you want.
If you have the names (discrete variable) for the x-axis in a column, and another column with the variable you want to sum and plot in y for each name, try:
ggplot(data = mpg) +
geom_col(aes(x = manufacturer, y = hwy))
You can have the values with the code
library(tidyverse)
mpg %>% group_by(manufacturer) %>% summarize(total = sum(hwy))
So for your case, if you have a column with the names you want in the x-axis, and another with the values you want the code to sum for each name, use
ggplot(data = your_data_frame) +
geom_col(aes(x = your_names, y = values_to_be_summed_for_each_name))
I want to make an area plot with ggplot(mpg, aes(x=year,y=hwy, fill=manufacturer)) + geom_area(), but I get this:
I'm realy new in R world, can anyone explain why it does not fill the area between the lines? Thanks!
First of all, there's nothing wrong with your code. It's working as intended and you are correct in the syntax required to do what you are looking to do.
Why don't you get the area geom to plot correctly, then? Simple answer is that you don't have enough points to draw a proper line between your x values for all of the aesthetics (manufacturers). Try the geom_point plot and you'll see what I mean:
ggplot(mpg, aes(x=year,y=hwy)) + geom_point(aes(color=manufacturer))
You need a different dataset. Here's a dummy one that is simply two lines with different slopes. It works as expected because each of the aesthetics has y values which span the x labels:
# dummy dataset
df <- data.frame(
x=rep(1:10,2),
y=c(seq(1,10,length.out=10), seq(1,5,length.out=10)),
z=c(rep('A',10), rep('B', 10))
)
# plot
ggplot(df, aes(x,y)) + geom_area(aes(fill=z))
I am an R beginner (first semester - we us this programme for univariate statistics) and currently struggling with plotting the outcome of my glm(). I read quite a few threads and help files on the internet, but I have 2 problems: 1) I don't understand the advice because it is too advanced or 2) I understand the advice but when I replicate the code, it doesn't work.
I think I am close to the solution, but my curve doesn't work how it is supposed to. Can anyone tell me what I am doing wrong?
new.data<-data.frame(x=rnorm(50,0,1), y=c("yes", "no"))
mock_model<-glm(y~x, data=new.data, family=binomial)
x1<-seq(min(new.data$x), max(new.data$x), 0.01)
y1<-predict(mock_model, list(x=x1), type="response")
plot(new.data$x, new.data$y, xlab="numeric var", ylab="binary var")
points(x1, y1)
I am new to coding and this platform, so apologies in advance if the information I have provided is not sufficient.
Any advice would be greatly appreciated.
Here's an example using mtcars and the ggplot2 package. The syntax of ggplot2 works roughly like this: You begin a plot with the ggplot() command, within which you can (but don't have to) define aesthetics (the aes() option), which include selection of axis variables, but can also contain options to change the visuals, like colors, linewidths etc. If you define the axis variables within ggplot(), don't forget to put the data assignment (see example below) outside of aes().
Afterwards, you add layers of geoms to plot specific things, like data points with geom_point(), lines with geom_line() or a lot of other fun things. When you want to use the variables and data assigned in the ggplot() command, just leave the geom empty (apart from any visual aes() options you want to use for that specific geom). However, you can define new data and variables for a geom, for example to use different data sources in the same plot.
data(mtcars)
model_shift <- glm(am ~ mpg, data = mtcars, family = 'binomial')
x <- seq(min(mtcars$mpg), max(mtcars$mpg), .1)
y <- predict(model_shift, list(mpg = x), type = 'response')
plot_data <- data.frame(mpg = x, am = y)
library(ggplot2)
ggplot(aes(x = mpg, y = am), data = plot_data) +
geom_point()
Or with a line instead of points:
ggplot(aes(x = mpg, y = am), data = plot_data) +
geom_line()
To get a glimpse of the seemingly endless possibilities of ggplot2, have a look at these 'Top 50' ggplot2 visualizations. To learn the package-specific language, see this tutorial or check your university's library for Hadley Wickham's book ggplot2: elegant graphics for data analysis.
I feel like I am asking a totally silly question, but I can't force ggplot to show the legend for lines colours.
The thing is that I have two data frames with the same data, just the first data.frame represents new data (plus additional numbers) and the second represents the old data. I am trying to compare new and old data, thus to understand which is which I have to see the legend. I have tried to use scale_colour_manual, but it still doesn't appear.
I have read a number of various answers on similar questions and non of them worked or led to a better. You can see a simple example of my problem below:
rm(list = ls())
library(ggplot2)
xnew<-3:10
y<-5:12
xold<-4:11
years<-2000:2007
xfact<-rep("x", times=8)
yfact<-rep("y", times=8)
Newdata<-data.frame(indicator=c(xfact,yfact),Years=c(years,years), data=c(xnew,y))
Olddata<-data.frame(indicator=xfact,Years=c(years), data=xold)
graph<-ggplot(mapping=aes(Years, data, group=1)) +
geom_line(,Newdata[Newdata=="x",], size=1.5, colour="lightblue")+
geom_line(,Olddata[Olddata=="x",], size=1.5, colour="orange")+
ggtitle("OLD vs NEW")+
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
the result is without the legend.
Thanks for all the help I have already found on this website and thank you in advance for helping to solve this problem.
Legends are created in ggplot by mapping aesthetics to a single variable. Your mistake is that you're trying to set colors manually in each layer.
Newdata$type <- "New"
Olddata$type <- "Old"
all_data <- rbind(Newdata,Olddata)
ggplot(data = all_data[all_data$indicator == 'x',],aes(x = Years,y = data,colour = type)) +
geom_line() +
ggtitle("OLD vs NEW") +
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
There are countless examples illustrating this basic technique in ggplot here.
I try to make a barplot of a time-series dataset with ggplot2 but I get following error message (I have performed this on a similar dataset and it works):
Error in if (!is.null(data$ymin) && !all(data$ymin == 0)) warning("Stacking not well defined when ymin != 0", : missing value where TRUE/FALSE needed
For this I have used following code:
p <- ggplot(dataset, aes(x=date, y=value)) + geom_bar(stat="identity")
If I use geom_point() instead of geom_bar() it works fine.
You haven't provided a reproducible example, so I'm just guessing, but your syntax doesn't look right to me. Check here: http://docs.ggplot2.org/current/geom_bar.html
Bar charts by default produce tabulations of counts:
p <- ggplot( dataset, aes( factor(date) ) ) + geom_bar()
If you want it to do something different, you'll need to tell it what statistic to use. See the link above (towards the bottom) for an example using the mean. Alternatively, see here for a hybrid point/scatterplot (very bottom of the page):
http://docs.ggplot2.org/current/position_jitter.html
But fundamentally you have two continuous variables and it's not clear to me why you'd want anything but a scatterplot.