labels right next points in gg plot - r

Ive tried to google my way to the answere to the question, but none seems to give the answer to what im trying to do.
My goal is to add legends right besides the observations within the plot to show the name of the observation. Name of observations are located in the first column of my data frame.
ggplot(data = coef.vec)+aes(x = coef.x, y = variable.mean) +
geom_point()

You can use labels with geom_text() in next style. I have used simulated data:
library(tidyverse)
#Code
data <- data.frame(group=paste0('Obs',1:10),
coef.x=rnorm(10,0,1),
variable.mean=runif(10,0.015,0.05),stringsAsFactors = F)
#Plot
ggplot(data,aes(x=coef.x,y=variable.mean))+
geom_point()+
geom_text(aes(label=group),hjust=-0.15)
Output:

Related

Making a geom_bar from a dataframe in R

Background
I have a dataframe, df, of athlete injuries:
df <- data.frame(number_of_injuries = c(1,2,3,4,5,6),
number_of_people = c(73,52,43,12,7,2),
stringsAsFactors=FALSE)
The Problem
I'd like to use ggplot2 to make a bar chart or histogram of this simple data using geom_bar or geom_histogram. Important point: I'm pretty novice with ggplot2.
I'd like something where the x-axis shows bins of the number of injuries (number_of_injuries), and the y-axis shows the counts in number_of_people. Like this (from Excel):
What I've tried
I know this is the most trivial dang ggplot issue, but I keep getting errors or weird results, like so:
ggplot(df, aes(number_of_injuries)) +
geom_bar(stat = "count")
Which yields:
I've been in the tidyverse reference website for an hour at this and I can't crack the code.
It can cause confusion from time to time. If you already have "count" statistics, then do not count data using geom_bar(stats = "count") again, otherwise you simply get 1 in all categories. You want to plot those values as they are with geom_col:
ggplot(df, aes(x = number_of_injuries, y = number_of_people)) + geom_col()

R ggplot only shows one bar

I am just starting to work with R, so apologies if my question is too basic,
I have an excel sheet , here's the link: https://file.io/LfsAOdDCVnFq
where I am trying to plot a simple bar plot as follows:
X = I want it to be my sample names , the column called OTU ID in the file
Y = I want it to be the sum of my variables for each sample, column called Sum ZOTUs in the file
so far, I have installed and called library of ggplot2 and tried to plot my data frame but when I do that it only shows one bar, and I don't know what is wrong
install.packages("readxl")
install.packages("ggplot2")
library(readxl)
library(ggplot2)
ZOTU <- read_excel(file.choose())
ggplot(data=ZOTU, aes(x="OTU ID")) + geom_bar ()
and it shows the plot below:
can anyone help how to fix this?
Thanks
I can't see your uploaded image with the excel sheet screenshot.
My guess would be using quotation marks instead of backticks. Try running this code:
ggplot(data = ZOTU, aes(x = `OTU ID`)) + geom_bar()
First
Your question can be better formulated, please read how to ask a good question and how to create a minimal example to understand the basics of a workable question.
In R, you have a very good tool for creating reproducible examples: the reprex package
Also, I would not download anything from a given link in a random question in StackOverflow, and neither should you.
Try
Execute this code in your computer, and see if it helps you understand how ggplot works:
library(ggplot2) # load ggplot
mpg # let's look at a 'mpg' data included in the ggplot package
# Now, a simple bar plot
ggplot(mpg, aes(x = fl)) +
geom_bar()
We use the mpg data as the data for our figure, and we set the x-axis to be the fl column of that data. Finally, we "add" a bar plot to the figure.
By default, the bar plot will plot the count of the different values present in the column you passed as x-axis.
After comments
Following our discussion in the comment section, maybe this is what you want.
If you have the names (discrete variable) for the x-axis in a column, and another column with the variable you want to sum and plot in y for each name, try:
ggplot(data = mpg) +
geom_col(aes(x = manufacturer, y = hwy))
You can have the values with the code
library(tidyverse)
mpg %>% group_by(manufacturer) %>% summarize(total = sum(hwy))
So for your case, if you have a column with the names you want in the x-axis, and another with the values you want the code to sum for each name, use
ggplot(data = your_data_frame) +
geom_col(aes(x = your_names, y = values_to_be_summed_for_each_name))

How to incorporate data into plot which was constructed in ggplot2 using data from another file (R)?

Using a dataset, I have created the following plot:
I'm trying to create the following plot:
Specifically, I am trying to incorporate Twitter names over the first image. To do this, I have a dataset with each name in and a value that corresponds to a point on the axes. A snippet looks something like:
Name Score
#tedcruz 0.108
#RealBenCarson 0.119
Does anyone know how I can plot this data (from one CSV file) over my original graph (which is constructed from data in a different CSV file)? The reason that I am confused is because in ggplot2, you specify the data you want to use at the start, so I am not sure how to incorporate other data.
Thank you.
The question you ask about ggplot combining source of data to plot different element is answered in this post here
Now, I don't know for sure how this is going to apply to your specific data. Here I want to show you an example that might help you to go forward.
Imagine we have two data.frames (see bellow) and we want to obtain a plot similar to the one you presented.
data1 <- data.frame(list(
x=seq(-4, 4, 0.1),
y=dnorm(x = seq(-4, 4, 0.1))))
data2 <- data.frame(list(
"name"=c("name1", "name2"),
"Score" = c(-1, 1)))
The first step is to find the "y" coordinates of the names in the second data.frame (data2). To do this I added a y column to data2. y is defined here as a range of points from the may value of y to the min value of y with some space for aesthetics.
range_y = max(data1$y) - min(data1$y)
space_y = range_y * 0.05
data2$y <- seq(from = max(data1$y)-space, to = min(data1$y)+space, length.out = nrow(data2))
Then we can use ggplot() to plot data1 and data2 following some plot designs. For the current example I did this:
library(ggplot2)
p <- ggplot(data=data1, aes(x=x, y=y)) +
geom_point() + # for the data1 just plot the points
geom_pointrange(data=data2, aes(x=Score, y=y, xmin=Score-0.5, xmax=Score+0.5)) +
geom_text(data = data2, aes(x = Score, y = y+(range_y*0.05), label=name))
p
which gave this following plot:

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

Is it possible to create 3 series (2 lines and one point) faceted plot in ggplot?

I am trying to write a code that I wrote with a basic graphics package in R to ggplot.
The graph I obtained using the basic graphics package is as follows:
I was wondering whether this type of graph is possible to create in ggplot2. I think we could create this kind of graph by using panels but I was wondering is it possible to use faceting for this kind of plot. The major difficulty I encountered is that maximum and minimum have common lengths whereas the observed data is not continuous data and the interval is quite different.
Any thoughts on arranging the data for this type of plot would be very helpful. Thank you so much.
Jdbaba,
From your comments, you mentioned that you'd like for the geom_point to have just the . in the legend. This is a feature that is yet to be implemented to be used directly in ggplot2 (if I am right). However, there's a fix/work-around that is given by #Aniko in this post. Its a bit tricky but brilliant! And it works great. Here's a version that I tried out. Hope it is what you expected.
# bind both your data.frames
df <- rbind(tempcal, tempobs)
p <- ggplot(data = df, aes(x = time, y = data, colour = group1,
linetype = group1, shape = group1))
p <- p + geom_line() + geom_point()
p <- p + scale_shape_manual("", values=c(NA, NA, 19))
p <- p + scale_linetype_manual("", values=c(1,1,0))
p <- p + scale_colour_manual("", values=c("#F0E442", "#0072B2", "#D55E00"))
p <- p + facet_wrap(~ id, ncol = 1)
p
The idea is to first create a plot with all necessary attributes set in the aesthetics section, plot what you want and then change settings manually later using scale_._manual. You can unset lines by a 0 in scale_linetype_manual for example. Similarly you can unset points for lines using NA in scale_shape_manual. Here, the first two values are for group1=maximum and minimum and the last is for observed. So, we set NA to the first two for maximum and minimum and set 0 to linetype for observed.
And this is the plot:
Solution found:
Thanks to Arun and Andrie
Just in case somebody needs the solution of this sort of problem.
The code I used was as follows:
library(ggplot2)
tempcal <- read.csv("temp data ggplot.csv",header=T, sep=",")
tempobs <- read.csv("temp data observed ggplot.csv",header=T, sep=",")
p <- ggplot(tempcal,aes(x=time,y=data))+geom_line(aes(x=time,y=data,color=group1))+geom_point(data=tempobs,aes(x=time,y=data,colour=group1))+facet_wrap(~id)
p
The dataset used were https://www.dropbox.com/s/95sdo0n3gvk71o7/temp%20data%20observed%20ggplot.csv
https://www.dropbox.com/s/4opftofvvsueh5c/temp%20data%20ggplot.csv
The plot obtained was as follows:
Jdbaba

Resources