How to make a single histogram from 3 columns in R? [duplicate] - r

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 5 years ago.
This is my first time using ggplot2. I have a table of 3 columns and I want to plot frequency distribution of all three columns in one figure. I have only used hist() before so I am a little lost on this ggplot2. Here is an example of my table. Tab separated table with 3 columns A,B,C headers.
A B C
1.38502 1.38502 -nan
0.637291 0.753084 1.55556
0.0155242 0.0164394 -nan
3.29355 1.15757 -nan
1.00254 1.10108 0.132039
0.0155424 0.0155424 nan
0.760261 0.681639 0.298851
1.21365 1.21365 -nan
1.216 1.22541 -nan
0.61317 0.738528 0.585657
0.618276 0.940312 0.820591
1.96779 1.31051 1.58609
0.725413 2.29621 1.78989
0.684681 0.67331 0.290221
I have used the following code by looking up similar posts but I end up with error.
library(ggplot2)
dnds <- read.table('dNdS_plotfile', header =TRUE)
ggplot(data=dnds, melt(dnds), aes_(value, fill = L1))+
geom_histogram()
ERROR:No id variables; using all as measure variables
Error: Mapping should be created with aes() or aes_().
I am really lost on how to solve this error. I want one figure with three different colored histograms that do not overlap in my final figure. Please help me achieve this. Thank you.

This should accomplish what you're looking for. I like to load the package tidyverse, which loads a bunch of helpful packages like ggplot2 and dplyr.
In geom_histogram(), you can specify the bindwidth of the histograms with the argument binwidth() or the number of bins with bins(). If you also want the bars to not be stacked you can use the argument position = "dodge".
See the documentation here: http://ggplot2.tidyverse.org/reference/geom_histogram.html
library(tidyverse)
data <- read.table("YOUR_DATA", header = T)
graph <- data %>%
gather(category, value)
ggplot(graph, aes(x = value, fill = category)) +
geom_histogram(binwidth = 0.5, color = "black")

Related

Issues creating a line chart in R. Group aesthetic error [duplicate]

This question already has an answer here:
ggplot each group consists of only one observation
(1 answer)
Closed 8 months ago.
Below is the sample data. Trying to have two lines with different colors. Seems pretty simple but running into the below error. Have two questions. First, how do I get around this error. Second, how would I edit the legend to where it says "Hires" instead of "HI".
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
library(ggplot2)
measure <- c("HI","HI","HI","HI","HI","JO","JO","JO","JO","JO")
date <- c("2002-01","2002-02","2002-03","2002-04","2002-05","2002-01","2002-02","2002-03","2002-04","2002-05")
value <- c(100,105,95,145,110,25,35,82,75,90)
df <- data.frame(measure,date,value)
graph <- df %>% ggplot (aes(x=date, y= value, color = measure)) + geom_line () + theme (legend.position = "bottom",legend.title = element_blank())
print(graph)
It's asking for a group, so you can give it a group:
ggplot(aes(x=date, y=value, group=measure, color=measure))
It's a bit surprising that it's not already grouped, and I'm not exactly sure why, but the above change appears to produce the result you want:
If you're interested in why it's asking for a group, I'd recommend simplifying and reformatting your example, and then asking as a separate question.

R: creating a likert scale barplot

I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:

Reordering variables in stacked bar chart according to value ggplot2 [duplicate]

This question already has answers here:
Stacked barchart, independent fill order for each stack
(3 answers)
Closed 3 years ago.
Updated with sample data:
mydata <- read.table(header=TRUE, text="
Item Value Site
A 96 site1
B 1 site1
C 2 site1
A 1 site2
B 62 site2
A 19 site3
B 1 site3
C 11 site3
D 9 site3
")
What I'm trying to do is plot a stacked bar chart and reorder the item variable differently for each site, sorting it by the value column. So for each site, the item with the biggest percentage value is at the bottom of the stacked bar, followed by the next largest percentage value and so on. However, I've tried different methods and had difficulty arranging by the stacked bars using the value column.
Edit with solution:
follow the link marked as the answer above, plot each bar individually using geom_bar and add in the reorder function - aes(fill=reorder(Item, +Value))
Typically when I need to reorder a character variables for display, its easiest to convert the variable to a factor. Factors are a bit notorious in R for causing problems, but many of these can be easily overcome with help from the forcats package. In fact, the entire purpose of the package is to more easily handle working with factors in R.
The function forcats::fct_reorder is for reordering a factor by another variable (based on median values by default). In this particular instance, we are converting the substrate column into a factor, and reordering the substrate factor based on percentage...all in a single call.
library(ggplot2)
library(forcats)
ggplot(data = mydata,
aes(x = site,
y = percentage,
fill = fct_reorder(substrate, percentage))) +
geom_bar(stat = "identity") +
guides(fill = guide_legend(title = "substrate"))
Which gives the following:
I prefer to call the fct_reorder call within my ggplot call, as this will not require changing the underlying data frame.
If you would like to read more about forcats, I suggest this tidyverse site, or if you want to learn more about factors within R in general, start with the chapter on factors in R for Data Science.
Note
This question has since changed since my original response. I believe an adequate answer can be found to the linked question here
I think that forcats is a great way, but before we had that - or when you want to just use base R without libraries, this is more of the traditional approach:
mydata$substrate <- factor(mydata$substrate, levels = unique(mydata$substrate[order(mydata$percentage)]))
ggplot(data=mydata, aes(x=site, y=percentage), fill= Substrate) +
geom_bar(stat="identity",aes(fill=substrate))
This will work for you
ggplot(data=my_data, aes(x=site, y=percentage, fill=reorder(substrate, percentage))) +
geom_bar(stat="identity", position="stack")
by using reorder command in the fill argument.

same bar width in ggplot2? [duplicate]

This question already has answers here:
A way to always dodge a histogram? [duplicate]
(2 answers)
Closed 8 years ago.
In this example:
library(ggplot2)
dat <- data.frame(a=factor(c(1,1,1,2,2,2,3,3,3,4)), b=c("A","B","D","A","B","C","A","B","D",NA), c=c(1,4,3,5,5,1,2,2,8,6))
plot <- ggplot(dat,aes(fill=b,x=a,y=c))
plot + geom_bar(width=.7, position=position_dodge(width=.7), stat = "identity")
factor 4 is wider than the other bars. Is there a way to make them all the same width?
Ideally you should have data for every combination even if it is zero. That means, with 1 in data$a you should have data all the four(A,B,C,D) and so on... try modifying your data frame like this and plot. NA category was referred to as "other" here.
library(ggplot2)
dat <- data.frame(a=factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)),
b=c("A","B","C","D","other","A","B","C","D","other","A","B","C","D","other","A","B","C","D","other"),
c=c(1,4,0,3,0,5,5,1,0,0,2,2,0,8,0,0,0,0,0,6))
plot <- ggplot(dat,aes(fill=b,x=a,y=c))
plot + geom_bar(width=.7, position=position_dodge(width=.7), stat = "identity")
View this dataframe you will know the difference. You will obviously have missing bars corresponding to your data, which dnt look good. But im afraid this might be the only solution.

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

Resources