Improve the autoBinning Histogram - r

I'm doing auto Binning Histogram for my second time, but it looks elementary. I'm seeking help to improve it.
what I have tried is
> DAta <- read.table(text="Species DNA LINE LTR SINE Helitron Unclassified Unmasked
+ darius 2.68 10.37 18.00 1.52 3.64 0.03 63.79
+ Derian 2.74 10.59 16.61 1.56 4.24 0.03 64.23
+ rats 2.77 10.97 15.20 1.57 4.69 0.03 64.77
+ Mouos 2.53 10.42 17.33 1.42 3.68 0.02 64.6", header=TRUE)
> library(reshape2)
> DF1 <- melt(DF, id.var="Rank")
> DF1 <- melt(DAta, id.var="Species")
> library(ggplot2)
> ggplot(DF1, aes(x = Species, y = value, fill = variable)) +
+ geom_bar(stat = "identity")
Output:
How can I make the species name in Italic?
The order of the histogram should be as the same as the input? start from left to right (darius, Derian, rats and Mouos)
Colours and style to look better and reasonable.

There are 3 questions here:
To change the axis labels to italics, one needs adjust the
x.axis.text, see the question/answers referenced at the bottom.
To change the ordering of the axis labels, you need to specify the
variable Species as a factor variable defining the desire order of
the levels.
Finally, to change the color scheme, use the
scale_fill_ function. I like the colorBrewer package with several good color schemes available. There
are few other define scale_fill options available.
Note: this a barchart and not a histogram.
See the comments for additional details:
DAta <- read.table(text="Species DNA LINE LTR SINE Helitron Unclassified Unmasked
darius 2.68 10.37 18.00 1.52 3.64 0.03 63.79
Derian 2.74 10.59 16.61 1.56 4.24 0.03 64.23
rats 2.77 10.97 15.20 1.57 4.69 0.03 64.77
Mouos 2.53 10.42 17.33 1.42 3.68 0.02 64.6", header=TRUE)
#updated method to reshape data. tidyr is replacement for reshape2
library(tidyr)
library(tidyr)
DF1 <- pivot_longer(DAta, cols=-1, names_to = "Classification", values_to = "Value" )
#Set Species as factors defining the order of the labels
DF1$Species<-factor(DF1$Species, levels=c("darius", "Derian", "rats", "Mouos"))
library(ggplot2)
ggplot(DF1, aes(x = Species, y = Value, fill = Classification)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Pastel1") +
theme(axis.text.x = element_text(face="italic"))
Option: If the number of columns or the naming of the columns can change then here is a potential option for maintaining the proper ordering of the Species names:
#retrieves column names from original dataframe the 2nd to the end
# assumes the columns are "Species" and then only the species names
DF1$Species<-factor(DF1$Species, levels= names(DAta)[-1])
To adjust the axis labels here is a good reference:
Changing font size and direction of axes text in ggplot2

Related

ggplot2 - include one level of a factor in all facets

I have some time series data that is facet wrapped by a variable 'treatment'. One of the levels of this 'treatment' factor the a negative control & I want to include it in every facet.
For example using R dataset 'Theoph':
data("Theoph")
head(Theoph)
Subject Wt Dose Time conc
1 1 79.6 4.02 0.00 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.57 6.57
4 1 79.6 4.02 1.12 10.50
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
Theoph$Subject <- factor(Theoph$Subject, levels = unique(Theoph$Subject)) # set factor order
ggplot(Theoph, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ Subject)
How could I include the data corresponding to Subject '1' (the control) to be included in each facet? (And ideally removing the facet that contains Subject 1's data alone.)
Thank you!
To have a certain subject appear in every facet, we need to replicate it's data for every facet. We'll create a new column called facet, replicate the Subject 1 data for each other value of Subject, and for Subject != 1, set facet equal to Subject.
every_facet_data = subset(Theoph, Subject == 1)
individual_facet_data = subset(Theoph, Subject != 1)
individual_facet_data$facet = individual_facet_data$Subject
every_facet_data = merge(every_facet_data,
data.frame(Subject = 1, facet = unique(individual_facet_data$facet)))
plot_data = rbind(every_facet_data, individual_facet_data)
library(ggplot2)
ggplot(plot_data, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ facet)

Plot data from different csv files in one graph using R + ggplot

I have multiple .csv files, every on of this has a column (called: Data) that I want to compare with each other. But first, I have to group the values in a column of each file. In the end I want to have multiple colored "lines" with the mean value of each group in one graph. I will describe the process I use to get the graph I want below. This works for a single file but I don't know how to add multiple "lines" of multiple files in one graph using ggplot.
This is what I got so far:
data = read.csv(file="my01data.csv",header=FALSE, sep=",")
A single .csv File looks like the following, but without the headline
ID Data Range
1,63,5.01
2,61,5.02
3,65,5.00
4,62,4.99
5,62,4.98
6,64,5.01
7,71,4.90
8,72,4.93
9,82,4.89
10,82,4.80
11,83,4.82
10,85,4.79
11,81,4.80
After getting the data I group it with the following lines:
data["Group"] <- NA
data[(data$Range>4.95), "Group"] <- 5.0
data[(data$Range>4.85 & data$Range<4.95), "Group"] <- 4.9
data[(data$Range>4.75 & data$Range<4.85), "Group"] <- 4.8
The final data looks like this:
myTable <- "ID Data Range Group
1 63 5.01 5.00
2 61 5.02 5.00
3 65 5.00 5.00
4 62 4.99 5.00
5 62 4.98 5.00
6 64 5.01 5.00
7 71 4.90 4.90
8 72 4.93 4.90
9 72 4.89 4.90
10 82 4.80 4.80
11 83 4.82 4.80
10 85 4.79 4.80
11 81 4.80 4.80"
myData <- read.table(text=myTable, header = TRUE)
To plot this dataframe I use the following lines:
( pplot <- ggplot(data=myDAta, aes(x=myDAta$Group, y=myDAta$Data))
+ stat_summary(fun.y = mean, geom = "line", color='red')
+ xlab("Group")
+ ylab("Data")
)
Which results in a graph like this:
I assume you have the names of your .csv-files stored in a vector named file_names. Then you can run the following code and should get a different line for each file:
library(ggplot2)
data_list <- lapply(file_names, read.csv , header=FALSE, sep=",")
data_list <- lapply(seq_along(data_list), function(i){
df <- data_list[[i]]
df$Group <- round(df$Range, 1)
df$DataNumber <- i
df
})
finalTable <- do.call(rbind, data_list)
finalTable$DataNumber <- factor(finalTable$DataNumber)
ggplot(finalTable, aes(x=Group, y=Data, group = DataNumber, color = DataNumber)) +
stat_summary(fun.y = mean, geom = "line") +
xlab("Group") +
ylab("Data")
How it works
First the different datasets are read with read.csv into a list data_list. Then each data.frame in that list is assigned a Group.
I used round here with k=1, which means it rounds to one decimal point (I figured that's what your are doing).
Then also a unique number (in this case simply the index of the list) is assigned to each data.frame. After that the list is combined to one data.frame with rbind and then DataNumber is turned into a factor (prettier for plotting). Finally I added DataNumber as a group and color variable to the plot.
You can add another line by using stat_summary again; you can define the data and aes argument to any other dataset:
#some pseudo data for testing
my_other_data <- myData
my_other_data$Data <- my_other_data$Data * 0.5
pplot <- ggplot(data=myData, aes(x=Group, y=Data)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
stat_summary(data=my_other_data, aes(x=Group, y=Data),
fun.y = mean, geom = "line", color='green') +
xlab("Group") +
ylab("Data")
pplot
Why not creating a classifying column ("Class")
myTable1$Class <- "table1"
myTable1
"ID Data Range Group Class
1 63 5.01 5.00 table1
2 61 5.02 5.00 table1
3 65 5.00 5.00 table1"
myTable2$Class <- "table2"
myTable2
"ID Data Range Group Class
1 63 5.01 5.00 table2
2 61 5.02 5.00 table2
3 65 5.00 5.00 table2"
And merging dataframe
dfBIND <- rbind(myTable1, MyTable2)
So that you can ggplot with a grouping or coloring variable
pplot <- ggplot(data=dfBIND, aes(x= dfBIND$Group, y= dfBIND$Data, group=Class)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
xlab("Group") +
ylab("Data")

R: Stacked bar graph with two orders of grouping and three columns of data

My colleague and I are trying to create a stacked bar graph that is first grouped by RIL (on the x-axis), then by Trt, where the Trt (treatments) are clumped together and distinguished by colour. We also wish to label each of the bar graphs within each cluster by the treatment Trt.
The stacked bar graphs represent the calculated mean of SW_Before and SW_After (notice there are in the sample data there is one RIL, number 206, that has more than one row of data).
I originally thought to combine the two columns of data SW_Before and SW_After, however the control treatments of Trt do not contain data for SW_Before and SW_After but nevertheless must be included in the graph. Thus, a third column of data from SW_Total is graphed for each of the control clusters by RIL
I am relatively new to R as well as the realm of data organization so please excuse my amateur capabilities.
Below is a reproducible sample of my data:
data1 <- read.table (text= "Plant RIL Trt SW_Before SW_After SW_Total
1 85 206 Early 0.380 2.27 2.65
2 88 166 Early 0 0.311 0.311
3 92 Lindo Early 0 0.663 0.633
4 94 158 Early 0.0738 0.596 0.669
5 95 23 Early 0.0252 0.543 0.795
6 97 Lica Early 0 0.646 0.646
7 104 166 Peak 0.227 0.261 0.488
8 108 Lica Peak 0.0705 0.816 0.887
9 113 Lindo Late 0.628 0 0.628
10 115 206 Late 0.544 1.05 1.60
11 115 206 Control NA NA 1.50", sep="", header=T)
I realize this graph is more difficult to create than I imagined so any assistance/direction will be most appreciated.
EDIT:
I am now trying to graph the average variable (which includes SW_Total, SW_Before and SW_After) by RIL and Trt. This is my code:
melted1 <- melt(data.baSW, id=c("Plant", "RIL", "Trt"))
melted1 <- subset(melted1, RIL %in% c("158", "166", "206", "23", "Licalla", "Lindo"))
melted1 %>%
group_by(Trt, RIL, variable) %>%
mutate(mean.SW_Total = mean(value)) %>%
ggplot(aes(x = RIL, y = mean.SW_Total, fill = variable)) +
geom_bar(stat = 'identity', position = 'stack') + facet_grid(~ Trt)
EDIT 2
I have upgraded my code in respond to my EDIT #1. I believe this is the correct code but verification would be nice.
melted1 %>%
ggplot(aes(x = RIL, y = value, fill = variable)) +
geom_bar(stat = 'summary', position = 'stack', fun.y = "mean") + facet_grid(~ Trt)
I am not 100% sure I have interpreted your question correctly but I think this is close to what you want, adjusted from here.
library(reshape2) # for melt
library(tidyverse)
# convert all total values to 0 except that for the control ...
data1 <- data1 %>%
mutate(SW_Total = ifelse(Trt != "Control", 0, SW_Total))
#convert to long format
melted <- melt(data1, id=c("Plant","RIL","Trt"))
ggplot(melted, aes(x = RIL, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'stack') + facet_grid(~ Trt)

How to generate summary information and error bars in R

I have a set of data:
COL1 COL2
1 3.45
2 8.48
1 2.53
2 9.42
2 2.56
etc.
COL1 specifies a category, whereas COL2 is data. I'd like to, for each distinct value in COL1 generate mean, stddev, min & max values. So in the end have something like (not real numbers):
COL1VAL MEAN STDDEV
1 4.59 1.24
2 4.75 1.20
I'd also then like to generate a bar chart with error bars, with X axis being the COL1VAL and bar height being the mean.
Can one do this in R, and if so, how?
Here's how you could do those things using packages dplyr and ggplot2, assuming your data frame is called df.
library(dplyr)
dfsummary <- df %>%
group_by(COL1) %>%
summarise_each(funs(mean, sd, min, max))
dfsummary
#Source: local data frame [2 x 5]
#
# COL1 mean sd min max
#1 1 2.99 0.6505382 2.53 3.45
#2 2 6.82 3.7190859 2.56 9.42
library(ggplot2)
ggplot(dfsummary, aes(x = factor(COL1), y = mean)) +
geom_bar(stat = "identity", fill = "lightblue") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd))
If you prefer to stay in base R, you could use tapply and arrows:
head(chickwts, 15) # chicken growth depending on food#
means <- tapply(X=chickwts$weight, INDEX=chickwts$feed, FUN=mean)
sds <- tapply(X=chickwts$weight, INDEX=chickwts$feed, FUN=sd )
or <- order(means)
bp <- barplot(means[or], ylim=c(0, 390), las=2)
arrows(x0=bp, y0=(means+sds)[or], y1=(means-sds)[or],
code=3, angle=90, length=0.1)
Regards,
Berry

Plot many histograms using a for loop in R

I have a .csv file with data like this:
RI Na Mg Al Si K Ca Ba Fe Type
1 1.51793 12.79 3.50 1.12 73.03 0.64 8.77 0.00 0.00 BWF
2 1.51643 12.16 3.52 1.35 72.89 0.57 8.53 0.00 0.00 VWF
3 1.51793 13.21 3.48 1.41 72.64 0.59 8.43 0.00 0.00 BWF
4 1.51299 14.40 1.74 1.54 74.55 0.00 7.59 0.00 0.00 TBL
5 1.53393 12.30 0.00 1.00 70.16 0.12 16.19 0.00 0.24 BWNF
6 1.51655 12.75 2.85 1.44 73.27 0.57 8.79 0.11 0.22 BWNF
I want to create histograms for the distribution of each of the columns.
I've tried this:
data<-read.csv("glass.csv")
names<-(attributes(data)$names)
for(name in names)
{
dev.new()
hist(data$name)
}
But i keep getting this error: Error in hist.default(data$name) : 'x' must be numeric
I'm assuming that this error is because attributes(data)$names returns a set of strings, "RI" "Na" "Mg" "Al" "Si" "K" "Ca" "Ba" "Fe" "Type"
But I'm unable to convert them to the necessary format.
Any help is appreciated!
You were close. I think you were also trying to get Type at the end.
data<-read.csv("glass.csv")
# names<-(attributes(data)$names)
names<-names(data)
classes<-sapply(data,class)
for(name in names[classes == 'numeric'])
{
dev.new()
hist(data[,name]) # subset with [] not $
}
You could also just loop through the columns directly:
for (column in data[class=='numeric']) {
dev.new()
hist(column)
}
But ggplot2 is designed for multiple plots. Try it like this:
library(ggplot2)
library(reshape2)
ggplot(melt(data),aes(x=value)) + geom_histogram() + facet_wrap(~variable)
Rather than drawing lots of histograms, a better solution is to draw one plot with histograms in panels.
For this, you'll need the reshape2 and ggplot2 packages.
library(reshape2)
library(ggplot2)
First, you'll need to convert your data from wide to long form.
long_data <- melt(data, id.vars = "Type", variable.name = "Element")
Then create a ggplot of the value argument (you can change the name of this by passing value.name = "whatever" in the call to melt above) with histograms in each panel, split by each element.
(histograms <- ggplot(long_data, aes(value)) +
geom_histogram() +
facet_wrap(~ Element)
)
hist(data$name) looks for a column named name, which isn't there. Use hist(data[,name]) instead.

Resources