problem display grouped boxplot with ggplot2 - r

I have problem with my grouped boxplot, they are not display as boxplot, i can't understand why.
This is what i did:
library(ggplot2)
require(reshape2)
args = commandArgs(trailingOnly=TRUE)
data <- read.table(args[1], header=T, sep="\t")
data.m <- melt(data, id.var = "dijarn")
ggplot(data = data.m, aes(x=dijarn, y=value)) + geom_boxplot(aes(fill=variable))
exemple of my data :
> head(data.m)
dijarn variable value
1 dijarn043 ATP5PB 2230.746
2 dijarn044 ATP5PB 2501.788
3 dijarn045 ATP5PB 2067.263
4 dijarn046 ATP5PB 4060.777
5 dijarn047 ATP5PB 3075.087
6 dijarn048 ATP5PB 2892.501
i have 37 dijarn and here 5 variable.
then it's a lot but i think ggplot2 can handle it ?
i tried to chance size of my image but i didn't change anything. did i forget some option ?
thanks for your help

Are you trying to create 5 boxplots one for each level of the variable or 37 boxplots one for each level of dijarn? You can't create a boxplot of one observation. Boxplots reflect the IQR (quartiles of a range)
try factoring the variable you want a boxplot on
data$dijarn <- as.factor(data$dijarn)

Related

R: creating a likert scale barplot

I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:

How to draw a basic histogram with X and Y axis in R

I want to make a simple histogram which involves two vectors ,
values <- c(1,2,3,4,5,6,7,8)
freq <- c(4,6,4,4,3,2,1,1)
df <- data.frame(values,freq)
Now the data.farame df consists the following values :
values freq
1 4
2 6
3 4
4 4
5 3
6 2
7 1
8 1
Now I want to draw a simple histogram, in which values are on the x axis and freq is on y axis. I am trying to use the hist function, but I am not able to give two variables. How can I make a simple histogram from this data?
using ggplot2:
library(ggplot2)
ggplot(df, aes(x = values, y = freq)) +
geom_bar(stat="identity")
Since you have the frequencies already, what you really want is a bar plot:
barplot(df$freq,names.arg=df$values)
If you've got your heart set on using hist, you should do:
hist(rep(df$values,df$freq))
Please read ?barplot and ?hist for further plotting options.
Also, because I'm somewhat of a zealot, I think the code looks cleaner if you use data.table:
library(data.table)
setDT(df) #convert df to a data.table by reference
df[,barplot(freq,names.arg=values)]
and
df[,hist(rep(values,freq))]

Barplot with continuous x axis using base r graphics

I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.

How to plot histogram with means calculated by factor levels from multiple columns

I am new to R and may be my question looks silly, I spent half of the day trying to solve it on my own with no luck. I've found no tutorial which illustrates how to do it, and if you know such tutorial you're welcome. I want to plot a histogram with means calculated by factors from columns. My initial data looks like this (simplified version):
code_group scale1 scale2
1 5 3
2 3 2
3 5 2
So I need histogram where each bean colored by code_group and it's value is mean for each level from code_group, x-axis with labels scale1 and scale2. Every label contains three beans (for three levels of factor code_group). I've managed to calculate means for each level on my own, it looks like this:
code_group scale1 scale2
1 -1.0270270 0.05405405
2 -1.0882353 0.14705882
3 -0.7931034 -0.34482759
but I have no idea how to plot it in historgam! Thanks in advance!
Assuming you mean bar chart and not histogram (please clarify your question if this isn't the case), you can melt your data and plot it with ggplot like this:
library(ggplot2)
library(reshape2)
##
mdf <- melt(
df,
id.vars="code_group",
variable.name="scale_type",
value.name="mean_value")
##
R> ggplot(
mdf,
aes(x=scale_type,
y=mean_value,
fill=factor(code_group)))+
geom_bar(stat="identity",position="dodge")
Data:
df <- read.table(
text="code_group scale1 scale2
1 -1.0270270 0.05405405
2 -1.0882353 0.14705882
3 -0.7931034 -0.34482759",
header=TRUE)
Edit:
You could just make the modifications to the data itself (or a copy of it) like below:
mdf2 <- mdf
mdf2$code_group <- factor(
mdf2$code_group,
levels=1:3,
labels=c("neutral",
"likers",
"lovers"))
names(mdf2)[1] <- "group"
##
ggplot(
mdf2,
aes(x=scale_type,
y=mean_value,
fill=group))+
geom_bar(stat="identity",position="dodge")
##
Given the mean values you provided, you could do something like this:
To recreate your simplified dataset:
d=data.frame(code_group=c(1,2,3),scale1=c(-1.02,-1.08,-0.79),scale2=c(0.05,.15,-0.34))
To create your graph:
barplot(c(d[,'scale1'],d[,'scale2']),col=d[,'code_group'],names.arg=c(paste('scale1',unique(d[,'code_group']),sep='_'),paste('scale2',unique(d[,'code_group']),sep='_')))
This will give you the following graph:

working with 3 columns of data in ggplot2: x, y1, and y2 into a stacked bar plot

I have 3 column data. The first column, depth, should be on the x axis. The other two columns are nr and r. I need to plot the data in a stacked barplot with A on the bottom and B on the top of nr. The data is very large (ie. the read depth goes from 0 to 1022), so I can't type everything out specifically in r or on here. Here's an example of what the data would look like:
Depth r nr
6 2395 2904
8 0 3095
9 2689 0
12 3894 3578
15 5 4739
the r and the nr have to be on the y axis, and the depth has to be on the x axis. I've tried everything I can think of and am unable to get a 'height' to use or to just get the basic equation.
Work in long format
#using reshape2::melt
library(reshape2)
# assuming your original data.frame is called `D`
longD <- melt(D, id.var = 1)
ggplot(longD, aes(x = Depth, y = value, colour = variable, fill = variable)) +
geom_bar(stat = 'identity')
Using barchart from lattice you can deal with wide format :
library(lattice)
barchart(r+nr~factor(Depth),data=dt,stack=TRUE,auto.key=TRUE)
equivalent to this , using long format from #mnel answer:
barchart(value~factor(Depth),data=longD,
groups=variable,stack=TRUE,auto.key=TRUE)
Just to show base R graphics can match it as well, and assuming your data.frame is called dat:
barplot(
t(dat)[2:3,],
names.arg=t(dat)[1,],
space=c(0,diff(t(dat)[1,])),
axis.lty=1
)

Resources