Create Lollipop-like plot with R - r

I have a .csv file that looks like that:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021
95952794,62.36,62.06,55.56,51,60.35,44.27
95952795,100,100,100,100,100,100
95952833,0,0,-,0,-,-
95952846,0,0,-,0,0,-
95952876,0,-,0,0,0,0
95952877,38.89,28.98,25.67,36.99,37.91,16.86
95952878,100,100,100,100,100,100
95952884,0,-,0,-,-,0
95952897,18.7,20.52,20.94,16.43,22.68,12.55
95952898,100,100,75,80,-,100
95952902,10.88,8.93,10.22,10.63,13.51,6.06
95952903,100,100,100,75,-,100
95952915,10.75,8.7,7.91,8.35,15.12,8.88
What I want is to create a plot that is similar to this one:
http://www.scfbm.org/content/9/1/11/figure/F2
However, all my attempts failed. Unfortunately, the tool is yet not available and I cannot read the source code.
I've thought of ggplot and melt, but I do not come close to this graph. How can I achieve that all read samples (ReadsME_016,ReadsME_017,..) are listed on the x-axes and the positions are listed on the y-axes? I don’t know how to deal with both x- & y-axes being categorical while the plotted values should show percentages?
dataset <- melt(dataset, id.vars="Pos")
ggplot(dataset, aes(x=value, y=Pos, colour=variable)) + geom_point()
Here is the complete .csv file:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021,ReadsME_022,ReadsME_023,ReadsME_024,ReadsME_025,ReadsME_026,ReadsME_027,ReadsME_028,ReadsME_030,ReadsME_031,ReadsME_032
95952794,62.36,62.06,55.56,51.0,60.35,44.27,53.73,61.69,57.04,64.16,61.48,59.42,66.93,49.71,55.23,66.67
95952795,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952833,0.0,0.0,-,0.0,-,-,100.0,-,-,-,-,0.0,-,-,0.0,-
95952846,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,-,-,0.0,-,-,-,-
95952876,0.0,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-
95952877,38.89,28.98,25.67,36.99,37.91,16.86,29.65,35.38,35.43,36.87,34.04,33.91,35.04,19.09,38.35,0.0
95952878,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952884,0.0,-,0.0,-,-,0.0,-,-,100.0,-,-,0.0,-,-,-,-
95952897,18.7,20.52,20.94,16.43,22.68,12.55,18.3,22.28,21.05,22.55,24.81,20.63,22.05,13.06,22.8,0.0
95952898,100.0,100.0,75.0,80.0,-,100.0,80.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952902,10.88,8.93,10.22,10.63,13.51,6.06,9.62,15.73,14.08,18.65,13.28,16.44,15.02,8.92,11.11,100.0
95952903,100.0,100.0,100.0,75.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,100.0,100.0,-
95952915,10.75,8.7,7.91,8.35,15.12,8.88,7.32,9.76,11.45,8.99,10.57,14.07,10.36,6.35,10.04,0.0
95952916,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,-,100.0,-
95952925,10.39,8.33,8.59,10.51,14.19,10.99,6.98,11.56,13.93,15.0,14.29,16.26,9.76,5.86,12.96,0.0
95952926,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952937,19.53,14.97,11.97,14.43,19.26,17.18,19.48,12.31,21.17,21.57,23.08,26.24,16.38,13.47,21.82,0.0
95952938,100.0,100.0,100.0,100.0,-,100.0,100.0,-,-,-,-,-,-,-,100.0,-
95952825,-,0.0,-,-,-,-,-,-,-,-,0.0,-,-,0.0,0.0,-
95952975,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952669,-,-,0.0,-,-,0.0,0.0,-,-,-,-,-,-,-,0.0,-
95952718,-,-,0.0,0.0,0.0,-,0.0,-,-,-,0.0,-,-,0.0,0.0,-
95952868,-,-,0.0,-,0.0,-,-,0.0,-,-,0.0,-,-,-,-,-
95952957,-,-,0.0,-,-,-,-,0.0,0.0,0.0,-,0.0,-,-,-,-
95952976,-,-,0.0,-,0.0,0.0,0.0,100.0,-,0.0,-,-,-,-,0.0,-
95952681,-,-,-,0.0,-,0.0,-,0.0,-,-,-,-,-,0.0,-,-
95952779,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952811,-,-,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-,-,-,0.0,-
95952821,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952823,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952859,-,-,-,0.0,0.0,-,-,0.0,0.0,-,0.0,-,-,0.0,0.0,-
95952882,-,-,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-
95953023,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95953058,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95952664,-,-,-,-,-,0.0,0.0,-,-,0.0,-,-,-,-,0.0,-
95952801,-,-,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-
95952968,-,-,-,-,-,-,0.0,-,-,0.0,-,-,-,-,-,-
95952797,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952851,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952894,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952807,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-,-
95952712,-,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-

First, you want to make sure you are reading in your data properly. You have non-numeric values (specifically "-") mixed in with numeric values. I'm assuming those are missing values. Make sure you let R know that with na.strings="-". Then, to get something more consistent with the example plot, i changed your variables around
library(reshape2) # for melt()
library(ggplot2) # for ggplot()
dataset <- read.table("file.txt", header=TRUE, sep=",", na.strings="-")
ggplot(melt(dataset, id.vars="Pos"),
aes(x=Pos, y=variable, colour=cut(value, breaks=5))) +
geom_point()

Related

Creating a multi-panel plot of a data set grouped by two grouping variables in R

I'm trying to solve the following exercise:
Make a scatter plot of the relationship between the variables 'K1' and 'K2' with "faceting" based on the parameters 'diam' and 'na' (subdivide the canvas by these two variables). Finally, assign different colors to the points depending on the 'thickness' of the ring (don't forget to factor it before). The graph should be similar to this one ("grosor" stands by "thickness"):
Now, the last code I tried with is the following one (the dataset is called "qerat"):
ggplot(qerat, aes(K1,K2, fill=factor(grosor))) + geom_point() + facet_wrap(vars(diam,na))
¿Could somebody give me a hand pointing out where the mistake is? ¡Many thanks in advance!
Maybe you are looking for a facet_grid() approach. Here the code using a data similar to yours:
library(ggplot2)
#Data
data("diamonds")
#Plot
ggplot(diamonds,aes(x=carat,y=price,color=factor(cut)))+
geom_point()+
facet_grid(color~clarity)
Output:
In the case of your code, as no data is present, I would suggest next changes:
#Code
ggplot(qerat, aes(K1,K2, color=factor(grosor)))+
geom_point() +
facet_grid(diam~na)

graphing multiple data series in R ggplot

I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:
qplot( date , NO3, data=qual.arn)
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)
and received this error.
Error in add_ggplot(e1, e2, e2name) :
argument "e2" is missing, with no default
I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.
ggplot(date=qual.no3.s, aes(date,NO3))
Error: ggplot2 doesn't know how to deal with data of class uneval
PLEASE HELP. Thank you!
Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.
# set up minimum reproducible example
set.seed(1) # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))
ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.
library(ggplot2)
library(reshape2) # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) +
geom_point() + labs(x=NULL)
The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.
# The wrong way: plot two sets of points
ggplot() +
geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
scale_color_manual(name="Component",values=c("red", "blue")) +
labs(x=NULL, y="Concentration")

selecting only some facets to print in facet_wrap, ggplot2

my question is very simple, but I have failed to solve it after many attempts. I just want to print some facets of a facetted plot (made with facet_wrap in ggplot2), and remove the ones I am no interested in.
I have facet_wrap with ggplot2, as follows:
#anomalies linear trends
an.trends <- ggplot()+
geom_smooth(method="lm", data=tndvilong.anomalies, aes(x=year, y=NDVIan, colour=TenureZone,
group=TenureZone))+
scale_color_manual(values=miscol) +
ggtitle("anomalies' trends")
#anomalies linear trends by VEG
an.trendsVEG <- an.trends + facet_wrap(~VEG,ncol=2)
print(an.trendsVEG)
And I get the plot as I expected (you can see it in te link below):
anomalies' trends by VEG
The question is: how do I get printed only the facest I am interested on?
I only want to print "CenKal_ShWoodl", "HlShl_ShDens", "NKal_ShWoodl", and "ThShl_ShDens"
Thanks
I suggest the easiest way to do that is to simply give ggplot() an appropriate subset. In this case:
facets <- c("CenKal_ShWoodl", "HlShl_ShDens", "NKal_ShWoodl", "ThShl_ShDens")
an.trends.sub <- ggplot(tndvilong.anomalies[tndvilong.anomalies$VEG %in% facets,])+
geom_smooth(method="lm" aes(x=year, y=NDVIan, colour=TenureZone,
group=TenureZone))+
scale_color_manual(values=miscol) +
ggtitle("anomalies' trends") +
facet_wrap(~VEG,ncol=2)
Obviously without your data I can't be sure this will give you what you want, but based on your description, it should work. I find that with ggplot, it is generally best to pass it the data you want plotted, rather than finding ways of changing the plot itself.

introducing a gap in continuous x axis using ggplot

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap for Sep.
The only things I have come up trying are treating the missing month as zero or NA, creating a hugh drop in the area chart for Sep or excluding it but with an x axis ranging from 1-11, respectively (see plots in dropbox folder).
The data set Im working on can be found in my dropbox folder and it's named r_class.txt and you can also see the two different plots (Rplots1 and 2).
Any ideas would really be appreciated!
Plot the series as two separate data frames:
#Load libraries
require(ggplot2)
require(reshape)
#Code copied from your linked post:
wa=read.table('wa_class.txt', sep="", header=F, na.string="0")
names(wa)=c("Class","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
wam=melt(wa)
wam$variablen=as.numeric(wam$variable)
#For readability, split the melted data frame into two separate data frames
wam1 <- wam[wam$variablen %in% 1:6,]
wam2 <- wam[wam$variablen %in% 8:12, ]
ggplot() +
geom_area(data=wam1, aes(x=variablen, y=value, fill=Class)) +
geom_area(data=wam2, aes(x=variablen, y=value, fill=Class))
#and add lineranges, etc., accordingly

R: Plot multiple box plots using columns from data frame

I would like to plot an INDIVIDUAL box plot for each unrelated column in a data frame. I thought I was on the right track with boxplot.matrix from the sfsmsic package, but it seems to do the same as boxplot(as.matrix(plotdata) which is to plot everything in a shared boxplot with a shared scale on the axis. I want (say) 5 individual plots.
I could do this by hand like:
par(mfrow=c(2,2))
boxplot(data$var1
boxplot(data$var2)
boxplot(data$var3)
boxplot(data$var4)
But there must be a way to use the data frame columns?
EDIT: I used iterations, see my answer.
You could use the reshape package to simplify things
data <- data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100))
library(reshape)
meltData <- melt(data)
boxplot(data=meltData, value~variable)
or even then use ggplot2 package to make things nicer
library(ggplot2)
p <- ggplot(meltData, aes(factor(variable), value))
p + geom_boxplot() + facet_wrap(~variable, scale="free")
From ?boxplot we see that we have the option to pass multiple vectors of data as elements of a list, and we will get multiple boxplots, one for each vector in our list.
So all we need to do is convert the columns of our matrix to a list:
m <- matrix(1:25,5,5)
boxplot(x = as.list(as.data.frame(m)))
If you really want separate panels each with a single boxplot (although, frankly, I don't see why you would want to do that), I would instead turn to ggplot and faceting:
m1 <- melt(as.data.frame(m))
library(ggplot2)
ggplot(m1,aes(x = variable,y = value)) + facet_wrap(~variable) + geom_boxplot()
I used iteration to do this. I think perhaps I wasn't clear in the original question. Thanks for the responses none the less.
par(mfrow=c(2,5))
for (i in 1:length(plotdata)) {
boxplot(plotdata[,i], main=names(plotdata[i]), type="l")
}

Resources