trying to plot ranges of dates - r

I have 19 tags which were deployed and reported at different times throughout the summer and fall. Currently I am trying to create a plot to display the times of deployment and reporting so that I can visualize where there is overlap in data collection. I have tried several different plotting functions including plot(), boxplot(), and ggplot(). I have gotten close to what I want with boxplot() but would like the box to extend from the start to the end date and eliminate the whiskers entirely. Is there a way to do this or should I use a different function or package? Here is my code, it probably isn't the most efficient since I'm somewhat new to R.
note: tnumber are just the tag numbers I used. The dates were all taken from different data sets.
dep.dates=boxplot(t62104[,8],t40636[,8],t84337[,8],t84353[,8],t62103[,8],
t110289[,8],t62102[,8],t62105[,8],t62101[,8],t84360[,8],
t117641[,8],t40643[,8],t110291[,8],t84338[,8],t110290[,8],
t84363[,8],t117639[,8],t117640[,8],t117638[,8],horizontal=T,
main='Tag deployment and pop-up dates',xlab='Month',
ylab='Tag number',names=c('62104','40636','84337','84353',
'62103','110289','62102','62105','62101','84360','117641',
'40643','110291','84338','110290','84363','117639','117640',
'117638'),las=1)

Something like this will work if all you care about is ranges.
require(ggplot2)
require(SpatioTemporal)
data(mesa.data.raw)
require(data.table)
out <- as.data.table(t(apply(mesa.data.raw$obs, 2, function(.v){
names(.v)[range(which(!is.na(.v)))]
})),keep=TRUE)
setnames(out, "rn", "monitors")
ggplot(out, aes(x=monitors, y=V1, ymin=V1, ymax=V2,)) + geom_crossbar() + coord_flip()
ggplot(out, aes(x=monitors, ymin=V1, ymax=V2)) + geom_linerange() + coord_flip()
The first ggplot call creates horizonal bars but I can't figure out how to get rid of the center line so I just put it at the start.
The second plot creates horizontal lines, which I think looks better anyway.

Related

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

text annotation to a graph in ggplot

I am drawing a PC plot using ggplots.
I know this question has been answered in some previous posts but I could not still solve my problem.
I have a data set called tab which is the output of PCA
sample.id pop EV1 EV2
HT185_MK8-2.sort.bam HA_27 -0.03796869 0.046369552
HT48_SD1A-37.sort.bam HA_14 0.04208393 0.032961404
HT53_IA1A-10.sort.bam HA_1 -0.02580365 0.005262476
HT260_MK1-4.sort.bam HA_20 -0.06090545 0.005578504
HT170_SD2W-14.sort.bam HA_17 0.01288395 0.012117833
Q093_MK7-13.sort.bam HA_26 0.06310162 0.188558067
I want to add labels on each dot in the plot, theses dots are individuals from several populations. So I want to give them their population ID (pop column in the data set).
I am using something this
ggplot(data=tab,aes(EV1,EV2, label=tab[,2])) + geom_point(aes(color=as.factor(pop))) + ylab("Principal component 2") + xlab("Principal component 1")
But I do not get my desired output.
This is my PC plot!
So could anyone help me to add population label on each dot in the plot!
Thanks
Try geom_text:
geom_text(aes(label=as.character(pop)),hjust=0,vjust=0)
Also consider looking into plotly, or setting a threshold on the labels, because labeling every point will lead to a very crowded plot, and probably very little additional useful information.

How to not graph the extreme outliers in a boxplot?

I have an R script that uses a csv file as it's source data to create sixteen separate boxplots. Each of the sixteen boxplots have varying y-axis scales, which makes it difficult to apply a general ylim statment to the script. I tried using the coor_cartesian function with the ylim statement as well as the scale_y_continuous function, but again, that was too general to apply across sixteen boxplots with varying y-axis scales (I do not want to normalize the scales across the sixteen boxplots, only plots with 'extreme' outliers).
Below is the snipet of data I used to create the sixteen box plots. 'SE_Data' is the csv source file I noted above. I should also mention that the sixteen boxplots are exported as a single pdf file (I don't know if this level of detail is needed or not).
# Enter csv input file:
SE_Data<-read.csv("SE_DATA.csv",header=T)
# Enter output file name:
pdf(file="SE_Box_Plots.pdf", onefile=TRUE)
x=c("A","B","C","D","E","F","G","H")
SE_Data$ACO_Desc <- factor(SE_Data$ACO_Desc , x) #Ensures x-axis is ordered from A through H
#Creates sixteen individual boxplots
for (i in 5:ncol(SE_Data)) {
p<-ggplot(SE_Data, aes(x=Group_Desc, y=SE_Data[,i])) + geom_boxplot() +
ylab(gsub("\\_", " ", colnames(SE_Data)[i])) +
xlab("") +
theme(axis.text.x=element_text(angle = 0))
print(p)
}
dev.off()
dev.list()
I wasn't sure if I would need to create an IF ELSE statment to solve this problem, however, as a someone who is still fairly new to R, this appears to be well above my skill level. Below, I included two of the sixteen boxplots to illustrate how their y-axis scales differ from eachother.
Box Plot 1:
Box Plot 2:
As you can see from the two boxplots, they both have very different y-axis scales. In my opinion 'boxplot 2' looks fine, however, 'boxplot 1' contains extreme outliers. I would to develop a piece of code that could remove these extreme values in order to reduce the amount of 'dead space' on the boxplot; thus, lowering the scale of the y-axis and making it more appealing to the eye.
It's important to stress that I still want outliers to be included in my boxplots, however, I want to remove only the extreme outliers. If you need any more information from my end please be sure to let me know.
I can't reproduce your graph without the data but including
geom_boxplot( outlier.shape=NA )
should hide the outliers. You can manually adjust the yscale with
scale_y_continuous(limits=c(-5, 1)) # or whatever values you want to use.

R: how to make multiple plots from one CSV, grouping by a column

I'd like to put multiple plots onto a single visual output in R, based on data that I have in a CSV that looks something like this:
user,size,time
fred,123,0.915022
fred,321,0.938769
fred,1285,1.185608
wilma,5146,2.196687
fred,7506,1.181990
barney,5146,1.860287
wilma,1172,1.158015
barney,5146,1.219313
wilma,13185,1.455904
wilma,8754,1.381372
wilma,878,1.216908
barney,2974,1.223852
I can read this just fine, using, e.g.:
data = read.csv('data.csv')
For the moment, a fairly simple plot is fine, so I'm just trying plot(), without much to it (setting type='o' to get lines and points), and' from solving a past problem, I know that I can do, e.g., the following, to get data for just fred:
plot(data$time[which(data$user == 'fred')], data$size[which(data$user == 'fred')], type='o')
What I'd like, though, is to have the data for each user all showing up on one set of axes, with color coding (and a legend to match users to colors) to identify different user data.
And if another user shows up, I'd like another line to show up, with another color (perhaps recycling if I have too many users at once).
However, just this doesn't do it:
plot(data$size, data$time, type='o',col=c("red", "blue", "green"))
Because it doesn't seem to group by the user.
And just this:
plot(data, type='o')
gives me an error:
Error in plot.default(...) :
formal argument "type" matched by multiple actual arguments
This:
plot(data)
does do something, but not what I want.
I've poked around, but I'm new enough to R that I'm not quite sure how best to search for this, nor where to look for examples that would hit a use-case like this.
I even got somewhat closer with this:
plot(data$size[which(data$user == 'wilma')], data$time[which(data$user == 'wilma')], type='o', col=c('red'))
lines(data$size[which(data$user == 'fred')], data$time[which(data$user == 'fred')], type='o', col=c('green'))
lines(data$size[which(data$user == 'barney')], data$time[which(data$user == 'barney')], type='o', col=c('blue'))
This gives me a plot (which I'd post inline, but as a new user, I'm not allowed to yet):
not-quite-right plot
which is kind of close to what I want, except that it:
doesn't have a legend
has ugly axis labels, instead of just time and size
is scaled to the first plot, and thus is missing data from some of the others
isn't sorted by x-axis, which I could do externally, though I'm guessing I could do it fairly easily in R.
So, the question, ultimately, is this:
What's an easy way to plot data like this which:
has multiple lines based on the labels in the first column of the CSV
uses the same set of axes for the data in columns 2 and 3, regardless of the label
has a legend and color-coding for which label is being used for a particular line (or set of points)
will adapt to adding new labels to the data file, hopefully without change to the R code.
Thanks in advance for any help or pointers on this.
P.S. I looked around for similar questions, and found one that's sort of close, but it's not quite the same, and I failed to figure out how to adapt it to what I'm trying to do.
Good question. This is doable in base plot, but it's even easier and more intuitive using ggplot2. Below is an example of how to do this with random data in ggplot2
First download and install the package
install.packages("ggplot2",repos='http://cran.us.r-project.org')
require(ggplot2)
Next generate the data
a <- c(rep('a',3),rep('b',3),rep('c',3))
b <- rnorm(9,50,30)
c <- rep(seq(1,3),3)
dat <- data.frame(a,b,c)
Finally, make the plot
ggplot(data=dat, aes(x=c, y=b , group=a, colour=a)) + geom_line() + geom_point()
Basically, you are telling ggplot that your x axis corresponds to the c column (dat$c), your y axis corresponds to the b column (y$b) and to group (draw separate lines) by the a column (dat$a). Colour specifies that you want to group colour by the a column as well.
The resulting graph looks like this:

How to plot one column vs the rest in R

I have a data set where the [,1] is time and then the next 14 are magnitudes. I would like to scatter plot all the magnitudes vs time on one graph, where each different column is gridded (layered on top of one another)
I want to use the raw data to make these graphs and came make them separately but would like to only have to do this process once.
data set called A, the only independent variable is time (the first column)
df<-data.frame(time=A[,1],V11=A[,2],V08=A[,3],
V21=A[,4],V04=A[,5],V22=A[,6],V23=A[,7],
V24=A[,8],V25=A[,9],V07=A[,10],xxx=A[,11],
V26=A[,12],PV2=A[,13],V27=A[,14],V28=A[,15],
NV1=A[,16])
I tried the code mentioned by #VlooO but it scrunched the graphs making them too hard to decipher and each had its own axes. All my graphs can be on the same axes just separated by their headings.
When looking at the ggplots I Think that would be a perfect program for what I want.
ggplot(data=df.melt,aes(x=time,y=???))
I confused what my y should be since I want to reference each different column.
Thanks R community
Hope i understand you correctly:
df<-data.frame(time=rnorm(10),A=rnorm(10),B=rnorm(10),C=rnorm(10))
par(mfrow=c(length(df)-1,1))
sapply(2:length(df), function(x){
plot(df[,c(1,x)])
})
The result would be
here some hints since you don't provide a reproducible example , neither you show what you have tried :
Use list.files to go through all your documents
Use lapply to loop over the result of the previous step and read your data
Put your data in the long format using melt from reshape2 and the variable time as id.
Use ggplot2 to plot using the variable as aes color/group.
library(ggplot2)
library(reshape2)
invisible(lapply(list.files(pattern=...),{
dt = read.table(x)
dt.l = melt(dt,id.vars='time')
print(ggplot(dt.l)+geom_line(aes(x=time,y=value,color=variable))
}))
If you don't need ggplot2, then the matplot function for base graphics can be used to do what you want in one command.
SOLUTION:
After looking through a bunch more problems and playing around a bit more with ggplot2 I found a code that works pretty great. After I made my data frame (stated above), here is what i did
> df.m<- melt(df,"time")
ggplot(df.m, aes(time, value, colour = variable)) + geom_line() +
+ facet_wrap(~ variable, ncol = 2)
I would post the image but I don't have enough reputation points yet.
I still don't really understand why "value" is placed into the y position in aes(time, value,...) If anyone could provided an explanation that would be greatly appreciated. My last question is if anyones knows how to make the subgraphs titles smaller.
Can I use cex.lab=, cex.main= in ggplot2?

Resources