Decrease number of x-axis ticks (labels) in barchart

Decrease number of x-axis ticks (labels) in barchart - r

I am using barchart from the lattice package. I have time series data going back 10 years, and I would like the x-axis to be displayed in the format %b-%Y, in six month intervals. This is trivially accomplished in xyplot (given vector of dates DateVector) with syntax such as:
scales=list(x=list(format = "%b-%Y",tick.number = length(DateVector)/2))
barchart ignores the tick.number option for factors by design, however, so the x axis becomes unreadable for large number of data labels. How can I reduce the number of ticks and/or tick labels?
Here is a simple example that reproduces my problem, with data following (save the data as Testrr.csv to run, apologies if this is the wrong format, it's my first time ;)
library(lattice)
inptTrans <- read.csv("Testrr.csv")
inptTrans$NotherTime <- as.Date(as.character(inptTrans$TransDateS),"%m/%d/%Y")
xyp2 <- barchart (NumE+NumF~ NotherTime, data=inptTrans, main = sprintf("Total")
,type='r',xlab = '',ylab='',col=c('red','black')
,horizontal=FALSE
,scales=list(x=list( rot=45,cex=1.0 ,
labels=format(inptTrans$NotherTime,"%b-%Y"),tick.number=2)
)
,key=list(text = list(c("Num F","Num E"))
,rectangle=list(col=c('black','red') ),columns = 2
,corner = c(0.05,-0.11),lty = c(1),lwd=3)
,stack=TRUE
)
png(sprintf('Testrr.png'),width = 900, height = 750)
print(xyp2)
dev.off()
Testrr.csv:
TransDateS,NumTot,NumF,NumE
01/15/2003,339486,18478,293879
02/15/2003,343761,16430,295272
03/15/2003,413700,17924,356004
04/15/2003,432741,18260,377046
05/15/2003,465439,18721,406632
06/15/2003,490699,18276,424773
07/15/2003,507818,18354,440237
08/15/2003,506530,17391,437386
09/15/2003,481039,17809,411568
10/15/2003,483364,19507,412902
11/15/2003,392189,16308,329675
12/15/2003,444249,18342,367489
01/15/2004,364662,20009,308456
02/15/2004,389718,16894,329946
03/15/2004,511386,20207,434052
04/15/2004,525563,19867,452646
05/15/2004,538570,17557,463289
06/15/2004,620868,20386,535523
07/15/2004,581368,20844,500511
08/15/2004,580773,19687,495930
09/15/2004,523992,21322,442304
10/15/2004,507288,18907,427265
11/15/2004,489571,19135,409818
12/15/2004,510904,21213,419328
01/15/2005,419417,21930,351257
02/15/2005,446402,19600,374778
03/15/2005,581109,22655,486727
04/15/2005,574275,21263,487661
05/15/2005,617399,21033,524466
06/15/2005,688876,26999,584782
07/15/2005,610606,20712,516127
08/15/2005,670746,23171,562919
09/15/2005,601594,23769,499257
10/15/2005,537107,21057,446620
11/15/2005,517340,22378,421351
12/15/2005,507610,22002,403392
01/15/2006,409802,25032,334607
02/15/2006,429196,23383,350498
03/15/2006,567056,27528,461193
04/15/2006,502403,24994,415797
05/15/2006,578793,27765,480468
06/15/2006,603134,31641,501012
07/15/2006,507408,25796,423143
08/15/2006,557500,31435,460225
09/15/2006,475884,30448,386443
10/15/2006,469092,31648,385751
11/15/2006,432720,36363,349832
12/15/2006,422369,30619,335706
01/15/2007,365297,38993,302488
02/15/2007,372276,34855,308599
03/15/2007,455525,39038,377113
04/15/2007,431043,36892,363764
05/15/2007,473539,42371,401959
06/15/2007,483341,44540,408415
07/15/2007,441046,43735,373058
08/15/2007,446111,48509,375242
09/15/2007,341554,45252,283456
10/15/2007,365869,55746,304820
11/15/2007,333946,57399,274292
12/15/2007,309551,51137,252391
01/15/2008,270806,70554,229161
02/15/2008,289606,66395,245970
03/15/2008,328369,65904,279960
04/15/2008,353531,78510,308293
05/15/2008,375080,81284,328119
06/15/2008,390034,80175,342170
07/15/2008,378648,85689,334255
08/15/2008,363756,83357,321556
09/15/2008,350942,79301,309385
10/15/2008,330164,75568,293867
11/15/2008,252408,67388,222510
12/15/2008,296037,65057,261619
01/15/2009,222048,69380,201462
02/15/2009,247591,77532,224236
03/15/2009,305516,57897,277544
04/15/2009,322308,65778,295160
05/15/2009,335134,78854,305599
06/15/2009,386702,93204,353285
07/15/2009,391393,91153,358731
08/15/2009,361150,77043,329656
09/15/2009,360568,79893,327923
10/15/2009,374886,90766,339969
11/15/2009,342502,77895,305321
12/15/2009,330770,81255,300899
01/15/2010,238132,94309,220064
02/15/2010,261558,81167,241123
03/15/2010,363432,103052,335172
04/15/2010,385535,100984,355852
05/15/2010,381303,95408,350883
06/15/2010,416893,94812,376190
07/15/2010,297810,94574,278054
08/15/2010,313494,98536,291501
09/15/2010,301782,110154,280312
10/15/2010,279117,80204,260114
11/15/2010,270178,67700,250713
12/15/2010,301323,67770,279877
01/15/2011,232283,78998,219762
02/15/2011,236917,64257,223213
03/15/2011,312669,80275,295648
04/15/2011,317720,77881,301463
05/15/2011,332220,76017,315955
06/15/2011,364962,79413,347253
07/15/2011,320199,67812,305785
08/15/2011,352555,74725,337283
09/15/2011,316858,82664,303357
10/15/2011,293425,69684,281407
11/15/2011,279320,73735,267671
12/15/2011,295369,70498,282722
01/15/2012,238417,73051,229405
02/15/2012,267105,65015,256719
03/15/2012,328104,63709,315019
04/15/2012,330508,58175,318258
05/15/2012,369418,65886,356174
06/15/2012,361304,59515,348708
07/15/2012,305613,49700,295448
08/15/2012,227541,35801,219883

The argument tick.number is really only a suggestion about placing tick marks. In this case, using the at argument is likely to produce more consistent results.
If you want to plot every sixth label, first create a vector to index this sequence:
okLabs <- seq(1, nrow(inptTrans), by = 6)
okLabs
[1] 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115
Then you can pass okLabs to the at argument and use it to subset your labels (here the scales list is assigned to an object that you can pass to the scales argument):
scalesList <- list(x = list(rot = 45, cex = 1.0,
labels = format(inptTrans$NotherTime, "%b-%Y")[okLabs], at = okLabs))
barchart (NumE+NumF~ NotherTime, data=inptTrans, main = sprintf("Total")
,type='r',xlab = '',ylab='',col=c('red','black')
,horizontal=FALSE
,scales= scalesList
,key=list(text = list(c("Num F","Num E"))
,rectangle=list(col=c('black','red') ),columns = 2
,corner = c(0.05,-0.11),lty = c(1),lwd=3)
,stack=TRUE
)
NOTE that this code won't work well when there are missing (NA) values in the data you're trying to plot. For that, you might try omitting the rows with missing data and creating a new data.frame from that subset.

Related

Continuous data binning based on observation distribution/frequency to decide bin range r dplyr

I have now for days without luck scanned the internet for help on this issue. Any suggestions would be highly appreciated! (especially in a tidyverse-friendly syntax)
I have a tibble with approx. 4300 rows/obs and 320 columns. One column is my dependent variable, a continuous numeric column called "RR" (Response Ratios). My goal is to bin the RR values into 10 factor levels. Later for Machine Learning classification.
I have experimented with the cut() function with this code:
df <- era.af.Al_noNaN %>%
rationalize() %>%
drop_na(RR) %>%
mutate(RR_MyQuantile = cut(RR,
breaks = unique(quantile(RR, probs = seq.int(0,1, by = 1 / numbers_of_bins))),
include.lowest = TRUE))
But I have no luck, because my bins come out with equal n in each, however, that does not reflect the distribution of the data.. I have studied a bit here https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b but I simply cannot achieve the same in R.
Here is the distribution of my RR data values grouped into classes *not what I want

You can try hist() to get the breaks. It's for plotting histograms but it also provides other associated data as side effect. In the example below, the plot is suppressed by plot = FALSE to expose the breaks data. Then, use that in cut(). This should give you the cutoffs, maintaining the distribution of the variable.
hist(iris$Sepal.Length, breaks = 5, plot = FALSE)
# $breaks
# [1] 4 5 6 7 8
#
# $counts
# [1] 32 57 49 12
#
# ...<omitted>
breaks <- hist(iris$Sepal.Length, breaks = 5, plot = FALSE)$breaks
dat <- iris %>%
mutate(sepal_length_group = cut(Sepal.Length, breaks = breaks))
dat %>%
count(sepal_length_group)
# sepal_length_group n
# 1 (4,5] 32
# 2 (5,6] 57
# 3 (6,7] 49
# 4 (7,8] 12

Thank you!
I also experimented using cut() and then count(). Then I use the labels=FALSE to give labels that can be used in a new mutate for a new column with character names of the intervals groups..
numbers_of_bins = 10
df <- era.af.Al_noNaN %>%
rationalize() %>%
drop_na(RR) %>%
mutate(RR_MyQuantile = cut(RR,
breaks = unique(quantile(RR, probs = seq.int(0,1, by = 1 / numbers_of_bins))),
include.lowest = TRUE))
head(df$RR_MyQuantile,10)
df %>%
group_by(RR_MyQuantile) %>%
count()

Median and Boxplot (R)

I am writing to your forum because I do not find solution for my problem. I am trying to represent graphically the Median catching time (MCT) of mosquito that we (my team and I) have collected (I am currently in an internship to study the malaria in Ivory Coast). The MCT represents the time for which 50% of the total malaria vectors were caught on humans.
For example, we collected this sample:
Hour of collection / Mosquitoes number:
20H-21H = 1
21H-22H = 1
22H-23H = 2
23H-00H = 2
00H-01H = 13
01H-02H = 10
02H-03H = 15
03H-04H = 15
04H-05H = 8
05H-06H = 10
06H-07H = 6
Here the effective cumulated is 83 mosquitoes. And I am assuming that the median of this mosquito serie is 83+1/2 = 42 (And I don't even find this number on R), inducing a Median catching time at 2 am (02).
Therefore, I have tried to use the function "boxplot" with different parameters, but I cannot have what I want to represent. Indeed, I have boxes for each hour of collection when I want the representation of the effective cumulated over the time of collection. And the time use in R is "20H-21H" = 20, "21H-22H" = 21 etc.
I have found an article (Nicolas Moiroux, 2012) who presents the Median Catching Time and a boxplot that I should like to have. I copy the image of the cited boxplot:
Boxplot_Moiroux2012
Thank you in advance for your help, and I hope that my grammar is fine (I speak and write mainly in French, my mother tongue).
Kind Regards,
Edouard
PS : And regarding the code I have used with this set of data, here I am (with "Eff" = Number of mosquito and "Heure" = time of collection):
sum(Eff)
as.factor(Heure)
tapply(Eff,Heure,median)
tapply(Heure,Eff,median)
boxplot(Eff,horizontal=T)
boxplot(Heure~Eff)
boxplot(Eff~Heur))
(My skills on R are not very sharp...)

You need to use a trick since you already have counts and not the time data for each catch.
First, you convert your time values to a more continuous variable, then you generate a vector with all the time values and then you boxplot (with a custom axis).
txt <- "20H-21H = 1
21H-22H = 1
22H-23H = 2
23H-00H = 2
00H-01H = 13
01H-02H = 10
02H-03H = 15
03H-04H = 15
04H-05H = 8
05H-06H = 10
06H-07H = 6"
dat <- read.table(text = txt, sep = "=", h = F)
colnames(dat) <- c("collect_time", "nb_mosquito")
# make a continuous numerical proxy for time
dat$collect_time_num <- 1:nrow(dat)
# get values of proxy according to your data
tvals <- rep(dat$collect_time_num, dat$nb_mosquito)
# plot
boxplot(tvals, horizontal = T, xaxt = "n")
axis(1, labels = as.character(dat$collect_time), at = dat$collect_time_num)
outputs the following plot :

Coloring Rarefaction curve lines by metadata (vegan package) (phyloseq package)

First time question asker here. I wasn't able to find an answer to this question in other posts (love stackexchange, btw).
Anyway...
I'm creating a rarefaction curve via the vegan package and I'm getting a very messy plot that has a very thick black bar at the bottom of the plot which is obscuring some low diversity sample lines.
Ideally, I would like to generate a plot with all of my lines (169; I could reduce this to 144) but make a composite graph, coloring by Sample Year and making different types of lines for each Pond (i.e: 2 sample years: 2016, 2017 and 3 ponds: 1,2,5). I've used phyloseq to create an object with all my data, then separated my OTU abundance table from my metadata into distinct objects (jt = OTU table and sampledata = metadata). My current code:
jt <- as.data.frame(t(j)) # transform it to make it compatible with the proceeding commands
rarecurve(jt
, step = 100
, sample = 6000
, main = "Alpha Rarefaction Curve"
, cex = 0.2
, color = sampledata$PondYear)
# A very small subset of the sample metadata
Pond Year
F16.5.d.1.1.R2 5 2016
F17.1.D.6.1.R1 1 2017
F16.1.D15.1.R3 1 2016
F17.2.D00.1.R2 2 2017
enter image description here

Here is an example of how to plot a rarefaction curve with ggplot. I used data available in the phyloseq package available from bioconductor.
to install phyloseq:
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
library(phyloseq)
other libraries needed
library(tidyverse)
library(vegan)
data:
mothlist <- system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq")
mothgroup <- system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq")
mothtree <- system.file("extdata", "esophagus.tree.gz", package = "phyloseq")
cutoff <- "0.10"
esophman <- import_mothur(mothlist, mothgroup, mothtree, cutoff)
extract OTU table, transpose and convert to data frame
otu <- otu_table(esophman)
otu <- as.data.frame(t(otu))
sample_names <- rownames(otu)
out <- rarecurve(otu, step = 5, sample = 6000, label = T)
Now you have a list each element corresponds to one sample:
Clean the list up a bit:
rare <- lapply(out, function(x){
b <- as.data.frame(x)
b <- data.frame(OTU = b[,1], raw.read = rownames(b))
b$raw.read <- as.numeric(gsub("N", "", b$raw.read))
return(b)
})
label list
names(rare) <- sample_names
convert to data frame:
rare <- map_dfr(rare, function(x){
z <- data.frame(x)
return(z)
}, .id = "sample")
Lets see how it looks:
head(rare)
sample OTU raw.read
1 B 1.000000 1
2 B 5.977595 6
3 B 10.919090 11
4 B 15.826125 16
5 B 20.700279 21
6 B 25.543070 26
plot with ggplot2
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, color = sample))+
scale_x_continuous(labels = scales::scientific_format())
vegan plot:
rarecurve(otu, step = 5, sample = 6000, label = T) #low step size because of low abundance
One can make an additional column of groupings and color according to that.
Here is an example how to add another grouping. Lets assume you have a table of the form:
groupings <- data.frame(sample = c("B", "C", "D"),
location = c("one", "one", "two"), stringsAsFactors = F)
groupings
sample location
1 B one
2 C one
3 D two
where samples are grouped according to another feature. You could use lapply or map_dfr to go over groupings$sample and label rare$location.
rare <- map_dfr(groupings$sample, function(x){ #loop over samples
z <- rare[rare$sample == x,] #subset rare according to sample
loc <- groupings$location[groupings$sample == x] #subset groupings according to sample, if more than one grouping repeat for all
z <- data.frame(z, loc) #make a new data frame with the subsets
return(z)
})
head(rare)
sample OTU raw.read loc
1 B 1.000000 1 one
2 B 5.977595 6 one
3 B 10.919090 11 one
4 B 15.826125 16 one
5 B 20.700279 21 one
6 B 25.543070 26 one
Lets make a decent plot out of this
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, group = sample, color = loc))+
geom_text(data = rare %>% #here we need coordinates of the labels
group_by(sample) %>% #first group by samples
summarise(max_OTU = max(OTU), #find max OTU
max_raw = max(raw.read)), #find max raw read
aes(x = max_raw, y = max_OTU, label = sample), check_overlap = T, hjust = 0)+
scale_x_continuous(labels = scales::scientific_format())+
theme_bw()

I know this is an older question but I originally came here for the same reason and along the way found out that in a recent (2021) update vegan has made this a LOT easier.
This is an absolutely bare-bones example.
Ultimately we're going to be plotting the final result in ggplot so you'll have full customization options, and this is a tidyverse solution with dplyr.
library(vegan)
library(dplyr)
library(ggplot2)
I'm going to use the dune data within vegan and generate a column of random metadata for the site.
data(dune)
metadata <- data.frame("Site" = as.factor(1:20),
"Vegetation" = rep(c("Cactus", "None")))
Now we will run rarecurve, but provide the argument tidy = TRUE which will export a dataframe rather than a plot.
One thing to note here is that I have also used the step argument. The default step is 1, and this means by default you will get one row per individual per sample in your dataset, which can make the resulting dataframe huge. Step = 1 for dune gave me over 600 rows. Reducing the step too much will make your curves blocky, so it will be a balance between step and resolution for a nice plot.
Then I piped a left join right into the rarecurve call
dune_rare <- rarecurve(dune,
step = 2,
tidy = TRUE) %>%
left_join(metadata)
Now it will be plottable in ggplot, with a color/colour call to whatever metadata you attached.
From here you can customize other aspects of the plot as well.
ggplot(dune_rare) +
geom_line(aes(x = Sample, y = Species, group = Site, colour = Vegetation)) +
theme_bw()
dune-output
(Sorry it says I'm not allowed to embed the image yet :( )

R: combine mpg trans columns into new dataframe containing two columns

I am working my way through the R for Data Science Manual, currently finishing chapter 3. I am trying to find a way to produce a plot combining the different types of automatic and manual transmission into two plots, instead of what I have currently:
# Install necessary packages
install.packages("tidyverse")
library(tidyverse)
# Create the plot
fuelbytrans <- ggplot(data = mpg) +
geom_jitter(
mapping = aes(x = displ, y = hwy, colour = fl),
size = 0.75) +
# Change labels for title and x and y axes
labs(
title = "Drivstofforbruk iht. datasettet «mpg» fordelt på girkasse og motorvolum",
x = "Motorvolum",
y = "Am. mil per gallon")
# Run it
fuelbytrans
# Set colours and labels for fuel legend and position it on the bottom
# e (etanol), d (diesel), r (regular [bensin, lavoktan]), p (premium [bensin, høyoktan]),
# c (CNG)
cols <- c( #kilde: http://colorbrewer2.org/#type=diverging&scheme=PRGn&n=5
"c" = "yellow",
"d" = "red",
"e" = "black",
"p" = "blue",
"r" = "darkgreen"
)
labels_fuel <- fuelbytrans +
scale_colour_manual(
name = "Drivstoff",
values = cols,
breaks = c("c", "d", "e", "p", "r"),
labels = c("CNG",
"diesel",
"etanol",
"bensin,\nhøyoktan",
"bensin,\nlavoktan")) +
theme(legend.position = "bottom",
legend.background = element_rect(
fill = "gray90",
size = 2,
linetype = "dotted"
))
# Run it
labels_fuel
# Wrap by transmission type
labels_fuel + facet_wrap(~ trans, nrow = 1)
As you can see, what I get is 8 columns for automatic transmission, and two for manual; what I would like is just two columns, one for automatic and one for manual, concatenating the plots. I have presently no idea how to do this, and would appreciate all help.
If any information is missing, should have been written differently, or could otherwise be improved, please advise.
I am running RStudio 0.99.902. I am quite new to R.

You have more than 2 types of transmission in your data:
table(mpg$trans)
# auto(av) auto(l3) auto(l4) auto(l5) auto(l6)
# 5 2 83 39 6
# auto(s4) auto(s5) auto(s6) manual(m5) manual(m6)
# 3 3 16 58 19
You need to group them into 2 groups first, here is one option:
mpg = mpg %>%
mutate(trans2 = if_else(grepl("auto", trans), "auto", "manual"))
table(mpg$trans2)
# auto manual
# 157 77
Then, use the new trans2 variable for facetting (you need to rerun the plot).
Two more comments:
If you want to know more about an R function, call ?function_name in R. This will bring up the help page for that function. It usually includes examples that you can run from R to see what it does in action. (Plus here we are using grepl, so it would also be useful to Google the term "regular expressions", if you are not familiar with them).
Since you are reading r4ds, you need to get familiar with the "pipe operator" used in dplyr, tidyr and other tidyverse packages sooner rather than later. It can chain multiple function calls together in an easily readable way. Google it or take a look here. The call could also be written without the pipe like this:
mpg = mutate(mpg, trans2 = if_else(grepl("auto", trans), "auto", "manual"))
In this particular case, the pipe operator is actually not that useful. I am just so used to it I went for it automatically.

Scatter Plot Matrices

I have a matrix mat[n,m] and I'd like to use splom to plot the scatterplots of mat[,"col4"] as a function of all other column values. Also I'd like to add different colors to points of certain row numbers which are stored in rownID[]. I've seen examples using splom but they plot all variables against all variables and use group of columns to change the color of points. Is it possible to do what I want using splom (or other R function)?
Example:
set.seed(1)
mat <- matrix(sample(0:100, 16), ncol=4)
dimnames(mat) <- list(rownames(mat, do.NULL = FALSE, prefix = "row"),
colnames(mat, do.NULL = FALSE, prefix = "col"))
mat
col1 col2 col3 col4
row1 26 19 58 61
row2 37 86 5 33
row3 56 97 18 66
row4 89 62 15 42
rowID <- matrix(c(1,3), ncol=1, nrow=2)
Thanks to https://stackoverflow.com/a/16033003/1262767
I've been using featurePlot function of caret package but I don't know how to change color of some specific points (that's why I'm interested in splom):
featurePlot(mat, mat$col4, plot = "scatter",
## Add some space between the panels
between = list(x = 1, y = 1), main = "testSet",
## Add a background grid ('g') and a smoother ('smooth')
type = c("g", "p", "s"))

This doesn't really seem like a good fit for splom. I think you're be better off reshaping your data and using a standard xyplot. For example
library(reshape2)
mm<-melt(cbind(data.frame(mat), high=1:nrow(mat) %in% rowID), c("col4","high"))
xyplot(col4~value|variable, mm, groups=high)
which gives

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Decrease number of x-axis ticks (labels) in barchart - r

Related

Continuous data binning based on observation distribution/frequency to decide bin range r dplyr

Median and Boxplot (R)

Coloring Rarefaction curve lines by metadata (vegan package) (phyloseq package)

R: combine mpg trans columns into new dataframe containing two columns

Scatter Plot Matrices

Categories

Resources