how do you print table in knitr - r

I am trying to us knitr to print data frame in table format using xtable:
```{r xtable,fig.width=10, fig.height=8, message=FALSE, results = 'asis', echo=FALSE, warning=FALSE, fig.cap='long caption', fig.scap='short',tidy=FALSE}
print(xtable(d),format="markdown")
```
This is the data frame d:
d <- structure(list(Hostname = structure(c(8L, 8L, 9L, 5L, 6L, 7L,
1L, 2L, 3L, 4L), .Label = c("db01", "db02", "farm01", "farm02",
"tom01", "tom02", "tom03", "web01", "web03"), class = "factor"),
Date = structure(c(6L, 10L, 5L, 3L, 2L, 1L, 8L, 9L, 7L, 4L
), .Label = c("10/5/2015 1:15", "10/5/2015 1:30", "10/5/2015 2:15",
"10/5/2015 4:30", "10/5/2015 8:30", "10/5/2015 8:45", "10/6/2015 8:15",
"10/6/2015 8:30", "9/11/2015 5:00", "9/11/2015 6:00"), class = "factor"),
Cpubusy = c(31L, 20L, 30L, 20L, 18L, 20L, 41L, 21L, 29L,
24L), UsedPercentMemory = c(99L, 98L, 95L, 99L, 99L, 99L,
99L, 98L, 63L, 99L)), .Names = c("Hostname", "Date", "Cpubusy",
"UsedPercentMemory"), class = "data.frame", row.names = c(NA,
-10L))
Any ideas what I am missing here?

Try kable from knitr. It will format the table nicely.
If you would like to use xtable try:
print(xtable(d), type="latex", comment=FALSE)

While Pierre’s solution works, this should ideally happen automatically. Luckily, you can use knitr hooks to make this work.
That is, given this code:
```{r}
d
```
We want knitr to automatically produce a nicely formatted table, without having to invoke a formatting function manually.
Here’s some code I’m using for that. You need to put this at the beginning of your knitr document, or in the code that’s compiling your document:
opts_chunk$set(render = function (object, ...) {
if (pander_supported(object))
pander(object, style = 'rmarkdown')
else if (isS4(object))
show(object)
else
print(object)
})
This uses pander and additionally requires a helper function, pander_supported:
library(pander)
pander_supported = function (object)
UseMethod('pander_supported')
pander_supported.default = function (object)
any(class(object) %in% sub('^pander\\.', '', methods('pander')))
pander.table = function (x, ...)
pander(`rownames<-`(rbind(x), NULL), ...)
For nicer formatting, I also use these defaults:
panderOptions('table.split.table', Inf)
panderOptions('table.alignment.default',
function (df) ifelse(sapply(df, is.numeric), 'right', 'left'))
panderOptions('table.alignment.rownames', 'left')

If you are rendering your knitr/rmarkdown report to HTML, you can use the function rmarkdown::paged_table().
For example:
---
title: "My report"
output: html_document
---
```{r}
library(rmarkdown)
f <- function() {
paged_table(mtcars)
}
f()
```
This .Rmd is knit into the following HTML:
Also, consider using the gt package via gt().

Related

Plotly heatmap with different cell widths

I would like to plot an interactive heatmap, where the column widths are different.
Although I managed to get different cell widths, the widths do not correspond to the values and the ordering is not correct.
The order of the x-axis should remain the same as the segments column in the df data.frame.
If the heatmap doesn't work, I would also be fine with a stacked barchart.
df <- structure(list(
segments = c(101493L, 101493L, 101493L, 101492L, 101492L, 101492L, 101494L, 101494L, 101494L, 102018L, 102018L,
102018L, 102018L, 102018L, 102019L, 102019L, 102019L, 102019L, 102019L),
timestamp = structure(c(1579233600, 1579240800, 1579248000,
1579233600, 1579240800, 1579248000, 1579233600, 1579240800, 1579248000,
1579219200, 1579226400, 1579233600, 1579240800, 1579248000, 1579219200,
1579226400, 1579233600, 1579240800, 1579248000), class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin"),
value = c(91.772, 91.923, 96.968, 104.307, 101.435, 105.539, 104.879, 104.197, 103.038,
96.403, 90.926, 111.807, 115.931, 111.729, 100.129, 86.903, 108.22, 117.841, 112.293),
width = c(5L, 5L, 5L, 2L, 2L, 2L, 3L, 3L, 3L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L)),
row.names = c(1L, 2L, 3L, 11L, 12L, 13L, 21L, 22L, 23L, 31L, 32L, 33L, 34L, 35L,43L, 44L, 45L, 46L, 47L),
class = "data.frame")
library(plotly)
plot_ly(data = df) %>%
add_trace(type="heatmap",
x = ~as.character(width),
y = ~timestamp,
z = ~value,
xgap = 0.2, ygap = 0.2) %>%
plotly::layout(xaxis = list(rangemode = "nonnegative",
tickmode = "array",
tickvals=as.character(unique(df$width)),
ticktext=as.character(unique(df$segments)),
zeroline = FALSE))
By giving Plotly a matrix for the z-values it seems to work and the widths are respected.
df$newx <- rep(cumsum(df[!duplicated(df$segments),]$width), rle(df$segments)$length)
mappdf <- expand.grid(timestamp=unique(df$timestamp), newx=unique(df$newx))
mappdf <- merge(mappdf, df[,c("timestamp","value","newx")], all.x = T, all.y = F, sort = F)
mappdf <- mappdf[order(mappdf$newx, mappdf$timestamp),]
zvals <- matrix(data = mappdf$value,
nrow = length(unique(df$timestamp)),
ncol = length(unique(df$newx)))
plot_ly() %>%
add_heatmap(y = sort(unique(df$timestamp)),
x = c(0,unique(df$newx)),
z = zvals) %>%
plotly::layout(xaxis = list(
title = "",
tickvals=unique(df$newx),
ticktext=paste(unique(df$segments), "-", unique(df$width))
))

Using geom_errorbar in ggplot2 results in "Error: geom_errorbar requires the following missing aesthetics: ymin, ymax"

I wanted to create a visualisation for some data I had collected using ggplot2. Everything works fine except I cannot add error bars for some reasons. The code I used is the following
graph2 <- ggplot(enth_comb, aes(saturated, eocv, color=oil))
graph2 <- graph2 + geom_point()
This worked fine and resulted in the graph I expected. Then I added the following
graph2 <- graph2 + geom_errorbar(aes(ymin = v_lowlim, ymax = v_highlim))
This gives me the error "Error: geom_errorbar requires the following missing aesthetics: ymin, ymax" despite having provided ymin and ymax. I also tried adding an x value and removing 'aes' but it resulted in the same error.
The data is the following
I appreciate any help or suggestions.
Edit: Added output of dput(enth_comb)
structure(list(oil = structure(c(4L, 6L, 3L, 5L, 2L, 1L), .Label = c("coconut",
"palm", "peanut", "rapeseed", "rice", "sunflower"), class = "factor"),
saturated = c(8L, 11L, 17L, 25L, 82L, 88L), sonounsaturated = c(64L,
20L, 46L, 38L, 7L, 12L), Polyunsaturated = c(28L, 69L, 32L,
37L, 11L, 0L), eocv = c(26991L, 26746L, 28817L, 30056L, 20635L,
29497L), eocm = c(31204L, 30892L, 32964L, 34436L, 22979L,
33233L), eocv_error = c(2073L, 602L, 1932L, 5578L, 2128L,
1267L), eocm_error = c(2396L, 695L, 2210L, 6391L, 2369L,
1427L), v_highlim = c(29064L, 27348L, 30749L, 35634L, 22763L,
30764L), v_lowlim = c(24918L, 26144L, 26885L, 24478L, 18507L,
28230L), m_highlim = c(33600L, 31587L, 35174L, 40827L, 25348L,
34660L), m_lowlim = c(28808L, 30197L, 30754L, 28045L, 20610L,
31806L)), class = "data.frame", row.names = c(NA, -6L))
The full solution would be concatening all elements:
ggplot(enth_comb, aes(saturated, eocv, color=oil))+
geom_point()+
geom_errorbar(aes(ymin = v_lowlim, ymax = v_highlim))

What is the best way to use agricolae to do ANOVAs on a split plot design?

I'm trying to run some ANOVAs on data from a split plot experiment, ideally using the agricolae package. It's been a while since I've taken a stats class and I wanted to be sure I'm analyzing this data correctly, so I did some searching online and couldn't really find consistency in the way people were analyzing their split plot experiments. What is the best way for me to do this?
Here's the head of my data:
dput(head(rawData))
structure(list(ï..Plot = 2111:2116, Variety = structure(c(5L,
4L, 3L, 6L, 1L, 2L), .Label = c("Burbank", "Hodag", "Lamoka",
"Norkotah", "Silverton", "Snowden"), class = "factor"), Rate = c(4L,
4L, 4L, 4L, 4L, 4L), Rep = c(1L, 1L, 1L, 1L, 1L, 1L), totalTubers = c(594L,
605L, 656L, 729L, 694L, 548L), totalOzNoCulls = c(2544.18, 2382.07,
2140.69, 2401.56, 2440.56, 2503.5), totalCWTacNoCulls = c(461.76867,
432.345705, 388.535235, 435.88314, 442.96164, 454.38525), avgLWratio = c(1.260615419,
1.287949374, 1.111981583, 1.08647584, 1.350686661, 1.107173509
), Hollow = c(14L, 15L, 22L, 25L, 14L, 13L), Double = c(10L,
13L, 15L, 22L, 11L, 9L), Knob = c(86L, 80L, 139L, 156L, 77L,
126L), Researcher = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Wang", class = "factor"),
CullsPounds = c(1.75, 1.15, 4.7, 1.85, 0.8, 5.55), CullsOz = c(28,
18.4, 75.2, 29.6, 12.8, 88.8), totalOz = c(2572.18, 2400.47,
2215.89, 2431.16, 2453.36, 2592.3), totalCWTacCulls = c(466.85067,
435.685305, 402.184035, 441.25554, 445.28484, 470.50245)), row.names = c(NA,
6L), class = "data.frame")
For these data, the whole plot is Rate, the split plot is Variety, the block is Rep, and for discussion's sake here, we can look at totalCWTacNoCulls as the response.
Any help would be very much appreciated! I am still getting the hang of Stack Overflow, so if I have made any mistakes or shared my data wrong, please let me know and I'll change it. Thank you!
You can do this using agricolae package as follows
library(agricolae)
attach(rawData)
Rate = factor(Rate)
Variety = factor(Variety)
Rep = factor(Rep)
sp.plot(Rep, Rate, Variety, totalCWTacNoCulls)
Usage according to agricolae package is
sp.plot(block, pplot, splot, Y)
where, block is replications, pplot is main-plot Factor, splot is sub-plot Factor and Y response variable

How to run a function against several dataframes and output dataframes with the same name as input in R

I have several dataframes that I am applying a function to
The function works but I would like to lapply it to several dataframes and output the result according to the input names.
Here is an example of one of the dataframes
structure(list(chr = structure(c(1L, 1L, 1L), .Label = c("chr1",
"chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22",
"chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX",
"chrY"), class = "factor"), leftPos = c(100260254L, 100735342L,
100805662L), strand.x = structure(c(1L, 1L, 2L), .Label = c("-",
"+"), class = "factor"), X50CellJ_SLX.9395.FSeqJ.fq.gz = c(7L,
295L, 132L), Cytospongex10_SLX.9395.FSeqK.fq.gz = c(72L, 256L,
148L), FFPE20X_SLX.9395.fq.gz = c(5L, 74L, 36L), Tumour10_SMACCO_AH_088_SLX.9396.FSeqH.fq.gz = c(13L,
154L, 65L), Tumour11_SMACCO_SH_020_SLX.9396.FSeqI.fq.gz = c(1L,
0L, 0L), Tumour12_SMACCO_ED_008_SLX.9396.FSeqJ.fq.gz = c(3L,
25L, 8L), Tumour13_SMACCO_AH_086_SLX.9396.FSeqK.fq.gz = c(7L,
120L, 28L), Tumour1_SMACCO_AH_100_SLX.9396.FSeqA.fq.gz = c(0L,
0L, 0L), Tumour2_SMACCO_AH_058_SLX.9396.FSeqB.fq.gz = c(24L,
98L, 42L), Tumour3_SMACCO_SH_051_SLX.9396.FSeqC.fq.gz = c(29L,
92L, 29L), Tumour4_SMACCO_ED_031_SLX.9396.FSeqD.fq.gz = c(18L,
53L, 14L), Tumour5_SMACCO_RS_027_SLX.9396.FSeqE.fq.gz = c(8L,
93L, 17L), Tumour7_SMACCO_AH_026_SLX.9396.FSeqF.fq.gz = c(30L,
205L, 60L), Tumour9_SMACCO_ST_024_SLX.9396.FSeqG.fq.gz = c(15L,
129L, 17L), strand.y = structure(c(1L, 1L, 2L), .Label = c("-",
"+"), class = "factor"), Tumour14_SMACCO_AH_094_SLX.9394.FSeqA.fq.gz = c(0L,
7L, 3L), Tumour15_SMACCO_WG_006_SLX.9394.FSeqB..fq.gz = c(3L,
19L, 4L), Tumour16_SMACCO_ST_035_SLX.9394.FSeqC.fq.gz = c(1L,
23L, 8L), Tumour17_SMACCO_ST_034_SLX.9394.fq.gz = c(7L, 26L,
5L), Control19_SLX.9394.FSeqE.fq.gz = c(51L, 256L, 36L), Control20_SLX.9394.FSeqF.fq.gz = c(23L,
110L, 34L), Control21_SLX.9394.FSeqG..fq.gz = c(30L, 56L,
11L), Control22_SLX.9394.FSeqH.fq.gz = c(22L, 72L, 24L), Control23_SLX.9394.FSeqI.fq.gz = c(10L,
23L, 2L), Control25_SLX.9394.FSeqJ.fq.gz = c(17L, 72L, 8L),
Control27_SLX.9394.FSeqK.fq.gz = c(10L, 21L, 9L), Control28_SLX.9395.FSeqA.fq.gz = c(13L,
40L, 4L), Control29_SLX.9395.FSeqB.fq.gz = c(14L, 39L,
6L), Control30_SLX.9395.FSeqC.fq.gz = c(5L, 32L, 5L),
Control31_SLX.9395.FSeqD.fq.gz = c(7L, 11L, 5L), Control32_SLX.9395.FSeqE.fq.gz = c(5L,
32L, 4L), Control33_SLX.9395.FSeqF.fq.gz = c(10L, 25L,
6L), Control34_SLX.9395.FSeqG.fq.gz = c(3L, 32L, 1L),
Control35_SLX.9395.FSeqH.fq.gz = c(10L, 33L, 0L), Controls = c(0L,
0L, 0L), Samples = c(0L, 0L, 0L)), .Names = c("chr", "leftPos",
"strand.x", "X50CellJ_SLX.9395.FSeqJ.fq.gz", "Cytospongex10_SLX.9395.FSeqK.fq.gz",
"FFPE20X_SLX.9395.fq.gz", "Tumour10_SMACCO_AH_088_SLX.9396.FSeqH.fq.gz",
"Tumour11_SMACCO_SH_020_SLX.9396.FSeqI.fq.gz", "Tumour12_SMACCO_ED_008_SLX.9396.FSeqJ.fq.gz",
"Tumour13_SMACCO_AH_086_SLX.9396.FSeqK.fq.gz", "Tumour1_SMACCO_AH_100_SLX.9396.FSeqA.fq.gz",
"Tumour2_SMACCO_AH_058_SLX.9396.FSeqB.fq.gz", "Tumour3_SMACCO_SH_051_SLX.9396.FSeqC.fq.gz",
"Tumour4_SMACCO_ED_031_SLX.9396.FSeqD.fq.gz", "Tumour5_SMACCO_RS_027_SLX.9396.FSeqE.fq.gz",
"Tumour7_SMACCO_AH_026_SLX.9396.FSeqF.fq.gz", "Tumour9_SMACCO_ST_024_SLX.9396.FSeqG.fq.gz",
"strand.y", "Tumour14_SMACCO_AH_094_SLX.9394.FSeqA.fq.gz",
"Tumour15_SMACCO_WG_006_SLX.9394.FSeqB..fq.gz", "Tumour16_SMACCO_ST_035_SLX.9394.FSeqC.fq.gz",
"Tumour17_SMACCO_ST_034_SLX.9394.fq.gz", "Control19_SLX.9394.FSeqE.fq.gz",
"Control20_SLX.9394.FSeqF.fq.gz", "Control21_SLX.9394.FSeqG..fq.gz",
"Control22_SLX.9394.FSeqH.fq.gz", "Control23_SLX.9394.FSeqI.fq.gz",
"Control25_SLX.9394.FSeqJ.fq.gz", "Control27_SLX.9394.FSeqK.fq.gz",
"Control28_SLX.9395.FSeqA.fq.gz", "Control29_SLX.9395.FSeqB.fq.gz",
"Control30_SLX.9395.FSeqC.fq.gz", "Control31_SLX.9395.FSeqD.fq.gz",
"Control32_SLX.9395.FSeqE.fq.gz", "Control33_SLX.9395.FSeqF.fq.gz",
"Control34_SLX.9395.FSeqG.fq.gz", "Control35_SLX.9395.FSeqH.fq.gz",
"Controls", "Samples"), row.names = c(NA, 3L), class = "data.frame")
Here is what I have so far
mylist <- list(A = OriginalMeta , B = SLX9392 , C = SLX9393, D = SLX9397, E = Gastric, F = Dysplasia, G = GoodDysplasia, H = Cholangio, I = LCM_PS14_1105_1F)
sortIt <- function(df1) {
df1$strand.x<- NULL
df1$strand.y<- NULL
df1$strand<-NULL
df1$X.<-NULL
names(df1)[1] <- c("chr")
#Get rid of X and Y chromosomes
df1 <- df1[!grepl("chrX", df1$chr), ]
df1 <- df1[!grepl("chrY", df1$chr), ]
xyAss3<-df1
return(xyAss3)
}
lapply(names(mylist),
sortIt(x)write.csv(mylist[x],
file =paste0(x,'.csv')))
The thing is I just dont know how to feed the mylist into the function. Should I call x in the lapply df1? I'm a bit confused as to how to tie it all together.
I think you'll do better to fold the creation of the .csv into your function and then use a for loop to apply that function to each object in your list in turn. So something like this, where df is the sample data frame you posted:
mylist <- list(A = df, B = df)
sortIt <- function(i) {
df = mylist[[i]]
df[,"strand.x"] <- NULL
df[,"strand.y"] <- NULL
df[,"strand"] <- NULL
df[,"X."] <- NULL
names(df) <- c("chr", names(df)[2:length(names(df))])
df <- df[!grepl("chrX", df$chr), ]
df <- df[!grepl("chrY", df$chr), ]
write.csv(df, file = paste0(names(mylist)[i], ".csv"), row.names=FALSE)
}
for (i in seq(length(mylist))) {sortIt(i)}
If you were trying to create a new object in your workspace, then one of the apply functions would be a better bet. But when you're trying to output files, I think you need to use a for loop instead.
Not really sure what you are trying to achieve, but guessing that you want to save the transformed data frame to a file with a name taken from the list, this could do the job (it should work with the rest of your code - note the [[1]]):
lapply(names(mylist),
function(x) write.csv(sortIt(mylist[x][[1]]),
file = paste0(x,'.csv')))
Another option is to use mapply, here I'm attaching a complete example:
# create the data
dframes <- lapply(1:3, function(x) data.frame(x=rnorm(10), y=runif(10)))
names(dframes) <- LETTERS[1:3]
# the transformation function
sortdf <- function(df) df[order(df$x),]
# two variants of apply
lapply(names(dframes),
function(name) write.csv(sortdf(dframes[name][[1]]),
file=paste0(name, '.csv')))
# mapply does not have the ugly [[1]] syntax bit, I'd prefer it myself
mapply(function(name, df) write.csv(sortdf(df), file=paste0(name, '.csv')),
names(dframes),
dframes)

how do you subset a data frame based on a variable name

my data frame called d:
dput(d)
structure(list(Hostname = structure(c(8L, 8L, 9L, 5L, 6L, 7L,
1L, 2L, 3L, 4L), .Label = c("db01", "db02", "farm01", "farm02",
"tom01", "tom02", "tom03", "web01", "web03"), class = "factor"),
Date = structure(c(6L, 10L, 5L, 3L, 2L, 1L, 8L, 9L, 7L, 4L
), .Label = c("10/5/2015 1:15", "10/5/2015 1:30", "10/5/2015 2:15",
"10/5/2015 4:30", "10/5/2015 8:30", "10/5/2015 8:45", "10/6/2015 8:15",
"10/6/2015 8:30", "9/11/2015 5:00", "9/11/2015 6:00"), class = "factor"),
Cpubusy = c(31L, 20L, 30L, 20L, 18L, 20L, 41L, 21L, 29L,
24L), UsedPercentMemory = c(99L, 98L, 95L, 99L, 99L, 99L,
99L, 98L, 63L, 99L)), .Names = c("Hostname", "Date", "Cpubusy",
"UsedPercentMemory"), class = "data.frame", row.names = c(NA,
-10L))
In a loop I need to go through this data frame based on metrics variable, I need to createa subset data frame for summarization:
metrics<-as.vector(unique(colnames(d[,c(3:4)])))
for (m in metrics){
sub<-dd[,c(1,m)]
}
I cannot use m in this subset line, any ideas how I could subset data frame based on a variable name?
In your subsetting call you are mixing column indexes and column names so R does not understand what you are trying to do.
Either use column names:
for (m in metrics) {
sub <- d[, c(colnames(d)[1], m)]
}
Or indexes:
for (i in 3:4) {
sub <- d[, c(1, i)]
}
Having said that, for loops in R are usually for cases where dynamic assignments are needed or for calling functions with side effects or some other relatively unusual case. Creating a summary by slicing and dicing data in for loops is almost never the proper way to do it in R. If the usual functional tools are not enough there are fantastic packages like plyr, dplyr, etc that let you split-apply-combine your data in very convenient and idiomatic ways.

Resources