Table format of ANOVA output - r

I have a large dataset on which I am performing ANOVA analysis. I'm not sure how to get the output of the analysis into a table that I can use in a Word document (without retyping all of the values manually).
Here is an example of what I'm trying to do:
var1 <- c("Red", "Green", "Blue", "Blue", "Red","Red", "Green", "Blue", "Blue", "Red",
"Red", "Blue", "Green", "Blue", "Red","Red", "Green", "Blue", "Blue", "Red")
var2 <- c(10, 20, 15, 32, 10, 20, 15, 32, 10, 20, 15, 32, 10, 20, 15, 32, 10, 20, 15, 32)
df <- data.frame(var1, var2)
TukeyHSD(aov(var2 ~ var1))
This produces an output that looks like this:
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = var2 ~ var1)
$var1
diff lwr upr p adj
Green-Blue -8.25 -21.183389 4.683389 0.2580147
Red-Blue -2.75 -13.310068 7.810068 0.7848043
Red-Green 5.50 -7.433389 18.433389 0.5323260
I would like the output to be in a format that is easy to cut and paste into Word that includes the headings, "Variable", "Difference" and "p value". Any help would be appreciated.

Related

How to change the position of text of factors in plot

After I have made a constrained PCoA, and plotted the results in a graph, With the following result. However, the text is not nicely positioned at each location. Is there a possibility to modify this?
Code:
p = plot(ord, type="n") #Plot clean graph
with(env, levels(field)) #set the 3 different field levels
with(env, points(ord, display = "sites", col = colvec[field],pch = 19, bg = colvec[field], cex=1))#display sites with field collours
p = text(ord, display = "species", select = c(19, 50, 47, 13, 29, 12, 45), cex=0.8, col = "lightblue4") # add only some selected species
with(env, legend("topright", legend = levels(field), bty = "n",col = colvec, pch = 21, pt.bg = colvec)) #add legenda
plot(dbDRA, col = "black") #plot vectors
plot(dbDRA, p.max = 0.05, col = "red") #make all significant factors red

ggplot2 - geom_histogram / scale_fill_manual

I am working the following dataframe (df):
df$GP<-c(0,0,0,1,1,2,3,3,3,3,4,4,9,15,18,18,19,19,20,20,21,22,22,23)
df$colour<-c("g","g","g","g","g","g","g","g","g","g","g","g","t","t","g","g","g","g","g","g","g","g","g","g")
I want the histogram below, but showing a different fill for colour=="g" and colour=="t".
However, running the following code, the bars labelled colour=="t", go out of scale (up to 1 - plot2) whereas should be at 0.25 (plot1).
ggplot(data=df,aes(x=GP,y=..ndensity..))+geom_histogram(bins=25,aes(fill=colour))+scale_fill_manual(values=c("black","grey"))
Do you have any idea of how this could be achieved?
Thank you very much for your help with this one!
I used a tibble as the data type for dataset, with different tibble variable names.
the result is just as you want.
tb <- tibble(
tbx = c(0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 4, 4, 9, 15, 18, 18, 19, 19, 20, 20, 21, 22, 22, 23),
tby = c("g","g","g","g","g","g","g","g","g","g","g","g","t","t","g","g","g","g","g","g","g","g","g","g")
)
ggplot(tb, aes(tbx, tby = ..ndensity..)) +
geom_histogram(bins = 25, aes(fill = tby)) +
scale_fill_manual(values = c("red", "grey"))
and this is the output plot:
I hope this addresses your question

Which one the is more appropriate predictive model to use in R for the following scenario

I have values in x axis ranging from 300 mm to 0.075 mm, and in y - axis from 0 to 100. I need to predict the values for x = 0.002. There is a need to plot using semilog plot. I tried to use lm function in the following way:
f2 <- data.frame(sievesize = c(0.075, 1.18, 2.36, 4.75), weight = c(55, 66.9, 67.69, 75)
f3 <- data.frame(sievesize = 0.002)
model1 <- lm(weight ~ log10(sievesize), data = f2)
pred3 <- predict(model1, f3)
Is there any better way to predict the values for 0.002?
You cannot do much with the data except to calculate the prediction interval to understand what is a margin of error for your prediction (it will be shown that it is 38.5 mm +/- 21 mm):
just four points in a range of your experimental data (~ 18 bytes of data).
0.002 mm sieve size is outside your data range [0.075, 4.75]. Unfortunately this kind of extrapolation of any model leads to quate a huge prediction error.
non-linear relation you are fitting in lin-log plot has a discontinuity when approach to zero
the data are distributed in a very narrow range for an exponential dependence.
Please see below the code:
f2 <- data.frame(sievesize = c(0.075, 1.18, 2.36, 4.75), weight = c(55, 66.9, 67.69, 75))
f3 <- data.frame(sievesize = c(0.002))
m_lm <- lm(weight ~ log10(sievesize), data = f2)
fit_lm <- predict(m_lm, f3, interval = "prediction")
fit_lm
pred_x <- data.frame(sievesize = seq(0.001, 5, .01))
fit_conf <- predict(m_lm, pred_x, interval = "prediction")
# fit lwr upr
# 1 38.46763 17.73941 59.19586
plot(log10(f2$sievesize), f2$weight, ylim = c(0, 85), pch = 16, xlim = c(-3, 1))
points(log10(f3$sievesize), fit_lm[, 1], col = "red", pch = 16)
lines(log10(pred_x$sievesize), fit_conf[, 1])
lines(log10(pred_x$sievesize), fit_conf[, 2], col = "blue")
lines(log10(pred_x$sievesize), fit_conf[, 3], col = "blue")
legend("bottomright",
legend = c("experiment", "fitted line", "prediction interval", "forecasted"),
lty = c(NA, 1, 1, NA),
lwd = c(NA, 1, 1, NA),
pch = c(16, NA, NA, 16),
col = c("black", "black", "blue", "red"))
and the graph which illustrates above mentioned points:
So the usage some advance techniques like nonlinear fit, glm or bayessian regression etc. will not bring additional insights as the data set is extriemly small and distributed in very narrow range.

legend in a forest plot

I am having a hard time with the forest plot package in R. Here is my code. Actually everything works well beside the legend.
. However for the legend, I would like to have a Blue Circle, A red Square and a green losange in stead of 3 squares.
Any idea?
Thanks in advance.
Peter
library(forestplot)
test_data <- data.frame(coef1=c(0.54,0.72,0.57),
coef2=c(0.59,0.79,0.58),
coef3=c(0.49,0.60,0.48),
low1=c(0.41,0.46,0.42),
low2=c(0.44,0.49,0.42),
low3=c(0.37,0.37,0.35),
high1=c(0.72,1.12,0.77),
high2=c(0.78,1.26,0.80),
high3=c(0.65,0.99,0.66))
col_no <- grep("coef", colnames(test_data))
row_names <- list(
list("Behavioral CVH","Biological CVH","Total CVH"))
coef <- with(test_data, cbind(coef1, coef2, coef3))
low <- with(test_data, cbind(low1, low2, low3))
high <- with(test_data, cbind(high1, high2, high3))
forestplot(row_names, coef, low, high,
title="Paris Prospective Study 3",
fn.ci_norm=matrix(c("fpDrawCircleCI", "fpDrawNormalCI","fpDrawDiamondCI"),
nrow = 3, ncol=3, byrow=T),
zero = c(1), boxsize=0.05,
col=fpColors(box=c("royalblue", "gold", "black"),
line=c("darkblue", "orange", "black"),
summary=c("darkblue", "red", "black"),
hrz_lines = "#444444"),
xlab="Odds ratio & 95% Confidence intervals",
vertices = TRUE,
new_page = TRUE,
legend=c("Q2 vs. Q1","Q3 vs. Q1","Q4 vs. Q1"),
legend_args = fpLegend(pos = list("topright"),
title="Legend",
r = unit(0, "snpc"),
gp = gpar(col="#CCCCCC", lwd=1.5)))
One thing you can do is use the "regular" call to legend, outside the call for the forestplot.
To do that, you'll first have to call plot.new:
plot.new()
forestplot(...) # without the legend part
legend("topright", c("Q2 vs. Q1","Q3 vs. Q1","Q4 vs. Q1"), title="Legend", border="#CCCCCC", box.lwd=1.5,
col=c("blue", "red", "green"), pch=c(16, 15, 18))

Split barplot by grouping by days

I have the following bar chart produced using this code:
MD1<-read.csv("MD_qual_OTU_sorted.csv")
MD1<-data.frame(Samples=c("A","B","C","D","E","F","G","H","I","J","K","L","M", "N","O","P","Q", "R"), Number.of.OTUs=c(13,10,9,9,15,11,7,7,9,9,5,10,10,7,15,17,8,9))
par(las=1)
barplot(MD1[,2],names.arg=MD1[,1], ylab='OTU Count', yaxt='n', xlab='MD samples', main='Total OTU count/Sample',density=c(90,90, 90, 90, 90, 90, 10, 10, 10, 10, 10, 10, 40, 40, 40, 40, 40, 40), col=c("yellow","yellow","pink", "pink","green","green","red","red", "purple", "purple", "blue", "blue", "orange", "orange","cyan", "cyan","chartreuse4", "chartreuse4" ))
usr <- par("usr")
par(usr=c(usr[1:2], 0, 20))
axis(2, at=seq(0,20,5))
I want to split samples A-F into a separate group (Day 3), G-L (Day 5) and M-R (Day 15)
There are similar questions posted however I am not sure how to tidy up the manner in which I have inputted my data to be able to use these solutions.
You could consider using ggplot2, separate plots are very easy using facet_wrap and facet_grid.
library(ggplot2)
#create a grouping variable
MD1$Day <- rep(c("Day 03","Day 05","Day 15"),
each=6)
p1 <- ggplot(MD1, aes(x=Samples,y=Number.of.OTUs)) +
geom_bar(stat="identity") + facet_wrap(~Day,
scales="free_x")
p1
Or, if you want to use base-R and approach your original image:
#add colors/densities
MD1$col <- c("yellow","yellow","pink", "pink","green","green","red","red",
"purple", "purple", "blue", "blue", "orange", "orange","cyan", "cyan","chartreuse4", "chartreuse4" )
MD1$density <- c(90,90, 90, 90, 90, 90, 10, 10, 10, 10, 10, 10, 40, 40, 40, 40, 40, 40)
#set 1 row three cols for plotting
par(mfrow=c(1,3))
#split and plot
lapply(split(MD1, MD1$Day),function(x){
barplot(x[,2],
names.arg=x[,1],
ylab='OTU Count',
ylim=c(0,20),
main=unique(x$Day),
col=x$col,
density=x$density)
})

Resources