R works, but gives me random results - r

For scientific analysis, I am following this tutorial:
https://uw-madison-microbiome-hub.github.io/Microbiome_analysis_in-_R/
Specifically, I would like to use this script to make permanova:
library(vegan)
library(qiime2R)
library(phyloseq)
library(ggplot2)
# import data
ASVs <- read_qza("/path/table.qza")
metadata <- read.table("/path/metadata16S.tsv", , sep='\t', header=T, row.names=1, comment="")
metadata <- metadata[-1,]
tree <- read_qza("/path/rooted-tree.qza")
taxonomy <- read_qza("/path/taxonomy.qza")
tax_table <- do.call(rbind, strsplit(as.character(taxonomy$data$Taxon), ";"))
colnames(tax_table) <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species")
rownames(tax_table) <- taxonomy$data$Feature.ID
# phyloseq object
physeq <- phyloseq(
otu_table(ASVs$data, taxa_are_rows = T),
phy_tree(tree$data),
tax_table(tax_table),
sample_data(metadata)
)
# convert ot relative abundance
physeq_rel <- microbiome::transform(physeq, "compositional")
physeq.ord.wuni <- ordinate(physeq_rel, "PCoA", "unifrac", weighted=T)
b.div.wuni <- plot_ordination(physeq_rel, physeq.ord.wuni, type= "samples", color= "MyParameter") + geom_point(size=3)
b.div.wuni <- b.div.wuni + stat_ellipse() + ggtitle("Weighted Unifrac") + theme_classic()
print(b.div.wuni)
The problem is that after running this script, I get purely random data every time I run it, but only on my Linux, while if I run it on other computers with a window OS, the results are correct!
What could be causing the problem?
The only clue I have is that during the installation of ggplot2 I had problems regarding libmkl_rt.so.1, already present in conda, but which R could not find. I solved the problem by renaming the files as suggested in https://github.com/igraph/rigraph/issues/275#issuecomment-947940126
I have no idea what to do ...
Thank you for your help
Andrea

Related

Extract CMIP6 information with R

I'm doing a research about climate change and I want to extract the CMIP6 information using R but I don't get the results I want. This is the code.
library(raster)
library(ncdf4)
long_lat <- read.csv("C:/Users/USUARIO/Desktop/Tesis/Coordenadas - copia.csv", header = T)
raster_pp <- raster::brick("pr_day_ACCESS-CM2_historical_r1i1p1f1_gn_2000.nc")
sp::coordinates(long_lat) <- ~XX+YY
raster::projection(long_lat) <- raster::projection(raster_pp)
points_long_lat <- raster::extract(raster_pp[[1]], long_lat, cellnumbers = T)[,1]
data_long_lat <- t(raster_pp[points_long_lat])
The cordinates I've used are the following: -77.7754,-9.5352 (latitude, longitude), but I've tried these as well and still doesn't work 102.2246,-9.5352. Thanks for your response.

NMDS plot with vegan not coloured by groups

I try to use a script to plot NMDS that worked perfectly before...but I changed of R version (R 4.1.2 on Ubuntu 20.04) and I cannot get anymore NMDS coloured graphs.
I get this error
"species scores not available"
I get the good NMDS representation, but I cannot get it colored according "TypeV" -see below code - (I got it before)
mydata TESTNMDS.csv
data <- read.table('TESTNMDS.csv',header = T, sep = ",")
coldit=c( "#FFFF33","#E141B9", "#33FF66", "#3333FF")
abundance.matrix <- data[,3:50]
for (j in ncol(abundance.matrix):1) if (colSums(abundance.matrix[j]) < 1)
bundance.matrix <-abundance.matrix[ , -j]
dist_data<-vegdist(abundance.matrix^0.5, method='bray')
nmds <- metaMDS(dist_data, trace =TRUE, try=500)
plot(nmds)
> plot(nmds)
species scores not available
If I try to plot with various represenstation (that worked before) I cannot get color
orditorp(nmds,display="sites",col=coldit[data$TypeV])
ordipointlabel(nmds, display = "sites", cex=1, pch=16, col=coldit[datat$TypeV])
ordispider(nmds, groups=data$TypeV,label=TRUE)
ordihull(nmds, groups=data$TypeV, lty="dotted")
THANKS A LOT !
The main problem is that R no longer automatically converts character data to factors so you have to do that explicitly. Here is your code with a few simplifications:
library(vegan)
data <- read.table('TESTNMDS.csv',header = T, sep = ",")
data$TypeV <- factor(data$TypeV) # make TypeV a factor
coldit=c( "#FFFF33","#E141B9", "#33FF66", "#3333FF")
abundance.matrix <- data[,3:50]
idx <- colSums(abundance.matrix) > 0
abundance.matrix <- abundance.matrix[ , idx]
dist_data<-vegdist(abundance.matrix^0.5, method='bray')
nmds <- metaMDS(dist_data, trace =TRUE, try=500)
plot(nmds)
orditorp(nmds,display="sites",col=coldit[data$TypeV])
You do not need a loop to eliminate columns since R is vectorized.

Saving output plot in R with grid.grab() doesn't work

I've been trying to save multiple plot generated with the meta package in R, used to conduct meta-analysis, but I have some troubles. I need to save this plot to arrange them in a multiple plot figure.
Example data:
s <- data.frame(Study = paste0("Study", 1:15),
event.e = sample(1:100, 15),
n.e = sample(100:300, 15))
meta1 <- meta::metaprop(event = event.e,
n= n.e,
data=s,
studlab = Study)
Here is the code:
meta::funnel(meta1)
funnelplot <- grid::grid.grab()
I can see the figure in the "plot" tab in R Studio; However, if I search the funnelplot object in the environment it say that is a "NULL" type, and obviously trying to recall that doesn't work.
How can I fix it?

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Using foreach() in R to speed up loop for ggplot2

I would like to create a PDF file containing hundreds of plots in a certain order.
My strategy was using foreach() and storing each ggplot2 object into the output list, and then printing each ggplot2 object to the output file.
For example, I would like to plot a histogram of prices for every factor "carat" in the diamonds dataset:
library(ggplot2)
library(plyr)
library(foreach) # for parallelization
library(doParallel) # for parallelization
#setup parallel backend to use 4 processors
cl<-makeCluster(4)
registerDoParallel(cl)
# use diamonds dataset
carats.summary <- ddply(diamonds, .(carat), summarise, count = length(carat))
m.list <- foreach(i = 1:length(carats.summary$carat),
.packages = "ggplot2") %dopar% {
jcarat = carats.summary$carat[i]
m <- ggplot(subset(diamonds, carat == jcarat), aes(x = price)) +
geom_histogram()
print(m)
}
With this code, I am hoping to create a list of ggplot2 objects which I can then save into a single pdf file (for example using pdf()) in an ordered manner (for example, in ascending carats).
However, running this results in an error message:
Error in serialize(data, node$con) : error writing to connection
I suspect this is due to the fact that if I tried to append the ggplot2 object to a list, I would get a warning message like this:
lst <- vector(mode = "list")
lst[1] <- m
Warning message:
In lst[1] <- m :
number of items to replace is not a multiple of replacement length
Although this is pure speculation and I could be wrong.
Does anybody have an idea how to use foreach() to save ggplot2 objects onto a list? Or some way to parallelize for loops involving ggplot2?
Thanks in advance.
You shouldn't be printing the object inside the loop, just create the ggplot object. Only print when you have the graphic device open that you want.
m.list <- foreach(i = 1:length(carats.summary$carat),
.packages = "ggplot2") %dopar% {
jcarat = carats.summary$carat[i]
ggplot(subset(diamonds, carat == jcarat), aes(x = price)) +
geom_histogram()
}
then you can get at them with
m.list[[1]]
etc...

Resources