I have made ordination plots of microbiome data using the R phyloseq functions ordinate and plot_ordination with a phyloseq object and a previously calculated distance matrix (unweighted UniFrac distances) as inputs. I would like to add some arrows that indicate which species relative abundances mainly drive the distance between the samples along the axes.
Adding scores using the function scores or specscores.dbrda has not worked. Is there another way to add these arrows/vectors?
Here's the code I used:
MDS <- ordinate(physeqobject, "MDS", distance=unifracmatrix)
plot <- plot_ordination(physeqobject, MDS, color = "variable1")
speciesscores <- scores(MDS, display = "species")
The first part works fine, the scores function returns this error:
Error in scores.default(MDS, display = "species") :
cannot find scores
I also tried it without a separate distance matrix with the (default) Bray-Curtis distances calculated within ordinate, but I still get the same error:
MDS <- ordinate(physeqobject, "MDS")
Related
I used some raw-output files from our flow cytometer which tells me in .csv which intensities it measures at which wavelength for every event/cell.
This resulted in a .csv with around 25000 cells and around 240 measuring points.
Importing the .csv file into R-Studio and removing some measurements yielded a matrix with 25000 obs x 73 variables.
Then I used rPhenograph to calculate the neighborhoods, which worked well.
But now it seems to be a dataframe or something that I genuinely have no idea how to plot it.
Data1 <- read_csv("CD4_3.csv", skip=17)
Data_selected <- select(Data2, ends_with(".A"))
rpheno_out <- Rphenograph(Data_selected)
I hoped to get a plot which looks/resembles a tSNE plot.
Instead, I only got an error-code telling me that ggplot can't handle it.
ggplot(rpheno_out) + geom_point()
Fehler: data must be a data frame, or other object coercible by
fortify(), not an S3 object with class communities
I think you've misunderstood what the Rphenograph() function returns ; the doc states :
A simple R implementation of the phenograph
PhenoGraph
algorithm, which is a clustering method designed for high-dimensional
single-cell data analysis. It works by creating a graph ("network")
representing phenotypic similarities between cells by calclating the
Jaccard coefficient between nearest-neighbor sets, and then
identifying communities using the well known Louvain
method in this graph.
This only builds clusters on your data based on the information you provide. The output has no dimensionally reduced version of your input.
If you want to see what your clustering looks like, then you have to apply your favorite dimension-reduction analysis and plot color-coding with the cluster info from Rphenograph().
To give you an example, I've done this on the provided code from the function's doc :
library(cytofkit)
## Example from Rphenograph's doc
iris_unique <- unique(iris) # Remove duplicates
data <- as.matrix(iris_unique[,1:4])
Rphenograph_out <- Rphenograph(data, k = 45)
## Added bit to see the results
pca <- prcomp(iris_unique[,1:4], retx = T, rank. = 2)
par(mfrow=c(1,2))
plot(pca$x, col=Rphenograph_out$membership, lwd=3,
main="Color by Rphenograph cluster")
plot(pca$x, col=iris_unique$Species, lwd=3,
main="Color by Species")
Results in :
I'm trying to create a good heat map using Krigging for missing values.
I have the following data, that contains all the values that have been measured for RLevel.
I followed the following link that tells how to use krigging. https://rpubs.com/nabilabd/118172
This is the following code I wrote. Before these steps, I had removed all the values from my DieData that
needed values to be tested. The values that need to be tested are refered as die.data.NAValues in my code.
#**************************************************CODE*****************
#Step3: Convert to SpatialPointsDataFrame Object
coordinates(die.data) = ~X+Y
#Step 4: Get the prediction Grid
coordinates(die.data.NAValues)=~X+Y
#Using autokride method
kr = autoKrige(RLevel, die.data, die.data.NAValues,nmax=20)
predicted_die_values <- kr$krige_output
predicted_die_model <- kr$var_model
#Get Predictions. Plot the predicted on heat map.
g <- gstat(NULL,"RLevel",RLevel~1,die.data, model=predicted_die_model,nmax=1)
predictedSet <- predict(g,newdata=die.data,BLUE=TRUE)
#Plot the krigging graph
predicted_die_values %>% as.data.frame %>% ggplot(aes(x=X,y=Y)) + geom_tile(aes(fill=v1.pred))+coord_equal() +scale_fill_gradient(low="yellow",high="red")+scale_x_continuous()+scale_y_continuous()+theme_bw()
When I plot the graph, I get the following image from the values that have been tested by the KRIGING METHOD.
My question is how can I show a good heat map with predicted points from KRIG and from the points already have. I want my graph to show something like this from the link above I had posted.
Description about my dataset: My original dataset including NA values that have not been tested contains around 55057 points. When I take out NA values and use that are my prediction grid, I get 390 points. Majority of the values for RLevel are within 30's range except around 100-200 points are above 100.
Can anyone help me out or give me guidance of how to produce a good heatmap?
I have mixed data type that contain numeric and categorical attributes to which I am planning to apply cluster algorithms.
As a first step, I produced a distance matrix using the daisy() function and Gower distance measure. I have displayed the distance matrix using a heatmap and a levelplot function in R.
It seems as if there is strong similarity between some of the objects in my data and I want to check some of the similar/dissimilar objects to satisfy myself that the measure is working well on my data.
How do I select the similar/dissimilar objects from the heatmap and link them to the original data set to be able to evaluate them?
This is how I plot my heatmap using R. IDX is my distance Matrix.
new.palette=colorRampPalette(c("black","yellow","#007FFF","white"),space="rgb")
levelplot(IDX_as[1:ncol(IDX_as),ncol(IDX_as):1],col.regions=new.palette(20))
quartz(width=7,height=6) #make a new quartz window of a given size
par(mar=c(2,3,2,1)) #set the margins of the figures to be smaller than default
layout(matrix(c(1,2),1,2,byrow=TRUE),widths=c(7,1)) #set the layout of the quartz window. This will create two plotting regions, with width ratio of 7 to 1
image(IDX_as[1:ncol(IDX_as),ncol(IDX_as):1],col=new.palette(20),xaxt="n",yaxt="n") #plot a heat map matrix with no tick marks or axis labels
axis(1,at=seq(0,1,length=20),labels=rep("",20)) #draw in tick marks
axis(2,at=seq(0,1,length=20),labels=rep("",20))
#adding a color legend
s=seq(min(IDX_as),max(IDX_as),length=20) #20 values between minimum and maximum values of m
l=matrix(s,ncol=length(s),byrow=TRUE) #coerce it into a horizontal matrix
image(y=s,z=l,col=new.palette(20),ylim=c(min(IDX),max(IDX)),xaxt="n",las=1) #plot a one-column heat map
heatmap(IDX_as,symm=TRUE,col=new.palette(20))
I'm trying to estimate the area of the 95% contour of a kde object from the ks package in R.
If I use the example data set from the ks package, I would create the kernel object as follow:
library(ks)
data(unicef)
H.scv <- Hscv(x=unicef)
fhat <- kde(x=unicef, H=H.scv)
I can easily plot the 25, 50, 75% contour using the plot function:
plot(fhat)
But I want to estimate the area within the contour.
I saw a similar question here, but the answer proposed does not solve the problem.
In my real application, my dataset is a time series of coordinates of an animal and I want to measure the home range size of this animal using a bivariate normal kernel. I'm using ks package because it allows to estimate the bandwith of a kernel distribution with methods such as plug-in and smoothed cross-validation.
Any help would be really appreciated!
Here are two ways to do it. They are both fairly complex conceptually, but actually very simple in code.
fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
z=estimate,levels=cont["95%"])[[1]])
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677
library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,]) # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677
Explanation
First, the kde(...) function returns a kde object, which is a list with 9 elements. You can read about this in the documentation, or you can type str(fhat) at the command line, or, if you're using RStudio (highly recommended), you can see this by expanding the fhat object in the Environment tab.
One of the elements is $eval.points, the points at which the kernel density estimates are evaluated. The default is to evaluate at 151 equally spaced points. $eval.points is itself a list of, in your case 2 vectors. So, fhat$eval.points[[1]] represents the points along "Under-5" and fhat$eval.points[[2]] represents the points along "Ave life exp".
Another element is $estimate, which has the z-values for the kernel density, evaluated at every combination of x and y. So $estimate is a 151 X 151 matrix.
If you call kde(...) with compute.cont=TRUE, you get an additional element in the result: $cont, which contains the z-value in $estimate corresponding to every percentile from 1% to 99%.
So, you need to extract the x- and y-values corresponding to the 95% contour, and use that to calculate the area. You would do that as follows:
fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
z=estimate,levels=cont["95%"])[[1]])
Now, contour.95 has the x- and y-values corresponding to the 95% contour of fhat. There are (at least) two ways to get the area. One uses the pracma package and calculates
it directly.
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677
The reason for the negative value has to do with the ordering of x and y: polyarea(...) is interpreting the polygon as a "hole", so it has negative area.
An alternative uses the area calculation routines in rgeos (a GIS package). Unfortunately, this requires you to first turn your coordinates into a "SpatialPolygon" object, which is a bit of a bear. Nevertheless, it is also straightforward.
library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,]) # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677
Another method would be to use the contourSizes() function within the kde package. I've also been interested in using this package to compare both 2D and 3D space use in ecology, but I wasn't sure how to extract the 2D density estimates. I tested this method by estimating the area of an "animal" which was limited to the area of a circle with a known radius. Below is the code:
set.seed(123)
require(GEOmap)
require(kde)
# need this library for the inpoly function
# Create a data frame centered at coordinates 0,0
data = data.frame(x=0,y=0)
# Create a vector of radians from 0 to 2*pi for making a circle to
# test the area
circle = seq(0,2*pi,length=100)
# Select a radius for your circle
radius = 10
# Create a buffer for when you simulate points (this will be more clear below)
buffer = radius+2
# Simulate x and y coordinates from uniform distribution and combine
# values into a dataframe
createPointsX = runif(1000,min = data$x-buffer, max = data$x+buffer)
createPointsY = runif(1000,min = data$y-buffer, max = data$y+buffer)
data1 = data.frame(x=createPointsX,y=createPointsY)
# Plot the raw data
plot(data1$x,data1$y)
# Calculate the coordinates used to create a cirle with center 0,0 and
# with radius specified above
coords = as.data.frame(t(rbind(data$x+sin(circle)*radius,
data$y+cos(circle)*radius)))
names(coords) = c("x","y")
# Add circle to plot with red line
lines(coords$x,coords$y,col=2,lwd=2)
# Use the inpoly function to calculate whether points lie within
# the circle or not.
inp = inpoly(data1$x, data1$y, coords)
data1 = data1[inp == 1,]
# Finally add points that lie with the circle as blue filled dots
points(data1$x,data1$y,pch=19,col="blue")
# Radius of the circle (known area)
pi * radius^2
#[1] 314.1593
# Sub in your own data here to calculate 95% homerange or 50% core area usage
H.pi = Hpi(data1,binned=T)
fhat = kde(data1,H=H.pi)
ct1 = contourSizes(fhat, cont = 95, approx=TRUE)
# Compare the known area of the circle to the 95% contour size
ct1
# 5%
# 291.466
I've also tried creating 2 un-connected circles and testing the contourSizes() function and it seems to work really well on disjointed distributions.
I am using the ks package from R to estimate 2d space utilization using distance and depth information. What I would like to do is to use the 95% contour output to get the maximum vertical and horizontal distance. So essentially, I want to be able to get the dimensions or measurements of the resulting 95% contour.
Here is a piece of code with as an example,
require(ks)
dist<-c(1650,1300,3713,3718)
depth<-c(22,19.5,20.5,8.60)
dd<-data.frame(cbind(dist,depth))
## auto bandwidth selection
H.pi2<-Hpi(dd,binned=TRUE)*1
ddhat<-kde(dd,H=H.pi2)
plot(ddhat,cont=c(95),lwd=1.5,display="filled.contour2",col=c(NA,"palegreen"),
xlab="",ylab="",las=1,ann=F,bty="l",xaxs="i",yaxs="i",
xlim=c(0,max(dd[,1]+dd[,1]*0.4)),ylim=c(60,-3))
Any information about how to do this will be very helpful. Thanks in advance,
To create a 95% contour polygon from your 'kde' object:
library(raster)
im.kde <- image2Grid (list(x = ddhat$eval.points[[1]], y = ddhat$eval.points[[2]], z = ddhat$estimate))
kr <- raster(im.kde)
It is likely that one will want to resample this raster to a higher resolution before constructing polygons, and include the following two lines, before creation of the polygon object:
new.rast <- raster(extent(im.kde),res = c(50,50))
kr <- resample(kr, new.rast)
bin.kr <- kr
bin.kr[bin.kr < contourLevels(k, prob = 0.05)]<-NA
bin.kr[bin.kr > 0]<-1
k.poly<-rasterToPolygons(bin.kr,dissolve=T)
Note that the results are similar, but not identical, to Hawthorne Beier's GME function 'kde'. He does use the kde function from ks, but must do something slightly different for the output polygon.
At the moment I'm going for the "any information" prize rather than attempting a final answer. The ks:::plot.kde function dispatches to ks:::plotkde.2d in this case. It works its magic through side effects and I cannot get these functions to return values that can be inspected in code. You would need to hack the plotkde.2d function to return the values used to plot the contour lines. You can visualize what is in ddhat$estimate with:
persp(ddhat$estimate)
It appears that contourLevels examines the estimate-matrix and finds the value at which greater than the specified % of the total density will reside.
> contourLevels(ddhat, 0.95)
95%
1.891981e-05
And then draws the contout based on which values exceed that level. (I just haven't found the code that does that yet.)