How to fix this PCA in R - r

I am creating a PCA plot from data:
label <- read.table('label_clusters.tsv')
mydata <- read.table('raw_clusters.tsv')
GP.svd = svd(mydata)
dat = data.frame("pc1"= GP.svd$u[,1],
"pc2"= GP.svd$u[,2],
"Data"= c(rep("my", nsamples(our.obj2)), rep("zeller", nsamples(z.obj))))
GP.svd is a large list in the form of:
[,97] [,98] [,99] [,100] [,101] [,102]
[1,] -9.616173e-02 -0.0779788701 -0.1087899396 -0.0653396699 -0.140911786 -5.064931e-02
[2,] 1.101038e-01 0.0465664554 0.0237686772 0.1344639223 0.035536326 2.715842e-02
[3,] -3.247248e-02 0.0295960109 0.0148926826 0.0021550661 -0.003509716 -1.887659e-02
When I run the code thus far, I get this error:
Error in data.frame(pc1 = GP.svd$u[, 1], pc2 = GP.svd$u[, 2], Data = c(rep("my", :
could not find function "nsamples"
I am not sure why this is happening, any help is appreciated

Your code cannot find the nsamples function. This means that you:
have to import an package that contains nsamples, or
write an nsamples function yourself that works correctly on our.obj2, or
use a different function, for example nrow if our.obj2 is a data.frame.

Related

Decomposed variance ill-defined in Analysis of heterogeneity (ANOHE)

I am trying to run a meta analysis using a package "gemtc", and the code performs very well in my test data..............................................
The code is listed:
data <- read.csv("input.txt", sep=",", header=T)
network <- mtc.network(data, description="Example")
result.anohe <- mtc.anohe(network, n.adapt=10000, n.iter=50000)
#The file (problem.txt) is also attached.
However, when I use my real data, it has an unfixed bug:
Error in decompose.study(study.samples[, colIndexes, drop = FALSE], studies[i]) :
Decomposed variance ill-defined for 1. Most likely the USE did not converge:
[,1] [,2] [,3] [,4]
[1,] 0.000 2478.307 2491.482 2485.044
[2,] 2478.307 0.000 1106288.727 -440067.825
[3,] 2491.482 1106288.727 0.000 -1459996.199
[4,] 2485.044 -440067.825 -1459996.199 0.000
Thanks very much in advance!
The input file causing problem is attached:
file
..............................................................................................................................................................................................

Merging Polygons in Shape Files with Common Tag IDs: unionSpatialPolygons

I am trying to read from a shape file and merge the polygons with a common tag ID.
library(rgdal)
library(maptools)
if (!require(gpclib)) install.packages("gpclib", type="source")
gpclibPermit()
usa <- readOGR(dsn = "./path_to_data/", layer="the_name_of_shape_file")
usaIDs <- usa$segment_ID
isTRUE(gpclibPermitStatus())
usaUnion <- unionSpatialPolygons(usa, usaIDs)
When I try to plot the merged polygons:
for(i in c(1:length(names(usaUnion)))){
print(i)
myPol <- usaUnion#polygons[[i]]#Polygons[[1]]#coords
polygon(myPol, pch = 2, cex = 0.3, col = i)
}
all the merged segments looks fine except those in around Michigan for which the merger happens in a very weird way such that the resulted area for this particular segment, gives only a small polygon as below.
i = 10
usaUnion#polygons[[i]]#Polygons[[1]]#coords
output:
[,1] [,2]
[1,] -88.62533 48.03317
[2,] -88.90155 47.96025
[3,] -89.02862 47.85066
[4,] -89.13988 47.82408
[5,] -89.19292 47.84461
[6,] -89.20179 47.88386
[7,] -89.15610 47.93923
[8,] -88.49753 48.17380
[9,] -88.62533 48.03317
which turned out to be a small northern island:
I suspect the problem is that for some reason the unionSpatialPolygons function does not like geographically separated polygons [left and right side of Michigan], but I could not find a solution to it yet.
Here is the link to input data as you can reproduce.
I think the problem is not with unionSpatialPolygons but with your plot. Specifically, you are plotting only the first 'sub-polygon' for each ID. Run the following to verify what went wrong:
for(i in 1:length(names(usaUnion))){
print(length(usaUnion#polygons[[i]]#Polygons))
}
For each of these, you took only the first one.
I got a correct polygon join/plot with the following code:
library(rgdal)
library(maptools)
library(plyr)
usa <- readOGR(dsn = "INSERT_YOUR_PATH", layer="light_shape")
# remove NAs
usa <- usa[!is.na(usa$segment_ID), ]
usaIDs <- usa$segment_ID
#get unique colors
set.seed(666)
unique_colors <- sample(grDevices::colors()[grep('gr(a|e)y|white', grDevices::colors(), invert = T)], 15)
colors <- plyr::mapvalues(
usaIDs,
from = as.numeric(sort(as.character(unique(usaIDs)))), #workaround to get correct color order
to = unique_colors
)
plot(usa, col = colors, main = "Original Map")
usaUnion <- unionSpatialPolygons(usa, usaIDs)
plot(usaUnion, col = unique_colors, main = "Joined Polygons")
Here is an example using sf to do this plot which highlights how the package's ability to work with dplyr and summarise in particular can make this operation extremely expressive and succinct. I filter out the missing IDs, group_by the ID, summarise (which does union by default), and easily plot with geom_sf.
library(tidyverse)
library(sf)
# Substitute wherever you are reading the file from
light_shape <- read_sf(here::here("data", "light_shape.shp"))
light_shape %>%
filter(!is.na(segment_ID)) %>%
group_by(segment_ID) %>%
summarise() %>%
ggplot() +
geom_sf(aes(fill = factor(segment_ID)))

Creating spatialpolygons dataframe from list of polygons

I am currently trying to create a polygon shapefile from a list of polygons (study areas for biodiversity research).
Currently these polygons are stored in a list in this format:
$SEW22
[,1] [,2]
[1,] 427260.4 5879458
[2,] 427161.4 5879472
[3,] 427175.0 5879571
[4,] 427273.9 5879557
[5,] 427260.4 5879458
$SEW23
[,1] [,2]
[1,] 418011.0 5867216
[2,] 417912.0 5867230
[3,] 417925.5 5867329
[4,] 418024.5 5867315
[5,] 418011.0 5867216
I tried to simply write them as shpfile with writeOGR but the following error occurs:
> #write polygons to shp
> filenameshp <- paste('Forestplots')
> layername <- paste('Forestplots')
> writeOGR(obj=forest, dsn = filenameshp,
+ layer=layername, driver="ESRI Shapefile", overwrite_layer = TRUE)
Error in writeOGR(obj = forest, dsn = filenameshp, layer = layername, :
inherits(obj, "Spatial") is not TRUE
I read this tutorial by Barry Rowlingson to create spatialpolygons and thought I should probably first create a dataframe and did this:
forestm<-do.call(rbind,forest)
but this returned nothing useful as you can imagine, plus it lost the names of the plots.
As I am still new to R I also tried lots of different other approaches which sensefulness I could not fully judge but none returned what I hoped for and so I spare you with these random approaches.....
I am looking forward to your propositions.
Many thanks
P.S. I also tried the following as described in the spatialpolygons{sp} package:
> Polygons(forest, ID)
Error in Polygons(forest, ID) : srl not a list of Polygon objects
You can follow the approach described in this answer: https://gis.stackexchange.com/questions/18311/instantiating-spatial-polygon-without-using-a-shapefile-in-r.
Here's how to apply the approach to your case. First, I create a list of matrices as in your sample data:
forest <- list(
"SEW22" = matrix(c(427260.4, 5879458, 427161.4, 5879472, 427175.0, 5879571, 427273.9, 5879557, 427260.4, 5879458),
nc = 2, byrow = TRUE),
"SEW23" = matrix(c(418011.0, 5867216, 417912.0, 5867230, 417925.5, 5867329, 418024.5, 5867315, 418011.0, 5867216),
nc = 2, byrow = TRUE)
)
Now
library(sp)
p <- lapply(forest, Polygon)
ps <- lapply(seq_along(p), function(i) Polygons(list(p[[i]]), ID = names(p)[i]))
sps <- SpatialPolygons(ps)
sps_df <- SpatialPolygonsDataFrame(sps, data.frame(x = rep(NA, length(p)), row.names = names(p)))
In the first step, we iterate through the list of matrices and apply the Polygon function to each matrix to create a list of Polygon objects. In the second step, we iterate through this list to create a Polygons object, setting the ID of each element in this object to the corresponding name in the original list (e.g. "SEW22", "SEW23"). The third step creates a SpatialPolygons object. Finally, we create a SpatialPolygonsDataFrame object. Here I have a dummy dataframe populated with NAs (note that the row names must correspond to the polygon IDs).
Finally, write the data
rgdal::writeOGR(obj = sps_df,
dsn = "Forestplots",
layer = "Forestplots",
driver = "ESRI Shapefile",
overwrite_layer = TRUE)
This creates a new folder in your working directory:
list.files()
# [1] "Forestplots"
list.files("Forestplots")
# [1] "Forestplots.dbf" "Forestplots.shp" "Forestplots.shx"
Consult the linked answer for more details.

Why does not the 'outer' function work properly for some argument values in R?

When I run the R command:
outer(37:42, 37:42, complex, 1)
I get an error
"Error in dim(robj) <- c(dX, dY) : dims [product 36] do not match the length of object [37]"
in my R session. But when I run
outer(36:42, 36:42, complex, 1)
I have a valid matrix as a result. The problem persists for all values greater than 36. And there is no problem for all values less then 37.
Is this a bug?
My system: Microsoft R Open 3.4.4 / RStudio 1.1.447 / Ubuntu 16.04
More specifically, when running the function with arguments m:n, m:n it returns the error whenever n < (n - m + 1)^2 [citation needed]. Try for example outer(20:23, 20:23, complex, 1) and outer(20:24, 20:24, complex, 1), where the first will fail but the latter won't, because 24 < (24-20+1)^2. I suspect this has to do with the first argument of complex being length.out, which defines the length of the vector to return - not really an explanation, I know. So your first argument 37:42 is passed to the length.out parameter. This does not make a lot of sense so please correct me if I am wrong, but I think what you want to do is the following:
outer(37:42, 37:42, function(x,y) {complex(1, real = x, imaginary = y)})
Which outputs:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 37+37i 37+38i 37+39i 37+40i 37+41i 37+42i
[2,] 38+37i 38+38i 38+39i 38+40i 38+41i 38+42i
[3,] 39+37i 39+38i 39+39i 39+40i 39+41i 39+42i
[4,] 40+37i 40+38i 40+39i 40+40i 40+41i 40+42i
[5,] 41+37i 41+38i 41+39i 41+40i 41+41i 41+42i
[6,] 42+37i 42+38i 42+39i 42+40i 42+41i 42+42i
Hope this helps.
The problem is in the 4th argument: it should be named:
outer(37:42, 37:42, complex, length.out = 1)
works fine!

Clustering function R Hclust Loop and develop a table

I'm working on a text mining/clustering project and am trying to create a table which contains number of clusters as rows and 6 columns representing the following 6 metrics:
max.diameter, min.separation, average.within,average.between,avg.silwidth,dunn.
I need to create the tables for 3 methods - kmeans, pam and hclust.
I was able to create something for kmeans
dtm0.90Dist = dist(dtm0.90)
foreachcluster = function(k) {
kmeans.result = kmeans(dtm0.90, k);
kmeans.stats = cluster.stats(dtm0.90Dist,kmeans.result$cluster);
c(kmeans.stats$min.separation, kmeans.stats$max.diameter,
kmeans.stats$average.within, kmeans.stats$avearge.between,
kmeans.stats$avg.silwidth, kmeans.stats$dunn)
}
rbind(foreachcluster(2), foreachcluster(3), foreachcluster(4), foreachcluster(5),
foreachcluster(6), foreachcluster(7),foreachcluster(8))
and I get the following output
[,1] [,2] [,3] [,4] [,5]
[1,] 3.162278 30.19934 5.831550 0.5403872 0.10471348
[2,] 2.236068 28.37252 5.006058 0.3923446 0.07881104
[3,] 1.000000 28.37252 4.995478 0.2496066 0.03524537
[4,] 1.000000 26.40076 4.387212 0.2633338 0.03787770
[5,] 1.000000 26.40076 4.353248 0.2681947 0.03787770
[6,] 1.000000 26.40076 4.163757 0.1633954 0.03787770
[7,] 1.000000 26.40076 4.128927 0.2676423 0.03787770
I need similar output for hclust and pam methods but for the life of me can't get the same function to work for either of the two methods
OK, so I was able to make the function for HCLUST
forhclust=function(k){dfDist = dist(dtm0.90);
hclust.result = hclust(dfDist);
hclust.cluster = (cutree(hclust.result, k));
cluster.stats(dfDist,hclust.cluster);c(cluster.stats$min.separation)}
But I get an error when i run this
Error in cluster.stats$min.separation :
object of type 'closure' is not subsettable
What I need is for it to print "min.separation" output.
I would really appreciate all the help and perhaps some guidance in understanding why my approach is failing in hclust.
Also, is there a good source that can explain the functioning and application of these methods, step by step, in detail?
Thank You
foreachcluster2 = function(k) {
hc = hclust(mDist, method = "ave")
hresult = cutree(hc, k)
h.stats = cluster.stats(mDist,hresult);
c( max.dia=h.stats$max.diameter,
min.sep=h.stats$min.separation,
avg.wi=h.stats$average.within,
avg.bw=h.stats$average.between,
silwidth=h.stats$avg.silwidth,
dunn=h.stats$dunn)
}
t2 = rbind(foreachcluster2(2), foreachcluster2(3), foreachcluster2(4), foreachcluster2(5),foreachcluster2(6),
foreachcluster2(7), foreachcluster2(8), foreachcluster2(9), foreachcluster2(10),
foreachcluster2(11), foreachcluster2(12),foreachcluster2(13),foreachcluster2(14))
rownames(t2) = 2:14
t2
This should work. For pam():
pamC <- pam(x=m, k=2)
pamC
pamC$clustering
use $clustering instead of $cluster, the rest are the same.

Resources