I am trying to export biometric data from an analysis using the ROCR package. Here is the code that I've done so far:
pred = performance(Matching.Score,Distribution)
perf = prediction(pred,"fnr", "fpr")
An object of class “performance”
Slot "x.name":
[1] "False positive rate"
Slot "y.name":
[1] "False negative rate"
Slot "alpha.name":
[1] "Cutoff"
Slot "x.values":
[[1]]
[1] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[15] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
......
Slot "y.values":
[[1]]
[1] 1.00000 0.99999 0.99998 0.99997 0.99996 0.99995
[15] 0.99986 0.99985 0.99984 0.99983 0.99982 0.99981
......
Slot "alpha.values":
[[1]]
[1] Inf 1.0427800 1.0221150 1.0056240 1.0032630 0.9999599
[12] 0.9644779 0.9633058 0.9628996 0.9626501 0.9607665 0.9605930
.......
This results in several Slots. I would like to export the resulting values into a text file for Excel modification using:
write(pred, "filename")
However, when I try to write the file, I get an error stating:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'S4') cannot be handled by 'cat'
Is there any way around this?
I'd appreciate any advice. Thank you!
Matt Peterson
Check the class structure of the resulting S4 objects with str, extract the relevant variables to build a dataframe and use write.table/write.csv to export the results. For instance, for the prediction pred:
R> library("ROCR")
R> data(ROCR.simple)
R> pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
R> perf <- performance(pred, "fnr", "fpr")
R> str(pred)
Formal class 'prediction' [package "ROCR"] with 11 slots
..# predictions:List of 1
.. ..$ : num [1:200] 0.613 0.364 0.432 0.14 0.385 ...
..# labels :List of 1
.. ..$ : Ord.factor w/ 2 levels "0"<"1": 2 2 1 1 1 2 2 2 2 1 ...
..# cutoffs :List of 1
.. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...
..# fp :List of 1
.. ..$ : num [1:201] 0 0 0 0 1 1 2 3 3 3 ...
..# tp :List of 1
.. ..$ : num [1:201] 0 1 2 3 3 4 4 4 5 6 ...
..# tn :List of 1
.. ..$ : num [1:201] 107 107 107 107 106 106 105 104 104 104 ...
..# fn :List of 1
.. ..$ : num [1:201] 93 92 91 90 90 89 89 89 88 87 ...
..# n.pos :List of 1
.. ..$ : int 93
..# n.neg :List of 1
.. ..$ : int 107
..# n.pos.pred :List of 1
.. ..$ : num [1:201] 0 1 2 3 4 5 6 7 8 9 ...
..# n.neg.pred :List of 1
.. ..$ : num [1:201] 200 199 198 197 196 195 194 193 192 191 ...
R> write.csv(data.frame(fp=pred#fp, fn=pred#fn), file="result_pred.csv")
and for performance perf:
R> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
..# x.name : chr "False positive rate"
..# y.name : chr "False negative rate"
..# alpha.name : chr "Cutoff"
..# x.values :List of 1
.. ..$ : num [1:201] 0 0 0 0 0.00935 ...
..# y.values :List of 1
.. ..$ : num [1:201] 1 0.989 0.978 0.968 0.968 ...
..# alpha.values:List of 1
.. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...
R> write.csv(data.frame(fpr=perf#x.values,
fnr=perf#y.values,
alpha.values=perf#alpha.values),
file="result_perf.csv")
Related
I am using ConsensusClusterPlus package in R for clustering my omic data. I want to use my clusters for regression.Is there a way to create composite scores if say i reduce 1000 genes to 7 clusters and use those 7 clusters for regression.
I tried to look at structure of cluster in R.
results = ConsensusClusterPlus(d1,maxK=maxK,reps=1000,pItem=0.8,pFeature=1, title=title,clusterAlg="hc",distance="pearson",seed=1262118388.71279,plot="png")
icl = calcICL(results,title=title,plot="png")
str(results[[7]])
List of 5
$ consensusMatrix: num [1:40, 1:40] 1 0.689 0.976 1 1 ...
$ consensusTree :List of 7
..$ merge : int [1:39, 1:2] -1 -5 -7 -8 -9 -10 -11 -12 -13 -14 ...
..$ height : num [1:39] 0 0 0 0 0 0 0 0 0 0 ...
..$ order : int [1:40] 40 34 35 28 6 32 22 18 21 19 ...
..$ labels : NULL
..$ method : chr "average"
..$ call : language hclust(d = as.dist(1 - fm), method = finalLinkage)
..$ dist.method: NULL
..- attr(*, "class")= chr "hclust"
$ consensusClass : Named int [1:40] 1 1 1 1 1 2 1 1 1 1 ...
..- attr(*, "names")= chr [1:40] "CAR 12:0" "CAR 12:1" "CAR 13:0" "CAR 14:0" ...
$ ml : num [1:40, 1:40] 1 0.689 0.976 1 1 ...
$ clrs :List of 3
..$ : chr [1:40] "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" ...
..$ : num 8
..$ : chr [1:7] "#A6CEE3" "#FB9A99" "#FF7F00" "#FDBF6F" ...
How to find composite scores ?
I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"
I have created a spatial points data frame from .las file which has the following structure and headings.
head(Rnorm#data)
X Y Z gpstime Intensity ReturnNumber NumberOfReturns Classification ScanAngle pulseID treeID
1: 390385.3 5847998 23.35 8194.459 9 1 2 5 -14 1 291
2: 390385.9 5847998 0.52 8194.479 76 1 1 4 -13 2 291
3: 390385.1 5847998 1.06 8194.483 72 1 1 4 -13 3 291
4: 390385.7 5847999 0.81 8194.483 75 1 1 4 -13 4 291
5: 390386.1 5848001 0.41 8194.503 15 2 2 3 -13 5 241
6: 390385.4 5848000 0.26 8194.503 78 1 1 3 -13 6 241
converting this data to spatial points data frame
df <- as.spatial(Rnorm)
the structure of the spatial points data frame looks like:
str(df)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..# data :'data.frame': 71164 obs. of 9 variables:
.. ..$ Z : num [1:71164] 23.35 0.52 1.06 0.81 0.41 ...
.. ..$ gpstime : num [1:71164] 8194 8194 8194 8194 8195 ...
.. ..$ Intensity : int [1:71164] 9 76 72 75 15 78 54 76 55 79 ...
.. ..$ ReturnNumber : int [1:71164] 1 1 1 1 2 1 1 1 2 1 ...
.. ..$ NumberOfReturns: int [1:71164] 2 1 1 1 2 1 1 1 2 1 ...
.. ..$ Classification : int [1:71164] 5 4 4 4 3 3 3 3 3 3 ...
.. ..$ ScanAngle : int [1:71164] -14 -13 -13 -13 -13 -13 -13 -14 -14 -14
.. ..$ pulseID : int [1:71164] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ treeID : int [1:71164] 291 291 291 291 241 241 241 291 NA NA
..# coords.nrs : num(0)
..# coords : num [1:71164, 1:2] 390385 390386 390385 390386 390386 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "X" "Y"
..# bbox : num [1:2, 1:2] 390281 5847998 390386 5848126
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "X" "Y"
.. .. ..$ : chr [1:2] "min" "max"
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..# projargs: chr NA
Now I want to create the polygon from each of the treeID which contains more than one points so would like to request for your kind help.
Thanks in advance for your kind help.
The lidR package has the functionality you need to create individual tree polygons from point cloud data. The following example shows how to convert point cloud data to individual tree polygons:
library(lidR)
# read file
las = readLAS("Example.las")
# ground classification
lasground(las, MaxWinSize = 10, InitDist = 0.05, CellSize = 7)
# normalization
lasnormalize(las, method = "knnidw", k = 10L)
# compute a canopy image
chm = grid_canopy(lasnorm, res = 0.5, subcircle = 0.2, na.fill = "knnidw", k = 4)
chm = as.raster(chm)
kernel = matrix(1,3,3)
chm = raster::focal(chm, w = kernel, fun = mean)
chm = raster::focal(chm, w = kernel, fun = mean)
# tree segmentation
crowns = lastrees(las, "watershed", chm, th = 4, extra = TRUE)
# display
tree = lasfilter(las, !is.na(treeID))
plot(tree, color = "treeID", colorPalette = pastel.colors(100), size = 1)
# More stuff
library(raster)
contour = rasterToPolygons(crowns, dissolve = TRUE)
plot(chm, col = height.colors(50))
plot(contour, add = T)
I'm following up an old question addressed here:
calculate x-value of curve maximum of a smooth line in R and ggplot2
How could I calculate the Y-value of curve maximum?
Cheers
It would seem to me that code changes of "x" to "y" and 'vline' to 'hline' and "xintercept" to "yintercept" would be all that were needed:
gb <- ggplot_build(p1)
exact_y_value_of_the_curve_maximum <- gb$data[[1]]$y[which(diff(sign(diff(gb$data[[1]]$y)))==-2)+1]
p1 + geom_hline( yintercept =exact_y_value_of_the_curve_maximum)
exact_y_value_of_the_curve_maximum
I don't think I would call these "exact" since they are only numerical estimates. The other way to get that value would be
max(gb$data[[1]]$y)
As the $data element of that build-object can be examined:
> str(gb$data)
List of 2
$ :'data.frame': 80 obs. of 7 variables:
..$ x : num [1:80] 1 1.19 1.38 1.57 1.76 ...
..$ y : num [1:80] -123.3 -116.6 -109.9 -103.3 -96.6 ...
..$ ymin : num [1:80] -187 -177 -166 -156 -146 ...
..$ ymax : num [1:80] -59.4 -56.5 -53.5 -50.3 -46.9 ...
..$ se : num [1:80] 29.3 27.6 25.9 24.3 22.8 ...
..$ PANEL: int [1:80] 1 1 1 1 1 1 1 1 1 1 ...
..$ group: int [1:80] 1 1 1 1 1 1 1 1 1 1 ...
$ :'data.frame': 16 obs. of 4 variables:
..$ x : num [1:16] 1 2 3 4 5 6 7 8 9 10 ...
..$ y : num [1:16] -79.6 -84.7 -88.4 -74.1 -29.6 ...
..$ PANEL: int [1:16] 1 1 1 1 1 1 1 1 1 1 ...
..$ group: int [1:16] 1 1 1 1 1 1 1 1 1 1 ...
One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?
The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...