I want to perform Kernel density estimate for a 5-dimensional data (x,y,z,time,size) by using "kde" function in "ks" library of R. In it's manual it says it can do Kernel density estimate for 1- to 6-dimensional data (Page 24 of manual: http://cran.r-project.org/web/packages/ks/ks.pdf).
My problem is that it says for more than 3 dimensions I need to specify eval.points. I don't know how can I specify the evaluation points because there is no example for more than 3 dimensions. For example if I want to Generate regular 3D sequences data in the space of the problem and use them as the eval-point, what should I do?
Here is my data:
422.697323 164.19886 2.457419 8.083796636 0.83367586
423.008236 163.32434 0.5551326 37.58477455 0.893893903
204.733908 218.36365 1.9397874 37.88324312 0.912809449
203.963056 218.4808 0.3723791 43.21775903 0.926406005
100.727581 46.60876 1.4022341 49.41510519 0.782807523
453.335182 244.25521 1.6292517 51.73779175 0.903910803
134.909462 210.96333 2.2389119 53.13433521 0.896529401
135.300562 212.02055 0.6739541 67.55073745 0.748783521
258.237117 134.29735 2.1205291 76.34032587 0.735699304
341.305271 149.26953 3.718958 94.33975483 0.849509216
307.138925 59.60571 0.6311074 106.9636715 0.987923188
307.76875 58.91453 2.6496741 113.8515307 0.802115718
415.025535 217.17398 1.7155688 115.7464603 0.875580325
414.977687 216.73327 1.7107369 115.9776948 0.767143582
311.006135 173.24378 2.7819572 120.8079566 0.925380118
310.116929 174.28122 4.3318722 129.2648401 0.776528535
347.260911 37.34946 3.5155427 136.7851291 0.851787115
351.317624 33.65703 0.5806926 138.7349284 0.909723017
4.471892 59.42068 1.4062959 139.0543783 0.967270976
5.480223 59.72857 2.7326106 139.2114277 0.987787428
199.513023 21.53302 2.5163259 143.5895625 0.864164659
198.718031 23.50163 0.4801849 147.2280466 0.741587333
26.650517 35.2019 0.8246514 150.4876506 0.744788202
25.089379 90.47825 0.8700944 152.1944046 0.777252476
26.307439 88.41552 2.4422487 155.9090026 0.952215177
234.282901 236.11422 1.8115261 155.9658144 0.776284654
235.052948 236.77437 1.9644963 156.6900297 0.944285448
23.048202 98.6261 3.4573048 159.7700912 0.773057491
21.516695 98.05431 2.5029284 160.8202997 0.978779087
213.936324 151.87013 3.1042192 161.0612489 0.80499513
277.887935 197.25753 1.3659279 163.673142 0.758978575
277.239746 197.54001 2.2109361 166.2629868 0.775325157
And this is the code that I am using:
library(ks)
library(rgl)
kern <- read.table(file.choose(), sep=",")
hat <- kde(kern)
It works for upto 3 dimensions but for 4 and 5 dimensions it says: need to specify eval.points for more than 3 dimensions.
Also, I'd like to know how can I plot these kernels? For example use z as the conditioning variable and plot x,y,time in a 3D scatterplot and also use different colors for different ranges of size
Like you I wasn't initially able to find a worked example and the documentation doesn't really describe what sort of object is expected. For your 5d set of data I tried setting up a 5d-grid of points that were constructed from the 10, 25th, 50th, 75th and 90th percentiles for each of the dimensions. My dataset was named "dat":
evpts <- do.call(expand.grid, lapply(dat, quantile, prob=c(0.1,.25,.5,.75,.9)) )
I then passed that to the kde function and seemed to satisfy the algorithm. Whether this is "correct" does need checking. No guarantees.
> hat <- kde(dat, eval.points= evpts)
> str(hat)
List of 8
$ x : num [1:31, 1:5] 423 423 205 204 101 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "V1" "V2" "V3" "V4" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ V1: Named num [1:3125] 23 118 234 326 415 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ V2: Named num [1:3125] 35.2 35.2 35.2 35.2 35.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V3: Named num [1:3125] 0.581 0.581 0.581 0.581 0.581 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V4: Named num [1:3125] 43.2 43.2 43.2 43.2 43.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V5: Named num [1:3125] 0.749 0.749 0.749 0.749 0.749 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "V1" "V2" "V3" "V4" ...
.. ..$ dimnames:List of 5
.. .. ..$ V1: chr [1:5] "V1= 23.0482" "V1=117.8185" "V1=234.2829" "V1=326.1557" ...
.. .. ..$ V2: chr [1:5] "V2= 35.20190" "V2= 59.51319" "V2=149.26953" "V2=211.49194" ...
.. .. ..$ V3: chr [1:5] "V3=0.5806926" "V3=1.1180112" "V3=1.9397874" "V3=2.5830000" ...
.. .. ..$ V4: chr [1:5] "V4= 43.21776" "V4= 71.94553" "V4=129.26484" "V4=151.34103" ...
.. .. ..$ V5: chr [1:5] "V5=0.7487835" "V5=0.7764066" "V5=0.8517871" "V5=0.9190948" ...
$ estimate : Named num [1:3125] 3.23e-08 5.70e-08 1.01e-08 4.07e-10 6.20e-12 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 5073.879 1010.815 1.211 -651.089 -0.223 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "V1" "V2" "V3" "V4" ...
$ w : num [1:31] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"
I did find an earlier version of the package documentaion that offered this as a worked example of a 4d execution, sot I think my effort is essentially the same, modulo different dimensions:
data(iris)
ir <- iris[,1:4][iris[,5]=="setosa",]
H.scv <- Hscv(ir)
fhat <- kde(ir, H.scv, eval.points=ir)
Related
I'm relatively new to R and I'm running network and behavior coevolution models using the R Package RSiena.
My data set consists of around 100 networks and for each of these networks, I run one RSiena model.
ans.1 <- siena07(myalgorithm, data=mydata.1, effects=myeff.1, batch=TRUE)
...
ans.100 <- siena07(myalgorithm, data=mydata.100, effects=myeff.100, batch=TRUE)
Now I want to test the goodness of fit for each of the multiple network models. I actually know how to check the goodness of fit for a single model.
gof <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(gof)
But I don't know how to combine the GOF results of all 100 models to get an overall impression. How can I get a table with the model number and the p-values. Or can I plot the results for all models within one plot? Or is there a better way?
So far I tried to put the GOF results in a list:
goftest <-list()
goftest[[1]] <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
...
goftest[[100]] <- sienaGOF(ans.100, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(goftest)
goftest[[1]] #Output:
"Siena Goodness of Fit ( IndegreeDistribution ), all periods
=====
Monte Carlo Mahalanobis distance test p-value: 0.941
-----
One tailed test used (i.e. estimated probability of greater distance than observation).
-----
Calculated joint MHD = ( 14.4 ) for current model."
str(goftest[[1]])#Output:
"List of 1
$ Joint:List of 8
..$ p : num 0.941
..$ SimulatedTestStat: Named num [1:2000] 9.97 16.02 6.83 10.14 8.65 ...
.. ..- attr(*, "names")= chr [1:2000] "1" "2" "3" "4" ...
..$ ObservedTestStat : num 2.09
..$ TwoTailed : logi FALSE
..$ Simulations : int [1:2000, 1:9] 21 22 22 21 19 26 30 23 25 26 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2000] "1" "2" "3" "4" ...
.. .. ..$ : NULL
..$ Observations : int [1, 1:9] 26 48 63 73 76 78 78 78 78
..$ InvCovSimStats : num [1:9, 1:9] 13.2509 4.9587 1.2948 0.231 0.0895 ...
..$ Rank : int 9
.. ..- attr(*, "method")= chr "tolNorm2"
.. ..- attr(*, "useGrad")= logi FALSE
.. ..- attr(*, "tol")= num 2e-15
..- attr(*, "class")= chr "sienaGofTest"
..- attr(*, "sienaFitName")= chr "sienaFitObject"
..- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
..- attr(*, "key")= chr [1:9] "0" "1" "2" "3" ...
- attr(*, "class")= chr "sienaGOF"
- attr(*, "scoreTest")= logi FALSE
- attr(*, "originalMahalanobisDistances")= num [1:3] 2.15 3.51 8.74
- attr(*, "oneStepMahalanobisDistances")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepMahalanobisDistances_old")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances_old")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepSpecs")= num[1:20, 0 ]
- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
- attr(*, "simTime")= 'proc_time' Named num [1:5] 39.61 0.28 40.21 NA NA
..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
- attr(*, "twoTailed")= logi FALSE
- attr(*, "joined")= logi TRUE"
But I don't know how to extract the p-Values and get a table, which just contains the network number and the associated p-value.
Furthermore, the plot command just produces error messages and no output so far.
It seems that, in R, I can refer to a variable with part of a variable name. But I am confused about why I can do that.
Use the following code as an example:
library(car)
scatterplot(housing ~ total)
house.lm <- lm(housing ~ total)
summary(house.lm)
str(summary(house.lm))
summary(house.lm)$coefficients[2,2]
summary(house.lm)$coe[2,2]
When I print the structure of summary(house.lm), I got the following output:
> str(summary(house.lm))
List of 11
$ call : language lm(formula = housing ~ total)
$ terms :Classes 'terms', 'formula' language housing ~ total
.. ..- attr(*, "variables")= language list(housing, total)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "housing" "total"
.. .. .. ..$ : chr "total"
.. ..- attr(*, "term.labels")= chr "total"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(housing, total)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "housing" "total"
$ residuals : Named num [1:162] -8.96 -11.43 3.08 8.45 2.2 ...
..- attr(*, "names")= chr [1:162] "1" "2" "3" "4" ...
$ coefficients : num [1:2, 1:4] 28.4523 0.0488 10.2117 0.0103 2.7862 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
$ aliased : Named logi [1:2] FALSE FALSE
..- attr(*, "names")= chr [1:2] "(Intercept)" "total"
$ sigma : num 53.8
$ df : int [1:3] 2 160 2
$ r.squared : num 0.123
$ adj.r.squared: num 0.118
$ fstatistic : Named num [1:3] 22.5 1 160
..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
$ cov.unscaled : num [1:2, 1:2] 3.61e-02 -3.31e-05 -3.31e-05 3.67e-08
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:2] "(Intercept)" "total"
- attr(*, "class")= chr "summary.lm"
However, it seems that I can refer to the variable coefficients with all of the following commands:
summary(house.lm)$coe[2,2]
summary(house.lm)$coef[2,2]
summary(house.lm)$coeff[2,2]
summary(house.lm)$coeffi[2,2]
summary(house.lm)$coeffic[2,2]
summary(house.lm)$coeffici[2,2]
summary(house.lm)$coefficie[2,2]
summary(house.lm)$coefficien[2,2]
summary(house.lm)$coefficient[2,2]
summary(house.lm)$coefficients[2,2]
They all give the same results: 0.01029709
Therefore, I was wondering when I can refer to a variable with only part of its name in R?
You can do it when rest of name is unambiguous. For example
df <- data.frame(abcd = c(1,2,3), xyz = c(4,5,6), abc = c(5,6,7))
> df$xy
[1] 4 5 6
> df$ab
NULL
> df$x
[1] 4 5 6
df$xy and even df$x gives right data, but df$ab results in NULL because it can refer to both df$abc and df$abcd. It's like when you type df$xy in RStudio and press Ctrl + Space you will get rigtht variable name, so you could refer to part of variable name.
http://adv-r.had.co.nz/Functions.html#lexical-scoping
When calling a function you can specify arguments by position, by
complete name, or by partial name. Arguments are matched first by
exact name (perfect matching), then by prefix matching, and finally by
position.
When you are doing quick coding to analyse some data, using partial names is not a problem, but I tend to agree, it's not good when writing code. In a package you can't do that, R-CMD check will find every occurence.
I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)
I have an indexed list containing several objects each of which contains 3 matrices ($tab, $nobs and $other). There are hundred such objects in the list. The objective is to access only $tab matrix and transpose it from each of the objects.
genfreqT <- lapply(genfreq[[1:100]]$tab, function(x) t(x))
This does not seem to work.
Here is how the genfreq object is structured. This was created with R package adegenet.
> str(genfreq[[1]])
List of 3
$ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : chr [1:1974] "L0001.1" "L0001.2" "L0002.1" "L0002.2" ...
$ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : Named chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
.. .. ..- attr(*, "names")= chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
$ call: language makefreq(x = x, truenames = TRUE)
genfreqT <-lapply(lapply(genfreq, "[[", "tab"),function(x) t(x))
The package developer for 'Adegenet' provided this solution:
> genfreqT <- lapply(genfreq, function(e) t(e$tab))
> summary(genfreqT)
Length Class Mode
data1.str 59220 -none- numeric
data2.str 59220 -none- numeric
data3.str 59220 -none- numeric
I have the following object M, from which I need to extract the fstatistic. It is a model generated by the function summaryC of a model generated by aovp, both functions from package lmPerm. I have tried hints for extracting values from normal linear models and from the functions in attr, extract and getElement, but without success.
Anybody could give me a hint?
> str(M)
List of 2
$ Error: vegetation: NULL
$ Error: Within :List of 11
..$ NA : NULL
..$ terms :Classes 'terms', 'formula' length 3 Temp ~ depth
.. .. ..- attr(*, "variables")= language list(Temp, depth)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "Temp" "depth"
.. .. .. .. ..$ : chr "depth"
.. .. ..- attr(*, "term.labels")= chr "depth"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ residuals : Named num [1:498] -46.9 -43.9 -46.9 -38.9 -41.9 ...
.. ..- attr(*, "names")= chr [1:498] "3" "4" "5" "6" ...
..$ coefficients : num [1:4, 1:4] -2.00 -1.00 -1.35e-14 1.00 2.59 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
..$ aliased : Named logi [1:4] FALSE FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:4] "depth1" "depth2" "depth3" "depth4"
..$ sigma : num 29
..$ df : int [1:3] 4 494 4
..$ r.squared : num 0.00239
..$ adj.r.squared: num -0.00367
..$ **fstatistic** : Named num [1:3] 0.395 3 494
.. ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
..$ cov.unscaled : num [1:4, 1:4] 0.008 -0.002 -0.002 -0.002 -0.002 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
..- attr(*, "class")= chr "summary.lmp"
- attr(*, "class")= chr "listof"
there it goes a reproducible example to play with:
Temp=1:100
depth<- rep( c("1","2","3","4","5"), 100)
vegetation=rep( c("1","2"), 50)
df=data.frame(Temp,depth,vegetation)
M=summaryC(aovp(Temp~depth+Error(vegetation),df, perm=""))
as the str output from your example shows, M is a list of two lists, the second one contains what you want. Hence list extraction via [[ does the trick:
> M[[2]][["fstatistic"]]
value numdf dendf
0.3946 3.0000 494.0000
If this is not what you want, please comment.