I would like to know if there is any way (I'm sure it is) to get the elements of the
additive relationship matrix A in R.
I already have the pedigree and I was succesfull on getting the A matrix by to different ways:
by using the function makeA from the pedigree package:
library(pedigree)
makeA(pedigree_renum, which = pedigree_renum$ID=="1-2372") #for all the animals
#> [1] TRUE
but I can not get the elements from the matrix
by using the function getA from the peigreemm package. In this case I get the 2372*2372 A matrix:
class(pedigree_general)
#> [1] "pedigree"
attr(,"package")
#> [1] "pedigreemm"
matrizA<-getA(pedigree_general)
class(matrizA)
#> [1] "dsCMatrix"
attr(,"package")
#> [1] "Matrix"
But I can't find out how to save certain elements from the matrix such as the upper diagonal elements.
Hope some of you can help me figure this out!
Different approaches to obtain the same result are welcome :)
Greetings from Buenos Aires.
From Pedigree's documentation on makeA:
Makes the A matrix for a part of a pedigree and stores it in a file called A.txt
What you have missed, if I'm not mistaken, is that the matrix you are searching for should be loaded from the file A.txt, which is the output file of the command. Example:
id <- 1:6
dam <- c(0,0,1,1,4,4)
sire <- c(0,0,2,2,3,5)
ped <- data.frame(id,dam,sire)
makeA(ped,which = c(rep(FALSE,4),rep(TRUE,2)))
A <- read.table("A.txt")
After printing A, here is the matrix:
Let me know if I'm missing something.
Related
I need to create a neighbours list from a spatial polygon. At the moment i am using the function poly2nb, but unfortunately it is not very accurate, and some polygons with no common points are considered neighbours. I have tried changing the snap argument, but with no luck.
I have however tried the function gTouches from the rgeos package, and it works much better. Only problem is, it creates a list object that cannot be used in spdep. Is there any way to convert it into a nb object?
Thank you in advance! :)
Looking at the source code for the tri2nb function https://rdrr.io/cran/spdep/src/R/tri2nb.R, which I know is a different function than the one you mentioned, you can change the list to nb type:
class(yourlist) <- "nb"
An example:
mylist <- list(A = c(1,2,5,6), B = c(2,4,6,5))
class(mylist) #check the class of mylist
Initial Result:
> class(mylist)
[1] "list"
Assign the class of mylist as nb
class(mylist) <- "nb"
class(mylist) #check the class of mylist
Final Result:
> class(mylist)
[1] "nb"
From this, you can continue to use functions in spdep to select optimal spatial weighting matrices.
Problem
I need to have a corpus of text, applied the same terms as another, so I can get a term document matrix with the same values.
What I am attempting, is to classify different corpus of texts between 2 groups, using a logistic regression, but I need both of these corpus to have the same variables from the function DocumentTermMatrix().
Current attempt and explanation with code
I can't wrap my head about how to approach the issue, for example, this gives me a first matrix of terms with their frequency:
data("crude")
crude_1 <- crude[1:10]
dtm_1 <- DocumentTermMatrix(crude_1)
dtm_1$dimnames$Terms
# [1] "..." "\"(it)" "\"demand" "\"for"
# [5] "\"growth" "\"if" "\"is" "\"may" ...
How can I use the same terms for the second part of the crude dataset, because so far the Terms are different:
crude_2 <- crude[11:20]
dtm_2 <- DocumentTermMatrix(crude_2)
dtm_2$dimnames$Terms
# [1] "..." "\"expansion" "\"is" "\"may"
# [5] "\"none" "\"this" "\"we" "\"will" ...
I could try to run a frequency on the same terms on crude_2. However, it would be expensive in terms of computation, and you might know a practical solution to this problem.
Question
I would like to coerce dtm_2 to have the same terms as in dtm_1. Only with the frequency of the crude_2 dataset. Is there a practical way to do this in R ?
Or more easily for example: Say I want to find out, how many times zebra or girafe appears in these text, and I want to do it explicitly, how can I proceed ?
Libraries used: library(tm)
Ok so I found a solution to my question by using the function tm_term_score() from the package tm and it works well I'd say, though any different implementation would be welcome.
Normal solution
This is the solution, first we capture the terms in a separate variable and then we apply the term list to create a matrix upon the other Document Term Matrix with the workds frequency:
data("crude")
# First dataset and term list
crude_1 <- crude[1:10]
dtm_1 <- DocumentTermMatrix(crude_1)
term_list <- dtm_1$dimnames$Terms
# Second dataset
crude_2 <- crude[11:20]
dtm_2 <- DocumentTermMatrix(crude_2)
# Creating a dummy column to remove at the end
X <- data.frame(dummy_col = 1:dtm_2$nrow)
for (term in term_list) {
temp_col <- tm_term_score(dtm_2, term)
# Attaching the column to the DF
X$temp_col<-temp_col
names(X)[length(names(X))] <- term
}
# Removing the dummy column
X$dummy_col <- NULL
# The variable X now contains the term frequency of the first data set, but applied to the second
Specific terms
And if you are looking for specific terms (like zebra or girafe, which there are none here), you can do as follow:
term_list <- c("zebra", "girafe")
X <- data.frame(dummy_col = 1:dtm_2$nrow)
for (term in term_list) {
temp_col <- tm_term_score(dtm_2, term)
# Attaching the column to the DF
X$temp_col<-temp_col
names(X)[length(names(X))] <- term
}
X$dummy_col <- NULL
X
# zebra girafe
# 1 0 0
# 2 0 0
# 3 0 0
# ...
I'm having some trouble extracting information from the ICtab() function of the bbmle package. Essentially what I'm trying to do is run this function on a series of glm models, then add that output to a master data.frame object. However, while I can extract the $dqAIC and $df parameters from the ICtab() output, I cannot figure out a way to extract the row names themselves (i.e. the names of the models that are being input into ICtab). This is an issue because the ICtab() output is ordered in ascending order of $dqAIC - as such, I cannot pre-label a list or data.frame or matrix with the correct order, as the resulting $dqAIC values are not known ahead of time. To compound the problem, the ICtab() object class does not seem to be able to be coerced into a data.frame or any other object where I might be able to extract row.names() or anything similar.
What I'm looking for is a way to extract all the information from the ICtab() function, as a whole or in 3 pieces (row names, dqAIC values, and df values), and then append it to a master table along with some other information.
Below is a sample of the code I'm trying, along with some test data.
library(bbmle)
library(visreg)
library(splines)
library(foreign)
library(survival)
library(lubridate)
dfun<- function(object) {with(object, sum((weights*residuals^2)[weights>0])/df.residual)}
test.data.1 <- seq(1, 1000, by = 10)
num.days <- seq(1, 100, by = 1)
disp.global <- glm(test.data.1 ~ num.days, family=poisson(link="log"), na.action=na.exclude)
model.1 <- glm(test.data.1 ~ ns(num.days, df = 3), family=poisson(link="log"), na.action=na.exclude)
model.2 <- glm(test.data.1 ~ ns(num.days, df = 6), family=poisson(link="log"), na.action=na.exclude)
testIC <- ICtab(model.1, model.2, dispersion=dfun(disp.global),type="qAIC")
Which gives the result:
> testIC
dqAIC df
model.2 0 7
model.1 5 4
I can pull the dqAIC and df values:
> testIC$dqAIC
[1] 0.000000 5.018875
> testIC$df
[1] 7 4
But I cannot figure out a way to get the "model.2" and "model.1" row names; row.names(testIC) returns nothing, and rownames(testIC) simply returns a NULL:
> row.names(testIC)
> rownames(testIC)
NULL
And as far as I can tell, there is no way to change this output using list(), as.data.frame(), data.frame(), or any other object type to get these row names.
> as.data.frame(testIC)
Error in as.data.frame.default(testIC) :
cannot coerce class ""ICtab"" to a data.frame
As a side note, in the documentation for the bbmle package, there appears to be a function called get.mnames() that should do exactly this - list the model names - however, it does not appear to be included in the bbmle package that is installed (my version matches the version of the documentation, 1.0.18):
> ls("package:bbmle")
[1] "AIC" "AICc" "AICctab" "AICtab" "anova" "BICtab" "call.to.char" "coef" "confint" "deviance"
[11] "formula" "ICtab" "logLik" "mle2" "namedrop" "parnames" "parnames<-" "plot" "predict" "profile"
[21] "qAIC" "qAICc" "relist2" "residuals" "sbeta" "sbetabinom" "sbinom" "simulate" "slice" "slice1D"
[31] "slice2D" "sliceOld" "snbinom" "snorm" "spois" "stdEr" "summary" "update" "vcov"
Any help getting these row names out of the ICtab() result would be greatly appreciated. The above code is simply a sample - what I'm actually doing is running multiple models, with a series of datasets, through the ICtab() function, and I want to put all of that information together in one data.frame object as the result.
Thanks in advance,
Nate
I had the same problem as yours, and I can see that nobody replied to your post.
I am not proud of my solution, it is not very elegant, but it works
class(testIC) <- "data.frame"
rownames(testIC)
I hope it would help someone, someday.
trantsyx' solution actually works fine. It can be combined with the convenient table2office commands from {export} package. Works perfect for me.
In the pvclust package in R, there is the pvclust() function. In the example provided in the function help file, there's the function:
boston.pp <- pvpick(boston.pv)
This is supposed to print out the clusters with high p-values. The output of this function is:
$clusters
$clusters[[1]]
[1] "rm" "medv"
$clusters[[2]]
[1] "zn" "dis"
$clusters[[3]]
[1] "crim" "indus" "nox" "age" "rad" "tax" "ptratio" "lstat"
$edges
[1] 3 5 9
I have a lot of trouble understanding what the output means, especially since I have very limited technical background on cluster analysis. In particular, I don't understand the meaning of the vector of names under each cluster. Can someone explain this for me? Thanks!
https://cran.r-project.org/web/packages/pvclust/pvclust.pdf
describes pvclust:
For data expressed as (n x p) matrix or data frame, we assume that the data is n observations of p objects, which are to be clustered. The i’th row vector corresponds to the i’th observation of these objects and the j’th column vector corresponds to a sample of j’th object with size n
Output of pvpick:
cluster - a list of character string vectors. Each vector corresponds to the names of objects in each cluster.
Have you plotted dendrogram of pvclust output? pvpick clusters output just lists internal points (pvclust treats each column in boston dataset as a point) in some cluster which you will see in dendrogram if you plot it.
I would like to know if there is a 'proper' way to subset big.matrix objects in R. It is simple to subset a matrix but the class always reverts to 'matrix'. This isn't a problem when working with small datasets like this but with massive datasets but with extremely large datasets the subset could still benefit from the 'big.matrix' class.
require(bigmemory)
data(iris)
# I realize the warning about factors but not important for this example
big <- as.big.matrix(iris)
class(big)
[1] "big.matrix"
attr(,"package")
[1] "bigmemory"
class(big[,c("Sepal.Length", "Sepal.Width")])
[1] "matrix"
class(big[,1:2])
[1] "matrix"
I have since learned that the 'proper' way to subset a big.matrix is to use sub.big.matrix although this is only for contiguous columns and/or rows. Non-contiguous subsetting is not currently implemented.
sm <- sub.big.matrix(big, firstCol=1, lastCol=2)
It doesn't seem to be possible without calling as.big.matrix on the subset.
From the big.matrix documentation,
If x is a big.matrix, then x[1:5,] is returned as an R matrix containing the first five rows of x.
I presume this also applies to columns as well. So it seems you would need to call
a <- as.big.matrix(big[,1:2])
in order for the subset to also be a big.matrix object.
class(a)
# [1] "big.matrix"
# attr(,"package")
# [1] "bigmemory"