Looking for a read.hclust function - r

The function write.hclust for hclust objects is available in the RFLPtools package. However, I can't find a corresponding read.*** function despite Googling. Does anyone know of such a function?

If I am understanding your question correctly, you should be able to use read.rflp.
library(RFLPtools)
data(RFLPdata)
res <- RFLPdist(RFLPdata, nrBands = 4)
cl <- hclust(res)
write.hclust(cl, file = "Test.txt", prefix = "Bd4", h = 50)
read.rflp("Test.txt")
Returns:
Sample Cluster Cluster.ID Gel
Ni_25_B2 Ni_25 1 Bd4_H50_01 B2
Ni_25_B5 Ni_25 2 Bd4_H50_02 B5
Ni_28_A2 Ni_28 3 Bd4_H50_03 A2`

Related

Question about getting counts in the R survey package

I'm using the 2018 CBECS data set from the Energy Information Administration (available here: https://www.eia.gov/consumption/commercial/data/2018/xls/2018_public_use_data.csv) and I've set up the sample design according to their user guide. I'm noticing a discrepancy when I use the svyby function as opposed to just the svytotal function and I'm hoping somebody can explain what it is I'm seeing and/or what I'm doing wrong.
Here is the set up for the sample design:
library(survey)
library(spatstat)
library(tidyverse)
cbecs2018 <- read_csv(paste0(getwd(), "/2018_public_use_data.csv"))
samp_wts <- cbecs2018$FINALWT
rep_wts <- cbecs2018[, grepl("^FINALWT", names(cbecs2018))]
rep_wts$FINALWT <- NULL
samp_design <- svrepdesign(weights=samp_wts, repweights=rep_wts,
type="JK2", mse=TRUE, data=cbecs2018)
sqftc <- factor(cbecs2018$SQFTC) #this is categorical variable classifying buildings by size
When I run svytotal to get a count of buildings by each category in sqftc, I get the output below, which is consistent with what EIA has:
svytotal(~sqftc, samp_design)
total SE
sqftc2 2836939.2 138709.13
sqftc3 1358439.0 78632.96
sqftc4 966092.8 55503.86
sqftc5 396595.4 23727.58
sqftc6 218416.8 11718.72
sqftc7 93085.9 5179.07
sqftc8 39865.5 1993.62
sqftc9 6664.8 620.07
sqftc10 2111.8 255.25
However, when I try to break it out by census region, I get completely different counts by category. For example, instead of showing 2,836,939 buildings in the second sqftc group, the table below makes it look like there are 3,605,529 buildings in the group.
x <- svyby(~sqftc, ~region, samp_design, svytotal)
> sum(x$sqftc2)
[1] 3605529
print(x)
region sqftc2 sqftc3 sqftc4 sqftc5 sqftc6 sqftc7 sqftc8 sqftc9 sqftc10 se1 se2 se3 se4 se5 se6 se7 se8
1 1 679858.4 382470.2 466330.8 383649.9 638936.3 777312.6 918361.9 220786.7 97105.4 70972.33 58987.22 57377.8 41027.49 79224.73 100678.28 104811.7 26387.60
2 2 1142179.1 634697.1 752421.8 762969.8 929830.8 1107860.2 1382698.4 369059.3 149810.3 131036.12 88954.07 102800.3 120901.81 88769.62 118328.83 146119.8 56056.48
3 3 859228.7 456788.7 521518.6 540952.1 779310.4 912930.2 1062321.1 285638.1 100881.7 86845.98 50065.79 56198.4 53630.90 66850.76 68490.26 87545.5 34443.43
4 4 924262.5 499895.4 541658.9 555604.6 820252.5 927657.6 1205995.5 298595.7 96787.1 96106.38 51019.41 58771.1 58782.50 60113.72 85934.54 134417.5 41790.27
se9
1 14502.07
2 39303.04
3 21410.55
4 13725.39
I feel like whatever I'm doing wrong is probably pretty straightforward, but any pointers would be greatly appreciated.
maybe review your minimal reproducible example? :-) when i run this, the numbers match
library(survey)
cbecs2018 <- read.csv("https://www.eia.gov/consumption/commercial/data/2018/xls/2018_public_use_data.csv")
samp_design <-
svrepdesign(
weights = ~ FINALWT ,
repweights = "^FINALWT[0-9]" ,
type = 'JK2' ,
mse = TRUE ,
data = cbecs2018
)
samp_design <- update( samp_design , SQFTC = factor( SQFTC ) )
svytotal(~SQFTC, samp_design)
svyby(~SQFTC,~REGION,samp_design,svytotal)

Import data vector from julia to R using RCall

Assume I have a Julia data array like this:
Any[Any[1,missing], Any[2,5], Any[3,6]]
I want to import it to R using RCall so I have an output equivalent to this:
data <- cbind(c(1,NA), c(2,5), c(3,6))
Note: the length of data is dynamic and it may be not 3!
could anyone help me how can I do this? Thank you
You can just interpolate a matrix into R:
a = [ 1 2 3
missing 5 6 ]
R"data <- $a"
To reorgnize your "array of array" into a matrix, you need to concat them
b = Any[Any[1,missing], Any[2,5], Any[3,6]]
a = hcat(b...)
R"data <- $a"

Using cpquery function for several pairs from dataset

I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.
First of all, I created a bn object, a network of bn and a table with all strengths.
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
or this:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
Now I think I misunderstood something in R or bnlearn package.
Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!
cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373

Selecting features from a feature set using mRMRe package

I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.

Convert ashape3d class to mesh3d

Can somebody help me convert an 'ashape3d' class object to class 'mesh3d'?
In ashape3d, the triangle en tetrahedron faces are are stored in different fields. As I don't think there's a function that can create a mesh3d object from triangles&tetrahedrons simultaneously, I tried the following (pseudocode):
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
vert <- model$x[model$vert[,2]==1,]
vert <- cbind(vert,rep(1,nrow(vert)))
tria <- model$triang[model$triang[,4]==1,1:3]
tetr <- model$tetra[model$tetra[,6]==1,1:4]
m3dTria <- tmesh3d(vertices=vert , indices=tria)
m3dTetr <- qmesh3d(vertices=vert , indices=tetr)
m3d <- mergeMeshes(m3dTria,m3dTetr)
plot.ashape3d(model) # works fine
plot3d(m3d) # Error in x$vb[1, x$it] : subscript out of bounds
Does anybody have a better way?
I needed to do this recently and found this unanswered question. The easiest way to figure out what is going on is to look at plot.ashape3d and read the docs for ashape3d. plot.ashape3d only plots triangles.
The rgl package has a generic as.mesh3d function. This defines a method for that generic function.
as.mesh3d.ashape3d <- function(x, ...) {
if (length(x$alpha) > 1)
stop("I don't know how to handle ashape3d objects with >1 alpha value")
iAlpha = 1
# from help for ashape3d
# for each alpha, a value (0, 1, 2 or 3) indicating, respectively, that the
# triangle is not in the alpha-shape or it is interior, regular or singular
# (columns 9 to last)
# Pick the rows for which the triangle is regular or singular
selrows = x$triang[, 8 + iAlpha] >= 2
tr <- x$triang[selrows, c("tr1", "tr2", "tr3")]
rgl::tmesh3d(
vertices = t(x$x),
indices = t(tr),
homogeneous = FALSE
)
}
You can try it out on the data above
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
plot(model, edges=F, vertices=F)
library(rgl)
model2=as.mesh3d(model)
open3d()
shade3d(model2, col='red')

Resources