This post is related to a previous question.
The basic problem in that post was how to connect nodes by common retweeted users. The code suggested by #ThomasIsCoding does work.
My follow up question is how to make this function store edge attributes. I provide a toy example below:
My initial dataframe is of the form:
author.id<-c("A","B","D")
rt_user<-c("C","C","C")
id<-c(1,2,3)
example<-data.frame(author.id,rt_user,id)
Where nodes A,B,D retweet from node C and the id column is a numeric classifier for the tweets. Using this in a network structure and applying the aforementioned function leads:
g <- graph.data.frame(example, directed = T)
edge.attributes(g)
gres <- do.call(
graph.union,
lapply(
names(V(g))[degree(g, mode = "out") == 0],
function(x) {
nbs <- names(V(g))[distances(g, v = x, mode = "in") == 1]
disjoint_union(
set_vertex_attr(make_full_graph(length(nbs)), name = "name", value = nbs),
set_vertex_attr(make_full_graph(1), name = "name", value = x)
)
}
)
)
plot(gres)
edge.attributes(gres)
My goal is to "preserve" the edge attributes of g in the final transformation of gres. I want to know that A used tweet 1, B used tweet 2 and D tweet 3. I believe now they should be transformed from edge to vertex attributes so that one still knows who is the user and which is the tweet but I am not sure on how to incorporate this into the code.
You can try the code below
gres <- set_vertex_attr(gres,
name = "id",
value = E(g)$id[match(names(V(gres)), names(tail_of(g, E(g))))]
)
where the edge attributes are characterized vertices from tail_of, and you will see
> V(gres)$id
[1] 1 2 3 NA
Related
As part of research I'm doing this semester, I'm adding onto a network that was already built by previous students but I'm running into errors I never had before. This code was not written by me, I have only modified it very slightly as I move along.
So I have this list daily_networks_df that contains 85953 data frames with each data frame looking something like this:
And the function & code here that generates different sections of the network (home, work, and other):
generate_el <- function(net_df){
networki = net_df[,1]
if(length(networki)>1){
# add edge in all possible pairs with prob_interact[i]
prob_interact=1/sqrt(length(networki))
ppl_pairs = combn(networki, 2)
tmp1 = rbinom(ncol(ppl_pairs), 1, prob_interact)
el_tmp = t(ppl_pairs[,tmp1==1])
return(el_tmp)
}
}
This function above is modified a bit for the 3 different network types but the other 2 are very similar so I didn't include. And here is the function calling:
# DEMAND NETWORK
work_start = 2
list_el1 = lapply(daily_networks_df[1:work_start], generate_el)
gr_el1 = do.call('rbind', list_el1)
# WORK NETWORK
prob_interact = .4
list_el2 = lapply(daily_networks_df[(work_start+1):homes_start], generate_el_work)
gr_el2 = do.call('rbind', list_el2)
# HOME NETWORK
prob_interact = .8
list_el3 = lapply(daily_networks_df[(homes_start+1):length(daily_networks_df)], generate_el_homes)
gr_el3 = do.call('rbind', list_el3)
# FULL NETWORK
gr_el=rbind(gr_el1, gr_el2, gr_el3)
gr = graph_from_data_frame(gr_el, directed=FALSE, vertices=total_pop)
I'm getting this error when I try to execute and have no idea why. Any help?
> gr = graph_from_data_frame(gr_el, directed=FALSE, vertices=total_pop)
Error in make_empty_graph(n = 0, directed = directed) :
VECTOR_ELT() can only be applied to a 'list', not a 'closure'
I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.
First of all, I created a bn object, a network of bn and a table with all strengths.
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
or this:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
Now I think I misunderstood something in R or bnlearn package.
Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!
cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373
I am making a neural network in R so I can predict future data.
Firstly, I made a function that makes the layers:
add_layer <- function(x, in_size, out_size, act_function){
w = tf$V=variable(tf$random_normal(shape(in_size, out_size)))
b = tf$variable(tf$random_normal(shape(1, out_size)))
wxb = tf$matmul(x,w)+ b
y = act_function(wxb)
return(y)
}
Then, I create the layers. For now, I create 2 layers:
x = tf$placeholder(tf$float32, shape(NULL,31))
ty = tf$placeholder(tf$float32, shape(NULL, 2))
#First layer
l1 = add_layer(x, 31, 10, tf$nn$relu)
#Second layer, result is 0(false) or 1(true)
l = add_layer(l1, 10,2, tf$nn$sotfmax)
But then there is an error when I make layer l1 and layer l:
AttributeError: module 'tensorflow' has no attribute 'variable'
The problem is, when I remove in_size or out_size, it gives me the error that these are missing. Do I add these two then it gives me this error. After filling all the parameters(in_size, out_size, x and the activation function) it still gives me variable missing as seen above.
Any suggestions how to solve this?
Edit: Changed capital letter v, but result is still the same
I work with regularly refreshed XML reports and I would like to automate the munging process using R & xml2.
Here's a link to an entire example file.
Here's a sample of the XML:
<?xml version="1.0" ?>
<riDetailEnrolleeReport xmlns="http://vo.edge.fm.cms.hhs.gov">
<includedFileHeader>
<outboundFileIdentifier>f2e55625-e70e-4f9d-8278-fc5de7c04d47</outboundFileIdentifier>
<cmsBatchIdentifier>RIP-2015-00096</cmsBatchIdentifier>
<cmsJobIdentifier>16220</cmsJobIdentifier>
<snapShotFileName>25032.BACKUP.D03152016T032051.dat</snapShotFileName>
<snapShotFileHash>20d887c9a71fa920dbb91edc3d171eb64a784dd6</snapShotFileHash>
<outboundFileGenerationDateTime>2016-03-15T15:20:54</outboundFileGenerationDateTime>
<interfaceControlReleaseNumber>04.03.01</interfaceControlReleaseNumber>
<edgeServerVersion>EDGEServer_14.09_01_b0186</edgeServerVersion>
<edgeServerProcessIdentifier>8</edgeServerProcessIdentifier>
<outboundFileTypeCode>RIDE</outboundFileTypeCode>
<edgeServerIdentifier>2800273</edgeServerIdentifier>
<issuerIdentifier>25032</issuerIdentifier>
</includedFileHeader>
<calendarYear>2015</calendarYear>
<executionType>P</executionType>
<includedInsuredMemberIdentifier>
<insuredMemberIdentifier>ARS001</insuredMemberIdentifier>
<memberMonths>12.13</memberMonths>
<totalAllowedClaims>1000.00</totalAllowedClaims>
<totalPaidClaims>100.00</totalPaidClaims>
<moopAdjustedPaidClaims>100.00</moopAdjustedPaidClaims>
<cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
<estimatedRIPayment>0.00</estimatedRIPayment>
<coinsurancePercentPayments>0.00</coinsurancePercentPayments>
<includedPlanIdentifier>
<planIdentifier>25032VA013000101</planIdentifier>
<includedClaimIdentifier>
<claimIdentifier>CADULT4SM00101</claimIdentifier>
<claimPaidAmount>100.00</claimPaidAmount>
<crossYearClaimIndicator>N</crossYearClaimIndicator>
</includedClaimIdentifier>
</includedPlanIdentifier>
</includedInsuredMemberIdentifier>
<includedInsuredMemberIdentifier>
<insuredMemberIdentifier>ARS002</insuredMemberIdentifier>
<memberMonths>9.17</memberMonths>
<totalAllowedClaims>0.00</totalAllowedClaims>
<totalPaidClaims>0.00</totalPaidClaims>
<moopAdjustedPaidClaims>0.00</moopAdjustedPaidClaims>
<cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
<estimatedRIPayment>0.00</estimatedRIPayment>
<coinsurancePercentPayments>0.00</coinsurancePercentPayments>
<includedPlanIdentifier>
<planIdentifier>25032VA013000101</planIdentifier>
<includedClaimIdentifier>
<claimIdentifier></claimIdentifier>
<claimPaidAmount>0</claimPaidAmount>
<crossYearClaimIndicator>N</crossYearClaimIndicator>
</includedClaimIdentifier>
</includedPlanIdentifier>
</includedInsuredMemberIdentifier>
</riDetailEnrolleeReport>
I would like to:
Read in the XML into R
Locate a specific insuredMemberIdentifier
Extract the planIdentifier and all claimIdentifier data associated with the member ID in (2)
Store all text and values for insuredMemberIdentifier, planIdentifier, claimIdentifier, and claimPaidAmount in a data.frame with a row for each unique claim ID (member ID to claim ID is a 1 to many)
So far, I have accomplished 1 and I'm in the ballpark on 2:
## Step 1 ##
ride <- read_xml("/Users/temp/Desktop/RIDetailEnrolleeReport.xml")
## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
memID <- xml_find_all(ride, "//d1:insuredMemberIdentifier[text()='ARS001']", xml_ns(ride))
[I know that I can then use xml_text() to extract the text of the element.]
After the code in Step 2 above, I've tried using xml_parent() to locate the parent node of the insuredMemberIdentifier, saving that as a variable, and then repeating Step 2 for claim info on that saved variable node.
node <- xml_parent(memID)
xml_find_all(node, "//d1:claimIdentifier", xml_ns(ride))
But this just results in pulling all claimIdentifiers in the global file.
Any help/information on how to get to step 4, above, would be greatly appreciated. Thank you in advance.
Apologies for the late response, but for posterity, import data as above using xml2, then parse the xml file by ID, as hinted by har07.
# output object to collect all claims
res <- data.frame(
insuredMemberIdentifier = rep(NA, 1),
planIdentifier = NA,
claimIdentifier = NA,
claimPaidAmount = NA)
# vector of ids of interest
ids <- c('ARS001')
# indexing counter
starti <- 1
# loop through all ids
for (ii in seq_along(ids)) {
# find ii-th id
## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
memID <- xml_find_all(x = ride,
xpath = paste0("//d1:insuredMemberIdentifier[text()='", ids[ii], "']"))
# find node for
node <- xml_parent(memID)
# as har07's comment find claim id within this node
cid <- xml_find_all(node, ".//d1:claimIdentifier", xml_ns(ride))
pid <- xml_find_all(node, ".//d1:planIdentifier", xml_ns(ride))
cpa <- xml_find_all(node, ".//d1:claimPaidAmount", xml_ns(ride))
# add invalid data handling if necessary
if (length(cid) != length(cpa)) {
warning(paste("cid and cpa do not match for", ids[ii]))
next
}
# collect outputs
res[seq_along(cid) + starti - 1, ] <- list(
ids[ii],
xml_text(pid),
xml_text(cid),
xml_text(cpa))
# adjust counter to add next id into correct row
starti <- starti + length(cid)
}
res
# insuredMemberIdentifier planIdentifier claimIdentifier claimPaidAmount
# 1 ARS001 25032VA013000101 CADULT4SM00101 100.00
Can somebody help me convert an 'ashape3d' class object to class 'mesh3d'?
In ashape3d, the triangle en tetrahedron faces are are stored in different fields. As I don't think there's a function that can create a mesh3d object from triangles&tetrahedrons simultaneously, I tried the following (pseudocode):
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
vert <- model$x[model$vert[,2]==1,]
vert <- cbind(vert,rep(1,nrow(vert)))
tria <- model$triang[model$triang[,4]==1,1:3]
tetr <- model$tetra[model$tetra[,6]==1,1:4]
m3dTria <- tmesh3d(vertices=vert , indices=tria)
m3dTetr <- qmesh3d(vertices=vert , indices=tetr)
m3d <- mergeMeshes(m3dTria,m3dTetr)
plot.ashape3d(model) # works fine
plot3d(m3d) # Error in x$vb[1, x$it] : subscript out of bounds
Does anybody have a better way?
I needed to do this recently and found this unanswered question. The easiest way to figure out what is going on is to look at plot.ashape3d and read the docs for ashape3d. plot.ashape3d only plots triangles.
The rgl package has a generic as.mesh3d function. This defines a method for that generic function.
as.mesh3d.ashape3d <- function(x, ...) {
if (length(x$alpha) > 1)
stop("I don't know how to handle ashape3d objects with >1 alpha value")
iAlpha = 1
# from help for ashape3d
# for each alpha, a value (0, 1, 2 or 3) indicating, respectively, that the
# triangle is not in the alpha-shape or it is interior, regular or singular
# (columns 9 to last)
# Pick the rows for which the triangle is regular or singular
selrows = x$triang[, 8 + iAlpha] >= 2
tr <- x$triang[selrows, c("tr1", "tr2", "tr3")]
rgl::tmesh3d(
vertices = t(x$x),
indices = t(tr),
homogeneous = FALSE
)
}
You can try it out on the data above
model <- ashape3d(rtorus(1000, 0.5, 2),alpha=0.25)
plot(model, edges=F, vertices=F)
library(rgl)
model2=as.mesh3d(model)
open3d()
shade3d(model2, col='red')