Weighted edges in R/igraph - r

I'm using R & the igraph package to plot a graph written in graphml and I want to use the weight parameter included in this syntax
<edge id="e389" source="w4" target="w0">
<data key="d1">0.166666666667</data>
</edge>
I can get the values with
weight = E(f)$weight // f is the graph
but I don't know how to use weight before calculating the df = degree(f)
For further information: all nodes are connected to each other and the weight is 1 / (number_of_nodes - 1) so the degree for each node should be 1.
graphml file
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d0" for="node" attr.name="label" attr.type="string"/>
<key id="d1" for="edge" attr.name="weight" attr.type="float"/>
<key id="d2" for="node" attr.name="type" attr.type="string"/>
<key id="d3" for="node" attr.name="tweet" attr.type="int"/>
<key id="d4" for="node" attr.name="color" attr.type="string"/>
<graph id="G" edgedefault="undirected">
<node id="w4">
<data key="d0">value1</data>
<data key="d2">word</data>
<data key="d1">0.166666666667</data>
<data key="d4">green</data>
</node>
.
.
.
<node id="w2">
<data key="d0">value2</data>
<data key="d2">word</data>
<data key="d1">0.166666666667</data>
<data key="d4">green</data>
</node>
<edge id="e389" source="w4" target="w0">
<data key="d1">0.166666666667</data>
</edge>

Most likely you are not looking for the degree() because this does not care about the edge weights. Are you probably looking for the graph.strength() function?
# create fully connected graph
g <- graph.full(10)
# assign weights such that every weight is 1/number_of_nodes -1
E(g)$weight <- 1/( length( V(g) ) -1 )
# calculate the "weighted degree"
graph.strength(g)
[1] 1 1 1 1 1 1 1 1 1 1
Alternatively, are you maybe looking for the normalized degree?
degree( g, normalized = TRUE )
[1] 1 1 1 1 1 1 1 1 1 1

Related

XQuery tumbling window: group by start item of first window

Using BaseX 9.7.3, I have a sorted list of names that has been produced using a tumbling window clause.
A snippet of the data looks like this:
<data>
<group>
<key id="0c7b0bca-0349-489c-b45f-2612f3134a76">ovid</key>
<key id="f77ab9c2-0be3-4348-809d-ab245e630f81">ovid 43 b c-17 or 18 a d</key>
</group>
<group>
<key id="39b9d6c2-85a5-4c72-a83e-2a52e548fc3b">ovid 43 bc</key>
<key id="acf5b3c0-8fd4-4e0c-950b-a40683bab431">ovid 43 bc-17 ad</key>
<key id="cc57be53-9ca8-4b5e-97cf-1aeca798cded">ovid 43 bc-17 ad or 18 a</key>
<key id="8395e750-1e52-4152-9d37-8c8f4e389fd3">ovid 43 bc-17 ad or 18 ad</key>
</group>
<group>
<key id="0be07fc6-d9bf-4d56-8352-1885b4dd6574">ovid 43 bc-17 or 18</key>
<key id="e3aafc69-56b0-4632-a96c-26ca448c6c2d">ovid 43 bc-17 or 18 ad</key>
</group>
<group>
<key id="f9615365-4a32-442b-9e20-9c5abb0e6fa0">ovide</key>
<key id="c7b45a8d-79a3-4e79-b32b-8d918f67a7b0">ovide 0043 av j-c-0017</key>
</group>
</data>
I would like to further group the data so that, in this example, a group would begin with "ovid" and end with "ovid 43 bc-17 or 18 ad."
Desired output:
<data>
<group>
<key id="0c7b0bca-0349-489c-b45f-2612f3134a76">ovid</key>
<key id="f77ab9c2-0be3-4348-809d-ab245e630f81">ovid 43 b c-17 or 18 a d</key>
<key id="39b9d6c2-85a5-4c72-a83e-2a52e548fc3b">ovid 43 bc</key>
<key id="acf5b3c0-8fd4-4e0c-950b-a40683bab431">ovid 43 bc-17 ad</key>
<key id="cc57be53-9ca8-4b5e-97cf-1aeca798cded">ovid 43 bc-17 ad or 18 a</key>
<key id="8395e750-1e52-4152-9d37-8c8f4e389fd3">ovid 43 bc-17 ad or 18 ad</key>
<key id="0be07fc6-d9bf-4d56-8352-1885b4dd6574">ovid 43 bc-17 or 18</key>
<key id="e3aafc69-56b0-4632-a96c-26ca448c6c2d">ovid 43 bc-17 or 18 ad</key>
</group>
<group>
<key id="f9615365-4a32-442b-9e20-9c5abb0e6fa0">ovide</key>
<key id="c7b45a8d-79a3-4e79-b32b-8d918f67a7b0">ovide 0043 av j-c-0017</key>
</group>
</data>
I have the following query, but it simply reproduces the input document:
<data>{
for tumbling window $entry in /*/group/key
start $s at $sp previous $sprev next $snext when starts-with($snext, $s)
end $e at $ep next $enext when not(starts-with($enext, $e))
return
<group>{
for $k in $entry
return (
<key id="{$k/#id}">{data($k)}</key>
)
}</group>
}</data>
Is it possible to compare the start item of the first group ("ovid") to subsequent entries that start with that token? I want to exclude "ovide," even though it starts with "ovid."
With extended (Java like) regular expressions as supported in Saxon I think
for tumbling window $w in /data/group/key
start $s when true()
end next $n when not(matches($n, '^' || $s || '\b', ';j'))
return
<group>{$w}</group>
gives the two groups you want.
I have now also checked that the ';j' flag works with BaseX 9.7.2 as well.

Nesting in XPATH

I have the following XML data for which I would like create an xpath statement which i think might contain nested count()
Here is the XML data for 5 CD Rentals
<?xml version="1.0"?>
<DataBase>
<!-- CD Rental 1 -->
<Rental>
<cd>
<title>title1</title>
</cd>
<person uniqueID = "1">
<name>name1</name>
</person>
</Rental>
<!-- CD Rental 2 -->
<Rental>
<cd>
<title>title2</title>
</cd>
<person uniqueID = "2">
<name>name2</name>
</person>
</Rental>
<!-- CD Rental 3 -->
<Rental>
<cd>
<title>title3</title>
</cd>
<person uniqueID = "1">
<name>name1</name>
</person>
</Rental>
<!-- CD Rental 4 -->
<Rental>
<cd>
<title>title4</title>
</cd>
<person uniqueID = "3">
<name>name3</name>
</person>
</Rental>
<!-- CD Rental 5 -->
<Rental>
<cd>
<title>title5</title>
</cd>
<person uniqueID = "2">
<name>name2</name>
</person>
</Rental>
</DataBase>
The xpath I had in mind was
Count the number of persons who rented multiple CD's
In the above XML data, the person with name as name1 and the person with name as name2 rented 2 CD's while name3 only rented 1 CD. So the answer I am expecting is 2. What could be a possible xpath for this?
One possible XPath expression would be:
count(//name[.=preceding::name][not(. = following::name)])
xpathtester demo
Brief explanation about the expression inside count():
//name[.=preceding::name]: find all elements name which have preceding element name with the same value, in other words name with duplicate
[not(. = following::name)]: further filter name elements found by the previous piece of XPath to return only the last of each duplicated name (distinct in Xpath?)

Build a table from XML data file using R language

I am new learner in R Programming,i have sample xml file as shown below
<Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
<Name>Group SEO Name</Name>
<Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
<DimensionLink DimensionID="Language"/>
<MetaData>
<Value AttributeID="Attribute-Group-Order">1</Value>
<Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
<Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
<Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
<Value AttributeID="Attribute-Order">3</Value>
</MetaData>
<AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
<AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
<UserTypeLink UserTypeID="ENT-Group"/>
<UserTypeLink UserTypeID="NAVGRP"/>
<UserTypeLink UserTypeID="ENT-SubCategory"/>
<UserTypeLink UserTypeID="ENT-Category"/>
i want to convert this into data frame using R language.My expected output is
## FullTextIndexed MultiValued ProductMode ExternallyMaintained Derived Mandatory Attribute-Group-Order Enterprise-Description UserTypeID
1 false false Property false false false 1 Navigation group seo name ENT-Group,ENT-Category,..
i have searched the internet but couldn't find a solution to my problem.
I got a code from internet
library("XML")
library("methods")
setwd("E:/Project")
xmldata<-xmlToDataFrame("Sample.xml")
print(xmldata)
but when i execute the code i get the below error
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c(Name = "You YoutubeLink7 (URL)", :
duplicate subscripts for columns
In addition: Warning message:
In names(x) == varNames :
longer object length is not a multiple of shorter object length
> print(xmldata)
Error in print(xmldata) : object 'xmldata' not found
could anyone help me know about what the error means and also a solution to my problem,sorry for the formatting issue.
Thanks in advance for the solution.
Thanks
With a correct xml data (attribute tag at the end of the file).
<?xml version="1.0" encoding="UTF-8"?>
<Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
<Name>Group SEO Name</Name>
<Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
<DimensionLink DimensionID="Language"/>
<MetaData>
<Value AttributeID="Attribute-Group-Order">1</Value>
<Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
<Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
<Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
<Value AttributeID="Attribute-Order">3</Value>
</MetaData>
<AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
<AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
<UserTypeLink UserTypeID="ENT-Group"/>
<UserTypeLink UserTypeID="NAVGRP"/>
<UserTypeLink UserTypeID="ENT-SubCategory"/>
<UserTypeLink UserTypeID="ENT-Category"/>
</Attribute>
Then we use xpath to get all we need. Change the path to your xml file in the htmlParse step.
library(XML)
data=htmlParse("C:/Users/.../yourxmlfile.xml")
fulltextindexed=xpathSApply(data,"normalize-space(//attribute/#fulltextindexed)")
multivalued=xpathSApply(data,"normalize-space(//attribute/#multivalued)")
productmode=xpathSApply(data,"normalize-space(//attribute/#productmode)")
externallymaintained=xpathSApply(data,"normalize-space(//attribute/#externallymaintained)")
derived=xpathSApply(data,"normalize-space(//attribute/#derived)")
mandatory=xpathSApply(data,"normalize-space(//attribute/#mandatory)")
attribute.group.order=xpathSApply(data,"//value[#attributeid='Attribute-Group-Order']",xmlValue)
enterprise.description=xpathSApply(data,"//value[#attributeid='Enterprise-Description']",xmlValue)
user.type.id=paste(xpathSApply(data,"//usertypelink/#usertypeid"),collapse = "|")
df=data.frame(fulltextindexed,multivalued,productmode,externallymaintained,derived,mandatory,attribute.group.order,enterprise.description,user.type.id)
Result :
Using tidyverse and xml2
DATA
data <- read_xml('<Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
<Name>Group SEO Name</Name>
<Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
<DimensionLink DimensionID="Language"/>
<MetaData>
<Value AttributeID="Attribute-Group-Order">1</Value>
<Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
<Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
<Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
<Value AttributeID="Attribute-Order">3</Value>
</MetaData>
<AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
<AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
<UserTypeLink UserTypeID="ENT-Group"/>
<UserTypeLink UserTypeID="NAVGRP"/>
<UserTypeLink UserTypeID="ENT-SubCategory"/>
<UserTypeLink UserTypeID="ENT-Category"/>
</Attribute>')
CODE
#For attribute tag
Attributes <- xml_find_all(data, "//Attribute")
Attributes <- Attributes %>%
map(xml_attrs) %>%
map_df(~as.list(.))
#find AttributeID nodes
nodes <- xml_find_all(data, "//Value")
AGO <- nodes[xml_attr(nodes, "AttributeID")=="Attribute-Group-Order"]
Attributes["Attribute-Group-Order"] <- xml_text(AGO)
ED <- nodes[xml_attr(nodes, "AttributeID")=="Enterprise-Description"]
Attributes["Enterprise-Description"] <- xml_text(ED)
#UserTypelink tags
UserTypeLink <- xml_find_all(data, "//UserTypeLink")
UserTypeLink <- UserTypeLink %>%
map(xml_attrs) %>%
map_df(~as.list(.)) %>%
mutate(UserTypeID=map_chr(UserTypeID, ~toString(UserTypeID, .x))) %>%
filter(row_number()==1)
#Final output
do.call("cbind", list(Attributes,UserTypeLink))

How to parse a complex xml in Rinto a dataframe?

I want to parse a nested xml file with the layout below in R and load it into a dataframe. I tried using several eays including the xml and xml2 packages but could not get it to work.
<?xml version="1.0" encoding="UTF-8"?>
<Targets>
<Target TYPE="myserver.mgmt.Metric" NAME="metric1">
<Attribute NAME="name" VALUE="metric1"></Attribute>
<Attribute NAME="Value" VALUE="2.4"></Attribute>
<Attribute NAME="collectionTime" VALUE="1525118288000"></Attribute>
<Attribute NAME="State" VALUE="normal"></Attribute>
<Attribute NAME="ObjectName" VALUE="obj1"></Attribute>
<Attribute NAME="ValueHistory" VALUE="5072"></Attribute>
</Target>
...
<Target TYPE="myserver.mgmt.Metric" NAME="metric999">
<Attribute NAME="name" VALUE="metric999"></Attribute>
<Attribute NAME="Value" VALUE="60.35"></Attribute>
<Attribute NAME="collectionTime" VALUE="1525118288000"></Attribute>
<Attribute NAME="State" VALUE="normal"></Attribute>
<Attribute NAME="ObjectName" VALUE="obj1"></Attribute>
<Attribute NAME="ValueHistory" VALUE="9550"></Attribute>
</Target>
</Targets>
The final outcome I am looking to get is:
name Value collectionTime State ObjectName ValueHistory
metric1 2.4 1525118288000 normal obj1 5072
metric2 60.35 1525118288000 normal obj2 9550
Any help is appreciated.
We can make use of XML with tidyverse
library(XML)
library(tidyverse)
lst1 <- getNodeSet(xml1, path = "//Target")
map_df(seq_along(lst1), ~
XML:::xmlAttrsToDataFrame(lst1[[.x]]) %>%
mutate_all(as.character) %>%
deframe %>%
as.list %>%
as_tibble) %>%
mutate_all(type.convert, as.is = TRUE)
# A tibble: 2 x 6
# name Value collectionTime State ObjectName ValueHistory
# <chr> <dbl> <dbl> <chr> <chr> <int>
#1 metric1 2.4 1525118288000 normal obj1 5072
#2 metric999 60.4 1525118288000 normal obj1 9550
data
xml1 <- xmlParse('<?xml version="1.0" encoding="UTF-8"?>
<Targets>
<Target TYPE="myserver.mgmt.Metric" NAME="metric1">
<Attribute NAME="name" VALUE="metric1"></Attribute>
<Attribute NAME="Value" VALUE="2.4"></Attribute>
<Attribute NAME="collectionTime" VALUE="1525118288000"></Attribute>
<Attribute NAME="State" VALUE="normal"></Attribute>
<Attribute NAME="ObjectName" VALUE="obj1"></Attribute>
<Attribute NAME="ValueHistory" VALUE="5072"></Attribute>
</Target>
<Target TYPE="myserver.mgmt.Metric" NAME="metric999">
<Attribute NAME="name" VALUE="metric999"></Attribute>
<Attribute NAME="Value" VALUE="60.35"></Attribute>
<Attribute NAME="collectionTime" VALUE="1525118288000"></Attribute>
<Attribute NAME="State" VALUE="normal"></Attribute>
<Attribute NAME="ObjectName" VALUE="obj1"></Attribute>
<Attribute NAME="ValueHistory" VALUE="9550"></Attribute>
</Target>
</Targets>
')

Outputting clustering information (dendogram) in R in a human readable format like XML or JSON

I wanto export the clustering information visualized in a dendogram in a human readable format (XML, JSON) with distances and the nodes and children nodes.
Have a look at the pmml package for a XML-based representation.
require(pmml)
# a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kmeans(x, 2)
plot(x, col = cl$cluster)
pmml(cl, centers = cl$centers)
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
<Header copyright="Copyright (c) 2013 edisz" description="KMeans cluster model">
<Extension name="user" value="edisz" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2013-11-11 15:02:46</Timestamp>
</Header>
<DataDictionary numberOfFields="2">
<DataField name="x" optype="continuous" dataType="double"/>
<DataField name="y" optype="continuous" dataType="double"/>
</DataDictionary>
<ClusteringModel modelName="KMeans_Model" functionName="clustering" algorithmName="KMeans: Hartigan and Wong" modelClass="centerBased" numberOfClusters="2">
<MiningSchema>
<MiningField name="x"/>
<MiningField name="y"/>
</MiningSchema>
<Output>
<OutputField name="predictedValue" feature="predictedValue"/>
<OutputField name="clusterAffinity_1" feature="clusterAffinity" value="1"/>
<OutputField name="clusterAffinity_2" feature="clusterAffinity" value="2"/>
</Output>
<ComparisonMeasure kind="distance">
<squaredEuclidean/>
</ComparisonMeasure>
<ClusteringField field="x" compareFunction="absDiff"/>
<ClusteringField field="y" compareFunction="absDiff"/>
<Cluster name="1" size="49" id="1">
<Array n="2" type="real">1.08242766097448 0.970387920586825</Array>
</Cluster>
<Cluster name="2" size="51" id="2">
<Array n="2" type="real">0.0261601744749776 0.0786776972701963</Array>
</Cluster>
</ClusteringModel>
</PMML>
But I don't know if this is more human-readable...

Resources