How to parse xml/sbml with R package xml? - r

I'm trying to parse information from the sbml/xml file below
https://dl.dropboxusercontent.com/u/10712588/file.xml
from this code
http://search.bioconductor.jp/codes/11172
It seems that I can import the file normally by
doc <- xmlTreeParse(filename,ignoreBlanks = TRUE)
but I can't recover node attributes by
atrr <- xpathApply(doc, "//species[#id]", xmlGetAttr, "id")
or
xpathApply(doc, "//species", function(n) xmlValue(n[[2]]))
A node of the file follows...
<species id="M_10fthf_m" initialConcentration="1" constant="false" hasOnly
SubstanceUnits="false" name="10-formyltetrahydrofolate(2-)" metaid="_metaM_10fth
f_m" boundaryCondition="false" sboTerm="SBO:0000247" compartment="m">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>FORMULA: C20H21N7O7</p>
<p>CHARGE: -2</p>
<p>INCHI: InChI=1S/C20H23N7O7/c21-20-25-16-15(18(32)26-20)23-11(7-22
-16)8-27(9-28)12-3-1-10(2-4-12)17(31)24-13(19(33)34)5-6-14(29)30/h1-4,9,11,13,23
H,5-8H2,(H,24,31)(H,29,30)(H,33,34)(H4,21,22,25,26,32)/p-2/t11-,13+/m1/s1</p>
<p>HEPATONET_1.0_ABBREVIATION: HC00212</p>
<p>EHMN_ABBREVIATION: C00234</p>
</body>
</notes>
<annotation>
...
I would like to retrieve all information inside species node, anyone know how to do that?

There exists an SBML parsing library libSBML (http://sbml.org/Software/libSBML).
This includes a binding to R that would allow access to the SBML objects directly within R using code similar to
document = readSBML(filename);
errors = SBMLErrorLog_getNumFailsWithSeverity(
SBMLDocument_getErrorLog(document),
enumToInteger("LIBSBML_SEV_ERROR", "_XMLErrorSeverity_t")
);
if (errors > 0) {
cat("Encountered the following SBML errors:\n");
SBMLDocument_printErrors(document);
q(status=1);
}
model = SBMLDocument_getModel(document);
if (is.null(model)) {
cat("No model present.\n");
q(status=1);
}
species = Model_getSpecies(model, index_of_species);
id = Species_getId(species);
conc = Species_getInitialConcentration(species)
There is a Species_get(NameOfAttribute) function for each possible attribute; together with Species_isSet(NameOfAttribute); Species_set(NameOfAttribute) and Species_unset(NameOfAttribute).
The API is similar for interacting with any SBML element.
The libSBML releases include R installers that are available from
http://sourceforge.net/projects/sbml/files/libsbml/5.8.0/stable
navigating to the R_interface subdirectory for the OS and architecture of your choice.
The source code distribution of libSBML contains an examples/r directory with many examples of using libSBML to interact with SBML in the R environment.

I guess it depends on what you mean when you say you want to "retrieve" all the information in the species nodes, because that retrieved data could be coerced to any number of different formats. The following assumes you want it all in a data frame, where each row is an species node from your XML file and the columns represent different pieces of information.
When just trying to extract information, I generally find it easier to work with lists than with XML.
doc <- xmlTreeParse(xml_file, ignoreBlanks = TRUE)
doc_list <- xmlToList(doc)
Once it's in a list, you can figure out where the species data is stored:
sapply(x, function(x)unique(names(x)))
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
[1] "species"
[[5]]
[1] "reaction"
[[6]]
[1] "metaid"
$.attrs
[1] "level" "version"
So you really only want the information in doc_list[[4]]. Take a look at just the first component of doc_list[[4]]:
str(doc_list[[4]][[1]])
List of 9
$ : chr "FORMULA: C20H21N7O7"
$ : chr "CHARGE: -2"
$ : chr "HEPATONET_1.0_ABBREVIATION: HC00212"
$ : chr "EHMN_ABBREVIATION: C00234"
$ : chr "http://identifiers.org/obo.chebi/CHEBI:57454"
$ : chr "http://identifiers.org/pubchem.compound/C00234"
$ : chr "http://identifiers.org/hmdb/HMDB00972"
$ : Named chr "#_metaM_10fthf_c"
..- attr(*, "names")= chr "about"
$ .attrs: Named chr [1:9] "M_10fthf_c" "1" "false" "false" ...
..- attr(*, "names")= chr [1:9] "id" "initialConcentration" "constant" "hasOnlySubstanceUnits" ...
So you have the information contained in the first eight lists, plus the information contained in the attributes.
Getting the attributes information is easy because it's already named. The following formats the attributes information into a data frame for each node:
doc_attrs <- lapply(doc_list[[4]], function(x) {
x <- unlist(x[names(x) == ".attrs"])
col_names <- gsub(".attrs.", "", names(x))
x <- data.frame(matrix(x, nrow = 1), stringsAsFactors = FALSE)
colnames(x) <- col_names
x
})
Some nodes didn't appear to have attributes information and so returned empty data frames. That caused problems later so I created data frames of NAs in their place:
doc_attrs_cols <- unique(unlist(sapply(doc_attrs, colnames)))
doc_attrs[sapply(doc_attrs, length) == 0] <-
lapply(doc_attrs[sapply(doc_attrs, length) == 0], function(x) {
df <- data.frame(matrix(rep(NA, length(doc_attrs_cols)), nrow = 1))
colnames(df) <- doc_attrs_cols
df
})
When it came to pulling non-attribute data, the names and values of the variables were generally contained within the same string. I originally tried to come up with a regular expression to extract the names, but they're all formatted so differently that I gave up and just identified all the possibilities in this particular data set:
flags <- c("FORMULA:", "CHARGE:", "HEPATONET_1.0_ABBREVIATION:",
"EHMN_ABBREVIATION:", "obo.chebi/CHEBI:", "pubchem.compound/", "hmdb/HMDB",
"INCHI: ", "kegg.compound/", "kegg.genes/", "uniprot/", "drugbank/")
Also, sometimes the non-attribute information was kept as just a list of values, as in the node I showed above, while other times it was contained in "notes" and "annotation" sublists, so I had to include an if else statement to make things more consistent.
doc_info <- lapply(doc_list[[4]], function(x) {
if(any(names(x) != ".attrs" & names(x) != "")) {
names(x)[names(x) != ".attrs"] <- ""
x <- unlist(do.call("c", as.list(x[names(x) != ".attrs"])))
} else {
x <- unlist(x[names(x) != ".attrs"])
}
x <- gsub("http://identifiers.org/", "", x)
need_names <- names(x) == ""
names(x)[need_names] <- gsub(paste0("(", paste0(flags, collapse = "|"), ").+"), "\\1", x[need_names], perl = TRUE)
#names(x) <- gsub("\\s+", "", names(x))
x[need_names] <- gsub(paste0("(", paste0(flags, collapse = "|"), ")(.+)"), "\\2", x[need_names], perl = TRUE)
col_names <- names(x)
x <- data.frame(matrix(x, nrow = 1), stringsAsFactors = FALSE)
colnames(x) <- col_names
x
})
To get everything together into a data frame, I suggest the plyr package's rbind.fill.
require(plyr)
doc_info <- do.call("rbind.fill", doc_info)
doc_attrs <- do.call("rbind.fill", doc_attrs)
doc_all <- cbind(doc_info, doc_attrs)
dim(doc_all)
[1] 3972 22
colnames(doc_all)
[1] "FORMULA:" "CHARGE:" "HEPATONET_1.0_ABBREVIATION:" "EHMN_ABBREVIATION:"
[5] "obo.chebi/CHEBI:" "pubchem.compound/" "hmdb/HMDB" "about"
[9] "INCHI: " "kegg.compound/" "kegg.genes/" "uniprot/"
[13] "drugbank/" "id" "initialConcentration" "constant"
[17] "hasOnlySubstanceUnits" "name" "metaid" "boundaryCondition"
[21] "sboTerm" "compartment"

As a partial answer, the document uses name spaces, and 'species' is part of the 'id' name space. So
> xpathSApply(doc, "//id:species", xmlGetAttr, "id", namespaces="id")
[1] "M_10fthf_c" "M_10fthf_m" "M_13dampp_c" "M_h2o_c" "M_o2_c"
[6] "M_bamppald_c" "M_h2o2_c" "M_nh4_c" "M_h_m" "M_nadph_m"
...
with id:species and namespaces="id" being different from what you illustrate above.

Related

Generate a xml from a R list

I'm new to xml and processing it in R.
I've been able to read and retrieve info from xml files using the xml2 package, but creating xml files from R objects has proven to be more challenging.
In particular, I'd like to generate a xml file from a R list. Consider the example below:
library(reprex)
library(xml2)
r_list <- list(person1 = list(starts = letters[1:3], ends = letters[4:6]), person2 = list(starts = LETTERS[1:4], ends = LETTERS[5:8]))
str(r_list)
#> List of 2
#> $ person1:List of 2
#> ..$ starts: chr [1:3] "a" "b" "c"
#> ..$ ends : chr [1:3] "d" "e" "f"
#> $ person2:List of 2
#> ..$ starts: chr [1:4] "A" "B" "C" "D"
#> ..$ ends : chr [1:4] "E" "F" "G" "H"
test1 <- xml2::as_xml_document((r_list))
#> Error: Root nodes must be of length 1
new_xml <- xml_new_root(.value = "category", name = "personList")
for(person in names(r_list)){
xml_add_child(new_xml, as_xml_document(r_list[person]))
}
new_xml
#> {xml_document}
#> <category name="personList">
#> [1] <person1>ad</person1>
#> [2] <person2>AE</person2>
Created on 2021-11-25 by the reprex package (v2.0.1)
I tried to directly coerce the list to xml using the as_xml_document function, but I get the error Root nodes must be of length 1.
Following the idea on this question, I tried to create the xml document with a root node and xml_add_child() to this document, but I did not get the expected result (see code output). In that question, they transform from an R data frame and not a list.
I'd also like to have personalized tag names and add attributes to these tags. The wished output would be:
<category name="personList">
<pers name="person1">
<starts>
<value>a</value>
<value>b</value>
<value>c</value>
</starts>
<ends>
<value>d</value>
<value>e</value>
<value>f</value>
</ends>
</pers>
<pers name="person2">
<starts>
<value>A</value>
<value>B</value>
<value>C</value>
<value>D</value>
</starts>
<ends>
<value>D</value>
<value>E</value>
<value>F</value>
<value>G</value>
</ends>
</pers>
</category>
Thanks for your help and have a nice day
R list attributes can be mapped to XML attributes:
library(xml2)
library(tidyverse)
r_list <- list(person1 = list(starts = letters[1:3], ends = letters[4:6]), person2 = list(starts = LETTERS[1:4], ends = LETTERS[5:8]))
r_list
new_xml <- xml_new_root(.value = "category", name = "personList")
for (person in names(r_list)) {
p <- list()
p[["pers"]] <- list(
starts = r_list[[person]]$starts %>% map(~list(value = list(.x))),
ends = r_list[[person]]$ends %>% map(~list(value = list(.x)))
)
attr(p[["pers"]], "name") <- person
xml_add_child(new_xml, as_xml_document(p))
}
write_xml(new_xml, "foo.xml")
output:
<?xml version="1.0" encoding="UTF-8"?>
<category name="personList">
<pers name="person1">
<starts>
<value>a</value>
<value>b</value>
<value>c</value>
</starts>
<ends>
<value>d</value>
<value>e</value>
<value>f</value>
</ends>
</pers>
<pers name="person2">
<starts>
<value>A</value>
<value>B</value>
<value>C</value>
<value>D</value>
</starts>
<ends>
<value>E</value>
<value>F</value>
<value>G</value>
<value>H</value>
</ends>
</pers>
</category>
Following the comment by #Limey (to see this question), I could generate the wished output with the following code (posted as answer just for completeness, as #danlooo answer also produces the same output).
library(XML)
r_list <- list(person1 = list(starts = letters[1:3], ends = letters[4:6]), person2 = list(starts = LETTERS[1:4], ends = LETTERS[5:8]))
str(r_list)
category = newXMLNode("category", attrs = c(name="personList"))
for(person in names(r_list)){
pers <- newXMLNode("pers", attrs = c(name = person), parent = category)
startsn <- newXMLNode("starts", parent = pers)
for(value in seq_along(r_list[[person]][["starts"]])){
svalue <- newXMLNode("value", r_list[[person]][["starts"]][[value]], parent = startsn)
}
endsn <- newXMLNode("ends", parent = pers)
for(value in seq_along(r_list[[person]][["ends"]])){
evalue <- newXMLNode("value", r_list[[person]][["ends"]][[value]], parent = endsn)
}
}
category

create a named list element, if it doesnt exist already in R

I have a list, masterList, to which i want to append another lists:, contentList and contentList2. the content lists are the result belonging to "01-response" - the first element of the masterList.
masterList <- list("01-response" = NULL, "02-response" = NULL,"03-response" = NULL,"04-response" = NULL,"05-response" = NULL)
contentList <- list(item1 = "text", item2 = "text2")
contentList2 <- list(item1 = "moretext", item2 = "moretext2")
I want to append contentList and contentList2 to masterList[["01-response"]]. However, i want all the contents to be stored inside a named list inside the masterList: `masterList[["01-response"]][["contents"]], like:
masterList
$`01-response`
$`01-response`$content
$`01-response`$content$item1
[1] "text"
$`01-response`$content$item2
[1] "text2"
$`01-response`$content$item1
[1] "moretext"
$`01-response`$content$item2
[1] "moretext2"
$`02-response`
NULL
$`03-response`
NULL
$`04-response`
NULL
$`05-response`
NULL
The problem is with the appending. I need to check if `masterList[["01-response"]][["contents"]] exists, before i append. If it exists, i simply append. If it doesnt exist, i need to create it first.
Lets specify the element as a variable, such: listElement <- "01-response". If i were to add a third list, contentList3 <- list(item1 = "moretext3", item2 = "moretext4"), i would simply run:
`listElement <- "01-response"`
if(exists("content", where = masterList[[listElement]])){
masterList[[listElement]][["content"]] <- append(masterList[[listElement]][["content"]],
contentList3)
}else{
masterList[[listElement]] <- append(masterList[[listElement]],
list(content = contentList3))
}
However, this code breakes if the masterList is empty:
masterList
$`01-response`
NULL
$`02-response`
NULL
$`03-response`
NULL
$`04-response`
NULL
$`05-response`
NULL
exists("content", where = masterList[[listElement]])
Error in as.environment(where) : using 'as.environment(NULL)' is defunct
How can i check if "content" exists at the level of masterList[[listElement]]?
Note: this happens inside an function, threfore i want to remain flexible and avoid using masterList[["01-response"]]. I use masterList[[listElement]] instead, where listElement <- "01-response"
I think you can work without knowing if content exists already.
masterList[["01-response"]]$content <- c(masterList[["01-response"]]$content, contentList, contentList2)
str(masterList)
# List of 5
# $ 01-response:List of 1
# ..$ content:List of 4
# .. ..$ item1: chr "text"
# .. ..$ item2: chr "text2"
# .. ..$ item1: chr "moretext"
# .. ..$ item2: chr "moretext2"
# $ 02-response: NULL
# $ 03-response: NULL
# $ 04-response: NULL
# $ 05-response: NULL
The trick of this is that c(x, y) will create if x is NULL, and append if x is a pre-existing list.
To answer one of your questions:
How can i check if "content" exists at the level of masterList[[listElement]]?
"content" %in% names(masterList[["01-response"]])

LIST to data.frame in XML file

I am working on XML files and I am trying to transform them into data.frame. However, during the transformation process the file is “LIST”, as seen below:
My Code:
require(tidyverse)
require(xml2)
page<-read_xml('<?xml version="1.0" encoding="ISO-8859-1" ?>
<test2:TASS xmlns="http://www.vvv.com/schemas"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd"
xmlns:test2="http://www.vvv.com/schemas" >
<test2:billing>
<test2:proceduresummary>
<test2:guidenumber>Z4088</test2:guidenumber>
<test2:diagnosis>
<test2:table>ICD-10</test2:table>
<test2:diagnosiscod>G93</test2:diagnosiscod>
<test2:description>DISORDER OF BRAIN, UNSPECIFIED</test2:description>
</test2:diagnosis>
<test2:procedure>
<test2:procedure>
<test2:description>HOSPITAL</test2:description>
</test2:procedure>
<test2:amount>15</test2:amount>
</test2:procedure>
</test2:proceduresummary>
</test2:billing>
</test2:TASS>')
t1<-if ("test2" %in% names(xml_ns(page))) {
ns<-xml_ns_rename(xml_ns(page), test2 = "test")
} else {
ns<- xml_ns(page)
}
MYFILE<- ifelse(names(xml_ns(page)) %in% "d1",
page %>% xml_find_all(".//d1:billing"),
page %>% xml_find_all(".//test:billing", ns))
MYFILE<-xml2::as_list(MYFILE) %>% jsonlite::toJSON() %>% jsonlite::fromJSON()
My "LIST"
**List of 1
$ :List of 2
..$ node:<externalptr>
..$ doc :<externalptr>
..- attr(*, "class")= chr "xml_node"**
I'm using the code below to transform it, but it's giving an error:
MYFILE <- xml2 :: as_list (MYFILE)%>% jsonlite :: toJSON ()%>% jsonlite :: fromJSON ()
This is the error.
Error in UseMethod("as_list") :
no applicable method for 'as_list' applied to an object of class "list"
How do I turn it into data.frame/tibble?
It looks like the ifelse statement is causing the file to be parsed three times. This is causing a problem. If you need this line try this instead ifelse("d1" %in% names(xml_ns(page)), . . .
This script works on the above sample. If there are more than 1 billing node then part of the below script will need modification. I highlighted that in the comments.
t1<-if ("test2" %in% names(xml_ns(page))) {
ns<-xml_ns_rename(xml_ns(page), test2 = "test")
} else {
ns<- xml_ns(page)
}
MYFILE<- ifelse(names(xml_ns(page)) %in% "d1",
page %>% xml_find_all(".//d1:billing"),
page %>% xml_find_all(".//test:billing", ns))
#To prevent repeating reading the file multiple times
# MYFILE<- if ("d1" %in% names(xml_ns(page))) {
# page %>% xml_find_all(".//d1:billing")
# } else {
# page %>% xml_find_all(".//test:billing", ns)
# }
OUTPUT<-lapply(MYFILE, function(MYFILE){
#convert all of the nodes to named vector
output<-as_list(MYFILE) %>% unlist()
#Shorten the names
names(output) <- gsub("^(.+?\\.)", "", names(output))
#depending on your next steps will determine the disired output
#create a long format dataframe
# long_answer<-data.frame(Name=names(output), output, row.names = NULL)
#create a wide format dataframe
wide_answer<-data.frame( t(output))
})
bind_rows(OUTPUT)

How to store a "complex" data structure in R (not "complex numbers")

I need to train, store, and use a list/array/whatever of several ksvm SVM models, which once I get a set of sensor readings, I can call predict() on each of the models in turn. I want to store these models and metadata about tham in some sort of data structure, but I'm not very familiar with R, and getting a handle on its data structures has been a challenge. My familiarity is with C++, C, and C#.
I envision some sort of array or list that contains both the ksvm models as well as the metadata about them. (The metadata is necessary, among other things, for knowing how to select & organize the input data presented to each model when I call predict() on it.)
The data I want to store in this data structure includes the following for each entry of the data structure:
The ksvm model itself
A character string saying who trained the model & when they trained it
An array of numbers indicating which sensors' data should be presented to this model
A single number between 1 and 100 that represents how much I, the trainer, trust this model
Some "other stuff"
So in tinkering with how to do this, I tried the following....
First I tried what I thought would be really simple & crude, hoping to build on it later if this worked: A (list of (list of different data types))...
>
> uname = Sys.getenv("USERNAME", unset="UNKNOWN_USER")
> cname = Sys.getenv("COMPUTERNAME", unset="UNKNOWN_COMPUTER")
> trainedAt = paste("Trained at", Sys.time(), "by", uname, "on", cname)
> trainedAt
[1] "Trained at 2015-04-22 20:54:54 by mminich on MMINICH1"
> sensorsToUse = c(12,14,15,16,24,26)
> sensorsToUse
[1] 12 14 15 16 24 26
> trustFactor = 88
>
> TestModels = list()
> TestModels[1] = list(trainedAt, sensorsToUse, trustFactor)
Warning message:
In TestModels[1] = list(trainedAt, sensorsToUse, trustFactor) :
number of items to replace is not a multiple of replacement length
>
> TestModels
[[1]]
[1] "Trained at 2015-04-22 20:54:54 by mminich on MMINICH1"
>
...wha? What did it think I was trying to replace? I was just trying to populate element 1 of TestModels. Later I would add an element [2], [3], etc... but this didn't work and I don't know why. Maybe I need to define TestModels as a list of lists right up front...
> TestModels = list(list())
> TestModels[1] = list(trainedAt, sensorsToUse, trustFactor)
Warning message:
In TestModels[1] = list(trainedAt, sensorsToUse, trustFactor) :
number of items to replace is not a multiple of replacement length
>
Hmm. That no workie either. Let's try something else...
> TestModels = list(list())
> TestModels[1][1] = list(trainedAt, sensorsToUse, trustFactor)
Warning message:
In TestModels[1][1] = list(trainedAt, sensorsToUse, trustFactor) :
number of items to replace is not a multiple of replacement length
>
Drat. Still no workie.
Please clue me in on how I can do this. And I'd really like to be able to access the fields of my data structure by name, perhaps something along the lines of...
> print(TestModels[1]["TrainedAt"])
Thank you very much!
You were very close. To avoid the warning, you shouldn't use
TestModels[1] = list(trainedAt, sensorsToUse, trustFactor)
but instead
TestModels[[1]] = list(trainedAt, sensorsToUse, trustFactor)
To access a list element you use [[ ]]. Using [ ] on a list will return a list containing the elements inside the single brackets. The warning is shown because you were replacing a list containing one element (because this is how you created it) with a list containing 3 elements. This wouldn't be a problem for other elements:
TestModels[2] = list(trainedAt, sensorsToUse, trustFactor) # This element did not exist, so no replacement warning
To understand list subsetting better, take a look at this:
item1 <- list("a", 1:10, c(T, F, T))
item2 <- list("b", 11:20, c(F, F, F))
mylist <- list(item1=item1, item2=item2)
mylist[1] #This returns a list containing the item 1.
#$item1 #Note the item name of the container list
#$item1[[1]]
#[1] "a"
#
#$item1[[2]]
# [1] 1 2 3 4 5 6 7 8 9 10
#
#$item1[[3]]
#[1] TRUE FALSE TRUE
#
mylist[[1]] #This returns item1
#[[1]] #Note this is the same as item1
#[1] "a"
#
#[[2]]
# [1] 1 2 3 4 5 6 7 8 9 10
#
#[[3]]
#[1] TRUE FALSE TRUE
To access the list items by name, just name them when creating the list:
mylist <- list(var1 = "a", var2 = 1:10, var3 = c(T, F, T))
mylist$var1 #Or mylist[["var1"]]
# [1] "a"
You can nest this operators like you suggested. So you coud use
containerlist <- list(mylist)
containerlist[[1]]$var1
#[1] "a"

How to change value if error occurs in for loop?

I have a loop that reads HTML table data from ~ 440 web pages. The code on each page is not exactly the same, so sometimes I need table node 1 and sometime I need node 2. Right now I've just been setting the node number manually in a list and feeding it into the loop. My problem is that the page nodes have started changing and updating the node # list is getting to be a hassle.
If the loop encounters the wrong node # (ie: 1 instead of 2, or reverse) it gives an error and shuts down. Is there a way to have the loop replace the erroneous node number to the correct one if it encounters an error, and then keep running the loop as if nothing happened?
Here's the readHTML portion of the code in my loop with an example url:
url <- "http://espn.go.com/nba/player/gamelog/_/id/2991280/year/2013/"
html.page <- htmlParse(url)
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Players$Nodes[s])
tbl = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
Here's the error I get when the node # is wrong:
"Error in readHTMLTable(tableNodes[[x]], colClasses = c("character"), stringsAsFactors = FALSE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[x]] : subscript out of bounds"
Example code:
A <- c("dog", "cat")
Nodes <- as.data.frame(1:1)
#)Nodes <- as.data.frame(1:2) <-- This works without errors
colnames(Nodes)[1] <- "Col1"
Nodes2 <- 2
url <-c("http://espn.go.com/nba/player/gamelog/_/id/6639/year/2013/","http://espn.go.com/nba/player/gamelog/_/id/6630/year/2013/")
for (i in 1:length(A))
{
html.page <- htmlParse(url[i])
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Nodes$Col1[i])
df = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
#tryCatch(df) here.....no clue
assign(paste0("", A[i]), df)
}
If you get subscript out of bounds error msg, then you should try to with a lower x for sure. General demo with tryCatch based on the demo code you posted in the original question (although I have replaced x with 2 as I have no idea what is Players and s):
> msg <- tryCatch(readHTMLTable(tableNodes[[2]], colClasses = c("character"),stringsAsFactors = FALSE), error = function(e)e)
> str(msg)
List of 2
$ message: chr "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript"| __truncated__
$ call : language readHTMLTable(tableNodes[[2]], colClasses = c("character"), stringsAsFactors = FALSE)
- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> msg$message
[1] "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript out of bounds\n"
> grepl('subscript out of bounds', msg$message)
[1] TRUE

Resources