getNodeSet by attribute values - r

I'm parsing xml with R's getNodeSet function by attribute value with the following code:
getNodeSet(doc, "/body//*[#attribution='HM'][#*='checkmark'][#*='underline']")
The code above returns node content that includes all three of the above values (effectively, 'HM' And 'checkmark' And 'underline').
I'd like the function to return nodes in which the first value remains constant, but for which additional values are EITHER/OR (effectively, 'HM' AND 'checkmark' OR 'underline').
Grateful for any help.

The solution is to combine the type attribute values to be OR'd within a single set of square brackets, and supply 'or' without quotes:
getNodeSet(doc, "/body//*[#attribution='HM'][#*='underline' or #*='checkmark']")

Related

Custom sorting issue in MarkLogic?

xquery version "1.0-ml";
declare function local:sortit(){
for $i in ('a','e','f','b','d','c')
order by $i
return
element Result{
element N{1},
element File{$i}
}
};
local:sortit()
the above code is sample, I need the data in this format. This sorting function is used multiple places, and I need only element N data some places and only File element data at other places.
But the moment I use the local:sortit()//File. It removes the sorting order and gives the random output. Please let me know what is the best way to do this or how to handle it.
All these data in File element is calculated and comes from multiple files, after doing all the joins and calculation, it will be formed as XML with many elements in it. So sorting using index and all is not possible here. Only order by clause can be used.
XPath expressions are always returned in document order.
You lose the sorting when you apply an XPath to the sequence returned from that function call.
If you want to select only the File in sorted order, try using the simple mapping operator !, and then plucking the F element from the item as you are mapping each item in the sequence:
local:sortit() ! File
Or, if you like typing, you can use a FLWOR to iterate over the sequence and return the File:
for $result in local:sortit()
return $result/File

Problems obtaining the correct object class. R

I created a small function to process a dataframe to be able to use the function:
preprocessCore::normalize.quantiles()
Since normalize.quintles() can only use a matrixc object, and I need to rearrange my data, I create a small function that takes a specific column (variable) in a especific data frame and do the following:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1$boco,df_p2$boco)
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
However, "mat" should be a matrix, but it seems the cbind() does not do its job since I'm obtaining the following Error:
normal(antitrombina_FI,Six_Plex_IID)
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
So, it is clear that the cbind() is not creating a matrix. I don't understand why this is happening.
Most likely you are binding two NULL objects together, yielding NULL, which is not a matrix. If your df objects are data.frame, then df_p1$boco is interpreted as "extract the variable named boco", not "extract the variable whose name is the value of an object having the symbol boco". I suspect that your data does not contain a variable literally named "boco", so df_p1$boco is evaluated as NULL.
If you want to extract the column that is given as the value to the formal argument boco in function normal() then you should use [[, not $:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[[boco]],df_p2[[boco]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
Thanks for your help bcarlsen. However I have found some errors:
First, I believe you need to introduce quotes in
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
If I run this script outside of a function works erally perfectly:
df_p1<-subset(Six_Plex_IID,Six_Plex_IID$Plate==1)
df_p2<-subset(Six_Plex_IID,Six_Plex_IID$Plate==2)
mat<-cbind(df_p1[["antitrombina_FI"]],df_p2[["antitrombina_FI"]])
norm<-preprocessCore::normalize.quantiles(mat)
However If I introduce this now in a function and try to run it like a function:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
normal(antitrombina_FI,Six_Plex_IID)
I get the same error mesage:
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
I'm completely clueless about why this is happening, why outside the function I'm obtaining a matrix and why inside the function not.
Thanks

Writing a function with text arguments

I have a matrix (colormatrix) that in the first column (called NCS.code) has a identifier, and in a second column (called colour.assignment) that has color names.
I would like to create a function (changecolor) that can change the color name (so the value of colour.assignment), based on the identifier name. I wrote the following, but it does not work.
changecolor<-function(id,color){
locdat<-match(id, colormatrix$NCS.code)
#finds the location where id matches the identifier
colormatrix_lab$colour.assignment[locdat]<-color}
#changes the color name at location locdat into the name that was given as input
changecolor("S0300-N","white")
#test if the code works -- which is not the case.
Thanks for all the help.
When you want to modify global variables from within function calls, you can not use <- because it will not evaluate in the global scope. To do what you want - try using:
<<- when modifying global data structures from within function calls.

How to use variable character strings with 'substitute' function in R

I need to have the possibility to fill an expression with the values of the unknown number of variables. The shape of the expression depends on the number of the variables.
Example:
Expression1: "italic(y)==a*italic(x)*b"
to become: "y=1.2 x+4.3"
Expression2: "italic(y)==a*italic(x)*b~c"
to become: "y=1.2 x+4.3 -5.3"
Currently I am using the substitute function, but it does not work along with the expression function:
substitute(expression("italic(y)==a*italic(x)*b"),list(a=1.23,b=2.3))
My expression needs to grow as the number of variables (i.e. length of the list) increases. So, next step would be to add the variable c:
substitute(expression("space1*italic(y)==a*italic(x)*b*c"),list(a=1.23,b=2.3,c=3.2))
But I need to change the expression in the code without any manual interference and these codes do not read the variable values from the list unless I change it to this (in which the expression is not expandable anymore as it is not a string):
substitute(italic(y)==a*italic(x)*b*c,list(a=1.23,b=2.3,c=3.2))
How can I do this?
Here is a script which might be along the lines of what you want. We can iterate the list of replacements using a for loop, and then make a regex replacement of the placeholder in the expression with the corresponding value from the list.
lst <- list(a=1.23,b=2.3)
expression <- "italic(y)==a*italic(x)*b"
for (name in names(lst)) {
expression <- gsub(paste0("\\b", name, "\\b"), lst[[name]], expression)
}
print(expression)
[1] "italic(y)==1.23*italic(x)*2.3"
Note carefully that I search for the variable name surrounded by word boundaries on both sides. If your placeholder would ever be surrounded by other word characters, then my solution would fail, and we would need to change the replacement logic.

Index in xpath expression

In a related post,
How to select specified node within Xpath node sets by index with Selenium?,
it is mentioned that there is "no index i in xpath".
I am trying to use an index in an R loop within an XPath expression such as
getNodeSet(xmlfile, '//first[i]/second/third')
Clearly, according to the above post it works perfectly when replacing 'i' with '1', but not e.g. for i <- 1.
However, the workaround in the above post (i.e. using ['+i+']) does not seem to work.
Any ideas on how to make indices work in XPath expressions?
'//first[i]/second/third' is just a string. Therefore you can use the R string building function paste0() to make your own (R doesn't use + for string concatenation).
getNodeSet(xmlfile, paste0('//first[', i, ']/second/third'))

Resources