How to convert a "list" to "EUtilisSUmmary" class in R - r

I am trying to do a multiple different term search in RISmed package of R as shown below:
library(RISmed)
library(rentrez)
library(dplyr)
search_topic<-c("KRAS AND MEK inhibitor","BRAF AND BRAF inhibitor")
search_query <- lapply((search_topic),EUtilsSummary, retmax=50,
mindate=2000, maxdate=2017)
Search query is a list and my next step here is to get the Pubmed IDs obtained by search term. However, when i try to use get it using
QueryId(search_query)
I get the
error:unable to find an inherited method for function ‘QueryId’ for signature ‘"list"’
I understand QueryID will work on EUtilisSummary class and hence QueryID function is not working. I tried converting it using
as(search_query, "EUtilisSummary", strict=TRUE, ext)
but this fails and the error is:
no method or default for coercing “list” to “EUtilisSummary”.
How do i convert this list object into EUtilisSummary class? Thanks in advance!

lapply returns a list of objects with classes defined by the function called
library(RISmed)
library(rentrez)
library(dplyr)
search_topic <- c("KRAS AND MEK inhibitor","BRAF AND BRAF inhibitor")
search_query <- lapply((search_topic),EUtilsSummary, retmax=50,
mindate=2000, maxdate=2017)
In this case search_query is a list (class(search_query) containing objects of class EUtilsSummary class(search_query[[1]]).
To manipulate such objects in a list one can just use another lapply with a function that can take them as arguments:
lapply(search_query, QueryId)
#output:
[[1]]
[1] "29079711" "29067643" "28982179" "28982154" "28957417" "28938614" "28866094" "28807001" "28797845" "28783173"
[11] "28775144" "28746882" "28619758" "28581516" "28574828" "28554329" "28551618" "28492898" "28459468" "28372922"
[21] "28301591" "28248226" "28215705" "28178529" "28167505" "28154798" "28152546" "28062115" "28060183" "28002807"
[31] "27997540" "27922010" "27876675" "27846317" "27834733" "27822414" "27821484" "27803104" "27793696" "27167191"
[41] "27733477" "27699948" "27670374" "27496137" "27484466" "27469379" "27467210" "27441499" "27422710" "27399335"
[[2]]
[1] "29100459" "29098416" "29096034" "29094484" "29085667" "29084636" "29079332" "29074209" "29072975" "29070145"
[11] "29066909" "29059158" "29054724" "29050517" "29050239" "29050218" "29048432" "29043205" "29040023" "29028954"
[21] "29028788" "28994264" "28991513" "28986666" "28984520" "28984291" "28984141" "28982601" "28982154" "28981385"
[31] "28979142" "28978720" "28976960" "28973166" "28963969" "28963614" "28961465" "28960564" "28959611" "28951457"
[41] "28947956" "28939558" "28936920" "28931905" "28923537" "28923400" "28919996" "28918496" "28915798" "28893027"

Related

How to turn each value in a list into a list of lists, made of values from a function

I have a list of >1000 values. dput(head(list)):
list("WP100", "WP106", "WP107", "WP111", "WP117", "WP12")
Function getXrefList from rWikiPathways package produces a new list of associated genes for each WP value.
input: (H parameter gives genes in the external gene format)
library(rWikiPathways)
getXrefList(list[[1]], 'H')
output:
[1] "ANPEP" "G6PD" "GCLC" "GCLM" "GGT1" "GGT5" "GPX1" "GPX2" "GPX3" "GPX4" "GSR" "GSS" "GSTA1"
[14] "GSTA5" "GSTM1" "GSTM2" "GSTT2" "IDH1" "OPLAH"
I would like to turn the original list into a list of lists. The WP value being the ith values and the jth values being the resulting list of genes from getXrefList function.
Current progress:
new_list <- lapply(list, function(x){getXrefList(x, 'H')})
new_list[[1]]
output:
"ANPEP" "G6PD" "GCLC" "GCLM" "GGT1" "GGT5" "GPX1" "GPX2" "GPX3" "GPX4" "GSR" "GSS" "GSTA1" "GSTA5" "GSTM1" "GSTM2" "GSTT2" "IDH1" "OPLAH"
Wheras the ideal output would look something like this:
List
WP100
"ANPEP" "G6PD" "GCLC" "GCLM" "GGT1" "GGT5" "GPX1" "GPX2" "GPX3" "GPX4" "GSR" "GSS" "GSTA1" "GSTA5" "GSTM1" "GSTM2" "GSTT2" "IDH1" "OPLAH"
WP106
"ABAT" "AGXT" "ASL" "ASPA" "ASS1" "DARS" "GAD1" "GAD2" "GOT1" "GOT2" "GPT" "PC"
And so forth in this fashion. Any help would be appreciated.
Solved. Easy fix.
names(new_list) <- list

Specify order of import for multiple tables in R

I'm trying to read in 360 data files in text format. I can do so using this code:
temp = list.files(pattern="*.txt")
myfiles = lapply(temp, read.table)
The problem I have is that the files are named as "DO_1, DO_2,...DO_360" and when I try to import the files into a list, they do not maintain this order. Instead I get DO_1, DO_10, etc. Is there a way to specify the order in which the files are imported and stored? I didn't see anything in the help pages for list.files or read.table. Any suggestions are greatly appreciated.
lapply will process the files in the order you have them stored in temp. So your goal is to sort them the way you actually think about them. Luckily there is the mixedsort function from the gtools package that does just the kind of sorting you're looking for. Here is a quick demo.
> library(gtools)
> vals <- paste("DO", 1:20, sep = "_")
> vals
[1] "DO_1" "DO_2" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7" "DO_8" "DO_9"
[10] "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17" "DO_18"
[19] "DO_19" "DO_20"
> vals <- sample(vals)
> sort(vals) # doesn't give us what we want
[1] "DO_1" "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17"
[10] "DO_18" "DO_19" "DO_2" "DO_20" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7"
[19] "DO_8" "DO_9"
> mixedsort(vals) # this is the sorting we're looking for.
[1] "DO_1" "DO_2" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7" "DO_8" "DO_9"
[10] "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17" "DO_18"
[19] "DO_19" "DO_20"
So in your case you just want to do
library(gtools)
temp <- mixedsort(temp)
before your call to lapply that calls read.table.

Is there any method of reading non-standard table use organizing labels without loop?

As far as we know, the parsing library like XML and xml2 can read standard table on web page perfectly. But there are some sorts of table which has no grid of table but organizing labels, such as “<span>” and “<div>”.
Now I am coping with a table like this,
The structure of table marks with “<span>”, and every 4 “<span>” Labels organize one record. I’ve used a loop to solve this problem and succeed. But I want to process it without loop. I heard that library purrr may help on this problem, but I don’t know how to use it in this situation.
I do my analysis by both “XML” and “xml2”:
Analysis with “XML” package
pg<-"http://www.irgrid.ac.cn/simple-search?fq=eperson.unique.id%3A311007%5C-000920"
library(XML)
tableNodes <- getNodeSet(htmlParse(pg), "//table[#class='miscTable2']")
itemlines <- xpathApply(tableNodes[[1]], "//tr[#class='itemLine']/td[#width='750']")
ispan <- xmlElementsByTagName(itemlines[[2]], "span")
title <- xmlValue(ispan$span)
isuedate <- xmlValue(ispan$span[1,2])
author <- xmlValue(ispan$span[3])
In this case, “XML” got a list of one span, but this list is very strange but met my expectations:
> attributes(ispan)
$names
[1] "span" "span" "span" "span"
It seems have one row only, but four columns. However, it doesn’t. The 2-4 “span” couldn’t be select by column. The first “span” occupied 2 columns, and other “span” could not get.
> val <- xmlValue(ispan$span[[1]])
> val
[1] "超高周疲劳裂纹萌生与初始扩展的特征尺度"
> isuedate <- xmlValue(ispan$span[[2]])
> isuedate
[1] " \r\n [科普文章]"
> isuedate <- xmlValue(ispan$span[[3]])
> isuedate
[1] NA
> author <- xmlValue(ispan$span[[4]])
> author
[1] NA
None of the selection method used in list works:
> title <- xmlValue(ispan$span[1,1])
Error in UseMethod("xmlValue") :
no applicable method for 'xmlValue' applied to an object of class "c('XMLInternalNodeList', 'XMLNodeList')"
title <- xmlValue(ispan$span[1,])
Error in UseMethod("xmlValue") :
no applicable method for 'xmlValue' applied to an object of class "c('XMLInternalNodeList', 'XMLNodeList')"
author <- xmlValue(ispan[1,3])
Error in ispan[1, 3] : incorrect number of dimensions
Analysis with “xml2”
Use “xml2” the obstacle of “span” makes same problem
pg<-"http://www.irgrid.ac.cn/simple-search?fq=eperson.unique.id%3A311007%5C-000920"
library(xml2)
tableSource <- xml_find_all(read_html(pg, encoding = "UTF-8"), "//table[#class='miscTable2']")
itemspan <- xml_child(itemspantab, "span")
It could not gether any of these “span” labels:
> itemspan
{xml_nodeset (1)}
[1] <NA>
If we make a step further to locate the “span” labels, it only get nothing:
> itemspanl <- xml_find_all(itemspantab, '//tr[#class="itemLine"]/td/span')
> itemspan <- xml_child(itemspanl, "span")
> itemspan
{xml_nodeset (40)}
[1] <NA>
[2] <NA>
[3] <NA>
...
An suggest told me use library(purrr) to do this, but the “purrr” process dataframe only, the “list” prepared by “xml2” could not be analyzed.
I want not to use loop and get the result like below, can we do it? I hope the scholars who have experience on “XML” and “xml2” could give me some advise on how to cope with this non-standard table. Thanks a lot.

Plots lists of data after assagn then in a function

I started using R for a course of Computational Fluid Dynamics and one of the starting lessons we should create a function that put out two lists of data. So I wrote this function:
Green.Ampt=function(param){
k=param[1]
Psi=param[2]
DTheta=param[3]
h=param[4]
F1=0.65
F1=0.65
vector.F2<-1:h
vector.f<-1:h
for(tempo in 1 : h){
DeltaF=1
while(DeltaF>0.01) {
F2=k*tempo+Psi*DTheta*log(F1/(Psi*DTheta)+1)
DeltaF=abs(F1-F2)
F1=F2
}
vector.F2[tempo]=F2
vector.f[tempo]= k*(Psi*DTheta/F2+1)}
OUT<-list(vector.F2, vector.f)
return(OUT)
}
I used this Green.Ampt(c(0.65,16.7,0.34,10)) to run the function then I controlled the console have recieved the following output:
[[1]]
[1] 3.152985 4.745484 6.077012 7.284812 8.404389 9.469498
[7] 10.490538 11.474561 12.434380 13.371189`
[[2]]
[1] 1.8205417 1.4277289 1.2573215 1.1566294 1.0891396 1.0397461
[7] 1.0018123 0.9716419 0.9468141 0.9260188`
I want to give at this two series of data a name because I need to plot them, but I am not successful in this.
Save the return value of your function as an object, which in this case will be a list. You can then extract the components of the list using [[ notation in a call to plot:
x <- Green.Ampt(c(0.65,16.7,0.34,10))
plot(x[[1]], x[[2]])
Here's the result:

Iterating an R Script as a function of sequential survey questions

The function below works perfectly for my purpose. The display is wonderful. Now my problem is I need to be able to do it again, many times, on other variables that fit other patterns.
In this example, I've output results for "q4a", I would like to be able to do it for sequences of questions that follow patterns like: q4 < a - z > or q < 4 - 10 >< a - z >, automagically.
Is there some way to iterate this such that the specified variable (in this case q4a) changes each time?
Here's my function:
require(reshape) # Using it for melt
require(foreign) # Using it for read.spss
d1 <- read.spss(...) ## Read in SPSS file
attach(d1,warn.conflicts=F) ## Attach SPSS data
q4a_08 <- d1[,grep("q4a_",colnames(d1))] ## Pull in everything matching q4a_X
q4a_08 <- melt(q4a_08) ## restructure data for post-hoc
detach(d1)
q4aaov <- aov(formula=value~variable,data=q4a) ## anova
Thanks in advance!
Not sure if this is what you are looking for, but to generate the list of questions:
> gsub('^', 'q', gsub(' ', '',
apply(expand.grid(1:10,letters),1,
function(r) paste(r, sep='', collapse='')
)))
[1] "q1a" "q2a" "q3a" "q4a" "q5a" "q6a" "q7a" "q8a" "q9a" "q10a"
[11] "q1b" "q2b" "q3b" "q4b" "q5b" "q6b" "q7b" "q8b" "q9b" "q10b"
[21] "q1c" "q2c" "q3c" "q4c" "q5c" "q6c" "q7c" "q8c" "q9c" "q10c"
[31] "q1d" "q2d" "q3d" "q4d" "q5d" "q6d" "q7d" "q8d" "q9d" "q10d"
[41] "q1e" "q2e" "q3e" "q4e" "q5e" "q6e" "q7e" "q8e" "q9e" "q10e"
[51] "q1f" "q2f" "q3f" "q4f" "q5f" "q6f" "q7f" "q8f" "q9f" "q10f"
[61] "q1g" "q2g" "q3g" "q4g" "q5g" "q6g" "q7g" "q8g" "q9g" "q10g"
[71] "q1h" "q2h" "q3h" "q4h" "q5h" "q6h" "q7h" "q8h" "q9h" "q10h"
[81] "q1i" "q2i" "q3i" "q4i" "q5i" "q6i" "q7i" "q8i" "q9i" "q10i"
[91] "q1j" "q2j" "q3j" "q4j" "q5j" "q6j" "q7j" "q8j" "q9j" "q10j"
...
And then you turn your inner part of the analysis into a function that takes the question prefix as a parameter:
analyzeQuestion <- function (prefix)
{
q <- d1[,grep(prefix,colnames(d1))] ## Pull in everything matching q4a_X
q <- melt(q) ## restructure data for post-hoc
qaaov <- aov(formula=value~variable,data=q4a) ## anova
return (LTukey(q4aaov,which="",conf.level=0.95)) ## Tukey's post-hoc
}
Now - I'm not sure where your 'q4a' variable is coming from (as used in the aov(..., data=q4a)- so not sure what to do about that bit. But hopefully this helps.
To put the two together you can use sapply() to apply the analyzeQuestion function to each of the prefixes that we automagically generated.
I would recommend melting the entire dataset and then splitting variable into its component pieces. Then you can more easily use subset to look at (e.g.) just question four: subset(molten, q = 4).

Resources