extract tree-like data into list in R - r

I would like to read JSON data from the PubChem-API on Paracetamol and extract 18.1.2 ChEBI Ontology information that is stored therein (see screenshot).
I.e.: I want to get all the entires for each role (i.e. application, biological role and chemical role) in a list structure in R.
For this I get the data via the API and convert it into a R object (chebi). So far so good.
require(httr)
require(jsonlite)
require(data.tree)
# from JSON to R list
qurl = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/classification/JSON?classification_type=simple'
cid = 1983
post = POST(qurl, body = list("cid" = paste(cid, collapse = ',')))
cont_json = try(content(post, type = 'text', encoding = 'UTF-8'), silent = FALSE)
cont = fromJSON(cont_json, simplifyDataFrame = FALSE)
# subset list (i.e. get CHEBI data)
cont_l = cont$Hierarchies$Hierarchy
idx = which(sapply(cont_l, function(x) x$SourceName == 'ChEBI'))
chebi = cont_l[[idx]]
Then from the chebi object I want to retrieve the information which entries each role (i.e. application, biological role, chemical role) contains.
(1) My first idea was to simply extract the Name information. However then I loose the tree-like structure of the data and don't know what belongs to which role.
ch_node = chebi$Node
sapply(ch_node, function(x) x$Information$Name)
(2) Secondly I saw that there's the data.tree package. However I don't know how to convert the chebi object properly.
chebi_tree = as.Node(ch_node) #?
Question: How can I get the role information from the chebi object into a list in R without loosing the tree-like structure?

Related

How to nest output dataframe from for loop in existing dataframe in R

I have a dataframe on members of the US congress and I'm am collecting additional data from google trends using the gtrendsR package. Given that google trends won't allow me to search all members of congress in a single query, I've decided to create a ForLoop that collects the data for one politician at a time.
for (i in df$name_google){
user <- i
res <- gtrends(c(obama, user), geo = c("US"), time = "all")
}
However, the google trends output file (res) is itself a list with a number of dataframes.
I would like to use the for loop to save this list to a new column in df, with each iteration of the loop adding the new res file to the row the of the user it just searched. I don't know how to do this, but something like the line i added to the lop below. Let me know if I'm failing to include necessary info.
for (i in df$name_google){
user <- i
res <- gtrends(c(obama, user), geo = c("US"), time = "all")
#df$newlistvariable <- res if df$name == i
}

Using filters with a list in R with Google Analytics package

I would like to extract a specific set of productskus from Google Analtics with several metrics included. My set of sku that I would like to extract are in a list. I cannot seem to get Analytics to do what I need it to do.
I have been trying to see how to filter on a list. The most common answer that I am able to find on how to use the dim_filter is from this website:
https://www.rdocumentation.org/packages/googleAnalyticsR/versions/0.7.0/topics/dim_filter
I have tried multiple ways to get my answer and am always getting errors at many different parts of the code.
stDate <- "2019-07-18"
endDate <- "2019-09-30"
x <- list(BC$sku)
#Get all of the info from Analytics for products
b <- google_analytics(ga_id,
date_range = c(stDate, endDate),
metrics = c("itemQuantity", "itemRevenue", "productDetailViews"),
dimensions = c("productSku"),
dim_filter = x,
anti_sample = TRUE)
The above code gives me the following error:
Error in as(dim_filters, ".filter_clauses_ga4") :
no method or default for coercing “list” to “.filter_clauses_ga4”
I am not able to get any output from this code as the filter is not working.
I can of course, query the entire dataset, but that becomes cumbersome very fast as I would like to be able to query the Google Analytics API with a specific set of skus anytime that I would like.
You need to construct your filter object more to handle the list of SKUs. As specified in the official googleAnalyticsR documentation dim_filters are created with the dim_filter() function and filter_clause_ga4()
You can send a list of dim_filter() in, or try use the "IN_LIST" operator and send in your character vector (if its not too big)
In the latter case the final code would look something like:
stDate <- "2019-07-18"
endDate <- "2019-09-30"
# if small enough list
dim_filters <- list(dim_filter("product_sku", "IN_LIST", BC$sku))
#Get all of the info from Analytics for products
b <- google_analytics(ga_id,
date_range = c(stDate, endDate),
metrics = c("itemQuantity", "itemRevenue", "productDetailViews"),
dimensions = c("productSku"),
dim_filter = filter_clause_ga4(dim_filters),
anti_sample = TRUE)
# this may work as well to construct the filter
dim_filters <- lapply(BC$sku,
function(x) {
dim_filter("product_sku" ,operator = "EXACT", expressions = x)
})

R - How would I create a structure to hold data

I am using R, and I have some genes which have some meta data (field) such as name, type, and other data.
What I want to do is create some object or dictionary that will allow me to store the data in fields and access it via a key and access its fields. How can I do this in R?
Here is an example:
gene{
name = "geneName",
type = "some type",
...
...
}
I can access the gene as such:
gene <- gene("geneName")
type <- gene.type

How to Create Nodes in RNeo4j using Vectors or Dataframes

The popular graph database Neo4j can be used within R thanks to the package/driver RNeo4j (https://github.com/nicolewhite/Rneo4j).
The package author, #NicoleWhite, provides several great examples of its usage on GitHub.
Unfortunately for me, the examples given by #NicoleWhite and the documentation are a bit oversimplistic, in that they manually create each graph node and its associated labels and properties, such as:
mugshots = createNode(graph, "Bar", name = "Mugshots", location = "Downtown")
parlor = createNode(graph, "Bar", name = "The Parlor", location = "Hyde Park")
nicole = createNode(graph, name = "Nicole", status = "Student")
addLabel(nicole, "Person")
That's all good and fine when you're dealing with a tiny example dataset, but this approach isn't feasible for something like a large social graph with thousands of users, where each user is a node (such graphs might not utilize every node in every query, but they still need to be input to Neo4j).
I'm trying to figure out how to do this using vectors or dataframes. Is there a solution, perhaps invoving an apply statement or for loop?
This basic attempt:
for (i in 1:length(df$user_id)){
paste(df$user_id[i]) = createNode(graph, "user", name = df$name[i], email = df$email[i])
}
Leads to Error: 400 Bad Request
As a first attempt, you should look at the functionality I just added for the transactional endpoint:
http://nicolewhite.github.io/RNeo4j/docs/transactions.html
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
clear(graph)
data = data.frame(Origin = c("SFO", "AUS", "MCI"),
FlightNum = c(1, 2, 3),
Destination = c("PDX", "MCI", "LGA"))
query = "
MERGE (origin:Airport {name:{origin_name}})
MERGE (destination:Airport {name:{dest_name}})
CREATE (origin)<-[:ORIGIN]-(:Flight {number:{flight_num}})-[:DESTINATION]->(destination)
"
t = newTransaction(graph)
for (i in 1:nrow(data)) {
origin_name = data[i, ]$Origin
dest_name = data[i, ]$Dest
flight_num = data[i, ]$FlightNum
appendCypher(t,
query,
origin_name = origin_name,
dest_name = dest_name,
flight_num = flight_num)
}
commit(t)
cypher(graph, "MATCH (o:Airport)<-[:ORIGIN]-(f:Flight)-[:DESTINATION]->(d:Airport)
RETURN o.name, f.number, d.name")
Here, I form a Cypher query and then loop through a data frame and pass the values as parameters to the Cypher query. Your attempts right now will be slow, because you're sending a separate HTTP request for each node created. By using the transactional endpoint, you create several things under a single transaction. If your data frame is very large, I would split it up into roughly 1000 rows per transaction.
As a second attempt, you should consider using LOAD CSV in the neo4j-shell.

How to access data frames in list by name

I'm pulling large sets of data for multiple sites and views from Google Analytics for processing in R. To streamline the process, I've added my most common queries to a function (so I only have to pass the profile ID and date range). Each query is stored as a local variable in the function, and is assigned to a dynamically-named global variable:
R version 3.1.1
library(rga)
library(stargazer)
# I would add a dataset, but my data is locked down by client agreements and I don't currently have any test sites configured.
profiles <- ga$getProfiles()
website1 <- profiles[1,]
start <- "2013-01-01"
end <- "2013-12-31"
# profiles are objects containing all the ID's, accounts #, etc.; start and end specify date range as strings (e.g. "2014-01-01")
reporting <- function(profile, start, end){
id <- profile[,1] #sets profile number from profile object
#rga function for building and submitting query to API
general <- ga$getData(id,
start.date = start,
end.date = end,
metrics = "ga:sessions")
... #additional queries, structured similarly to example above(e.g. countries, cities, etc.)
#transforms name of profile object to string
profileName <- deparse(substitute(profile))
#appends "Data" to profile object name
temp <- paste(profileName, "Data", sep="")
#stores query results as list
temp2 <- list(general,countries,cities,devices,sources,keywords,pages,events)
#assigns list of query results and stores it globally
assign(temp, temp2, envir=.GlobalEnv)
}
#call reporting function at head of report or relevant section
reporting(website1,start,end)
#returns list of data frames returned by the ga$getData(...), but within the list they are named "data.frame" instead of their original query name.
#generate simple summary table with stargazer package for display within the report
stargazer(website1[1])
I'm able to access these results through *website1*Data[1], but I'm handing the data off to collaborators. Ideally, they should be able to access the data by name (e.g. *website1*Data$countries).
Is there an easier/better way to store these results, and to make accessing them easier from within an .Rmd report?
There's no real reason to do the deparse in side the function just to assign a variable in the parent environment. If you have to call the reporting() function, just have that function return a value and assign the results
reporting <- function(profile, start, end){
#... all the other code
#return results
list(general=general,countries=countries,cities=cities,
devices=devices,sources=sources,keywords=keywords,
pages=pages,events=events)
}
#store results
websiteResults <- reporting(website1,start,end)

Resources