Using a for loop to create dynamic objects in R - r

I'm trying to use a for loop to create a set of dynamic objects in R. These will contain a list of organisations and values against a certain metric--each output will be the values of an individual metric.
In practice, this will be used to create chart objects using ggplot2, which I'll then use in RMarkdown. For the example below, it's just a sample using a head() function for each metric.
I tried using the paste function to create this name, but it gives the following error:
Error in paste("organisation_short", "_", MetricIDs[x]) <-
head(organisationdata_Jan2021) : target of assignment expands to
non-language object
I understand that the assign function might help, but I'm not sure how to use it. (My attempts also produced errors). I found a similar question in the link below, but it's set up in a way that pipes data directly into assign. I'm also not clear what "value = ." is doing. This query is below:
dynamically name objects in R
I believe the "value = ." refers to the data being piped into the assign function. I created an alternative version which is in the code below.
Error in assign(x = organisationdata_Jan2021, value = paste0("sampledata", :
invalid first argument
The idea is to create output files along the lines of: organisation_short_ABC123, organisation_short_ABC323, organisation_short_KJM088
I would be grateful for any guidance you might have!
MetricIDs <- c('ABC123','ABC323','KJM088')
# Attempt using paste
for (x in 1:3)
{
organisationdata_Jan2021 <- organisationdata_CM0040_Jan2021 %>% filter(Metric_ID==MetricIDs[x]) # Filter data to specific Metric ID
paste("organisation_short","_", MetricIDs[x]) <- head(organisationdata_Jan2021) # Goal: Create object that includes the Metric ID.
}
# Attempt using assign
for (x in 1:3)
{
organisationdata_Jan2021 <- organisationdata_CM0040_Jan2021 %>% filter(Metric_ID==MetricIDs[x]) # Filter data to specific Metric ID
assign(x=organisationdata_Jan2021, value=paste0("sampledata",MetricIDs[x]))
}
# Expected object names: organisation_short_ABC123, organisation_short_ABC323, organisation_short_KJM088
# This will be used to create chart objects using ggplot2, and those objects will be used in an R MarkDown document.

Related

Create empty data tables in R

Let's say I need to create empty date tables with the following names (create if it doesn't exist in my environment):
# names of datatables that should be created
dt_list <- c("results_1",
"results_2",
"final_results",
"model_results")
That is, I need to get empty (no columns) data tables: results_1, results_2, final_results, model_results (in reality, I have a much longer list of date labels that should be created if they don't exist).
I read the thread but didn't find a suitable solution.
I tried something like this, but it doesn't work:
# create an empty data.table if not exists
for(dt in 1:length(dt_list)){
if(!exists(dt_list[dt])){
dt_list[dt] <- data.table()
}
}
Error in dt_list[dt] <- data.table() : replacement has length zero
I would be grateful for any help!
Try this:
# create an empty data.table if not exists
for(dt in 1:length(dt_list)){
if(!exists(dt_list[dt])){
assign(dt_list[dt], data.table())
}
}

How to retrieve data using the rentrez package by giving a list of query names instead of a single one?

So I'm trying to use the rentrez package to retrieve DNA sequence data from GenBank, giving as input a list of species.
What I've done is create a vector for the species I want to query, followed by creating a term where I specify the types of sequence data I want to retrieve, then creating a search that retrieves all the occurrences that match my query, and finally I create data where I retrieve the actual sequence data in fasta file.
library(rentrez)
species<-c("Ablennes hians","Centrophryne spinulosa","Doratonotus megalepis","Entomacrodus cadenati","Katsuwonus pelamis","Lutjanus fulgens","Pagellus erythrinus")
for (x in species){
term<-paste(x,"[Organism] AND (((COI[Gene] OR CO1[Gene] OR COXI[Gene] OR COX1[Gene]) AND (500[SLEN]:3000[SLEN])) OR complete genome[All Fields] OR mitochondrial genome[All Fields])",sep='',collapse = NULL)
search<-entrez_search(db="nuccore",term=term,retmax=99999)
data<-entrez_fetch(db="nuccore",id=search$ids,rettype="fasta")
}
Basically what I'm trying to do is concatenate the results of the queries for each species into a single variable. I began using a for cycle but I see it makes no sense in this form because the data of each new species that is being queried is just replacing the previous one in data.
For some elements of species, there will be no data to retrieve and R shows this error:
Error: Vector of IDs to send to NCBI is empty, perhaps entrez_search or entrez_link found no hits?
In the cases where this error is shown and therefore there is no data for that particular species, I wanted the code to just keep going and ignore that.
My output would be a variable data which would include the sequence data retrived, from all the names in species.
library(rentrez)
species<-c("Ablennes hians","Centrophryne spinulosa","Doratonotus megalepis","Entomacrodus cadenati","Katsuwonus pelamis","Lutjanus fulgens","Pagellus erythrinus")
data <- list()
for (x in species){
term<-paste(x,"[Organism] AND (((COI[Gene] OR CO1[Gene] OR COXI[Gene] OR COX1[Gene]) AND (500[SLEN]:3000[SLEN])) OR complete genome[All Fields] OR mitochondrial genome[All Fields])",sep='',collapse = NULL)
search<-entrez_search(db="nuccore",term=term,retmax=99999)
data[x] <- tryCatch({entrez_fetch(db="nuccore",id=search$ids,rettype="fasta")},
error = function(e){NA})
}

How to use a variable to select a function within a package? [duplicate]

This question already has an answer here:
Get a function from a string of the form "package::function"
(1 answer)
Closed 3 years ago.
I've created a variable and want to use that variable to select my desired function in a package (e.g. package::function), however, the variable name is interpreted literally instead of being evaluated.
Here's the approach:
library(GSEABase)
library(tidyverse)
### SET ONTOLOGY GROUP (e.g. Biological Process = BP, Molecular Function = MF, Cellular Component = CC)
ontology <- "BP"
### Set GOOFFSPRING database, based on ontology group set above
go_offspring <- paste("GO", ontology, "OFFSPRING", sep = "")
## Need to know the 'offspring' of each term in the ontology, and this is given by the data in:
GO.db::go_offspring
## Create function to parse out GO terms assigned to each GOslim
## Courtesy Bioconductor Support: https://support.bioconductor.org/p/128407/
mappedIds <-
function(df, collection, OFFSPRING)
{
map <- as.list(OFFSPRING[rownames(df)])
mapped <- lapply(map, intersect, ids(collection))
df[["go_terms"]] <- vapply(unname(mapped), paste, collapse = ";", character(1L))
df
}
## Run the function
slimsdf <- mappedIds(slimsdf, myCollection, go_offspring)
This spits out the error:
Error: 'go_offspring' is not an exported object from 'namespace:GO.db'
When playing around in the R Studio console, I also notice that when I type
GO.db::
the autocomplete feature does not list my go_offspring variable as an option; it only lists the available functions within the GO.db package.
Seeing this behavior suggests there's a scope issue, in that the package cannot see variable definitions defined outside of the package.
Is there any way around this?
I've looked at this http://adv-r.had.co.nz/Environments.html, but I'm not entirely sure I follow all of it, nor do I see how to manipulate my environments to allow passing go_offspring to GO.db::.
You can use getFromNamespace to get the function via its character name from the namespace.
slimsdf <- mappedIds(slimsdf, myCollection, getFromNamespace(go_offspring, "GO.db"))

R Convert egor object to igraph/ use the visualization app (error: Duplicate vertex names)

I want to analyse egocentric networks using R's egor package. This package includes the egor's Network Visualization App - short egor_vis_app - (which uses igraph). I managed to create an egor object but I can't use this app to visualize the networks (error Duplicate vertex names) OR create an igraph objext (as_igraph(), same error). What am I doing wrong?
What I tried so far:
I used the pre-existing egor object (data("egor32")) and the visualization app worked.
Then, I used this pre-existing data to create an egor object:
data("alters32")
data("egos32")
data("edges32")
e <- egor(alters.df = alters32,
egos.df = egos32,
aaties = edges32,
ID.vars = list(
ego = "egoID",
alter = "alterID",
source = "Source",
target = "Target"))
and the app as well as the as_igraph(e) function
don't work (I followed this tutorial when creating the egor object with this data).
And this is my sample code (based on this):
df_new <- read.csv(textConnection('"id","numgiven","sex",
"sex1","sex2","sex3","sex4","sex5","close12",
"close13","close14","close15","close23","close24",
"close25","close34","close35","close45"
10,6,1,2,2,1,2,2,0,0,0,0,0,0,0,1,1,1
36,6,2,2,2,2,1,1,0,0,0,1,0,0,0,1,0,0'
), as.is=TRUE)
e1 <- with(df_new, onefile_to_egor(egos = df_new, pmin(numgiven,5),
ID.vars=list(ego="id"),
attr.start.col="sex1",
attr.end.col="sex5",
max.alters=5,
aa.first.var="close12",
aa.regex="^(?<attr>[[:alpha:]]+)(?<src>[[:digit:]])(?<tgt>[[:digit:]])$"))
Doesn't work either.
I am one of the developers of egor, and while I am aware of this problem and there are plans to fix it in upcoming versions, there's a way to fix this yourself, by reordering the alter data columns. The problem is that igraph expects the alter IDs or vertex names to be the first column in the dataframe listing the alters/vertices.
Here is how you can fix your egor object:
library(dplyr)
e1$.alts <- lapply(e1$.alts, function(x) select(x, .altID, everything()))
With this fix in place the as_igraph() function and the visualization app should work.

R spCbind error

I have successfully added information to shapefiles before (see my post on http://rusergroup.swansea.ac.uk/Healthmap.ashx?HL=map ).
However, I just tried to do it again with a slightly different shapefile (new local health boards for Wales) and the code fails at spCbind with a "row names not identical error"
o <- match(wales.lonlat$NEW_LABEL, wds$HB_CD)
wds.xtra <- wds[o,]
wales.ncchd <- spCbind(wales.lonlat, wds.xtra)
My rows did have different names before and that didn't cause any problems. I relabeled the column in wds.xtra to match "NEW_LABEL" and that doesn't help.
The labels and order of labels do match exactly between wales.lonlat and wds.xtra.
(I'm using Revolution R 5.0, which is built on R 2.13.2)
I use match to merge data to the sp data slot based on rownames (or any other common ID). This avoids the necessity of maptools for the spCbind function.
# Based on rownames
sdata#data=data.frame(sdata#data, new.df[match(rownames(sdata#data), rownames(new.df)),])
# Based on common ID
sdata#data=data.frame(sdata#data, new.df[match(sdata#data$ID, new.df$ID),])
# where; sdata is your sp object and new.df is a data.frame object that you want to merge to sdata.
I had the same error and could resolve it by deleting all other data, which were not actually to be added. I suppose, they confused spCbind because the matching wanted to match all row-elements, not only the one given. In my example, I used
xtra2 <- data.frame(xtra$ID_3, xtra$COMPANY)
to extract the relevant fields and fed them to spCbind afterwards
gadm <- spCbind(gadm, xtra2)

Resources