Pass arguments to the R Phyloseq subset_taxa wrapper - r

I'll explain the end goal, and what I'm trying as a test first. (Because I'm likely going about it the wrong way.)
I am using the phyloseq package to visualize microbiome data. I want to "automate" it to an extent by having users choose levels of analysis and have my script generate the visualizations without someone hand typing in each combination.
The issue is passing variables into the subset function. I get these errors primarily (depending on what combinations of paste0, eval, parse, as.logical, expression, noquote....etc that i've tried):
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
A user would set the levels of analysis. So lets say for now there are two levels, and selecting the second level automatically means you want the first level as well. (I haven't worked on that part yet, but I wanted to explain it upfront.
#Set lineage level
lin_level <- 1
lin_list <- c("k__Kingdom", "p__Phylum","c__Class", "o__Order","f__Family")
lin_select <- lin_list[lin_level]
sub_lin <- lin_list[(lin_level +1)]
#Kingdom
king_list <- "k__Bacteria"
#set Phylum list
if (lin_select == "p__Phylum"){
phylum_list <- c("p__Firmicutes","p__Proteobacteria","p__Bacteroidetes","p__Actinobacteria","p__Tenericutes")
}
subgroup <- "All"
From here, the script would ultimately get to the graphing section. If lin_level is set to 1, it would look like this:
FIXED
gphic = subset_taxa(physeq1, Kingdom=="k__Bacteria")
title = paste0(subgroup," ", "Bacteria-only")
plot_bar(gpsfb, "Phylum", "Abundance", "Phylum",
title=title, facet_grid="Type~.")
AUTOMATED
gphic = subset_taxa(physeq1, (substring(lin_select,4)) == king_list)
title = paste0(subgroup," ", (substring(king_list,4)),"-only")
plot_bar(gpsfb, (substring(sub_lin,4)), "Abundance", (substring(sub_lin,4)),
title=title, facet_grid="Type~.")
But, trying to pass (substring(lin_select,4)) == king_list as an argument results in errors.
I've searched through the various threads on this issue, but haven't been able to get the different answers to work. Ultimately I need to run the graphing section once for Kingdom, and then again each time for each item in the Phylum list. But before i can get there, I need to be able to pass the arguments into the subset function.
Things I've tried:
test <- paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\"")
noquote(test)
[1] Phylum=="p__Bacteroidetes"
gphic = subset_taxa(physeq1, noquote(test))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\""))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, as.logical(test))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(noquote(test))
[1] NA
gphic = subset_taxa(physeq1, as.logical(noquote(test)))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
noquote(test)
[1] Phylum=="p__Bacteroidetes"
as.logical(noquote(test))
[1] NA
as.logical(as.character(noquote(test)))
[1] NA
test2 <- eval(parse(text= test))
Error in eval(parse(text = test)) : object 'Phylum' not found
test2 <- eval(test)
gphic = subset_taxa(physeq1, as.logical(test2))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(test2)
[1] NA
And a lot of other permutations trying to sub in different things, but you get the idea.

gphic = subset_taxa(physeq1, eval(as.name(level_tax)) == king_list)
Here , level_tax is the variable in a loop. Say level_tax = "Order", then we convert the string "Order" into variable name by as.name(level_tax) or as.symbol(level_tax). Then we use eval(), which takes an expression and evaluates in the specified environment

Related

SMOTE: Error in names(x) <- value : 'names' attribute [510] must be the same length as the vector [509]

A <- read.csv("RNAexpression.csv")
data <- A[, c(1-511)]
data$FIELD1 <- factor(ifelse(data$FIELD1 == "control","rare","common"))
table(data$FIELD1)
newData0 <- SMOTE(FIELD1~., data, perc.over = 600,perc.under=100)
table(newData0$FIELD1)
Im trying to use SMOTE for oversampling but i got stuck in this error "Error during wrapup: 'names' attribute [510] must be the same length as the vector [509]" here my data set have 510 attributes but once it converted to vector it is becoming 509 anyone please help me with this.
thank you

How to filter data in R object so it only shows certain species?

Here is the code I am currently using after importing my OTU table, metadata, taxonomy and tree into R as one object. The issue I have is in the last line, I keep getting the error "Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent"
physeq = phyloseq(OTU, TAX, META, phy_tree)
library(phyloseq)
tps = transform_sample_counts(physeq, function(x) x / sum(x) )
ftps = filter_taxa(physeq, function(x) {sum(x >0)>1}, TRUE)
psfirm = subset_taxa(physeq, Phylum==“Firmicutes”)

Error code for tool subset_taxa; Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I load .biom data into R using library phyloseq.
Data2020 <- import_biom("XXX")
mapfile2020 <- import_qiime_sample_data("XXX")
tree <- read_tree("XXX")
mydata <- merge_phyloseq(Data2020, mapfile2020, tree)
colnames(tax_table(mydata)) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
ps <- mydata
ps1 <- subset_taxa(ps, Phylum != "NA")
Subset to the data I want.
WRB_data <- subset_samples(ps1, Site=="WRB")
Then Try to Subset by a taxa.
WRB_data2 <- subset_taxa(WRB_data, Phylum=="Acidobacteria")
Always results in this error. Please Help!?
Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
Hard to verify without data, but it looks like there is no taxa of the phylum Acidobacteria in WRB_data.
I faced a similar problem, and overcame it by making sure this part is correctly labeled
(Phylum=="Acidobacteria")

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:
here's the data:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met
also this:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. new to R but this shouldn't be so hard :(
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.
In your second, you're trying to assign the value "1055" to all of the 2nd column.
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.
Where dataset
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
Try the following
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for loop:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'
thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :)

How to change value if error occurs in for loop?

I have a loop that reads HTML table data from ~ 440 web pages. The code on each page is not exactly the same, so sometimes I need table node 1 and sometime I need node 2. Right now I've just been setting the node number manually in a list and feeding it into the loop. My problem is that the page nodes have started changing and updating the node # list is getting to be a hassle.
If the loop encounters the wrong node # (ie: 1 instead of 2, or reverse) it gives an error and shuts down. Is there a way to have the loop replace the erroneous node number to the correct one if it encounters an error, and then keep running the loop as if nothing happened?
Here's the readHTML portion of the code in my loop with an example url:
url <- "http://espn.go.com/nba/player/gamelog/_/id/2991280/year/2013/"
html.page <- htmlParse(url)
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Players$Nodes[s])
tbl = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
Here's the error I get when the node # is wrong:
"Error in readHTMLTable(tableNodes[[x]], colClasses = c("character"), stringsAsFactors = FALSE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[x]] : subscript out of bounds"
Example code:
A <- c("dog", "cat")
Nodes <- as.data.frame(1:1)
#)Nodes <- as.data.frame(1:2) <-- This works without errors
colnames(Nodes)[1] <- "Col1"
Nodes2 <- 2
url <-c("http://espn.go.com/nba/player/gamelog/_/id/6639/year/2013/","http://espn.go.com/nba/player/gamelog/_/id/6630/year/2013/")
for (i in 1:length(A))
{
html.page <- htmlParse(url[i])
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Nodes$Col1[i])
df = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
#tryCatch(df) here.....no clue
assign(paste0("", A[i]), df)
}
If you get subscript out of bounds error msg, then you should try to with a lower x for sure. General demo with tryCatch based on the demo code you posted in the original question (although I have replaced x with 2 as I have no idea what is Players and s):
> msg <- tryCatch(readHTMLTable(tableNodes[[2]], colClasses = c("character"),stringsAsFactors = FALSE), error = function(e)e)
> str(msg)
List of 2
$ message: chr "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript"| __truncated__
$ call : language readHTMLTable(tableNodes[[2]], colClasses = c("character"), stringsAsFactors = FALSE)
- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> msg$message
[1] "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript out of bounds\n"
> grepl('subscript out of bounds', msg$message)
[1] TRUE

Resources