Renaming data in a Seurat object metadata - r

I have a seurat object "gunion.data".
The metadata for gunion.data#meta.data$orig.ident is either "control, "ischemia", "synIRI" or "alloIRI". I would like to change ""synIRI" and "alloIRI" into "other".
I tried this
gunion.data#meta.data$orig.ident["alloIRI"] <- "other"
but it gave me an error:
Error in $<-.data.frame(tmp, orig.ident, value = c("control", "control", : replacement has 26933 rows, data has 26932
How should I format the code to change all "alloIRI" and "synIRI" in the data into "other"?

What you want to do is rename an Ident.
The below should work once you've changed your idents to 'orig.ident'.
Idents(gunion.data) <- 'orig.ident'
gunion.data <- RenameIdents(object = gunion.data, `synIRI` = "other", `alloIRI` = "other")
Idents(gunion.data) #to confirm the change has happened.
I would suggest you also have a look at the Seurat Essential commands, found here.

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

Error in is.single.string(object) : argument "object" is missing, with no default

I want to parse the AAChange.refGene column and then use biomaRt R package to extract information. My code is raising Error in is.single.string(object) : argument "object" is missing, with no default even though the getSequence function is meant to accept multiple arguments.
library(tidyr)
variant_calls = read.delim("variant_calls.txt")
info = tidyr::separate(variant_calls["AAChange.refGene"], AAChange.refGene, c("Refseq ID", "cDNA level change", "Protein level change"), ":")
df = cbind(variant_calls["Gene.refGene"],info)
library(biomaRt)
ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="https://grch37.ensembl.org", path="/biomart/martservice")
pep <- vector()
for(i in 1:length(df$`Refseq ID`)){
temp <- getSequence(id=df$`Refseq ID`[i],type='refseq_mrna',seqType='peptide', mart=ensembl)
temp <- sapply(temp$peptide, nchar)
temp <- sort(temp, decreasing = TRUE)
temp <- names(temp[1])
pep[i] <- temp
}
df$Sequence <- pep
Traceback:
Error in is.single.string(object) :
argument "object" is missing, with no default
I got the same error and found out (using ?getSequence) that it was a conflict between packages (classic R), specifically biomart and seqinr which is used to handle fasta format thus probably used together often.
My solution consisted in calling the function like this:
biomaRt::getSequence()

R GWASTools createDataFile() error: "Error ... %in% names(...) is not TRUE"

I'm trying to create an intensity GDS file from existing Illumina files using createDataFile() function of GWASTools.
I tried this:
col.nums <- as.integer(c(1,11,12,13,14))
names(col.nums) <- c("snp", "BAlleleFreq", "LogRRatio", "a1", "a2")
variables <- c("genotype","BAlleleFreq","LogRRatio")
intens <- createDataFile(path="/pathexample/", "/pathexample/IntensityGDS", file.type="gds", variables=variables, snp.annotation=snpAnnot, scan.annotation=scanAnnot, sep.type=",", skip.num=12, col.total=14, col.nums=col.nums, scan.name.in.file=-1, allele.coding="nucleotide", precision="single", compress="LZMA_RA:1M", compress.geno="", compress.annot="LZMA_RA", array.name=NULL, genome.build=NULL, diagnostics.filename="createDataFile.diagnostics.RData", verbose=TRUE)
The error I'm getting is:
Error: all(c("snpID", "chromosome", "position", "snpName") %in% names(snp.annotation)) is not TRUE
However I know those column names are in both the snp.annotation snpAnnotationDataFrame (aka snpAnnot) and the underlying dataframe I used to create that snpAnnotationDataFrame. E.g.:
varLabels(snpAnnot)
yields
"snpName" "chromosome" "position" "rsID_real" "snpID"
Thanks!!
Apparently the problem was that createDataFile() takes regular R dataframes in the snp.annotation and scan.annotation arguments, not an object of class "snp annotation data frame." ie, no need to run the command SnpAnnotationDataFrame() on your dataframe, just insert the actual dataframe.

How to prevent coercion to list in R

I am trying to remove all NA values from two columns in a matrix and make sure that neither column has a value that the other doesn't.
code:
data <- dget(file)
dependent <- data[,"chroma"]
independent <- data[,"mass..Pantheria."]
names(independent) <- names(dependent) <- rownames(data)
for (name in rownames(data)) {
if(is.na(dependent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
if(is.na(independent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
}
print(dput(independent))
print(dput(dependent))
I am brand new to R and am trying to perform this task with a for loop. However, when I delete a section by assigning NULL I receive the following warning:
1: In independent$Aeretes_melanopterus <- NULL : Coercing LHS to a list
2: In dependent$name <- NULL : Coercing LHS to a list
No elements are deleted and independent and dependent retain all their original rows.
file (input):
structure(list(chroma = c(7.443501276, 10.96156313, 13.2987235,
17.58110922, 13.4991105), mass..Pantheria. = c(NA, 126.57, NA,
160.42, 250.57)), .Names = c("chroma", "mass..Pantheria."), class = "data.frame", row.names = c("Aeretes_melanopterus",
"Ammospermophilus_harrisii", "Ammospermophilus_insularis", "Ammospermophilus_nelsoni",
"Atlantoxerus_getulus"))
chroma mass..Pantheria.
Aeretes_melanopterus 7.443501 NA
Ammospermophilus_harrisii 10.961563 126.57
Ammospermophilus_insularis 13.298723 NA
Ammospermophilus_nelsoni 17.581109 160.42
Atlantoxerus_getulus 13.499111 250.57
desired output:
structure(list(chroma = c(10.96156313, 17.58110922, 13.4991105
), mass..Pantheria. = c(126.57, 160.42, 250.57)), .Names = c("chroma",
"mass..Pantheria."), class = "data.frame", row.names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
chroma mass..Pantheria.
Ammospermophilus_harrisii 10.96156 126.57
Ammospermophilus_nelsoni 17.58111 160.42
Atlantoxerus_getulus 13.49911 250.57
structure(c(126.57, 160.42, 250.57), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
126.57 160.42 250.57
structure(c(10.96156313, 17.58110922, 13.4991105), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
10.96156 17.58111 13.49911
Looks like you want to omit rows from your data where chroma or mass..Pantheria are NA. Here's a quick way to do it:
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
I'm not sure why you are breaking independent and dependent out separately, but after filtering out bad observations is a good time to do it.
Since those are your only two columns, this is equivalent to omitting rows from your data frame that have any NA values, so you can use a shortcut like this:
data = na.omit(data)
If you want to keep a "pristine" copy of your raw data, simply change the name of the result:
data_no_na = na.omit(data)
# or
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
As to what's wrong with your code, $ is used for extracting columns from a data frame, but you're trying to use it for a named vector (since you've already extracted the columns), which doesn't work. Even then, $ only works with a literal string, you can't use it with a variable. For data frames, you need to use brackets to extract columns stored in variables. For example, the built-in mtcars data has a column called "mpg":
# these work:
mtcars$mpg
mtcars[, "mpg"]
my_col = "mpg"
mtcars[, my_col]
mtcars$my_col ## does not work, need to use brackets!
You can never use $ with row names in a data frame, only column names.

R HTS package: combinef and aggts not working with gts object

I'm trying to apply the combinef and aggts functions from the R hts package to a time series matrix in order to obtain an optimized set of forecasts across a hierarchy. I've run the same code every month without issue, and am now seeing errors after upgrading to hts package v4.5.
Reproducible example (I can share data file offline if needed)
#Read in forecast data for all levels of hierarchy#
fcast<-read.csv("SampleHierarchyForecast.csv", header = TRUE, check.names = FALSE)
#Convert to time series#
fcast<-ts(fcast, start = as.numeric(2010.25) + (64)/12, end = as.numeric(2010.25) + (75)/12, f= 12)
#Create time series of only the bottom level of the hierarchy#
index<-c()
fcastBottom<-fcast
for (i in 1:length(fcastBottom [1,]))
{
if(nchar(colnames(fcastBottom)[i])!=28)
index[i]<-i
else
index[i]<-0
}
fcastBottom<-fcastBottom[,-index]
#Create grouped time series from the bottom level forecast #
GtsForecast <- gts(fcastBottom, characters = list(c(12,12), c(4)), gnames = c("Category", "Item", "Customer", "Category-Customer"))
#Use combinef function to optimally combine the full hierarchy forecast using the groups from the full hierarchy gts#
combo <- combinef(fcast, groups = GtsForecast$groups)
*Warning message:
In mapply(rep, as.list(gnames), times, SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter*
traceback()
2: stop("Argument fcasts requires all the forecasts.")
1: combinef(fcast, groups = GtsForecast$groups)
There's a little bug when comebinef() function calls gts(). Now I've fixed it on github. So you can run your own code above without any trouble after updating the development version.
Alternatively, you need to tweak your code a bit if you don't want to install the newest version.
combo <- combinef(fcast, groups = GtsForecast$groups, keep = "bottom")
combo <- ts(combo, start = as.numeric(2010.25) + (64)/12,
end = as.numeric(2010.25) + (75)/12, f = 12)
colnames(combo) <- colnames(fcastBottom)
newGtsForecast <- gts(combo, characters = list(c(12,12), c(4)),
gnames = c("Category", "Item", "Customer",
"Category-Customer"))
Aggregate <- aggts(newGtsForecast)
Hope it helps.

Resources