Subsetting a Cell Data Set within Monocle - r

If anyone has any experience using the monocle package in R:
I am trying to subset my data based on a vector of sample names, but I cannot accomplish it.
I have tried:
x#phenoData$sampleNames <- example.cells
but I am getting this error:
replacement has 661 rows, data has 5809
The object I am trying to subset is a Cell Data Set (CDS) created from a Seurat object by the importCDS function.
I have also assigned a Cell Type to every sample that is called "CellType" which is part of the meta.data of the Seurat object and is listed under the varLabels slot of the phenoData after it is converted to a CDS.
I would like help subsetting based on either of these variables, thank you.

According to the monocle tutorial low quality cells were filtered with this code (HSSM is the monocle object):
valid_cells <- row.names(subset(pData(HSMM),
Cells.in.Well == 1 &
Control == FALSE &
Clump == FALSE &
Debris == FALSE &
Mapped.Fragments > 1000000))
HSMM <- HSMM[,valid_cells]
So for your example this should work:
x = x[,example.cells]
or (directly from Seurat):
x = x[,rownames(data.seurat#meta.data[data.seurat#meta.data$CellType == "interesting_cell",])]

This: x#phenoData$sampleNames <- example.cells is adding new data to the dataframe representing your sample treatments, instead of subsetting.
Try using x#phenoData$sampleNames %in% example.cells to retrieve a boolean vector (True, False) and filter using this:
x#phenoData[x#phenoData$sampleNames %in% example.cells,]
One small edit, this may mess up your CDS data structure, so be careful. It may be better to filter prior to generating the CDS or generate a new one from the old data.

Related

Create a histogram of specific columns and rows from a `data.frame` in R

## my data frame
crime = read.csv("url")
## specific columns that need to be represented
property_crime = crime$Burglary + crime$Theft + crime$`Motor Vehical Theft`
## the rows that I am looking for have the name "harris" within the column named "county_name"
## my attempt
with(crime, hist(harris))
## Error in hist(harris) : object 'harris' not found
Not sure why I am getting object 'harris' not found as that is the name under the county_name column. I'm new to R, could someone walk me through the process of displaying a histogram only including the values of specific columns and specific rows?
the rows that I am looking for have the name "harris" within the column named "county_name"
You have to tell R the same logic that you are telling us.
There are several ways of making this in R but I am going to put here the base R way.
We can access the desired rows of object crime column county_name by indexing like data.frame[rows, columns]. So, in your case, crime[harris_rows, "county_name"] should work. To get harris_rows, we can make a boolean index like so crime$county_name == harris. If we put all of this together and call hist():
hist(crime[crime$county_name == "harris", "county_name"])
You don't provide a reproducible example, but you can check a similar logic with the mtcars dataset. Here, I am making the histogram of the cars with mpg > 15
hist(mtcars[mtcars$mpg >15, "mpg"])
# this is another option that produces the same result
# hist(mtcars$mpg[mtcars$mpg >15])

Can't get 'plotweb' in the Biparite package to work (R)

I am trying to visualise a biparite network using the biparite package in R. My data consists of 4 columns in a spreadsheet. The columns contain 1) plant species names2) bee species names 3) site 4) interaction frequency. I first read the data into R from a CSV file, then convert it to a web using the helper function frame2webs. When I then try to visualise the network with plotweb() I get the error message:
Error in web[rind, cind, drop = FALSE] : incorrect number of dimensions
My code looks like this:
library(bipartite)
bee <- read.csv('TestFile.csv')
bees <- as.data.frame(bee)
BeeWeb <- frame2webs(bees, type.out = "array")
plotweb(BeeWeb)
I've also tried:
BeeWeb <- frame2webs(bees,
varnames = c("higher","lower","webID","freq"),
type.out = "array")
Please help! I am new to R and am struggling to make this work. Cheers!
Not sure what your data look like, but this happens to me when I have a single factor level in either the "higher" or "lower" column, type.out is "list", and emptylist is TRUE.
This is due to a problem in empty, a function that frame2webs only calls when type.out is "list" and emptylist is TRUE. empty finds the dimensions of your data using NROW and NCOL, which interpret a single row of input as a vertical vector. When there's only one factor level in "lower" or "higher", the input to empty is a one-row array. empty interprets this row as a column, hence the 'incorrect number of dimensions' error.
Two simple workarounds:
Set type.out to "array"
Set emptylist to FALSE

Can't complete cases of a data.frame

I'm coming because, I don't need help to realize the exercise, but I need help on an error that I can't fix..
This is the subject:
In R the more appropriate indicator for missing data is “NA” (not available). Therefore, replace each occurrence of “?” with “NA”.
a. For this exercise, create an R data frame for the mammographic data using only datapoints that have no missing values. This can be done using the complete.cases function which inputs a data frame and returns a Boolean vector v, where v[i] equals TRUE iff the i the data-frame sample is complete (meaning it does not possess an NA). For example, if the data-frame is stored in mammogram.frame, then mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),] creates a new data frame called mammogram2.frame that has all the complete mammogram data samples.
So I coded that:
mammogram = read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/mammographic-masses/mammographic_masses.data",
sep=",",
col.names=c("Birads","Age","Shape","Margin","Density","Severity"),
fill=TRUE,
strip.white=TRUE)
#Replace N/A by -1
mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]
#Display data frame
mammogram2
However I get this error:
> mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]
Error: object 'mammogram.frame' not found
I can't find on internet any solution about it, I tried lot of stuff but the missing values are still '?'
Thank

Unable to subset data in a shapefile

I would like to carry out a subsetting in my shapefile without specifying the name of the first column in the .dbf file.
To be more precise I would like to select all the rows with value 1 in the first column of the .dbf, but I don't want to specify the name of this column.
For example this script works because I specify the name of the column (as columnName)
library(rgdal) # readOGR
shapeIn <- readOGR(nomeFile)
shapeOut <- subset(shapeIn, columnName == 1)
instead it doesn't works
shapeOut <- (shapeIn[,1] == 1)
and I get an error message:
comparison (1) is possible only for atomic and list types shapeOut and shapeIn are ESRI vector files.
This is the header of my shapeIn
coordinates mask_1000_
1 (54000, 1218000) 0
2 (55000, 1218000) 0
3 (56000, 1218000) 0
Can you help me? Thank you
This
shapeOut <- (shapeIn[,1] == 1)
doesn't work beacuse SpatialPolygonsDataFrames contain other info other than the data. So "common" data.frame subsetting doesn't work in the same way. To have it work, you must make the "logical check" for subsetting on the #data slot: this should work (either using subset or "direct" indexing):
shapeOut <- subset(shapeIn, shapeIn#data[,1] == 1)
OR
shapeOut <- shapeIn[shapeIn#data[,1] == 1,]
(however, by recent experience, referencing to data by column number is seldom a good idea... ;-) )
ciao Giacomo !!!

Creating a data frame using a single line code

I need to select data for 3 variables and place them in a new data frame using a single line of code. The data frame I'm pulling from is Dance, the 3 variables are Lindy, Blues and Contra.
I have this:
Dance$new<-subset(Dance$Type==Lindy, Dance$Type==Blues, Dance$Type==Contra)
Can you tell what I'm doing wrong?
There are a number of ways you can do this, but I'd forget the subset part
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",]
If you only want specific columns
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",c("Col1", "Col2")]
Alternatively
danceNew <- Dance[Dance$Type %in% c("Blues", "Contra", "Lindy"),]
Again, if you only want specific columns do the same. The advantage with the final options is you can pass the values in as a variable, thereby making it more dynamic, e.g
danceNames <- c("Lindy", "Blues", "Contra")
danceNew <- Dance[Dance$Type %in% danceNames,]
you're mixing up the variables and the dataframes
this should do the trick..
if your initial dataframe is called "Dance" and the new dataframe is called "Dance.new":
Dance.new <- subset(Dance, Dance$Type=="Lindy" & Dance$Type=="Blues" & Dance$Type=="Contra"); row.names(Dance.new) <- NULL
I like using "row.names(Dance.new) <- NULL" line so I won't have the useless column of "row.names" in the new dataframe
Thanks for your help everyone. This is what ended up working for me.
dancenew<-subset(Dance, Type=="Lindy" | Type== "Blues" | Type=="Contra")

Resources