I usually import filtered feature bc matrix including barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz files to R environment by Read10X function, and convert the data to Seurat object by CreateSeuratObject function.
However, I found out that some publicly available processed scRNA-seq data was shared only in the format of counts.csv.gz file.
So, I tried to convert the counts.csv.gz files to Seurat object via following commands;
countsData<-read.delim(file = "~path/TUMOR1_counts.csv.gz", header = TRUE, sep = ",")
Tumor2 <- CreateSeuratObject(counts = countsData, project = "Tumor2", min.cells = 3, min.features = 200)
However, the following error occured.
Error in CreateAssayObject(counts = counts, min.cells = min.cells, min.features = min.features) :
No feature names (rownames) names present in the input matrix
Here is the counts.csv file that looks like this.
How can I solve this problem?
At first, count matrix as an input for CreateSeuratObject() should have the cells in column and features in row. It seems like that you should use t() to convert your imported counts with the rownames.
I recommend you do like this:
countsData <- read.csv(file = "~path/TUMOR1_counts.csv", header = TRUE, row.names = 1)
Tumor2 <- CreateSeuratObject(counts = t(countsData), project = "Tumor2", min.cells = 3, min.features = 200)
I think you have empty cells. You should fill them with zeros.
Related
I am trying to extract data using the rgbif package for multiple species (once the code works I'll be running a list of about 200 species, so it is important for me to implement a list).
I have tried to adapt code written in following link:
https://github.com/ropensci/rgbif/issues/377
This is what my input file looks like:
csv file
And my code looks as follows:
library("rgbif")
#input <- read.csv("C:/Users/omi30wk/Desktop/TESTsampledata_udi.csv", header = TRUE, fill = TRUE, sep = ",")
#since you guys don't have my csv file here are three samples species I'm using:
# Acanthorrhynchium papillatum, Acrolejeunea sandvicensis, Acromastigum cavifolium
#'taxon' as header, see image posted above of my csv file for clarity
allpts <- vector('list', length(input))
names(allpts) = input
for (taxon in input){
cat(taxon, "\n")
allpts[[taxon]] <- occ_data(scientificName = taxon, limit = 2) #error here
df <- allpts[[taxon]]$data
df$networkKeys = NULL
if (!is.null(df)) {
df <- df[, !apply(df, 2, function(z)
is.null(unlist(z)))]
write.csv(df, paste("/Users/user/Desktop/DATA Bats/allpts_30sept/", gsub(" ", "_", taxon), ".csv", sep = "")) } }
However I get following error message at the moment:
Error in `[[<-`(`*tmp*`, taxon, value = list(`Acanthorrhynchium papillatum` = list( :
no such index at level 1
I'm even happy to try different codes to extract multiple species data. I've already tried many codes (i.e. loops, etc) that also kept giving me error messages and I haven't been able to solve.
Any help is greatly appreciated!
I want to take the data from specific cells spread over two worksheets in one workbook, write it into one line on a new "consolidated" spreadsheet, and then repeat for all workbooks in said folder.
I'm struggling with pulling the specific cells and writing it to one line.
Cells D1, D4 and D7 need pulling on sheet 1, along with B4-F6 (rectangle) on sheet 2.
So far I can identify the correct folder, and also pull the data i need, however this is only for one named file at a time.
What I am unable to do is use read_xlsx across multiple sheets across multiple workbooks at once.
Grateful for any advice.
Some code i am (unsuccessfully) using below.
the following finds the folder
file.list <- list.files(path="FILE PATH", pattern="*.xlsx", full.names=TRUE, recursive=FALSE)
The following i can only work for one notebook at a time
Ideally i could use the "file.list" described above. Besides i can only pull a rectangle and not three specific cells without having the code repeated three times (which is not a problem if thats my only solution)
Info <- read_xlsx("FILE PATH", sheet = 1, range = "G6:G12", col_names = FALSE,
col_types = "guess" , na = "", trim_ws = TRUE, skip = 0,
# n_max = Inf, guess_max = min(1000, n_max),
progress = readxl_progress(), .name_repair = "unique")
Amount <- read_xlsx("FILE PATH", sheet = 2, range = "D4:G6", col_names = FALSE,
col_types = "numeric" , na = "", trim_ws = TRUE, skip = 0,
# n_max = Inf, guess_max = min(1000, n_max),
progress = readxl_progress(), .name_repair = "unique")
and i'm having mixed success with lapply/sapply
I would like to download and open the following Excel-file with monthly and annual consumer price indices directly from within R.
https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master
(the link can be found on this site: https://www.bfs.admin.ch/bfs/de/home/statistiken/preise/landesindex-konsumentenpreise/lik-resultate.assetdetail.7066959.html)
I used to download this file manually using the browser, save it locally on my computer, then open the xlsx-file with R and work with the data without any problems.
I have now tried to read the file directly from within R, but without luck so far. As you can see from the URL above, there is no .xlsx extension or the like, so I figured the file is zipped somehow. Here is what I've tried so far and where I am stuck.
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
temp <- tempfile()
download.file(likurl, temp)
list.files <- unzip(temp,list=TRUE)
data <- read.xlsx(unz(temp,
+ list.files$Name[8]), sheetIndex=2)
The result from the last step is
Error in +list.files$Name[8] : invalid argument to unary operator
I do not really understand the unz function, but can see this is somehow wrong when reading the help file for unz (I found this suggested solution somewhere online).
I also tried the following, different approach:
library(XLConnect)
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
tmp = tempfile(fileext = ".xlsx")
download.file(likurl, tmp)
readWorksheetFromFile(tmp, sheet = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE)
with the last line returning as result:
Error: ZipException (Java): invalid entry size (expected 1644 but got 1668 bytes)
I would greatly appreciate any help on how I can open this data and work with it as usual when reading in data from excel into R.
Thanks a lot in advance!
Here's my solution thanks to the hint by #Johnny. Reading the data from excel worked better with read.xlsx from the xlsx-package (instead of read_excel as suggested in the link above).
Some ugly details still remain with how the columns are named (colNames are not passed on correctly, except for the first and 11th column) and how strangely new columns are created from the options passed to read.xlsx (e.g., a column named colNames, with all entries == TRUE; for details, see the output structure with str(LIK.m)). However, these would be for another question and for the moment, they can be fixed in the quick and dirty way :-).
library(httr)
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl<-'https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master'
p1f <- tempfile()
download.file(likurl, p1f, mode="wb")
GET(likurl, write_disk(tf <- tempfile(fileext = ".xlsx")))
# annual CPI
LIK.y <- read.xlsx(tf,
sheetIndex = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.y$X. <- as.numeric(LIK.y$X.)
str(LIK.y)
# monthly CPI
LIK.m <- read.xlsx(tf,
sheetIndex = 1, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.m$X. <- as.numeric(LIK.m$X.)
str(LIK.m)
I am trying to use specan function from warbleR package. I want to pass my own wav file as an argument to the function. I have seen only one example in docs which is not much self explanatory.
wave_file <- readWave("C:/Users/ABC/Downloads/file_example_WAV_1MG.wav", from = 1, to = Inf, units = c("seconds"), header = FALSE, toWaveMC = NULL)
head(wave_file)
mono_file <- mono(wave_file, which = c("both"))
head(mono_file)
auto_file <- autodetec(X = "C:/Users/ABC/Downloads/file_example_WAV_1MG.wav")
head(auto_file)
dataframe <- data.frame(list = c("sound.files", "selec", "start", "end"))
dataframe <- data.frame(wave_file, "abc", 1, Inf)
dataframe
# Existing Example found in R docs
#setwd('C:/Users/ABC/Downloads')
#data1 <- data(list = c("Phae.long1", "Phae.long2", "Phae.long3", "Phae.long4", "selec.table"))
#writeWave(Phae.long1,"Phae.long1.wav")
#writeWave(Phae.long2,"Phae.long2.wav")
#writeWave(Phae.long3,"Phae.long3.wav")
#writeWave(Phae.long4,"Phae.long4.wav")
#writeWave(Phae.long1,"file_example_WAV_1MG.wav")
#writeWave(Phae.long2," ")
#writeWave(Phae.long3,"1")
#writeWave(Phae.long4,"Inf")
getwd()
#file <- specan(X = selec.table, bp = c(0, 22))
#head(file)
file <- specan(X = dataframe, bp = c(0,22))
How to give my own .wav file as argument to the specan function?
Instead of passing the actual wav file to the dataframe, pass the name of that file. So your code should look like this;
dataframe <- data.frame(list = c("sound.files", "selec", "start", "end"))
dataframe <- data.frame("file_example_WAV_1MG.wav", 2, 1, 20)
names(dataframe) <- c("sound.files", "selec", "start", "end")
a <- specan(X=dataframe, bp=c(0,22))
You can then view a. The extracted features will be stored in the dataframe. Make sure your file is stored in the working directory.
I am trying to load an example dataset from here: http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv to run an example PCA.
The correctly loaded data frame can be replicated with this line of code:
decathlon = read.csv('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
header = TRUE, row.names = 1, check.names = FALSE,
dec = '.', sep = ';')
However, I was wondering if this can be simulated with function(s) from readr package. Suitable function for this seems to be read_csv2, however, the row.names command is not available:
dplyrtlon = read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_names = TRUE, col_types = NULL, skip = 0)
Any suggestion on how to do this within readr?
readr returns tibbles instead of data frames. Tibbles are much faster and memory efficient than data frames but do not support row names.
Depending on what you want to do with your data after reading it in, you could either, add a column name to the first column (it looks like last names):
dplyrtlon <- read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_types = NULL, skip = 0)
names(dplyrtlon)[1] <- "last_name"
or you could convert the variable to a data frame, and use the content of the first column to set up row names:
r <- as.data.frame(dplyrtlon)
rownames(r) <- r[, 1]
r <- r[, -1]