Generating fastQC report - r

I am using the fastqcr R package to generate a multi-qc and single-qc reports of fastq files for RNAseq analysis. While my mlti-qc report works fine, I am finding the following error while trying to generate a single-qc report from the fastqc results zipped file.
Error in switch(status, PASS = "#00AFBB", WARN = "#E7B800", FAIL =
"#FC4E07") : EXPR must be a length 1 vector
The code I am using is
Step 6 - Building the final report
It creates an HTML file containing FastQC reports of one or multiple samples.
#for multi-qc
qc_report(qc.dir, result.file = "F:/SUDI#UCSF01/COURSES/RNA seq Analysis/scRNA seq by R/My Tutorials/Made by Sudi/Trial Analysis files/FastQC/fastqc_results/multi_qc_report",
experiment = "Exome sequencing of colon cancer cell lines", interpret = TRUE)
# For single-qc
qc.file1 <- "F:/SUDI#UCSF01/COURSES/RNA seq Analysis/scRNA seq by R/My Tutorials/Made by Sudi/Trial Analysis files/FastQC/fastqc_results/ERR522959_2_fastqc.zip"
qc.file1
qc_report(qc.file1, result.file = "F:/SUDI#UCSF01/COURSES/RNA seq Analysis/scRNA seq by R/My Tutorials/Made by Sudi/Trial Analysis files/FastQC/fastqc_results/single_qc_report", interpret = TRUE, preview = TRUE)
Can somebody help me trouble shoot this.
Thank you

I guess it is a version problem. Meaning that the scanning of the fastqc report depends on old versions and can't handle new ones. Just a guess. Because if you try to run
qc.file <- system.file("fastqc_results", "S1_fastqc.zip", package = "fastqcr")
qc_report(qc.file, result.file = "~/Desktop/result", interpret = TRUE)
This will work.
I am wondering what are the advantages of fastqcr, as MultiQC already have a very clear presentation of the data. And in the new versions of MultiQC you can also see an overview of failed and succeeded modules.

Related

R - [DESeq2] - Making DESeq Dataset object from csv of already normalized counts

I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)

Exported file from R using `haven` cannot be opened by SAS

When exporting data from R using haven::write_sas(), the resulting sas7bdat file is not recognized (i.e. cannot be loaded) by SAS EG/9.4. Although there are several other packages such as foreign that provide alternative approaches, I was hoping to find a relatively automated way to push a dataset from my R session directly into SAS.
When using haven, the file is made but cannot be opened by SAS EG nor 9.4:
# Load package
library(haven)
# Save data
write_sas(mtcars, "mtcars.sas7bdat")
Using foreign as alternative to haven:
library(foreign)
write.foreign(df = mtcars,
datafile = 'mtcars.txt',
codefile = 'mtcars.sas',
dataname = 'libraryname.tablename', # Destination in SAS to save the data
package = 'SAS')
Running the SAS code output from foreign is successful.
* Written by R;
* write.foreign(df = mtcars, datafile = "mtcars.txt", codefile = "mtcars.sas", ;
DATA libraryname.tablename ;
INFILE "mtcars.txt"
DSD
LRECL= 43 ;
INPUT
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
;
RUN;
However, neither of these methods help with automatically pushing the data directly from R into a SAS library, which would be preferable.
There is a lengthy discussion on GitHub describing some of the challenges when exporting data from R for use in SAS via haven. In addition to providing a solution on how to automate data transfer from R to SAS, I hope this can serve as an answer to some related questions.
If one wants to use tools designed by SAS for interoperability with R, RSWAT on GitHub is likely a more robust option. However, this will assume that you have access to SAS Cloud Analytics Services configured for this purpose.
If you are working with a SAS 9.4 on your machine and perhaps also connect to SAS servers (i.e. using rsubmit; commands), it should be relatively straightforward to pass a data-set directly from R into a SAS library. There are three steps:
Format dataset for SAS; although foreign will do a lot of the formatting changes, I prefer converting factors back to characters and having NA replaced with "". This I find ensures that no special formatting is needed by colleagues to open the final table in SAS.
# Example data
data <- data.frame(ID = c(123, NA, 125),
disease = factor(c('syphilis', 'gonorrhea', NA)),
AdmitDate = as.Date(c("2014-04-05", NA, "2016-02-03")),
DOB = as.Date(c("1990-01-01", NA, NA)))
# Function defined for converting factors and blanks
convert_format_r2sas <- function(data){
data <- data %>%
dplyr::mutate_if(is.factor, as.character) %>%
dplyr::mutate_if(is.character, tidyr::replace_na, replace = "")
return(data)
}
# Convert some formatting
data <- convert_format_r2sas(data)
Use foreign to export the data and associated code
library(foreign)
# Ensure the data and code files are saved in an easily accessible location (ideally in or downstream of your R project directory)
write.foreign(df = data ,
datafile = 'data.txt',
codefile = 'data.sas',
dataname = 'libraryname.tablename', # Destination in SAS to save the data
package = 'SAS')
Pass code to local SAS installation using custom function. You may need to adjust the location of the SAS.exe as well as the configuration file. This will work both passing a list of SAS files, or SAS code written directly in R as a character vector.
# Define function for passing the code to SAS and upload data (may require tweaking the local SAS installation location and configuration file)
pass_code_to_sas <- function(sas_file_list = NULL, inputstring = NULL,
sas_path = "C:/LocationTo/SASHome/SASFoundation/9.4/sas.exe",
configFile = "C:/LocationTo/SASHome/SASFoundation/9.4/SASV9.CFG") {
# If provided list of scripts, check they are all valid
if(!is.null(sas_file_list)){
if(any(purrr::map_lgl(sas_file_list, file.exists)) == FALSE | is.list(sas_file_list) == F){
stop("You entered an invalid file location or did not provide the locations as a list of characters")
}
}
sink(file.path(R.home(), "temp_codePass.sas"))
if(!is.null(sas_file_list)){
for(i in 1:length(sas_file_list)){
cat(readLines(sas_file_list[[i]]), sep = "\n")
}
}
cat(inputstring)
sink()
# Output message to view what code was sent...
message(paste0("The above info was passed to SAS: ",
if(!is.null(sas_file_list)){for(i in 1:length(sas_file_list)){cat(readLines(sas_file_list[[i]]), sep = "\n")}},
print(inputstring)))
# Run SAS
system2(sas_path,
args = paste0(
"\"", file.path(R.home(), "temp_codePass.sas"), "\"",
if(!is.null(configFile)) { paste0(" -config \"", configFile, "\"")}
)
)
# Delete the SAS file
file.remove(file.path(R.home(), "temp_codePass.sas"))
}
# Pass data to SAS
pass_code_to_sas(sas_file_list = 'path2codefile/data.sas')

R: use single file while running a for loop on list of files

I am trying to create a loop where I select one file name from a list of file names, and use that one file to run read.capthist and subsequently discretize, fit, derived, and save the outputs using save. The list contains 10 files of identical rows and columns, the only difference between them are the geographical coordinates in each row.
The issue I am running into is that capt needs to be a single file (in the secr package they are 'captfile' types), but I don't know how to select a single file from this list and get my loop to recognize it as a single entity.
This is the error I get when I try and select only one file:
Error in read.capthist(female[[i]], simtraps, fmt = "XY", detector = "polygon") :
requires single 'captfile'
I am not a programmer by training, I've learned R on my own and used stack overflow a lot for solving my issues, but I haven't been able to figure this out. Here is the code I've come up with so far:
library(secr)
setwd("./")
files = list.files(pattern = "female*")
lst <- vector("list", length(files))
names(lst) <- files
for (i in 1:length(lst)) {
capt <- lst[i]
femsimCH <- read.capthist(capt, simtraps, fmt = 'XY', detector = "polygon")
femsimdiscCH <- discretize(femsimCH, spacing = 2500, outputdetector = 'proximity')
fit <- secr.fit(femsimdiscCH, buffer = 15000, detectfn = 'HEX', method = 'BFGS', trace = FALSE, CL = TRUE)
save(fit, file="C:/temp/fit.Rdata")
D.fit <- derived(fit)
save(D.fit, file="C:/temp/D.fit.Rdata")
}
simtraps is a list of coordinates.
Ideally I would also like to have my outputs have unique identifiers as well, since I am simulating data and I will have to compare all the results, I don't want each iteration to overwrite the previous data output.
I know I can use this code by bringing in each file and running this separately (this code works for non-simulation runs of a couple data sets), but as I'm hoping to run 100 simulations, this would be laborious and prone to mistakes.
Any tips would be greatly appreciated for an R novice!

Reading binary data from accelerometer device into R

Essentially I want to know if there is a practical way to read a particular kind of binary file in to R. I have some Matlab code which does what I want but ideally I want to be able to do this in R.
The Matlab code is:
fid = fopen('filename');
A(:) = fread(fid, size*2, '2*uint8=>uint8',510,'ieee-le');
and so far in R I've been using:
to.read = file("filename", "rb")
bin = readBin(to.read, integer(), n = 76288, endian = "little")
The confusion I'm having is with the 3rd and 5th argument in the matlab function fread()- I don't understand exactly what '2*uint8=>uint8' or 'ieee-le' mean in terms of interpreting the binary data. This is what is holding me back from implementing it in R.
Also, the file extension is .cwa, apparently this is a very efficient format to have high frequency (100Hz) activity data recorded in.

Extract census data using ACS R package for all Zip codes and obtain future projections

I have a couple questions about how I can best use the acs R package. Thanks in advance for your help.
I would like to build up a comprehensive data frame that is a lookup table with all census data I can get from their API for each Zip code. Currently I just look up several individual tables using R code like the below example. Is there a better way of finding all available tables and build up the data table dataset automatically with the column names populated? I am aware of the acs.lookup function, but I would like to load all the tables and get the data for their zip codes. Is there a way to get a list of all the tables from the acs.lookup output, or maybe a complete list of the tables that are available?
I would also like to get future projection data for as many variables as I can get. I think I can calculate the projections that I found using the above methods using multiple years (2014, 2013, 2012, 2011) and using acs14lite R package for 2014. Before I do this I am wondering if the US census itself has future projections using this ACS data or something else?
Create user specified geographies
use all zip codes
zip_geo = geo.make(zip.code = "*")
Create race data frame
get race data
race.data = acs.fetch(geography=zip_geo, table.number = "B03002",
col.names = "pretty", endyear = 2013, span = 5)
create data frame of the demographics
zip_demographics = data.frame(region = as.character(geography(race.data)$zipcodetabulationarea),
total_population = as.numeric(estimate(race.data[,1])))
zip_demographics$region = as.character(zip_demographics$region)
convert to a data.frame
race_df = data.frame(white_alone_not_hispanic = as.numeric(estimate(race.data[,3])),
black_alone_not_hispanic = as.numeric(estimate(race.data[,4])),
asian_alone_not_hispanic = as.numeric(estimate(race.data[,6])),
hispanic_all_races = as.numeric(estimate(race.data[,12])))
zip_demographics$percent_white = (race_df$white_alone_not_hispanic / zip_demographics$total_population * 100)
zip_demographics$percent_black = (race_df$black_alone_not_hispanic / zip_demographics$total_population * 100)
zip_demographics$percent_asian = (race_df$asian_alone_not_hispanic / zip_demographics$total_population * 100)
zip_demographics$percent_hispanic = (race_df$hispanic_all_races / zip_demographics$total_population * 100)
You can download a copy of all of the codes in the 2010 table shells at the following link. It will start the download of and excel file when you click on it.
ACS Table Shells Download
what I did was load this document as a data frame, format the column with the appropriate codes and then just use cell addresses example: povertyNumerator<acsTable[781,2] to pull in a variable.
you cannot fully automate the process because you need to decide how to break out 'categories' of responses and do your own math, but outside of that you can work pretty quickly with this table and some acs package skills.
When I try to run the code above I get the following error messages:
> race.data = acs.fetch(geography=zip_geo, table.number = "B03002", col.names = "pretty", endyear = 2013, span = 5)
trying URL 'http://web.mit.edu/eglenn/www/acs/acs-variables/acs_5yr_2013_var.xml.gz'
Content type 'application/xml' length 720299 bytes (703 KB)
downloaded 703 KB
Error in .subset2(x, i, exact = exact) : subscript out of bounds
In addition: Warning message:
In (function (endyear, span = 5, dataset = "acs", keyword, table.name, :
temporarily downloading and using archived XML variable lookup files;
since this is *much* slower, recommend running
acs.tables.install()
And the output race.data is not produced. Any idea why this is happening?

Resources