for normalization of microarray data

for normalization of microarray data - r

I want to normalize data using RMA in R package. but there has problem it does not read .txt file. Please tell me, "what I do for normalizing data from .txt file?"
reply please

Basically all normalization methods in Bioconductor are based on the AffyBatch class. Therefore, you have to read your textfile (probably a matrix) and create an AffyBatch manually:
AB <- new("AffyBatch", exprs = exprs, cdfName = cdfname, phenoData = phenoData,...)

RMA needs ExpressionSet structure. After reading the file (read.table()) and cleaning colnames and row.names convert the file to matrix and use:
a<-ExpressionSet(assayData=matrix)
If didnt work, import your *.txt data to flexarray software which can read it and do rma.
This may work.

I use normalizeQuantiles() function from Limma R package:
library(limma)
mydata <- read.table("RDotPsoriasisLogNatTranformedmanuallyTABExport.tab", sep = "\t", header = TRUE) # read from file
b = as.matrix(cbind(mydata[, 2:5], mydata[, 6:11])) # set the numeric data set
m = normalizeQuantiles(b, ties=TRUE) # normilize
mydata_t <- t(as.matrix(m)) # transpose if you need

Related

R read excel by column names

So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl works for the basic stuff like this:
library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()
for (i in wb){
dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}
# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)
This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R to guess the col_types because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range argument uses the cell_ranger package which does not allow for reading discontinuous columns. So any alternative?

.xlsx >>> you can use library openxlsx
The read.xlsx function from library openxlsx has an optional parameter cols that takes a numeric index, specifying which columns to read.
It seems it reads all columns as characters if at least one column contains characters.
openxlsx::read.xlsx("test.xlsx", cols = c(2,3,6))
.xls >>> you can use library XLConnect
The potential problem is that library XLConnect requires library rJava, which might be tricky to install on some systems. If you can get it running, the keep and drop parameters of readWorksheet() accept both column names and indices. Parameter colTypes deals with column types. This way it works for me:
options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\") #path to jdk
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))
Edit:
Library readxl works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.
readxl::read_xls("test.xls", range = "B3:D8")
readxl::read_xls("test.xls", sheet = "Sheet1", range = cell_cols("B:E"))
readxl::read_xlsx("test.xlsx", sheet = 2, range = cell_cols(2:5))

Data transpose function in R not working properly

I am using R to do some work but I'm having difficulties in transposing data.
My data is in rows and the columns are different variables. When using the function phyDat, the author indicates a transpose function because importing data is stored in columns.
So I use the following code to finish this process:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- t(origin)
events <- phyDat(origin, type="USER", levels=c(0,1))
When I check the data shown in R studio, it is transposed but the result it is not. So I went back and modified the code as follows:
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
events <- phyDat(origin, type="USER", levels=c(0,1))
This time the data does not reflect transposed data, and the result is consistent with it.
How I currently solve the problem is transposing the data in CSV file before importing to R. Is there something I can do to fix this problem?

I had the same problem and I solved it by doing an extra step as follows:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- as.data.frame(t(origin))
events <- phyDat(origin, type="USER", levels=c(0,1))
Maybe it is too late but hope it could help other users with the same problem.

Export SpectraObjects to csv in ChemoSpec

I am using ChemoSpec to analyse FTIR spectra in R.
I was able to import several csv files using files2SpectraObject, applied some of the data pre-processing procedures, such as normalization and binning, and generated new SpectraObjects with the results.
Is it possible to export data back to csv format from the generated SpectraObjects?
So far I tried this
write.table(ftirbin, "E:/ftirbin.txt", sep="\t")
and got this:
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""Spectra"" to a data.frame
Thanks in advance!
G

If you look at ?Spectra you'll see how a Spectra object is stored. The intensity values are in your_object$data, and the frequency values are in your_object$freq. So you can't export the whole object (it's not a data frame, but rather a list), but you can export the pieces. To export the frequencies in the first column, and the samples in the following columns, you can do this (example uses a built in data set, SrE.IR):
tmp <- cbind(SrE.IR$freq, t(SrE.IR$data))
colnames(tmp) <- c("freq", SrE.IR$names)
tmp <- as.data.frame(tmp) # it was a matrix
Then you can write it out using write.csv or write.table (check the arguments to avoid row numbers).

R write to file / append at start of file

I am trying to write an input file that requires a single line in the first row telling if the file is sparse and if so how many variable levels there are. I know how to append a single line to the end of a file, but can't find a way to append to the first line of a file. Any suggestions?
library(e1071)
library(caret)
library(Matrix)
library(SparseM)
iris2 <- iris
iris2$sepalOver5 <- ifelse(iris2$Sepal.Length >= 5, 1, -1)
head(iris2)
summary(iris2)
trainRows <- sample(1:nrow(iris2), nrow(iris2) * .66, replace = F)
testRows <- which(!(1:nrow(iris2) %in% trainRows))
sum(testRows %in% trainRows)
sum(trainRows %in% testRows)
vtu1 <- c('Sepal.Width','Petal.Length','Petal.Width','Species')
dv1 <- dummyVars( ~., data = iris2[,vtu1], sparse = T)
train <- iris2[trainRows,]
test <- iris2[testRows,]
trainX <- as.matrix.csr(predict(dv1, train))
testX <- as.matrix.csr(predict(dv1, test))
trainY <- train[,'sepalOver5']
testY <- test[,'sepalOver5']
write.matrix.csr( as(trainX , "matrix.csr"), file= "amz.train" , fac = TRUE)
headString <- paste('sparse ',max(trainX#ja),sep = '')
I'd basically like to insert/append headString into amz.train in the first row. Any suggestions?

It is generally not possible to prepend to the start of a file (and if there are ways, they would be really inefficient, since the information of the start of the file in memory is generally unknown. This holds for any programming language).
Three options come to mind:
Read in the file, write the other information first, followed by the rest of the content of the file (might also be inefficient)
Write the information you want to prepend first
In the case you have a writer that cannot append (write.matrix for instance has no append option), you could try to merge this meta information with the data frame, and then writing it as a whole.
Since you are using a specialized format, I wouldn't recommend storing this meta-information this way.
Your file would look like:
sparse 6
1:3 2:5.2 3:2 6:1
1:3.7 2:1.5 3:0.2 4:1
1:3.2 2:6 3:1.8 6:1
And then there is option 4:
Rather, consider having a meta file which contains information such as file name, whether it is sparse or not and the number of levels. Here you could append, and if you would repeat this process it would be preferable. It will avoid problems of reading in weirdly formatted files.

R : How to write an XYZ file from a SpatialPointsDataFrame?

I have a SpatialPointsDataFrame which has one attribute (let's call it z for convenience) as well as lat/long coordinates.
I want to write this out to an XYZ file (i.e. an ASCII file with three columns).
Initially I tried
write.table(spdf, filename, row.names=FALSE)
but this wrote the z value first, followed by the coordinates, on each row. So it was ZXY format rather than XYZ. Not a big deal, perhaps, but annoying for other people who have to use the file.
At present I am using what feels like a really horrible bodge to do this (given below), but my question is: is there a good and straightforward way to write a SPDF out as XYZ, with the columns in the right order? It seems as though it ought to be easy!
Thanks for any advice.
Bodge:
dfOutput <- data.frame(x = coordinates(spdf)[,1], y = coordinates(spdf)[,2])
dfOutput$z <- data.frame(spdf)[,1]
write.table(dfOutput, filename, row.names=FALSE)

Why not just
library(sp)
spdf <- SpatialPointsDataFrame(coords=matrix(rnorm(30), ncol = 2),
data=data.frame(z = rnorm(15)))
write.csv(cbind(coordinates(spdf), spdf#data), file = "example.csv",
row.names = FALSE)

You can write to a .shp file using writeOGR from rgdal package. Alternatively, you could fortify (from ggplot2) your data and write that as a csv file.

Following up on Noah's comment about a method like coordinates but for data values: The raster package has the getValues() method for returning the values of a SpatialPointsDataFrame.
library(raster)
spdf <- raster('raster.sdat')
write.table(
cbind(coordinates(spdf), getValues(spdf)),
file = output_file,
col.names = c("X", "Y", "ZVALUE"),
row.names = FALSE,
quote = FALSE
)