I have an Agilent dataset. When I use R for reading it, I create a file which is called target file, which is a txt file containing all the sample txt. How can we read all the sample in MATLAB as R?
the code in matlab is :
AGFEData = agferead(File)
%//example:
agfeStruct = agferead('fe_sample.txt')
We use quantile normalization in R. And how we normalize it in MATLAB, the normalization code in MATLAB for a micro array is XNorm = manorm(X).
Related
I want to write a correlation matrix to an SPSS file. The file format is .sav and the format of the required correaltion matrix is a mixture of character string columns and numeric columns (mostly numeric). SPSS automatically recognises strings as nominal data and numbers as scale data. If SPSS does not recognise the columns as nominal or scale then I cannot load the SPSS file directly into AMOS for some modelling. When I write a dataframe as a .sav file using the write.sav (df, "filename", digits = x) function from the misty package, SPSS will recognise the numbers as scale data but limits the decimal places to 2 and in the case of 0.999 SPSS rounds the value up to 1.00. I need SPSS to recognise my dataframe numbers to 5 or 8 decimal places. If anyone can help me solve this - I would be very grateful!
I´ve ensured that the dataframe column types are correct before writing the .sav file and I have tried converting the dataframe to a tibble and ensuring the columns are the correct class before writing the .sav file. In both cases, SPSS only loads the .sav files with two decimal points. I have checked the settings in SPSS and of course SPSS will display .sav files generated by itself to any number of deicmal points and .csv files. Writing a .csv file from R and then importing to SPSS is a work-around but sub-optimal as I need to write a workable file direct to the .sav format that SPSS and AMOS will read correctly to many deicmal points. Note that SPSS will display more deicmal places if the numeric values are written to the .sav file as strings, however, the .sav file will not then load into AMOS because the columns are recognised by SPSS as nominal data (strings) and not scale data (numbers). Ultimately, I need a .sav file that causes SPSS to recognise numeric data as scale data with more than 2 decimal places. AMOS just follows whatever SPSS does.
Some test code could be:
install.packages("misty")
library(misty)
col1 = c(0.111,0.222,0.333)
col2 = c(0.444,0.555,0.666)
col3 = c(0.777,0.888,0.999)
df = data.frame(col1, col2, col3)
df
str(df)
write.sav(df,"test.sav", digits = 5)
Thanks in advance!
I just start using R and I have a question regarding cluster analysis in R.
I apply agnes function to apply cluster analysis for my dataset. But I realized that cluster results and the pltrees are different when I used the .txt file and .csv file.
Maybe it would be better to explain my problem with the images:
My dataset in .txt format;
I used the following code to see the data in R;
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T)
and everything is fine, it seems like;
I apply the cluster anaylsis,
complete1 <- agnes(data01, stand = FALSE, method = 'complete')
plot(complete1, which.plots=2, main='Complete-Linkage')
And here is the pltree:
I made the same steps with .csv file, which includes exactly the same dataset. Here is the dataset in .csv format:
Again the cluster analysis for .csv file:
data02 <- read.csv("D:/CLUSTER_ANALYSIS/NumericData3.csv", header = T)
complete2 <- agnes(data02, stand = FALSE, method = 'complete')
plot(complete2, which.plots=2, main='Complete-Linkage')
And the pltree is completely different,
So, DECIMAL SEPARATOR for the txt is COMMA and for csv file it is DOT. Which of these results are correct? Is the decimal separator for numeric dataset comma or dot in R?
From the R manual on read.table (and read.csv) you can see the default separators. They are dot for each of your used functions. You can also set them to whatever you like with the "dec" parameter. Eg:
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T, dec=",")
The following codes are what I thought of, it is kind of slow, any suggestions? Thank you!
The details are that first create a dataset in proc iml using R code, then transport that into regular SAS proc mixed statement to analyze it, then use proc append to store the results, then iterate the process 10000 times.
proc iml;
do i= 1 to 100000;
submit / R;
library(mvtnorm)
library(dplyr)
library(tidyr)
beta <- matrix(1:50, byrow = TRUE,10,5)
sigma <- matrix(1:25, 5)
sigma [lower.tri(sigma )] = t(sigma )[lower.tri(sigma )]
sample <- t(apply(beta, 1, function(m) rmvnorm(1, mean=m, sigma=sigma)))
Group = rep(factor(LETTERS[1:2]),each=5,1)
sample <- cbind(sample,Group,c(1:5))
concat <- function(x) paste0('Visit', x[, 2], 'Time', x[, 1])
cnames <- c(paste0("Time", 1:5),"Group","ID")
colnames(sample) <- cnames
sample <- data.frame(sample)
sample <- gather(sample, Visit, Response, paste0("Time", 1:5), factor_key=TRUE)
endsubmit;
call ImportDataSetFromR( "rdata", "sample" );
submit;
Proc mixed data=rdata;
ods select none;
class Group Visit ID;
model Response = Visit|Group;
repeated Visit/ subject=ID type=un;
ods output Tests3=Test;
run;
proc append data=Test base=result force ;
run;
ENDSUBMIT;
end;
Quit;
proc print data=result;
run;
The ideal approach would be to do the full simulation in SAS/IML because that would minimize the transfer of data between SAS and R. You can use the RANDNORMAL function to simulate multivariate normal data. Use the CREATE/APPEND statements to save the simulated samples to a SAS data set. Then call PROC MIXED and use a BY statement to analyze all the samples. See "Simulation in SAS," for the general ideas. No SUBMIT blocks are required. If you experience programming issues, consult the "Simulation" posts on The DO Loop blog, or if you intend to do a lot of simulation in SAS, you might want to find a copy of Simulating Data with SAS (Wicklin, 2013)
If you don't know SAS/IML well enough to run the simulation, then generate all 100,000 samples in R (vectorize, if possible) and manufacture a SampleID variable to identify each sample. Then import the entire data into SAS and use the BY statement trick to do the analysis.
Don't know exactly what you are doing so this has to be general.
Move the loop inside of the R code. Stay inside R to generate 1 big data frame and then import that into SAS. Looping over those submits will be slower. There is necessary overhead to call R, import the data from R (which is another R call), and then to run your SAS append. Putting the loop into R eliminates that overhead.
Essentially I want to know if there is a practical way to read a particular kind of binary file in to R. I have some Matlab code which does what I want but ideally I want to be able to do this in R.
The Matlab code is:
fid = fopen('filename');
A(:) = fread(fid, size*2, '2*uint8=>uint8',510,'ieee-le');
and so far in R I've been using:
to.read = file("filename", "rb")
bin = readBin(to.read, integer(), n = 76288, endian = "little")
The confusion I'm having is with the 3rd and 5th argument in the matlab function fread()- I don't understand exactly what '2*uint8=>uint8' or 'ieee-le' mean in terms of interpreting the binary data. This is what is holding me back from implementing it in R.
Also, the file extension is .cwa, apparently this is a very efficient format to have high frequency (100Hz) activity data recorded in.
Is there anyway without converting to char to replace NA with blank or nothing?
I used
data_model <- sapply(data_model, as.character)
data_model[is.na(data_model)] <- " "
data_model=data.table(data_model)
however it changes all the columns' types to categorical.
I want to save the data set and use it in sas it does not understand NA.
Here's a somewhat belated (and shameless self-promotion) from The R Primer on how to export a data frame to SAS. It should automatically correctly handle your NAs:
First you can use the foreign package to export the data frame as a SAS xport dataset. Here, I'll just export the trees data frame.
library(foreign)
data(trees)
write.foreign(trees, datafile = "toSAS.dat",
codefile="toSAS.sas", package="SAS")
This gives you two files, toSAS.dat and toSAS.sas. It is easy to get the data into SAS since the codefile toSAS.sas contains a SAS script that can be read and interpreted directly by SAS and reads the data in toSAS.dat.