Converting a (probable) ENVI file to decimals using R or excel - r

I got the output file from a spectrometer which is supposed to be a series of decimals numbers. The file looks like this:
™pQH1JHxþFH$ÏFH÷~EHa×BHäBHßdBH.#H²Ï=HL=HŒÚ<Hê‰:H­P:Hoõ9H¢Ž6Hº7H¨Y5H ?1H½¶.Hø²0HøŽ2H8æ.H.î,HŒt/H&1H͸0Hí.Hvî,H$ª+HµX+HCý*H·W+H!º+HP+HfØ(Hû'H†Ù'H|U(HQ`)Hn*H
})H'Hó%HÂ%H¶¨&H&H|•&H\
I have been reading a lot without getting to the solution. My silly question is: is that a ENVI or ASCII file? Or? How can I see the numbers I need do to use? I tried some online converters without being successful.
The starting point would be to get these numbers to develop a R code to make graphs. Thanks a lot for your time.

This looks like you opened the binary file of the mass spectrometer. Almost all vendors keep their format a secret. The only way to do this is to export it to an open format. Most vendors supply some kind of data analysis software and there are often export functions present. Most general open data formats are mzXML and mzML.
For converting have a look at the msconvert program from ProteoWizard.
If you have converted the data one of the packages in R where you can start with is XCMS.

Related

Writing .xlsx in R, importing into PowerBi error

I'm experiencing an odd error. I have a large dataframe in R (75000 rows, 97 columns) and I need to save it out and then import it into Power Bi.
At first I just did the simple:
library(tidyverse)
write_csv(Visits,"Visits.csv")
and while it seems to export and looks fine in excel, the csv itself is all messed up when I look at the contents in Power Bi. Here's an example of what I mean:
The 'phase.x' column should only have "follow-up" or "treatment" in that column. In excel, looks great:
but that exact same file gets screwed up in Power Bi:
I figured that being a 'comma separated variable' file, there must be some extra comma somewhere, and I saved it as an .xlsx instead.
So, while in excel, I saved that .csv as an .xlsx and it opened great in Power Bi!
Jump forward a moment and instead of write_csv() in R, I use write.xlsx(). But now I get this error:
If I simply go to that file, open it in excel, save it and hit close, that error goes away and it can load into Power Bi just fine. I figure it has something to do with this question on here.
Any ideas on what I might be screwing up as I save it out of R? Somehow I can fix it in R and not have to open and save it every time?
In power BI check that your source has ignore quoted line breaks enabled. I've found this is often an issue with .csv files in PowerBI.

Excel text data into R

I've been researching into this question, which I assume should be easy to fix, but am not having any luck. I have an excel file where each cell is some text of variable length. I'm wanting to read this into R so I can eventually do some text classification, but am failing. I get errors when using read.table and am struggling with all other alternatives. I've never worked with text data before so perhaps this is my issue. Having problems finding good examples of importing text data into R when it isn't in a corpus format.
There are special packages for reading data from the excel format. I mostly use readxl when I need to do this, but I know that there are several (a lot of them are described in this tutorial by datacamp, in the section Importing Excel files into R).
Another possibility (assuming you are using windows) is to copy the cells to the clipboard and use
read.table("clipboard")
for macOS and Linux there are similar commands, but I don't know them by heart.

Turn off scientific notation before writing to file

My data.frame uses the scientific notation, when parsing files like 3.007530e+07.
I definitely like to use it in R, however, for this analysis I have to transfer my data to csv and open it in excel(German Version), which cannot handle this notation.
My df looks sth like that:
df <- c(6.402000e+05,9.312903e+05,1.007800e+06,1.142000e+06,1.298500e+06,1.511700e+06,1.749000e+06,1.869357e+06)
I tried changing my global options such as options(scipen=999), which does not work, because then I have problems with my fread function.
Therefore, my question:
How to change the notation in a data.frame before, using write.csv()?
I appreciate your replies!
As an alternative to altering the R format (since you want to keep scientific notation in R), could you change how Excel imports your file?
For example, naming your csv file with a non-standard extension to trigger the manual importing process (import wizard), instead of automatically opening the file in the wrong format?
I tried a simple test with a csv formatted file of numbers in scientific notation, saved with a ".sci" filename. My version of Excel launched the wizard, then imported the file and handled the scientific notation correctly [MS Excel Starter 2010, English version].
Edit: I found the reference to using an unrecognized file extension to trigger Excel's import wizard: http://excelribbon.tips.net/T012201_Avoiding_Scientific_Notation_on_File_Imports.html
[The article suggests using .DAT, which I wouldn't use for an ASCII file, but I wanted to give credit where it's due for the idea.]

How to export a dataset to SPSS?

I want to export a dataset in the MASS package to SPSS for further investigation. I'm looking for the EuStockMarkets data set in the package.
As described in http://www.statmethods.net/input/exportingdata.html, I did:
library(foreign)
write.foreign(EuStockMarkets, "c:/mydata.txt", "c:/mydata.sps", package="SPSS")
I got a text file but the sps file is not a valid SPSS file. I'm really looking for a way to export the dataset to something that a SPSS can open.
As Thomas has mentioned in the comments, write.foreign doesn't generate native SPSS datafiles (.sav). What it does generate is the data in a comma delimited format (the .txt file) and a basic syntax file for reading that data into SPSS (the .sps file). The EuStockMarkets data object class is multivariate time series (mts) so when it's exported the metadata is lost and the resulting .sps file, lacking variable names, throws an error when you try to run it in SPSS. To get around this you can export it as a data frame instead:
write.foreign(as.data.frame(EuStockMarkets), "c:/mydata.txt", "c:/mydata.sps", package="SPSS")
Now you just need to open mydata.sps as a syntax file (NOT as a datafile) in SPSS and run it to read in the datafile.
Rather than exporting it, use the STATS GET R extension command. It will take a specified data frame from an R workspace/dataset and convert it into a Statistics dataset. You need the R Essentials for Statistics and the extension command, which are available via the SPSS Community site (www.ibm.com/developerworks/spssdevcentral)
I'm not trying to answer a question that has been answered. I just think there is something else to complement for other users looking for this.
On your SPSS window, you just need to find the first line of code and edit it. It should be something like this:
"file-name.txt"
You need to find the folder path where you're keeping your file:
"C:\Users\DELL\Google Drive\Folder-With-Your-File"
Then you just need to add this path to your file's name:
"C:\Users\DELL\Google Drive\Folder-With-Your-File\file-name.txt"
Otherwise SPSS will not recognize the .txt file.
Sorry if I'm repeating some information here, I just wanted to make it easier to understand.
I suppose that EuStockMarkets is a (labelled) data frame.
This should work and even keep the variable and value labels:
require(sjlabelled)
write_spss(EuStockMarkets, "mydata.sav")
Or you try rio:
rio::export(EuStockMarkets, "mydata.sav")

Can SAS still read or create the combination of {.dat fixed-column ascii data file, .sas syntax file}, or is it obsolete?

In the past I have used the excellent SAScii package in R to read in this type of data: {.dat fixed-column data file + the corresponding .sas "syntax" file}. I want to be quite precise about that because there is no end of ambiguity surrounding phrases like "SAS file". These .dat files contain only integers, and the .sas files specify both the way to parse the columns and the way the integers represent the values in the actual data (this feature is sometimes called the "codebook".) I have found very good data in that format (i.e. in the form of the pair of files {.dat, .sas}) from places like Minnesota Population Center's IPUMS https://usa.ipums.org/usa/, and built up a lot of tools to analyze it using R and SAScii.
Now I have access to SAS itself, and but would still like to re-use some of my tools and techniques. However I can find no reference in SAS to data like that {fixed-column data in .dat, syntax file in .sas}. Has that format been entirely superseded within SAS (perhaps by the SAS7BDAT format)? Or perhaps the {.dat,.sas} format was never used within SAS?? The reason I ask is, now that I have access to SAS and so much data in SAS7BDAT format, I would like to be able to export some of it in {.dat, .sas} format for use with my own tools.
Thanks very much, and cheers - Ed
I don't think this is something built into SAS. You could, however, write such a program pretty easily.
First off, Chris Hemidinger has written something that basically does this (it creates datalines, not .dat file, but that shouldn't be too hard to modify if you know .NET and/or to modify the R module to accept). That is discussed and available here. The title of the post is "Turn your data set into a data step program". This is roughly equivalent to the SQL Server task that creates "Create Table" code out of a table. This would only work in Enterprise Guide, although you should be able to do roughly the same thing in a standalone .NET program.
Second, you can easily write something like this in Base SAS. Creating the datalines is easy, numerous ways to write out to a file.
For a CSV, for example, you can do this.
ods csv file="c:\temp\mydata.csv";
proc print data=mydata;
run;
ods csv close;
If you're going to write a flat file, you might as well make the input/output .sas first - after all it can be almost the same code. You can query dictionary.columns to generate the code, both the input and output code. Create a table with the variable names, lengths, and formats for each variable, then process it in a data step advancing the start variable by the length of each variable (so it moves to the next position after the last one finished). If you need formats for your R project, then proc format cntlout=<datasetname> will generate a dataset that contains those formatted value translations, and you can write that out in whatever format you need as well.

Resources