Export SpectraObjects to csv in ChemoSpec - r

I am using ChemoSpec to analyse FTIR spectra in R.
I was able to import several csv files using files2SpectraObject, applied some of the data pre-processing procedures, such as normalization and binning, and generated new SpectraObjects with the results.
Is it possible to export data back to csv format from the generated SpectraObjects?
So far I tried this
write.table(ftirbin, "E:/ftirbin.txt", sep="\t")
and got this:
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""Spectra"" to a data.frame
Thanks in advance!
G

If you look at ?Spectra you'll see how a Spectra object is stored. The intensity values are in your_object$data, and the frequency values are in your_object$freq. So you can't export the whole object (it's not a data frame, but rather a list), but you can export the pieces. To export the frequencies in the first column, and the samples in the following columns, you can do this (example uses a built in data set, SrE.IR):
tmp <- cbind(SrE.IR$freq, t(SrE.IR$data))
colnames(tmp) <- c("freq", SrE.IR$names)
tmp <- as.data.frame(tmp) # it was a matrix
Then you can write it out using write.csv or write.table (check the arguments to avoid row numbers).

Related

How do I export a textstat_simil document without losing observations or variables?

I'm new to quanteda and I am having issues exporting my documents. I am comparing two documents, "dfm_latam", with more than 27k observations, and "dfm_cosines", which consists of two corpuses with texts to be compared with each one of the 27k observations of the dfm_latam database.
corpus_cosine_2 <- corpus(cosine_2_pdf)
corpus_cosines <- corpus_cosine_1 + corpus_cosine_2
dfm_cosines <- dfm(corpus_cosines, case_insensitive = TRUE)
corpus_latam <- corpus(latam_review)
docvars(corpus_latam, "Text") <- names(corpus_latam$text)
dfm_latam <- dfm(corpus_latam, case_insensitive = TRUE)
simil_latam <- textstat_simil(dfm_latam, dfm_cosines, method = "cosine", margin = "documents", case_insensitive = TRUE)
view(simil_latam)
The view() function in R provides me with the first 1000 rows and everything is fine. Both numeric variables from the dfm_cosines are showing up. But, when I try to export it as an Excel document, the output looks completely different from the view() 1000 rows preview. One of the variables is missing, and the .xlsx output only shows "corpus_cosine_1's" results. The dfm "dfm_cosines" is made after both "corpus_cosine_1" and "corpus_cosine_2". Why does it happen when I export it?
openxlsx::write.xlsx(simil_latam, file = "F:\\path\\simil_latam.xlsx")
So, I tried exporting along with the view() function:
openxlsx::write.xlsx(view(simil_latam), file = "F:\\path\\simil_latam.xlsx")
For this write.xlsx(view()), the variables showing up are just right, but I only export 1.000 observations out of the 27.000+ I have. How do I automatically export all of the observations of the table with all variables showing up?
You need to convert the textstat_simil object to something more spreadsheet-like. Try
as.matrix(simil_latam)
before you call write.xlsx() or if you prefer this format,
as.data.frame(simil_latam)
I suggest you inspect both coerced objects before exporting them, and also see the help functions for each of these for these methods (found in the quanteda.textstats package).

Different results for same dataset in Cluster Analyses with R Studio?

I just start using R and I have a question regarding cluster analysis in R.
I apply agnes function to apply cluster analysis for my dataset. But I realized that cluster results and the pltrees are different when I used the .txt file and .csv file.
Maybe it would be better to explain my problem with the images:
My dataset in .txt format;
I used the following code to see the data in R;
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T)
and everything is fine, it seems like;
I apply the cluster anaylsis,
complete1 <- agnes(data01, stand = FALSE, method = 'complete')
plot(complete1, which.plots=2, main='Complete-Linkage')
And here is the pltree:
I made the same steps with .csv file, which includes exactly the same dataset. Here is the dataset in .csv format:
Again the cluster analysis for .csv file:
data02 <- read.csv("D:/CLUSTER_ANALYSIS/NumericData3.csv", header = T)
complete2 <- agnes(data02, stand = FALSE, method = 'complete')
plot(complete2, which.plots=2, main='Complete-Linkage')
And the pltree is completely different,
So, DECIMAL SEPARATOR for the txt is COMMA and for csv file it is DOT. Which of these results are correct? Is the decimal separator for numeric dataset comma or dot in R?
From the R manual on read.table (and read.csv) you can see the default separators. They are dot for each of your used functions. You can also set them to whatever you like with the "dec" parameter. Eg:
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T, dec=",")

Convert XML Doc to CSV using R - Or search items in XML file using R

I have a large, un-organized XML file that I need to search to determine if a certain ID numbers are in the file. I would like to use R to do so and because of the format, I am having trouble converting it to a data frame or even a list to extract to a csv. I figured I can search easily if it is in a csv format. So , I need help understanding how to do convert it and extract it properly, or how to search the document for values using R. Below is the code I have used to try and covert the doc,but several errors occur with my various attempts.
## Method 1. I tried to convert to a data frame, but the each column is not the same length.
require(XML)
require(plyr)
file<-"EJ.XML"
doc <- xmlParse(file,useInternalNodes = TRUE)
xL <- xmlToList(doc)
data <- ldply(xL, data.frame)
datanew <- read.table(data, header = FALSE, fill = TRUE)
## Method 2. I tried to convert it to a list and the file extracts but only lists 2 words on the file.
data<- xmlParse("EJ.XML")
print(data)
head(data)
xml_data<- xmlToList(data)
class(data)
topxml <- xmlRoot(data)
topxml <- xmlSApply(topxml,function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml),
row.names=NULL)
write.csv(xml_df, file = "MyData.csv",row.names=FALSE)
I am going to do some research on how to search within R as well, but I assume the file needs to be in a data frame or list to so either way. Any help is appreciated! Attached is a screen shot of the data. I am interested in finding matching entity id numbers to a list I have in a excel doc.

What is the best way to import spss file in R with value labels?

I have a spss file which contents variables and value labels. I saw foreign package with read.spss function:
data <- read.spss("2017.sav", to.data.frame = TRUE, use.value.labels = TRUE)
If i use use.value.labels = TRUE, all string change to factor variables and i dont want it because they are not factor all.
I found one solution but i dont know if it is the best way to do it
1º First read spss file with previous sentence
2º select which variables are not factor and change it to string with:
cols <- c("x", "ab")
data[cols] <- lapply(data[cols], as.character)
if i dont use use.value.labels = TRUE i will have not value labels and i cannot export file correctly
You can also use the memisc package:
sav <- spss.system.file("file.sav")
df <- as.data.set(sav)
My company regularly deals with SAV files and we extract out the metadata separately. With the foreign package, you can get the metadata out in a few different ways (after you have loaded the file in):
data.label.table <- attr(sav, "label.table")
missings <- attr(sav, "missings")
The other bits require various lapply and sapply functions to get them out. The script I have is quite long, so I will not share it here. If you read the data in with read.spss(sav, to.data.frame = TRUE) you can get:
VariableLabels <- unname(attr(sav, "variable.labels"))
I dont know why, but I can’t install a "foreign" package.
Here is what I did instead to import a dataset from SPSS to R (through Excel):
Open your data in SPSS.
Export dataset from SPSS to Excel, but make sure to choose the "Save
value labels where defined instead of data values" option at the
very bottom.
Open R.
Import dataset from Excel.
Now, you have a dataset in R with value labels.
Use the haven package:
library(haven)
data <- read_sav("2017.sav")
The labels are shown in the RStudio viewer.

Imported a csv-dataset to R but the values becomes factors

I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:
setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP
However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.
Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.
By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.
It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with
point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))
But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)
You can set this globally for all read.csv/read.* commands with
options(stringsAsFactors=F)
Then read the file as follows:
my.tab <- read.table( "filename.csv", as.is=T )
When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").
The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:
stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")
This should import all variables as either integers or numeric.
None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric
or you can specify which columns to convert:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric
Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)
I'm new to R as well and faced the exact same problem. But then I looked at my data and noticed that it is being caused due to the fact that my csv file was using a comma separator (,) in all numeric columns (Ex: 1,233,444.56 instead of 1233444.56).
I removed the comma separator in my csv file and then reloaded into R. My data frame now recognises all columns as numbers.
I'm sure there's a way to handle this within the read.csv function itself.
This only worked right for me when including strip.white = TRUE in the read.csv command.
(I found the solution here.)
for me the solution was to include skip = 0
(number of rows to skip at the top of the file. Can be set >0)
mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",", skip = 22)

Resources