How to read a .sav SPSS file in in R? - r

I've tried read.spps(), but I get an encoding error:
library(foreign)
read.spss('persona.sav')
#>re-encoding from CP1252
Error in iconv(names(rval), cp, "") :
unsupported conversion from 'CP1252' to ''
In addition: Warning message:
In read.spss("persona.sav") :
persona.sav: Unrecognized record type 7, subtype 18 encountered in system file

Try re-encoding it as a utf-8 file:
library(foreign)
read.spss('persona.sav', reencode='utf-8')

You can try adding 'to.data.frame = TRUE' into read.spss()
For instance:
df <- read.spss("data.sav", to.data.frame = TRUE)

Related

ShortRead package error in function for writing large files in fastq format

Hello
I am practicing ShortRead package for analyzing fastq file format. A text book I am reading, write a function for writing large fastq files as below:
trim.file <- function (f1,
destination=sprintf("%s_filtered.fastq", fq)){
stream <- open(FastqStreamer(f1))
on.exit(close(stream))
repeat {
fq <- yield(stream)
if (length(fq) == 0){break}
fq <- fq[nFilter()(fq)]
fq <- trimTails(fq, 5, "A", successive = T)
fq <- fq[width(fq) > 80]
writeFastq(fq, destination, "a", compress = F)
}
}
trim.file(FastqFiles [1])
The FastqFiles [1] is the path/filename of the original fastq file that is created with this code FastqFiles <- list.files(path=fasrQDir, pattern = "*.fastq", full.names = T) and fasrQDir is the directory of my original fastq file.
When I run this code I encounter this error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'file' in selecting a method for function 'writeFastq': no method for coercing this S4 class to a vector
In addition: Warning message:
In open.connection(con$con) :
Please let me know what is the solution.
Best wishes

Error in h(simpleError(msg, call)) with 'as.matrix': cannot open the connection

This is my first question here. For now I'm learning how to use R in R-studio, and when I tried to read the data in a matrix form, the program showed the mistake. I tried this code:
ModelName = 'new_file' #I'm writing the file name in the same directory as the .r file is
FileName = paste(ModelName, '.txt', sep = '') #as far as I understand, I'm telling the program that
the file is in the form of txt
### Read Time series
d = as.matrix(read.table(FileName, header= T))
And then the program writes this:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix':
cannot open the connection
And I don't understand why it's not working.
The file for analysis is in the txt form, the example of data is below:
decy Temp CTD_S OxFix Pro Syn Piceu Naneu
2011.74221 27.60333 36.20700 27.26667 58638.33333 13107.00000 799.66667 117.66667
2011.74401 26.97950 36.13400 27.05000 71392.50000 13228.50000 1149.00000 116.50000
2011.74617 24.99750 35.34450 24.80000 264292.00000 27514.00000 2434.50000 132.50000
2011.74692 24.78400 35.25800 25.82500 208996.50000 39284.00000 3761.75000 220.75000
My r-studio version is 4.2.0.
I would be very grateful for explanation.

Error in file(file, "rt") : invalid 'description' argument when running R script

I am trying to reproduce this protocol for DNA sequencing data analysis. It requires running this bash script that links to an R script. However, I am getting this error (see bottom) that I cant seem to solve.
#!/bin/bash
Project_dir=~/base
cd /${Project_dir}
SCRTP=~/scRepliseq-Pipeline
OUTNAME="bam/G1_F121_A1.adapter_filtered2"
genome_name="mm10"
bamfile=${OUTNAME}.${genome_name}.clean_srt_markdup.bam
rscript=${SCRTP}/util/Step3_R-Aneu-Fragment-bins.R
out_dir="Aneu_analysis"
Name=‘$bamfile’
Name=${name%.adapter_filtered2.${genome_name}.clean_srt_markdup.bam}
blacklist=~/blacklist/mm10-blacklist-v1_id.bed
genome_file=~/reference/UCSC_hg19_female.fa.fai
mkdir -p ${out_dir}
Rscript --vanilla $rscript ${bamfile} ${out_dir} ${name} ${blacklist} ${genome_file}
it links to this R script
args = commandArgs(TRUE)
bamfile=args[1]
out_dir=args[2]
name=args[3]
blacklist=args[4]
genome_file=args[5]
options(scipen=100)
##Extension of file name##
ext="_mapq10_blacklist_fragment.Rdata"
ext2="_mapq10_blacklist_bin.Rdata"
library(AneuFinder)
##loading black list and genome Info##
genome_tmp <- read.table(genome_file,sep="\t") #UCSC_mm9.woYwR.fa.fai
genome=data.frame(UCSC_seqlevel=genome_tmp$V1,UCSC_seqlength=genome_tmp$V2)
chromosomes=as.character(genome$UCSC_seqlevel)
##setup output directories##
out_dir_f=paste0(out_dir,"/fragment")
out_dir_b=paste0(out_dir,"/bins")
dir.create(out_dir,showWarnings = FALSE)
dir.create(out_dir_f,showWarnings = FALSE)
dir.create(out_dir_b,showWarnings = FALSE)
##save the fragment file (>10 MAPQ), filtering out the blacklist regions##
raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)
save(raw_reads,file = paste0(out_dir_f,"/",name,ext))
##save the bin data file ##
bins_reads=binReads(raw_reads,
assembly=genome,
chromosomes=chromosomes,
binsizes=c(40000,80000,100000,200000,500000))
rpm=1000000/length(raw_reads)
bins_reads[["rpm"]]=rpm
save(bins_reads,file=paste(out_dir_b,"/",name,ext2,sep=""))
It shows this error:
Error in file(file, "rt") : invalid 'description' argument
Calls: read.table -> file
Execution halted

Using read.arff() function in R and importing .arff files

I am trying to import this dataset of .arff type
file_location <- file.path("/Users","supreet","Downloads","Chronic_Kidney_Disease1/")
Chronic_Kidney_Disease <- read.arff(paste(file_location,"chronic_kidney_disease.arff",sep=""))
But it is throwing the following error
Error in file(arff_file, "rb") : cannot open the connection In
addition: Warning message: In file(arff_file, "rb") : cannot open
file
'/Users/supreet/Downloads/Chronic_Kidney_Disease1/chronic_kidney_disease.arff.arff':
No such file or directory
Also, if remove .arff extension as it is already appended :
file_location <- file.path("/Users","supreet","Downloads","Chronic_Kidney_Disease1/")
Chronic_Kidney_Disease <- read.arff(paste(file_location,"chronic_kidney_disease",sep=""))
I get this error:
Error: XML content does not seem to be XML:
'/Users/supreet/Downloads/Chronic_Kidney_Disease1/chronic_kidney_disease.xml'
In addition: Warning message: In matrix(unlist(strsplit(arff_data,
",", fixed = T)), ncol = num_attrs, : data length [10001] is not a
sub-multiple or multiple of the number of rows [401]
>

Error trying to read a PDF using readPDF from the tm package

(Windows 7 / R version 3.0.1)
Below the commands and the resulting error:
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp
\RtmpS8Uql1\pdfinfo167c2bc159f8': No such file or directory
How do I solve this issue?
EDIT I
(As suggested by Ben and described here)
I downloaded Xpdf copied the 32bit version to
C:\Program Files (x86)\xpdf32
and the 64bit version to
C:\Program Files\xpdf64
The environment variables pdfinfo and pdftotext are referring to the respective executables either 32bit (tested with R 32bit) or to 64bit (tested with R 64bit)
EDIT II
One very confusing observation is that starting from a fresh session (tm not loaded) the last command alone will produce the error:
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpKi5GnL
\pdfinfode8283c422f': No such file or directory
I don't understand this at all because the function variable is not defined by tm.readPDF yet. Below you'll find the function pdf refers to "naturally" and to what is returned by tm.readPDF:
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0674bd8c>
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0c3d7364>
Apparently there is no difference - then why use readPDF at all?
EDIT III
The pdf file is located here: C:\Users\Raffael\Documents
> getwd()
[1] "C:/Users/Raffael/Documents"
EDIT IV
First instruction in pdf() is a call to tm:::pdfinfo() - and there the error is caused within the first few lines:
> outfile <- tempfile("pdfinfo")
> on.exit(unlink(outfile))
> status <- system2("pdfinfo", shQuote(normalizePath("C:/Users/Raffael/Documents/17214.pdf")),
+ stdout = outfile)
> tags <- c("Title", "Subject", "Keywords", "Author", "Creator",
+ "Producer", "CreationDate", "ModDate", "Tagged", "Form",
+ "Pages", "Encrypted", "Page size", "File size", "Optimized",
+ "PDF version")
> re <- sprintf("^(%s)", paste(sprintf("%-16s", sprintf("%s:",
+ tags)), collapse = "|"))
> lines <- readLines(outfile, warn = FALSE)
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6\pdfinfo8d419174450': No such file or direc
Apparently tempfile() simply doesn't create a file.
> outfile <- tempfile("pdfinfo")
> outfile
[1] "C:\\Users\\Raffael\\AppData\\Local\\Temp\\RtmpquRYX6\\pdfinfo8d437bd65d9"
The folder C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6 exists and holds some files but none is named pdfinfo8d437bd65d9.
Intersting, on my machine after a fresh start pdf is a function to convert an image to a PDF:
getAnywhere(pdf)
A single object matching ‘pdf’ was found
It was found in the following places
package:grDevices
namespace:grDevices [etc.]
But back to the problem of reading in PDF files as text, fiddling with the PATH is a bit hit-and-miss (and annoying if you work across several different computers), so I think the simplest and safest method is to call pdf2text using system as Tony Breyal describes here.
In your case it would be (note the two sets of quotes):
system(paste('"C:/Program Files/xpdf64/pdftotext.exe"',
'"C:/Users/Raffael/Documents/17214.pdf"'), wait=FALSE)
This could easily be extended with an *apply function or loop if you have many PDF files.

Resources