R Studio not recognizing files when compiling raw db - r

I have long-read RNA sequencing data from PacBio IsoSeq and trying to use IsoPops software (https://kellycochran.github.io/IsoPops/site_files/walkthrough.html) to generate graphs to interpret the data. I am new to R-studio so have limited proficiency. There is a walkthrough for the program which seems quite intuitive. I have successsfully installed the package on R Studio, and have opened the library and assigned variables to the files I have (which include a fasta file, gff file and abundance files). Once I've successfully assigned variables to the files, I'm supposed to generate a rawdb using the function compile_raw_db. However, when I do this function I get the error:
Error in system(command = paste("head -n1 ", filename, step = ""), intern = T) :
'head' not found
I've used the head -n1 function in linux command line, and can see that there definitely is a header and contents in the file. The error occurs for any file that I give to R studio, so my reasoning is that the programme is not recognizing the files.
Thought the files might have been empty, opened on Ubuntu and head -n2 function worked fine for each file, so I don't think it's an issue with the files.
My data was originally stored on an online shared network drive, moved files to local computer to eliminate any possibility for path errors. Still have the error message.
Have tried the same function with a number of different files, and I always have the same error message. So I think the issue is with how R studio is reading the files, and not the files themselves.
Would really appreciate some support with this, at the end of my PhD trying to analyse the last piece of my data and really struggling with this. Thanks in advance for any feedback!
> library(IsoPops)
Welcome to IsoPops version 0.3.1.
> transcript_AD3 <- "C:/R_Package_for_IsoSeq/IsoPops/IsoPops_Data/AD3-hq_transcripts.fasta"
> abundance_AD3 <- "C:/R_Package_for_IsoSeq/IsoPops/IsoPops_Data/AD3_Collapsed_Filtered_Isoform_Counts.abundance.txt"
> GFF_AD3 <- "C:/R_Package_for_IsoSeq/IsoPops/IsoPops_Data/AD3-collapse_isoforms.gff"
> rawDB <- compile_raw_db(transcript_AD3, abundance_AD3, GFF_AD3)
[1] "Loading sequences..."
Error in system(command = paste("head -n1 ", filename, step = ""), intern = T) :
'head' not found

Related

Reference GPSBabel in PATH variable to use in R

I have read this question and answer here Read multiple .gpx files on reading GPX files in R. At the end of the question, the question setter notes "To run readGPS you will need the open source GPSBabel program installed and referenced in your PATH variable."
I have installed GPSBabel, but am unsure how to link it to R so that the supplied code works.
library(maptools)
gpx.raw <- readGPS(i = "gpx", f = "Data/Waypoints_11-MAY-18.gpx", type="w")
Running the code at the head of the question (above) with my own filename returns the error
Error in readGPS(i = "gpx", f = "Data/Waypoints_11-MAY-18.gpx", type = "w") :
gpsbabel not found
I had this problem, and I downloaded and installed GPSBabel, copied of the path for where it installed, and then:
setwd("C:/Program Files (x86)/GPSBabel")
That took care of it for me, not sure if there is a simpler way to do this.

TreeTagger in R

I have downloaded TreeTaggerv3.2 for Windows and have configured it per the install.txt. I am trying to use it in R with koRpus package. I have set the kRp.env as -
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en",
preset="en", treetagger="manual", format="file",
TT.tknz=TRUE, encoding="UTF-8" )
.My data to be tagged is in a file and trying to use it as treetag("myfile.txt") but it is throwing the error-
Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\windows\system32\cmd.exe /c C:\TreeTagger\bin\tag-english.bat
C:\Users\vivsingh\Desktop\NLP\tree_tag_ex.txt' had status 255
The standalone TreeTagger is working on by windows.Any idea on how it works?
I had the exact same error and warning while trying lemmatization on R word vector following Bernhard Learns blog using windows 7 and R 3.4.1 (x64). The issue was also appearing using textstem package but TreeTagger was running properly in cmd window.
I mixed several answers I found on this post and here is my steps and code running properly:
get into R win_library (~\Documents\R\win-library\3.4\rJava\jri\x64\jri.dll) and copy jri.dll (thanks kravi!) to replace it the parent folder.
close and restart R
library(koRpus)
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", format="obj", TT.tknz=FALSE , lang="en", TT.options=list(path="c:/TreeTagger", preset="en"))
lemma_tagged_tbl <- tbl_df(lemma_tagged#TT.res)
Hope it helps.
I am posting this answer to keep a record. I also faced the same issue due to incorrect specification of the location of jri.dll on 64-Bit processor and windows 8.1. If we call
set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="/path/to/tree-tagger-windows-x.x/TreeTagger", preset="en")) and we follow either of following two steps, we can resolve this error:
While installing R, if we install only 64 Bit version of R, and
specify the proper path for these variables
LD_LIBRARY_PATH = /path/to/rJava/jri
JAVA_HOME = /path/to/jdk1.x.x
java.library.path = /path/to/rJava/jri/jri.dll
CLASSPATH = /path/to/rJava/jri
If we already installed both versions viz. 32 bit and 64 bit of R on your computer then just copy jri.dll from /path/to/rJava/jri/x64/jri.dll and replace at path/to/rJava/jri/jri.dll. Further, we need to set the path of above mentioned four variables.
I've got this issue (very similar I guess) and posted query to GitHub.
https://github.com/unDocUMeantIt/koRpus/issues/7
The current working solution for me for this case was easier than I could expect, just downgrading the koRpus package. This can change with time but this version should remain appropriate.
library("devtools")
install_github("unDocUMeantIt/koRpus", ref="0.06-5")
This package is not Java related they said.
You can face the same error while setting up the korpus environment and getting the result from treetagger. For example, when you use:
tagged.text <- treetag(
"C:/temp/sample_text.txt",
treetagger = "manual",
lang = "en",
TT.options = list(
path = "c:/Treetagger",
preset = "en"
),
doc_id = "sample"
)
You would receive a similar error
Error: Awww, this should not happen: TreeTagger didn't return any useful data.
This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary files that might be needed.
If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
If it does not fail but produce a table with proper results, please contact the author!
Here you need to change the value of treetagger, from
treetagger = "manual"
to
treetagger = "kRp.env"
However, before that remember to set the kRp.env as #Xochitl C. suggested in their answer
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
Once you do this, you'll get the desired result.

Error in R. Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :

I am encountering an error in R that I cannot seem to figure out. I am creating an R markdown document where I read in an a csv table using this code.
iati <- read.csv(file="/filepath/IATI_NGOS.csv",head=TRUE,sep=",")
and then using ggplot2 I create a plot using the following code.
figure_one <- ggplot(iati, aes(iati$reporting.org))+
geom_bar(fill="blue")+
ylab("Total Activities")+
xlab("NGO Reporting Organizations in IATI")+
ggtitle("Total Number of Activities compared to each NGO Reporting Organization in IATI")+
coord_flip()
When I try to call figure_one in the R markdown I get the following error:
Quitting from lines 44-55 (NGO_IATI.Rmd)
Error in gsub("(?<=\n)(?=.|\n)", continue, x, perl = TRUE) :
input string 1 is invalid UTF-8
Calls: <Anonymous> ... paste -> comment_out -> line_prompt -> paste -> gsub
In addition: Warning message:
In grep("\n", message) : input string 1 is invalid in this locale
Execution halted
When I run this code in a regular R script I have absolutely no issues. I have search for some answers but can't figure it out.
Thanks!
I ended solving my issue by just doing a fresh install of R and Rstudio on my local machine. I think the recent update to Yosemite on my local environment created a lot of issues with the TeX plugin I had installed for R markdown.
I get the same question when I knit my rmarkdown document and find **encoding is the cause.
When you use functions like read.csv, fread or read_csv, you will read the column name.
If column names are in other languages, like Chinese, the problem will easily happen.
Or you rmarkdown works on Windows, but the encoding bug happens on Mac, a different environment.
The temporal solution is to rename the column name in English and resave the data files.
Here is the pseudocode in R to show my idea.
library(data.table)
library(tidyverse)
fread('yourfile.csv',encoding = 'UTF-8') %>%
purrr::set_names(c('x1','x2','x3')) %>%
write_excel_csv('yourfile_2.csv')
Here the new file yourfile_2.csv is fine to rmarkdown knit without encoding problems happening.

Error with tex2docx function from reports R package

I'm trying to reproduce the example for tex2docx function in reports R package and getting the following error.
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex",
package = "reports")
BIB <- system.file("extdata/docs/example.bib", package = "reports")
tex2docx(DOC, file.path(getwd(), "test.docx"), path = NULL, bib.loc = BIB)
Error Message
pandoc.exe: Error reading bibliography `C:/Users/Muhammad'
citeproc: the format of the bibliographic database could not be recognized
using the file extension.
docx file generated!
Warning message:
running command 'C:\Users\MUHAMM~1\AppData\Local\Pandoc\pandoc.exe -s C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/doc_library/apa6.qual_tex/doc.tex -o C:/Users/Muhammad Yaseen/Documents/test.docx --bibliography=C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/docs/example.bib' had status 23
I wonder how to get tex2docx function in reports R package working properly.
As described in the above comments, the error is caused by passing a filename/path including some spaces that are nor escaped, nor quoted. A workaround could be wrapping all file paths and names inside of shQuote before passing to the command line with system.
Code: https://github.com/trinker/reports/pull/31
Demo:
Loading package
library(reports)
Creating a dummy dir with a space in the name that would hold the bib file
dir.create('foo bar')
file.copy(system.file("extdata/docs/example.bib", package = "reports"), 'foo bar/example.bib')
Specifying the source and the copied bib file:
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex", package = "reports")
BIB <- 'foo bar/example.bib'
Running the test:
tex2docx(DOC, file.path(getwd(), "test2.docx"), path = NULL, bib.loc = BIB)
Disclaimer: I tried to test this pull request, but I could not setup an environment with all the needed tools to run R CMD check with vignettes and everything else after all in 5 mins (sorry but being on vacation right now and just enjoying the siesta after lunch), so please consider this pull request as "untested" -- although it should work.

error in reading R file while submitting a r job to condor

I have a R Job that is submitted to the condor, The R file(one.R) which is submitted to the condor is reading another R file(two.R), when I submit the job to the condor its is failed and the reason for that is the submitted R(one.R) file is not reading the called R file(two.R)
Error in text file is :
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/pcname/Desktop/test_case/two.R': Permission denied
Execution halted
and my submit file is
#test_input.condor
#
executable = C:\R\R-2.10.1\bin\Rscript.exe
arguments = one.R
universe = vanilla
getenv = true
#requirements = ARCH == "INTEL" && OPSYS == "WINNT60"
input = one.R
should_transfer_files = yes
transfer_executable = false
when_to_transfer_output = ON_EXIT
transfer_input_files = C:/Users/OmegaAdmin/Desktop/test_case/two.R
log = test_input.log
output = test_input.out
error = test_input.err
queue
Appreciate any ideas on this.
Thanks,
This is not an R-related problem, but a problem of accessibility. The error message seems rather clear to me: the server has no reading rights for that file. Make sure you share the file or folder you want to read in. I don't know what the setup is of the network and clusters wherever you're located, but you better contact the admins to ask how you get your files to the right places.
Also make sure that if you transfer files to the servers/cluster, you adapt your R script so it points to the right directories. Which is probably not your own harddrive...
When you say
transfer_input_files = C:/Users/OmegaAdmin/Desktop/test_case/two.R
That tells Condor to copy two.R into the current working directory when the job starts. The current working directory is a specially created workspace, not (usually) a home directory. Thus I would expect the full path to look something like
C:/condor/execute/dir_28412/two.R
However, R is actually looking in
C:/Users/pcname/Desktop/test_case/two.R
Why is R looking there? Does one.R potentially say "Find two.R in $HOME/Desktop/test_case"? Does it perhaps say, "Look in Desktop/testcase/two.R" and R has configuration that wants to look relative to the user's home directory?
The solution is almost certainly to modify one.R or your R configuration to look for two.R in the current working directory. If for some reason R changes its current working directory, the environment variable _CONDOR_SCRATCH_DIR should contain it.
On a related note, you said:
arguments = one.R
input = one.R
The first is an argument passed to Rscript.exe, which I'm guessing tells R to load and run a file called one.R. Except that script isn't present! If you want that to work, you'll need to add it to transfer_input_files. But it obviously appears to work; why? Because "input=one.R" means "take the contents of one.R and pipe it in as standard input to Rscript.exe; the same as if you'd typed those contents in." I'm guessing you can remove the arguments, or remove the input and add one.R to your transfer_input_files, removing the ambiguity.

Resources