unable to open .dat files on R even with haven installed - r

So I use SGA tools for processing my images. It gives back results in .dat files. Now in order to work on this data in R, I tried to import the .dat file using the haven package. I installed haven and then its library, but I am not able to import data still and it gives this error message.
Error: Failed to parse C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat: This version of the file format is not supported.
When I use this command install.packages("haven"), haven is loaded, but then when I load library using library(haven) nothing appears on my console except for this
> library(haven)
Then when I use this code:
datatrial1 <- read_dta("C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat")
It gives me the error mentioned above. When I try converting my .dat file to a .csv file and load my data, the imported data adds additional "t" values before the values in columns except for the first one like this:
Flags: S - Colony spill or edge interference C - Low colony circularity
# row\tcol\tsize\tcircularity\tflags
1\t1\t4355\t0.9053\t
1\t2\t4456\t0.8401\t
1\t3\t3439\t0.8219\t
1\t4\t3215\t0.8707\t
All the t's before the numeric values are not what I want. Another issue that I am facing is I cannot install the gitter package on my R version which is R 4.2.2.

You can read your tab separated file like so `read.delim("file_path", header = TRUE, sep = "\t")

Related

How does lazydata loading work in R package installation?

I want to expose data that is already published in my data/ directory of my R package skeleton. See this link for "External data" sharing basics: http://r-pkgs.had.co.nz/data.html.
My data is stored in .txt format. If you didn't want to load the data via lazy loading, which would just be loaded by loading the R package require(myRpackage) and then doing data(datasetName)... you can read the data in normally using many of the read.table(), read.csv2() functions in base R.
My dataset is called "publishedData.txt" in this example, and can be loaded as below, which works beautifully:
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
However, when I go to re-install my R package with this new shiny & wonderful data, I get the following fail message, over and over (see pasted below).
Downloading GitHub repo myGitRepo/myRpackage#master
from URL https://api.github.com/repos/myGitRepo/myRpackage/zipball/master
Installing myRpackage
library='/Library/Frameworks/R.framework/Versions/3.5/Resources/library' --
install-tests
* installing *source* package ‘myRpackage’ ...
** R
** data
*** moving datasets to lazyload DB
Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec, :
line 1 did not have 215 elements
ERROR: lazydata failed for package ‘myRpackage’
* removing
‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/myRpackage’
Installation failed: Command failed (1)
Note, the above Github repo isn't real. I'm writing a generic post, so don't try to install this fake R package yourself.
My question: How do I debug lazydata load, when I don't know how lazydata load is performed? i.e., what code decides if the data in my publishedData.txt in my data/ folder is "A-OK", versus "Not okay"? I know they are using scan(), yet it should know that sep="\t" in a .txt file, and other than that, I'm not sure what's tripping it up?
Things I've tried:
I've scrubbed my header names as best as I can (e.g., removing non-alphabetical characters from column or rownames strings).
I've also removed any other column besides the rownames column that has string data in it instead of numerical data, just in case stringsAsFactors is set to default of TRUE in lazydata loading (which would slow down things by a lot).
Also, I've restarted R after each re-install attempt...
Okay, so I figured out a way to get this to work, without having to actually understand what was tripping it up.
Say your dataset loads using read.table(), but doesn't reinstall with lazydata load as described above. Chances are, your headers / rownames are off. A quick solution is just to do this:
# Load your data into R the way it works
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
# Write data to same file with these arguments
write.table(tmp, file="/dir/to/R/package/data/publishedData.txt", sep="\t", row.names = TRUE, col.names = TRUE)
Then, update your Github repo with git, and then try to reinstall R package. It will work this time around! The difference in the .txt file was the header for the col.names - the first "column" does not have a label associated with the rownames. It just starts with the col.name for column 1 of your data matrix. Then, in row 2, the row name for row 2 starts, then all the data comes next. So technically, row 1 has 1 less element in it than row 2, if you were to parse this data using a different method.
Hope it helps someone else. :-)

dataset cannot be loaded although data() lists it

We have an R library which has the dataset plaq.sample in the file data/plaq.sample.Rdata. One of the examples of a library function uses it. The example works just fine on my laptop (R 3.5.1), my colleagues laptop (R 3.1.3 and 3.4.4) but it fails on Travis CI (R 3.5.1 as well). The output is the following:
> data(plaq.sample)
Warning in data(plaq.sample) : data set ‘plaq.sample’ not found
> plaq.boot <- bootstrap.analysis(plaq.sample, pl=TRUE)
Error in bootstrap.analysis(plaq.sample, pl = TRUE) :
object 'plaq.sample' not found
Execution halted
This is bewildering. I have called data() at the beginning of the example and the output clearly states that this dataset is available:
Data sets in package ‘hadron’:
correlatormatrix
plaq.sample Sample plaquette time series
pscor.sample
samplecf Sample cf data
We are lost and do not understand why the R CMD check . does work on our laptops but not on Travis CI. What could be the issue for not finding the dataset?
I read a warning about the hidden file .Rdata which was present in the checking. I have added .Rdata to the .Rbuildignore file to exclude it. However, this file interprets each line as a regular expression, therefore also including data/plaq.sample.Rdata. Removing this line makes the dataset available again.

how to import .rec files in R

I have a .rec file that I want to import into R. I have saved the .rec file to my working directory. This is what I have tried.
library(foreign)
library(RODBC)
data.test <- read.epiinfo("data_in.rec")
I get this error:
Error in if (headerlength <= 0L)
stop("file has zero or fewer variables: probably not an EpiInfo file") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1:
In readLines(file, 1L, ok = TRUE) :
line 1 appears to contain an embedded nul
2:
In strsplit(line, " ") : input string 1 is invalid in this locale
I have looked online and in the read.epiinfo help package in R. The help package says
Some later versions of Epi Info use the Microsoft Access file format
to store data. That may be readable with the RODBC package.
I have two questions.
1. Is the error I am getting because the .rec file I have is from an Epi Info version later than 6?
2. How do I use the RODBC library to open the .rec file?
The .rec (or .REC) file turned out to be a .EDF (European Data Format) file type. It was easily opened in R using the library edfReader. The edfReader library help file is very useful for opening the file and extracting the time series data. See code below for what I used. Code was adapted from the help file.
install.packages('edfReader')
library(edfReader)
?edfReader
lib.dir <- system.file("data_in.rec",package="edfReader")
Cfile <- paste(lib.dir,'/edfPlusC.edf',sep='')
CHdr <- readEdfHeader("data_in.rec")
CSignals <- readEdfSignals(CHdr)
summary(CSignals)

Trouble providing data sets with package

I have two data sets full and raw that I placed in the data/ directory of my package. However, when I load my package, they are not available. I tried looking for them using the data function, but did not see them.
data(raw, package = "pkg")
Warning message:
In data(raw, package = "pkg") : data set 'raw' not found
Do I have to export them somehow?
I noticed when I tried to open the file using load from another computer, it read in as a string. Maybe I'm not writing the data frame properly? I used:
save(df.full, file = "full.RData")
save(df.raw, file = "raw.RData")

Using SRTM tif file in R

I'm trying to import a SRTM dataset into R. I've downloaded the data in a tif file however am having trouble reading it in "R".
Ive tried using the following code:
t = readTIFF("srtm_56_06/srtm_56_06.tif", as.is=TRUE)
load('srtm_56_06/srtm_56_06.tif')
read_file<-as.matrix(raster("srtm_56_06/srtm_56_06.tif")
However I am still getting error messages:
load('srtm_56_06/srtm_56_06.tif')
# Error: bad restore file magic number (file may be corrupted) -- no data loaded
# In addition: Warning message:
# file ‘srtm_56_06.tif’ has magic number 'II*'
# Use of save versions prior to 2 is deprecated
library(raster)
t = readTIFF("srtm_56_06/srtm_56_06.tif", as.is=TRUE)
# Error: could not find function "readTIFF"
read_file<-as.matrix(raster("srtm_56_06/srtm_56_06.tif") + min(read_file)
# Error: unexpected symbol in:
# "read_file<-as.matrix(raster("srtm_56_06/srtm_56_06.tif")
# min"
Can anyone help me with the commands to import this data. I'm a novice at "R" and a little lost.
Just read it with raster, but note you depend on rgdal being installed as well to read a .tif.
library(raster)
library(rgdal)
r <- raster("srtm_56_06/srtm_56_06.tif")
If that works, try
plot(r)
r
If it's really a "TIFF" then that should be fine, if it's really a GeoTIFF then you'll have a sensible map as well. (If it's something else that GDAL can read you might get a good result anyway, remember the extension of a file is not a reliable indicator of its contents).
The SRTM clue suggests that this is a single band DEM file from the tiled global SRTM data set. If it's somehow a "multi-band image" then you could read that with brick and plot with plotRGB (but I really doubt that is the case here). Note that there is a native binary format for SRTM that raster/rgdal could read as well but either they distributed .tif as well or someone else converted it.
There are a number of misconceptions in your code:
load is for a particular file type created from R (not these .tifs)
readTIFF is not in package raster
read_file would be a sensible matrix, if you have rgdal installed (which raster must use to load a .tif), but why throw away the spatial metadata?

Resources