Trouble providing data sets with package

Trouble providing data sets with package - r

I have two data sets full and raw that I placed in the data/ directory of my package. However, when I load my package, they are not available. I tried looking for them using the data function, but did not see them.
data(raw, package = "pkg")
Warning message:
In data(raw, package = "pkg") : data set 'raw' not found
Do I have to export them somehow?
I noticed when I tried to open the file using load from another computer, it read in as a string. Maybe I'm not writing the data frame properly? I used:
save(df.full, file = "full.RData")
save(df.raw, file = "raw.RData")

Related

unable to open .dat files on R even with haven installed

So I use SGA tools for processing my images. It gives back results in .dat files. Now in order to work on this data in R, I tried to import the .dat file using the haven package. I installed haven and then its library, but I am not able to import data still and it gives this error message.
Error: Failed to parse C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat: This version of the file format is not supported.
When I use this command install.packages("haven"), haven is loaded, but then when I load library using library(haven) nothing appears on my console except for this
> library(haven)
Then when I use this code:
datatrial1 <- read_dta("C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat")
It gives me the error mentioned above. When I try converting my .dat file to a .csv file and load my data, the imported data adds additional "t" values before the values in columns except for the first one like this:
Flags: S - Colony spill or edge interference C - Low colony circularity
# row\tcol\tsize\tcircularity\tflags
1\t1\t4355\t0.9053\t
1\t2\t4456\t0.8401\t
1\t3\t3439\t0.8219\t
1\t4\t3215\t0.8707\t
All the t's before the numeric values are not what I want. Another issue that I am facing is I cannot install the gitter package on my R version which is R 4.2.2.

You can read your tab separated file like so `read.delim("file_path", header = TRUE, sep = "\t")

Altering internal data in R package

I'm working on a package (let's call it myPackage) that needs to refer to external data, which is too big to incorporate into the package itself (it's a lot of netCDF files).
Because of this I have an internal PATH variable (which I initiate in /data-raw/, which saves it to sysdata.rda, and I've turned off lazy loading). When asked to get data, any function can then use this PATH to find the data.
I want the user to be able to specify their path, so I wrote a function:
setpath<-function(path){
myPackage:::PATH = path
}
Doing setpath("C:/") doesn't work. I get the error: Error in setpath("C:/") : object 'myPackage' not found.
I've also tried the following alternative function:
setpath<-function(path){
PATH = path
}
This way, the variable myPackage:::PATH never changes.
How should I be doing this? Is internal data read-only?

You can use options(). Create an option with
options(myPackageRepositoryPath = "some/path")
and retrieve it
path <- getOption(myPackageRepositoryPath)
The same way as you set an option, you also can overwrite an option:
setpath<-function(path){
options(myPackageRepositoryPath = path)
}

Error when exporting R data frame using openxlsx ("Error in zipr")

Usually I'm using the openxlsx package and the write.xlsx function when exporting R data frames into .xlsx-files. Since yesterday - probably after I was using the package XLConnect - something got messed up and the write.xlsx function doesn't work anymore. This is the error I get:
Error in zipr(zipfile = tmpFile, include_directories = FALSE, files = list.files(path = tmpDir, :
unused argument (include_directories = FALSE)
Unfortunately, I don't understand what this error means. Thanks for any helpful advice.
Edit: The function works when I use an older openxlsx version (4.1.0).

I was getting the same error.
I think the problem is with dependencies of openxlsx. There is a "zipR" package that might be picked up when you install openxlsx, while the actual dependency is zip package:
https://cran.r-project.org/web/packages/zip/index.html
https://cran.r-project.org/web/packages/zipR/zipR.pdf
I installed "zip" along with openxlsx and I don't get the error anymore.

I do not really understand the error message here. My computer does not allow me to save files to "c:/". So, if remove "c:/" part, it works fine, to save the file to the current working directory.
library(openxlsx)
df <- data.frame('x' = c(1,2,3),
'y' = c(3,2,1))
openxlsx::write.xlsx(df, "test.xlsx")
You would also try another package: writexl
writexl::write_xlsx(df, "text5.xlsx")`
This works on my machine.

How does lazydata loading work in R package installation?

I want to expose data that is already published in my data/ directory of my R package skeleton. See this link for "External data" sharing basics: http://r-pkgs.had.co.nz/data.html.
My data is stored in .txt format. If you didn't want to load the data via lazy loading, which would just be loaded by loading the R package require(myRpackage) and then doing data(datasetName)... you can read the data in normally using many of the read.table(), read.csv2() functions in base R.
My dataset is called "publishedData.txt" in this example, and can be loaded as below, which works beautifully:
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
However, when I go to re-install my R package with this new shiny & wonderful data, I get the following fail message, over and over (see pasted below).
Downloading GitHub repo myGitRepo/myRpackage#master
from URL https://api.github.com/repos/myGitRepo/myRpackage/zipball/master
Installing myRpackage
library='/Library/Frameworks/R.framework/Versions/3.5/Resources/library' --
install-tests
* installing *source* package ‘myRpackage’ ...
** R
** data
*** moving datasets to lazyload DB
Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec, :
line 1 did not have 215 elements
ERROR: lazydata failed for package ‘myRpackage’
* removing
‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/myRpackage’
Installation failed: Command failed (1)
Note, the above Github repo isn't real. I'm writing a generic post, so don't try to install this fake R package yourself.
My question: How do I debug lazydata load, when I don't know how lazydata load is performed? i.e., what code decides if the data in my publishedData.txt in my data/ folder is "A-OK", versus "Not okay"? I know they are using scan(), yet it should know that sep="\t" in a .txt file, and other than that, I'm not sure what's tripping it up?
Things I've tried:
I've scrubbed my header names as best as I can (e.g., removing non-alphabetical characters from column or rownames strings).
I've also removed any other column besides the rownames column that has string data in it instead of numerical data, just in case stringsAsFactors is set to default of TRUE in lazydata loading (which would slow down things by a lot).
Also, I've restarted R after each re-install attempt...

Okay, so I figured out a way to get this to work, without having to actually understand what was tripping it up.
Say your dataset loads using read.table(), but doesn't reinstall with lazydata load as described above. Chances are, your headers / rownames are off. A quick solution is just to do this:
# Load your data into R the way it works
tmp = read.table("/dir/to/R/package/data/publishedData.txt", sep="\t", header=TRUE)
# Write data to same file with these arguments
write.table(tmp, file="/dir/to/R/package/data/publishedData.txt", sep="\t", row.names = TRUE, col.names = TRUE)
Then, update your Github repo with git, and then try to reinstall R package. It will work this time around! The difference in the .txt file was the header for the col.names - the first "column" does not have a label associated with the rownames. It just starts with the col.name for column 1 of your data matrix. Then, in row 2, the row name for row 2 starts, then all the data comes next. So technically, row 1 has 1 less element in it than row 2, if you were to parse this data using a different method.
Hope it helps someone else. :-)

Difficulty opening a package data file of unknown type

I am trying to load the state map from the maps package into an R object. I am hoping it is a SpatialPolygonsDataFrame or something I can turn into one after I have inspected it. However I am failing at the first step – getting it into an R object. I do not know the file type.
I first tried to assign the map() output to an R object directly:
st_m <- maps::map(database = "state")
draws the map, but str(st_m) appears to do nothing, unless it is redrawing the same map.
Then I tried loading it as a dataset: st_m <- data("stateMapEnv", package="maps") but this just returns a string:
> str(stateMapEnv)
chr "R_MAP_DATA_DIR"
I opened the maps directory win-library/3.4/maps/mapdata/ and found what I think is the map file, “state.L”.
I tried reading it with scan and got an error message I do not understand:
scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
scan() expected 'a real', got '#'
I then opened the file with Notepad++. It appears to be a binary or compressed file.
So I thought it might be an R data file with an unusual extension. But my attempt to load it returned a “bad magic number” error:
st_m <- load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
bad restore file magic number (file may be corrupted) -- no data loaded
Observing that these responses have progressed from the unhelpful through the incomprehensible to the occult, I thought it best to seek assistance from the wizards of stackoverflow.

This should be able to export the 'state' or any other maps dataset for you:
library(ggplot2)
state_dataset <- map_data("state")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Trouble providing data sets with package - r

Related

unable to open .dat files on R even with haven installed

Altering internal data in R package

Error when exporting R data frame using openxlsx ("Error in zipr")

How does lazydata loading work in R package installation?

Difficulty opening a package data file of unknown type

Categories

Resources