Identifying a potential bug here. When calling writeRaster overwrite=TRUE, the new raster values remain unchanged. I originally wrote the wrong raster object, then corrected the code, and wrote a new raster to the same file name. The values in the attribute table of the written file are the same as the original, even though the raster object I am writing has the correct attributes when viewed in R.
Workaround was to give the new raster a different name (or manually delete the old).
R 3.0.0, Windows 7 64-b
Apologies to Brian, with whom I share our modeling workstation. This was my post.
Josh O'Brien- Looks like you were right, there was something locking the write-protection. I think ArcCatalog was locking it up.
This tool has performed as expected many times since this incident.
I found the same issue.
I confirm that if you have ArcMap open, the R function overwrite=TRUE doesn't work.
By the way, without any warning message.
Hope this help other R-user in managing raster files.
Related
We are creating a package on R whose main function is to geocode addresses in Belgium (= transform a "number-street-postcode" string into X-Y spatial coordinates). To do this, we need several data files, namely all buildings with their geographical coordinates in Belgium, as well as municipal boundary data containing geometries.
We face two problems in creating this package:
The files take up space: about 300-400Mb in total. This size is a problem, because we want to eventually put this package on CRAN. A solution we found on Stackoverflow is to create another package only for the data, and to host this package elsewhere. But there is then a second problem that arises (see next point).
Some of the files we use are files produced by public authorities. These files are publicly available for download and they are updated weekly. We have created a function that downloads the data if the data on the computer is more than one week old, and transforms it for the geocoding function (we have created a specific structure to optimize the processing). We are new to package creation, but from what we understand, it is not possible to update data every week if it is originally contained in the package (maybe we are wrong?). It would be possible to create a weekly update of the package, but this would be very tedious for us. Instead, we want the user to be able to update the data whenever he wants, and for the data to persist.
So we are wondering what is the best solution regarding this data issue for our package. In summary, here is what we want:
Find a user-friendly way to download the data and use it with the package.
That the user can update the data whenever he wants with a function of the package, and that this data persists on the computer.
We found an example that could work: it is the Rpostal package, which also relies on large external data (https://github.com/Datactuariat/Rpostal). The author found the solution to install the data outside the package, and to specify the directory where they are located each time a function is used. It is then necessary to define libpostal_path as argument in the functions, so that it works.
However, we wonder if there is a solution to store the files in a directory internal to R or to our package, so we don't have to define this directory in the functions. Would it be possible, for example, to download these files into the package directory, without the user having the choice, so that we can know their path in any case and without the user having to mention it?
Are we on the right track or do you think there is a better solution for us?
I have a large population survey dataset for a project and the first step is to make exclusions and have a final dataset for analyses. To organize my work, I must continue my work in a new file where I derive survey variables correctly. Is there a command used to continue work by saving all the previous data and code to the new file?
I don´t think I understand the problem you have. You can always create multiple .R files and split the code among them as you wish, and you can also arrange those files as you see fit in the file system (group them in the same folder with informative names and comments, etc...).
As for the data side of the problem, you can load your data into R, make any changes / filters needed, and then save it to another file with one of the billions of functions to write stuff to the disk: write.table() from base, fwrite() from data.table (which can be MUCH faster), etc...
I feel that my answer is way too obvious. When you say "project" you mean "something I have to get done" or the actual projects that you can create in rstudio. If it´s the first, then I think I have covered it. If it´s the second, I never got to use that feature so I am not going to be able to help :(
Maybe you can elaborate a bit more.
I have a question about R. I think I forgot to save one of the scripts I was working on and I'm trying to get it back somehow. The script involved commands to create plots.
If I use the command:
print(nameoftheplot)
I am able to print the plot. Does this mean that R still has the script somewhere in the working memory? And how can I get it back?
Thanks for your help!
With luck, your commands are saved in R’s history; you should immediately perform
savehistory('history.r')
This usually contains all the last commands you executed.
I am able to print the plot. Does this mean that R still has the script somewhere in the working memory?
Unfortunately, no. However, it still has the print object in memory, and you can dump that to retrieve some information:
dput(nameoftheplot)
Whether this is useful depends on how exactly the plot was created.
Apart from that, the following two things can give you information about the last state of your script:
ls()
will show you all the objects you defined in the global environment. You can look at their values for clues. In particular, if you defined functions, their code will be available in its entirety.
search()
will show you which packages your script loaded and attached.
From this you may be able to reconstruct large parts of your code.
I am useing stat transfer to convert a dataset from SAS file format to R-format. The file in SAS is ~ 489mb, when converted to Rdata its 520mb. Given that the file is a data frame with 4090222 x 11 "cell's", I suppose that the difference can be explained to some extent.
But when I open the converted dataset, and ask R to save it, the 530mb goes down to some 120mb, I really dont understand how and why this is happening. I suspect data is dropped (because the resize is so notable), but as far as I can see, this is not happening.
I have tried all.equal which returns TRUE. In fact everything I try, tells me that the datasets are indeed equal... But it does not add up?
Am I makeing some huge mistake?
EDIT: See Gregors point below, "problem" solved!
Just turning my comments into an answer:
R compresses data when it saves it as .RData, and actually does an impressive job of it as compared to other statistical programming languages, as demonstrated in this blog entry.
So the answer is no, you shouldn't be worried.
So I've been trying to read this particular .mat file into R. I don't know too much about matlab, but I know enough that the R.matlab package can only read uncompressed data into R, and to save it as uncompressed I need to save it as such in matlab by using
save new.mat -v6.
Okay, so I did that, but when I used readMat("new.mat") in R, it just got stuck loading that forever. I also tried using package hdf5 via:
> hdf5load("new.mat", load=FALSE)->g
Error in hdf5load("new.mat", load = FALSE) :
can't handle hdf type 201331051
I'm not sure what this problem could be, but if anyone wants to try to figure this out the file is located at http://dibernardo.tigem.it/MANTRA/MANTRA_online/Matlab_Code%26Data.html and is called inventory.mat (the first file).
Thanks for your help!
This particular file has one object, inventory, which is a struct object, with a lot of different things inside of it. Some are cell arrays, others are vectors of doubles or logicals, and a couple are matrices of doubles. It looks like R.matlab does not like cells arrays within structs, but I'm not sure what's causing issues for R to load this. For reasons like this, I'd generally recommend avoiding mapping structs in Matlab to objects in R. It is similar to a list, and this one can be transformed to a list, but it's not always a good idea.
I recommend creating a new file, one for each object, e.g. ids = inventory.instance_ids and save each object to either a separate .mat file, or save all of them, except for the inventory object, into 1 file. Even better is to go to text, e.g via csvwrite, so that you can see what's being created.
I realize that's going around use of a Matlab to R reader, but having things in a common, universal format is much more useful for reproducibility than to acquire a bunch of different readers for a proprietary format.
Alternatively, you can pass objects in memory via R.matlab, or this set of functions + the R/DCOM interface (on Windows).
Although this doesn't address how to use R.matlab, I've done a lot of transferring of data between R and Matlab, in both directions, and I find that it's best to avoid .mat files (and, similarly, .rdat files). I like to pass objects in memory, so that I can inspect them on each side, or via standard text files. Dealing with application specific file formats, especially those that change quite a bit and are inefficient (I'm looking at you MathWorks), is not a good use of time. I appreciate the folks who work on readers, but having a lot more control over the data structures used in the target language is very much worth the space overhead of using a simple output file format. In-memory data transfer is very nice because you can interface programs, but that may be a distraction if your only goal is to move data.
Have you run the examples in http://cran.r-project.org/web/packages/R.matlab/R.matlab.pdf on pages 22 to 24? That will test your ability to read from versions 4 and 5. I'm not sure that R cannot read compressed files. There is an Rcompresssion package in Omegahat.