I'm testing out using a simple piece of code to create directories that don't exist for my monthly SAS analysis so I don't have to manually do the pre-prep so to speak:
options dlcreatedir;
libname newdir "C:\Folder\&YYYYMM.\Inputs";
However I've read that using that options command essentially leaves it on for all subsequent pieces of code and I am worried that it might perform some messy writes on my disk if I make a mistake and leave it on.
Is there a way to turn it off once I am done creating directories?
If you review the documentation you'll notice it lists two options, DLCREATEDIR and NODLCREATEDIR. Use the second to turn off the option.
option nodlcreatedir;
http://support.sas.com/documentation/cdl/en/lesysoptsref/64892/HTML/default/viewer.htm#n1pihdnfpj4b32n1t62lx0zdsmdn.htm
Another option is to use the DCREATE() function without changing the default options to create a folder explicitly rather than rely on the libname statement to create the folder.
data _null_;
folder = dcreate("Inputs", "C:\Folder\&YYYYMM.\");
run;
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002986745.htm
Related
Sir,i am a student, learning R,I have a question about how to store data in R, or how to retrieve data that has been erased.
Sir,
Using RStudio is not much different than using, say, Word or Notepad, but with some differences.
First the similarities:
If you do not save your Rscript or data, it might not be available after you restart RStudio or if you overwrite/erase your data.
The advantage of using R and Rstudio is that you can script how you load and manipulate your data, hence recreate the data. If you use a script and do not rely only on the console (interactive) part.
For the differences, Rstudio can be set to save your current workspace. This is were all data and variables loaded reside. To change the settings, go to "Tools" --> "Global options" and you should see the options as depicted below.
However, if you erase your data by overwriting with other values or using the command unset, the data is lost. Your only recourse is to retrace how it was loaded/modified, using either your script or going through the "history".
For saving data, see e.g. http://www.sthda.com/english/wiki/saving-data-into-r-data-format-rds-and-rdata. Note the difference between save and saveRDS where the former saves data with their variable names, whereas saveRDS saves the data without and must be loaded into a variable.
Background
I'm doing some data manipulation (joins, etc.) on a very large dataset in R, so I decided to use a local installation of Apache Spark and sparklyr to be able to use my dplyr code to manipulate it all. (I'm running Windows 10 Pro; R is 64-bit.) I've done the work needed, and now want to output the sparklyr table to a .csv file.
The Problem
Here's the code I'm using to output a .csv file to a folder on my hard drive:
spark_write_csv(d1, "C:/d1.csv")
When I navigate to the directory in question, though, I don't see a single csv file d1.csv. Instead I see a newly created folder called d1, and when I click inside it I see ~10 .csv files all beginning with "part". Here's a screenshot:
The folder also contains the same number of .csv.crc files, which I see from Googling are "used to store CRC code for a split file archive".
What's going on here? Is there a way to put these files back together, or to get spark_write_csv to output a single file like write.csv?
Edit
A user below suggested that this post may answer the question, and it nearly does, but it seems like the asker is looking for Scala code that does what I want, while I'm looking for R code that does what I want.
I had the exact same issue.
In simple terms, the partitions are done for computational efficiency. If you have partitions, multiple workers/executors can write the table on each partition. In contrast, if you only have one partition, the csv file can only be written by a single worker/executor, making the task much slower. The same principle applies not only for writing tables but also for parallel computations.
For more details on partitioning, you can check this link.
Suppose I want to save table as a single file with the path path/to/table.csv. I would do this as follows
table %>% sdf_repartition(partitions=1)
spark_write_csv(table, path/to/table.csv,...)
You can check full details of sdf_repartition in the official documentation.
Data will be divided into multiple partitions. When you save the dataframe to CSV, you will get file from each partition. Before calling spark_write_csv method you need to bring all the data to single partition to get single file.
You can use a method called as coalese to achieve this.
coalesce(df, 1)
I have a few R files that contain functions imported and used by several other R files. I import these functions with the source function. Naturally, the scope of a particular file might change over time, and recently I wanted to rename a file I had already sourced in many other places.
I'm using RStudio, and I have been unable to find a way to do this except for either manually updating each dependent file, or creating some external code to scan through the files.
Is there no way to do consistent renaming in RStudio? Alternatively, am I doing something wrong by using source to add functions?
You may or may not find this satisfactory. Create a parent script with the old name that sources the script with the new name.
Extending this, you could just create a general preamble script, called something like "preamble.R", that sources all general utility scripts you have. Such an approach is common (I believe) with TeX. Then you only have one place to update file names.
Situation
I wrote an R program which I split up into multiple R-files for the sake of keeping a good code structure.
There is a Main.R file which references all the other R-files with the 'source()' command, like this:
source(paste(getwd(), dirname1, 'otherfile1.R', sep="/"))
source(paste(getwd(), dirname3, 'otherfile2.R', sep="/"))
...
As you can see, the working directory needs to be set correctly in advance, otherwise, this could go wrong.
Now, if I want to share this R program with someone else, I have to pass all the R files and folders in relative order of each other for things to work. Hence my next question.
Question
Is there a way to replace all the 'source' commands with the actual R script code which it refers to? That way, I have a SINGLE R script file, which I can simply pass along without having to worry about setting the working directory.
I'm not looking for a solution which is an 'R package' (which by the way is one single directory, so I would lose my own directory structure). I simply wondering if there is an easy way to combine these self-referencing R files into one single file.
Thanks,
Ok I think you could use something like scaning all the files and then writting them again in the same new one. This can be done using readLines and sink:
sink("mynewRfile.R")
for(i in Nfiles){
current_file = readLines(filedir[i])
cat("\n\n#### Current file:",filedir[i],"\n\n")
cat(current_file, sep ="\n")
}
sink()
Here I have supposed all your file directories are in a vector filedir with length Nfiles, I guess you can adapt that
Given a function, how to save it to an R script (.R)?
Save works well with data, but apparently can not create .R data.
Copy pasting from the console to a new script file appears to introduce characters that cause errors.
Take a look at the dump function. That writes files that are R code that can be read back in with source or used in some other way.
I have to ask: why are you writing your functions in the console in the first place? Any number of editors support a "source" call, so you can update the function as you edit. Copy/pasting from the console will carry prompt characters along , if nothing else, so it's a bad idea to begin with.