How to import an external dataset into in a Moodle question? - r-exams

I would like to import an external dataset using read.table() (or any other function for reading files) and then randomize or sample over it. The file is stored in a subfolder within the parent folder that contains the exercises *.rmd. I am working within a RStudio project. I tried placing the dataset in different levels of the folder structure. Using relative path did not work, but absolute paths did.
My folder structure is:
$home/project_name/exercises # It contains the RMD files
$home/project_name/exercises/data # It contains data files that I want to process
$home/project_name/datasets # this folder could eventually contain the dataset I want to process
To make this code more portable, I would like to know o the manage relative paths within *.Rmd for the knitting process.

The exercises are copied to a temporary directory and processed there. Hence, the easiest option is to copy these files to the temporary directory using include_supplement("file.csv"). By default this assumes that the file.csv is in the same directory that the exercise itself resides in. If it is in a subdirectory you can use include_supplement("file.csv", recursive = TRUE) and then subdirectories are searched recursively for file.csv.
After using include_supplement(), the copied file is available locally and can be processed with read.table() or also included in the exercise as a supplementary file. See http://www.R-exams.org/templates/Rlogo/ for a worked example. However, note that the Rlogo template explicitly specifies the directory from which the file should be copied. This is not necessary if that directory is the same as the exercise or a subdirectory.

Related

Is there a way to reference files in a folder within the working directory in R?

I have already finished with my RMarkdown and I'm trying to clean up the workspace a little bit. This isn't exactly a necessary thing but more of an organizational practice which I'm not even sure if it's a good practice, so that I can keep the data separate from some scripts and other R and git related files.
I have a bunch of .csv files for data that I used. Previously they were on (for example)
C:/Users/Documents/Project
which is what I set as my working directory. But now I want them in
C:/Users/Document/Project/Data
The problem is that this only breaks the following code because they are not in the wd.
#create one big dataframe by unioning all the data
bigfile <- vroom(list.files(pattern = "*.csv"))
I've tried adding a full path to list.files() to where the csvs are but no luck.
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv"))
Error: 'data1.csv' does not exist in current working directory ('C:/Users/Documents/Project').
Is there a way to only access the /Data folder once for creating my dataframe with vroom() instead of changing the working directory multiple times?
You can list files including those in all subdirectories (Data in particular) using list.files(pattern = "*.csv", recursive = TRUE)
Best practices
Have one directory of raw and only raw data (the stuff you measured)
Have another directory of external data (e.g. reference data bases). This is something you do can remove afterwards and redownload if required.
Have another directory for the source code
Put only the source code directory under version control plus one other file containing check sums of the raw and external data to proof integrity
Every other thing must be reproducible using raw data and the source code. This can be removed after the project. Maybe you want to keep small result files (e.g. tables) which take long time to reproduce.
You can list the files and capture the full filepath name right?
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv", full.names = T))
and that should read the file in the directory without reference to your wd
Try one of these:
# list all csv files within Data within current directory
Sys.glob("Data/*.csv")
# list all csv files within immediate subdirectories of current directory
Sys.glob("*/*.csv")
If you only have csv files then these would also work but seem less desirable. Might be useful though if you quickly want to review what files and directories are there. (I would be very careful not to use the second one within statements to delete files since if you are not in the directory you think it is in then you can wind up deleting files you did not intend to delete. The first one might too but is a bit safer since it would only lead to deleting wrong files if the directory you are in does have a Data subdirectory.)
# list all files & directories within Data within current directory
Sys.glob("Data/*")
# list all files & directories within immediate subdirectories of current directory
Sys.glob("*/*")
If the subfolder always has the same name (or the same number of characters), you should be able to do it thanks to substring. In your example, "Data" has 4 characters (5 with the /), so the following code should do:
Repository <- substring(getwd(), 1, nchar(getwd())-5)

How can I avoid hardcoding a file path?

I am using RStudio to knit an .Rnw file to .pdf. This .Rnw file is stored in directory that is under git version control. This directory also contains a .RProj file for the project.
I collaborate with colleagues who don't know the first thing about .Rnw files and git. These colleagues want to open a Word file and track change their hearts out. So I give the people what they want.
Everyone needs access, so storing the Word file on a cloud service like Box makes sense. In the past I created a subfolder in my repo that I shared—keeping everything within the root directory—but this time around I needed to store the file in a shared folder that someone else created. So my solution was to copy the Word file from this shared directory to my repository.
Technical Approach
I don't know how to make this a reproducible problem, but hopefully you will give me some latitude since I'm trying to make my work fully reproducible ;)
Let's say that my .Rnw file is stored in repoRoot/subfolder. Since knitr changes the working directory to subfolder where this .Rnw file is located, the first chunk sets the root.dir one level up at the project root.
<<knitr, include=FALSE>>=
library(knitr)
opts_knit$set(root.dir=normalizePath('../')) # go up 1 level
#
The next chunk copies the Word file from the shared folder to my git repo and runs the analysis file. The shared directory path is hard coded to my machine, which is the problem I'm writing for your help solving.
file.copy(from='/Users/ericpgreen/Box Sync/Project/Paper/draft.docx',
to='subfolder/draft.docx', # my repo
overwrite=TRUE)
source(scripts/analysis.R) # generates objects we reference in the .docx file
After adding \begin{document}, I include a chunk where I convert the .docx file to .txt and then rename it to .Rnw.
# convert docx to txt
system("textutil -convert txt 'subfolder/draft.docx'")
# rename txt to .Rnw
file.rename('subfolder/draft.txt',
'subfolder/draft.Rnw')
The next child chunk calls this .Rnw file that contains the text of the Word file with references to R objects included through \Sexpr{}:
<<include-draft, child='draft.Rnw', include=FALSE>>=
#
This works just fine for me. Whenever I knit the .Rnw file it grabs the latest version of the .docx file that my colleagues have edited (complete with track changes and comments) and, in a later step not shown here, returns the .pdf file to the shared folder.
Problem to Solve
This setup meets almost every need for me, except that the initial file.copy() command is hard coded to my machine. So if someone in my group clones my repo (e.g., research assistants who DO use version control), it won't run out of the box. Is there a workaround to hard coding in this type of case?
Ultimately you won’t get around hard-coding paths that are outside your control, such as paths to network shares. What you can and should avoid is hard-coding these paths in your documents.
Instead, relegate them to configuration files and/or environment variables (which, again, will be controlle by configuration files, to with .bashrc and similar). The simplest approach is then to use
network_share_path = Sys.getenv('NETWORK_SHARE_PATH',
stop('no network share path configured'))
file.copy(from = network_share_path, to = 'subfolder/draft.docx', overwrite = TRUE)

Import multiple csv files into R from zip folder

I know that this question has been asked exhaustively on this site, but I cannot find any question which addresses my problem.
I am trying to import multiple .csv files into R which are located in nested .zip files on my PC. The other questions seem to relate to importing a single file from a URL, which is not my issue.
I have set my working directory to the folder which contains the first .zip file, but there is another one inside of it, which then contains normal folders, and finally hundreds of .csv files which I am looking to access.
Up to now I have always manually extracted the data since I have no idea where to begin with unzipping code, but considering this folder contains around 20GB of data, I'm going to need to try something else.
Any help would be appreciated!
EDIT - CODE:
setwd("C:/docs/data/241115")
temp <- tempfile()
unzip("C:/docs/data/241115/Requested.zip",exdir=temp)
l = list.files(temp)
unzip("C:/docs/data/241115/Requested/Data Requested.zip",exdir=temp)
> error 1 in extracting from zip file
Without a minimal reproducible example it's difficult to know exactly where the problem lies. My best guess is that using a tempfile() is causing problems.
I would create a folder within your working directory to unzip the files to. You can do this from within R if you like:
# Create the folder 'temp' in your wd
dir.create("temp")
Now assuming your zip file is in the working directory I would unzip the first .zip in to temp in one step:
unzip("Requested.zip", exdir = "temp")
Finally, unzip the final .zip:
unzip("temp/Data Requested.zip", exdir = "temp")

Specifying folder paths in R

I have a folder structure which looks like the following. I have a file which needs to be processed in the Data folder. After processing I'll be writing 2 files to Results folder and the Log folder. (I use R in a windows environment).
Below is my R code for setting the folder paths for files:
setwd("E:/Assignment")
myData = read.csv("E:/Assignment/Data/accident_rates.csv")
myResultsFileLoc = "E:/Assignment/Results/accident_rates_accumilated.csv"
myLogFileLoc = "E:/Assignment/Results/accident_rates_accumilated_Log.csv"
write.table(results, myResultsFileLoc,...)
write.table(results, myLogFileLoc ,...)
As I have already set the working directory to "E:/Assignment", I don't want to repeat this part for the sub folders when setting the paths for other files. Is there anyway to instruct R the folder is a sub folder of the set working directory (without specifying the complete path)? I tried using the below methods but it gives me errors.
myResultsFileLoc = "/Results/accident_rates_accumilated.csv"
myResultsFileLoc = "~/Results/accident_rates_accumilated.csv"
I'm looking for this solution because my folder structure might expand and it would be bit difficult to maintain the folder path for each and every file in the sub folder.

Unix-merging multiple file types in multiple folders into one pdf

I have a parent folder with around 30 subfolders which each contain pdfs,.doc,.docx, and .jpg files. I need to combine all files into one large pdf. I want the order in which the files are appended into the 'master pdf' to reflect my current folder and file order (which is alphabetic for the subfolder names and numeric for the files within each subfolder).
I am fairly new to Unix and am a bit stuck on this....I would be most grateful for any advice you may have on how to approach this problem. Thank you.
There are three problems here:
Traverse the directory tree to find all documents
Convert each file into PDF
Merge the PDFs
For the first part you could use the find command to get the list of files or script the directory traversal.
For the second part you could use OpenOffice/LibreOffice command line driver to convert .doc and .docx files and ghostscript to convert .jpg files.
For the third part, probably ghostscript again.
Alternatively there are good PDF APIs available for some programming languages, such as iText from Lowagie for Java.

Resources