I am writing an R script that reads in a template .R file, a list of dates, and creates a bunch of folders corresponding to the dates and containing copes of the .R wherein text substitution has been performed in R to customize each script for the given date.
I'm stuck on the part where I write out the .R file though, because the formatting and/or character representation keeps getting screwed up.
Here's a minimal, reproducible example:
RMapsDemo <- readLines("https://raw.githubusercontent.com/hack-r/RMapsDemo/master/RMapsDemo.R")
RMapsDemo <- gsub("## File: RMapsDemo.R", "## File: RMapsDemo.R ####", RMapsDemo)
save(RMapsDemo, file = "RMapsDemo.R") # Doesn't work right
save(RMapsDemo, file = "RMapsDemo.R", ascii = T) # Doesn't work right
dput(RMapsDemo, file = "RMapsDemo.R") # Close, but no cigar
dput(RMapsDemo, file = "RMapsDemo.R", control = c("keepNA", "keepInteger")) # Close, but no cigar
Ricardo Saporta pointed out the solution in the comments -- use writeLines.
I feel stupid for not thinking of this myself. It works beautifully.
writeLines(RMapsDemo, con = "RMapsDemo.R")
Related
I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline
I have alot of different scripts in R that sources one another with source(). Im looking for a way to create an overview diagram, that links each script visually, so i can easily see the "source hierarchy" of my code.
The result could look something like:
I hope there is a solution, that doesnt require a software license.
Hope it makes sence! :)
I can suggest you use Knime. it has the kind of diagram you are looking for. It has some scripts already wrote to clean, visualize data and write output and has integration with R and Python.
https://docs.knime.com/?category=integrations&release=2019-12
https://www.knime.com/
Good luck.
For purposes of example change directory to an empty directory and run the code in the Note at the end to create some sample .R files.
In the first two lines of the code below we set the files variable to be a character vector containing the paths to the R files of interest. We also set st to the path to the main source file. Here it is a.R but it can be changed appropriately.
The code first inserts the line contained in variable insert at the beginning of each such file.
Then it instruments source using the trace command shown so that each time source is run a log record is produced. We then source the top level R file.
Finally we read in the log and use the igraph package to produce a tree of source files. (Any other package that can produce suitable graphics could be used instead.)
# change the next two lines of code appropriately.
# Settings shown are for the files generated in the Note at the end
# assuming they are in the current directory and no other R files are.
files <- Sys.glob("*.R")
st <- "a.R"
# inserts indicated line at top of each file unless already inserted
insert <- "this.file <- normalizePath(sys.frames()[[1]]$ofile)"
for(f in files)
inp <- readLines(f)
ok <- !any(grepl(insert, inp, fixed = TRUE)) # TRUE if insert not in f
if (ok) writeLines(c(insert, input), f)
}
# instrument source and run to produce log file
if (file.exists("log")) file.remove("log")
this.file <- "root"
trace(source, quote(cat("parent:", basename(this.file),
"file:", file, "\n", file = "log", append = TRUE)))
source(st) # assuming a.R is the top level program
untrace(source)
# read log and display graph
DF <- read.table("log")[c(2, 4)]
library(igraph)
g <- graph.data.frame(DF)
plot(g, layout = layout_as_tree(g))
For example, if we have the files generated in the Note at the end then the code above generates this diagram:
Note
cat('
source("b.R")
source("c.R")
', file = "a.R")
cat("\n", file = "b.R")
cat("\n", file = "C.R")
I am reading ISL at the moment which is related to machine learning in R
I really like how the book is laid out specifically where the authors reference code inline or libraries for example library(MASS).
Does anyone know if the same effect can be achieved using R Markdown i.e. making the MASS keyword above brown when i reference it in a paper? I want to color code columns in data frames when i talk about them in the R Markdown document. When you knit it as a HTML document it provides pretty good formatting but when i Knit it to MS Word it seems to just change the font type
Thanks
I've come up with a solution that I think might address your issue. Essentially, because inline source code gets the same style label as code chunks, any change you make to SourceCode will be applied to both chunks, which I don't think is what you want. Instead, there needs to be a way to target just the inline code, which doesn't seem to be possible from within rmarkdown. Instead, what I've opted to do is take the .docx file that is produced, convert it to a .zip file, and then modify the .xml file inside that has all the data. It applies a new style to the inline source code text, which can then be modified in your MS Word template. Here is the code:
format_inline_code = function(fpath) {
if (!tools::file_ext(fpath) == "docx") stop("File must be a .docx file...")
cur_dir = getwd()
.dir = dirname(fpath)
setwd(.dir)
out = gsub("docx$", "zip", fpath)
# Convert to zip file
file.rename(fpath, out)
# Extract files
unzip(out, exdir=".")
# Read in document.xml
xml = readr::read_lines("word/document.xml")
# Replace styling
# VerbatimChar didn't appear to the style that was applied in Word, nor was
# it present to be styled. VerbatimStringTok was though.
xml = sapply(xml, function(line) gsub("VerbatimChar", "VerbatimStringTok", line))
# Save document.xml
readr::write_lines(xml, "word/document.xml")
# Zip files
.files = c("_rels", "docProps", "word", "[Content_Types].xml")
zip(zipfile=out, files=.files)
# Convert to docx
file.rename(out, fpath)
# Remove the folders extracted from zip
sapply(.files, unlink, recursive=TRUE)
setwd(cur_dir)
}
The style that you'll want to modify in you MS Word template is VerbatimStringTok. Hope that helps!
I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.
For example, here are 3 paths to those files:
G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx
G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx
G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx
I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.
Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:
require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))
for(i in 1:length(wd.seq)) {
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}
`
Thanks for reading and suggestions, if any.
Mathieu
I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example
fileList <- list.files(""G://User//Documents//daily_data//")
And then load the xlsx files looping through the list with a for loop
for(i in fileList) {
loadWorkbook(file = i)
}
I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)
I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"
fileListClean <- fileList[grepl("Op_Schedule_",fileList)]
or perhaps only selecting .xlsx files in the directory:
fileListClean <- fileList[grepl(".xlsx",fileList)]
Edit to fit your reply:
Since you need to fit it to a sequence, you can do it as you did earlier:
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")
And then use grepl to only pick files starting with that extensions:
fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]
Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns
I have some data from and I am trying to load it into R. It is in .csv files and I can view the data in both Excel and OpenOffice. (If you are curious, it is the 2011 poll results data from Elections Canada data available here).
The data is coded in an unusual manner. A typical line is:
12002,Central Nova","Nova-Centre"," 1","River John",N,N,"",1,299,"Chisholm","","Matthew","Green Party","Parti Vert",N,N,11
There is a " on the end of the Central-Nova but not at the beginning. So in order to read in the data, I suppressed the quotes, which worked fine for the first few files. ie.
test<-read.csv("pollresults_resultatsbureau11001.csv",header = TRUE,sep=",",fileEncoding="latin1",as.is=TRUE,quote="")
Now here is the problem: in another file (eg. pollresults_resultatsbureau12002.csv), there is a line of data like this:
12002,Central Nova","Nova-Centre"," 6-1","Pictou, Subd. A",N,N,"",0,168,"Parker","","David K.","NDP-New Democratic Party","NPD-Nouveau Parti democratique",N,N,28
Because I need to suppress the quotes, the entry "Pictou, Subd. A" makes R wants to split this into 2 variables. The data can't be read in since it wants to add a column half way through constructing the dataframe.
Excel and OpenOffice both can open these files no problem. Somehow, Excel and OpenOffice know that quotation marks only matter if they are at the beginning of a variable entry.
Do you know what option I need to enable on R to get this data in? I have >300 files that I need to load (each with ~1000 rows each) so a manual fix is not an option...
I have looked all over the place for a solution but can't find one.
Building on my comments, here is a solution that would read all the CSV files into a single list.
# Deal with French properly
options(encoding="latin1")
# Set your working directory to where you have
# unzipped all of your 308 CSV files
setwd("path/to/unzipped/files")
# Get the file names
temp <- list.files()
# Extract the 5-digit code which we can use as names
Codes <- gsub("pollresults_resultatsbureau|.csv", "", temp)
# Read all the files into a single list named "pollResults"
pollResults <- lapply(seq_along(temp), function(x) {
T0 <- readLines(temp[x])
T0[-1] <- gsub('^(.{6})(.*)$', '\\1\\"\\2', T0[-1])
final <- read.csv(text = T0, header = TRUE)
final
})
names(pollResults) <- Codes
You can easily work with this list in different ways. If you wanted to just see the 90th data.frame you can access it by using pollResults[[90]] or by using pollResults[["24058"]] (in other words, either by index number or by district number).
Having the data in this format means you can also do a lot of other convenient things. For instance, if you wanted to fix all 308 of the CSVs in one go, you can use the following code, which will create new CSVs with the file name prefixed with "Corrected_".
invisible(lapply(seq_along(pollResults), function(x) {
NewFilename <- paste("Corrected", temp[x], sep = "_")
write.csv(pollResults[[x]], file = NewFilename,
quote = TRUE, row.names = FALSE)
}))
Hope this helps!
This answer is mainly to #AnandaMahto (see comments to the original question).
First, it helps to set some options globally because of the french accents in the data:
options(encoding="latin1")
Next, read in the data verbatim using readLines():
temp <- readLines("pollresults_resultatsbureau13001.csv")
Following this, simply replace the first comma in each line of data with a comma+quotation. This works because the first field is always 5 characters long. Note that it leaves the header untouched.
temp[-1] <- gsub('^(.{6})(.*)$', '\\1\\"\\2', temp[-1])
Penultimately, write over the original file.
fileConn<-file("pollresults_resultatsbureau13001.csv")
writeLines(temp,fileConn)
close(fileConn)
Finally, simply read the data back into R:
data<-read.csv(file="pollresults_resultatsbureau13001.csv",header = TRUE,sep=",")
There is probably a more parsimonious way to do this (and one that can be iterated more easily) but this process made sense to me.