Let's say that we assign some variable:
variable_name <- runif(100)
letter <- "a"
listed <- list("a", "b", "C")
I would love to have a function assignment_code(object) that will output assignment code of these objects i.e.
>assignment_code(variable_name)
"variable_name <- runif(100)"
>assignment_code(letter)
"letter <- "a""
>assignment_code(listed)
"listed <- list("a", "b", "C")"
I tried to do it but I wasn't sure how it can be done. I tried to do some magic with ls() but I wasn't sure about proper algorithm of picking elements in ls(). Do you know how it can be done ?
In general, no it is not possible to find out from an object how it was created, for example the assignments x <- 1 and x <- 3-2 would leave x looking the same, with no clue as to which was used to create it.
Some possible solutions though are:
Accessing the history of R to see how variables were created by using the up arrow in Rgui, or the 'history' pane in Rstudio. This is also stored in a file called .rhistory.
Saving all your code as R scripts, so that you have a record of how each variable was created.
Saving your work as Rmarkdown (integrated well with Rstudio) where the code are resulting output can be combined.
Related
I have list data for which I used split:
x <- split(A, f = A$Col_1)
It works beautifully. But now I need to write each chunk of the split to an individual .csv. There are 2100 chunks of 140 rows each. Let's call them "1:2100". I would like to create something that wrote "1" to "~/full_path_name/A1.csv" then go to "2" and write to "~/full_path_name/A2.csv", then "3" to "~/full_path_name/A3.csv", etc.
I included "~/full_path_name/" because down the road this path name will change for other data using the same code, and for my own understanding I need to see it in the code. I don't know how to write a small sample of what I am asking for for someone to correct because I don't know how to write it at all.
Can someone make a suggestion on how to do this? Thank you.
I have only been coding for month and am entirely self-taught. I do not have a background in other coding programs. I have no one to ask for help but here. I struggle with the terminology, so please understand if I am not asking in the proper way and I will try to correct it if need be.
EDIT, AFTER DOING SOME FURTHER RESEARCH --
This is what I have found elsewhere on SO from #RichPaloo, and my adaptations below that:
#example data.frame
df <- data.frame(x = 1:4, y = c("a", "a", "b", "b"))
#split into a list by the y column
l <- split(df, df$y)
#the names of the list are the unique values of the y column
nam <- names(l)
#iterate over the list and the vector of list names and write csvs
for(i in 1:length(l)) {
write_csv(l[[i]], paste0(nam[i], ".csv"))
}
This is my version:
bcc4.5_WINTER <- split(bcc4.5_FinalWinterRO, f = bcc4.5_FinalWinterRO$HUC8)
nam <- names(bcc4.5_WINTER)
for(i in 1:length(bcc4.5_WINTER)) {
write_csv(bcc4.5_WINTER[[i]], paste0(“~/Rprojects/BCC_CSM1_1_RCP_45/Winter/”, nam[i], “.csv”))
}
I appear to have a problem with the folder within my home folder "/BCC_CSM1_1_RCP_45/Winter/” It says "unexpected token" at both ends, but not at the "~Rprojects". Can I not send something to a folder within my home folder?
It also shows redlines under the quotes around ".csv" near the end. I don't know what to make of this because it's exactly what the person used successfully, apparently, in another post. Thank you.
So, the code example above (#Paul) worked except the df[l] was not being iterated, so I removed the _i from each l instance. The final problem I had (in comments above) was because the path name was not complete.
I used fwrite() rather than write.csv because it gave me better feedback as I struggled with mistakes. This gave me what I needed:
#split file into chunks by names within a row, in this case row "BBB"
df <- split(old_df, f = old_df$BBB)
#write those chunks to individual .csv files with the name being the name of each chunk
save_fun <- function(df, name_i) {
fwrite(df, file = paste0("~/Desktop/projects_folder/", name_i, ".csv"))
}
#save the file on your computer
mapply(FUN = save_fun, df, name_i = names(df), SIMPLIFY = FALSE)
Much thanks to Paul.
Investigating the potential typo problem
Please see the two lines below:
write.csv(l[[1]], file = paste0("./a_folder/", names(l)[1], ".csv"))
write.csv(l[[1]], file = paste0(“./a_folder/”, names(l)[1], “csv”))
Line 1 will save the file. Note that "./a_folder/" and ".csv" are seen as text.
Line 2 “./a_folder/” and “.csv” are not recognized as text. Line 2 produces an error: unexpected input in " write.csv(l[[1]], file = paste0(“"
RStudio colors your code to help you with this problem.
Thoughts about not using a for loop.
I think one better way to go (especialy when you have large dataset) is by using lapply or mapply. What these functions do is take each "chunk" of a list and apply a function to it.
As lapply loses the name of each chunk while processing it. It can be annoying when you want to use the name of the chunk to name the file on your computer. mapply() comes handy to deal with this situation.
Here is an example using the provided example.
# example data.frame
df <- data.frame(x = 1:4, y = c("a", "a", "b", "b"))
# split df
l <- split(df, df$y)
# save each "chunk" of l as a .csv file on a hard drive
# 1st, create a function that takes a "chunk" of your list and its name as inputs
save_fun <- function(l_i, name_i) {
print(l_i) # print the output in console
write.csv(l_i, file = paste0("./a_folder/", name_i, ".csv")) # save the file on your computer
}
# 2nd, use mapply (and not a list) to use the previous function on each pair chunk/name
mapply(FUN = save_fun, l_i = l, name_i = names(l), SIMPLIFY = FALSE) # see ?mapply for how to use mapply()
I'm trying to generate multiple reports automatically with R markdown. I have a MS Word file that I import to R with officer library. In this MS Word file I want to substitute the word alpha with A, B, and C names that I defined in a VarNames vector. I then want to generate report for each of the VarNames in a separate MS Word file. I try to use the code bellow:
library(officer)
library(magrittr)
library(rmarkdown)
my_doc <- read_docx('Word_file.docx')
varNames <- c("A", "B", "C")
for (i in 1:length(varNames)) {
doc2 <- body_replace_all_text(my_doc, old_value = "alpha",new_value = varNames[i], only_at_cursor=FALSE,ignore.case =FALSE);
}
doc2 <- cursor_backward(doc2)
docx_show_chunk(doc2)
my_doc2 <- print(doc2, target ="/Users/majerus/Desktop/R/auto_reporting/my_doc2.docx")
But the code only generates one report for the varname A. Could you please help me figuring out what is wrong with the code? Even if I can generate the report in .pdf or .html formats would be fine. Thanks!
Ok, so I think the best solution would be this:
# Remember to set your working directory with setwd() accordingly - all reads
# and writes will be in that dir - or specify the path to the file every time,
# if you prefer it that way.
# setwd("xyz")
library(officer)
# I think the pipes %>% are very useful, ESPECIALLY with officer, so:
library(dplyr)
# To make it a fully reproductive example, let's create a Word file
# with only "alpha" text in it.
read_docx() %>%
body_add_par("alpha") %>%
print("Word_file.docx")
# now, let's create the vector for the loop to take in
varNames <- c("A", "B", "C")
### Whole docx creation script should be inside the for loop
for (i in 1:length(varNames)) {
#firsty, read the file
read_docx("Word_file.docx") %>%
#then, replace the text according to varNames
body_replace_all_text(old_value = "alpha",
new_value = varNames[i],
only_at_cursor=FALSE,
ignore.case =FALSE) %>%
# then, print the outputs. Output name should be generated dynamically:
# every report (for every i) need to have a different name.
print(target = paste0("Output_",i,".docx"))
}
# After running your script, in your working directory should be 4 files:
# Word_file.docx "alpha"
# Output_1.docx "A"
# Output_2.docx "B"
# Output_3.docx "C"
Your whole bit with cursor_backward() and docx_show_chunk() seems to be pointless. From my experience - it's best not to use the cursor functionality too much.
Best practice may be to specify in the template the specific places to replace the text (as in your example and my solution) - or just build the whole document dynamically in R (you can firstly load an empty template with predefined styles if you want to use custom ones).
See below for my reprex of my issues with source, <-, <<-, environments, etc.
There's 3 files, testrun.R, which calls inputs.R and CODE.R.
# testrun.R (file 1)
today <<- "abcdef"
source("inputs.R")
for (DC in c("a", "b")) {
usedlater_3 <- paste("X", DC, used_later2)
print(usedlater_3)
source("CODE.R", local = TRUE)
}
final_output <- paste(OD_output, used_later2, usedlater_3)
print(final_output)
# #---- file 2
# # inputs.R
# used_later1 <- paste(today, "_later")
# used_later2 <- "l2"
#
# #---- file 3
# # CODE.R
# OD_output <- paste(DC, today, used_later1, usedlater_2, usedlater_3)
I'm afraid I didn't learn R or CS in a proper way so I'm trying to catch up now. Any bigger picture lessons would be helpful. Previously, I've been relying on a global environment where I keep everything (and save/keep between sessions), but now I'm trying to make everything reproducible, so I'm using RStudio to run local jobs that start from scratch.
I've been trying different combinations of <-, <<-, and source(local = TRUE) (instead of local = FALSE). I do use functions for pieces of code where I know the inputs I need and outputs I want, but as you can see, CODE.R uses variables from both testrun.R, the loop inside testrun.R, and input.R. Converting some of the code into functions might help ? but I'd like to know of alternatives as well given this case.
Finally you can see my own troubleshooting log to see my thought process:
first run: variable today wasn't found, so I made today <<- "abcdef" double arrow assignment
second run: DC not found, so I will switch to local = TRUE
third run: but now usedlater_2 not found, so i will change usedlater_2 to <<-. (what about usedlater_1? why didn't this show up as error? we'll see...)
result of third run: usedlater_2 still not found when CODE.R needs it. out of ideas. note: used_later2 was found to create used_later3 in the for loop in testrun.R.
Is there a way to identify all the workspace objects created, modified or referenced in a sourced script? I have hundreds of randomly-named objects in my workplace and am 'cleaning house' - I would like to be able to be more proactive about this in the future and just rm() the heck out of the sourced script at the end.
The simplest way is to store your environment objects in a list before sourcing, sourcing, then comparing the new environment objects with the old list.
Here is some pseudo-code.
old_objects <- ls()
source(file)
new_objects <- setdiff(ls(), c(old_objects, "old_objects"))
This will identify the created objects. To identify whether an object was modified, I don't see another solution than to store all your objects in a list beforehand and then running identical afterwards.
# rm(list = ls(all = TRUE))
a <- 1
b <- 1
old_obj_names <- ls()
old_objects <- lapply(old_obj_names, get)
names(old_objects) <- old_obj_names
# source should go here
a <- 2
c <- 3
# I add "old_obj_names" and "old_objects" in the setdiff as these
# were created after the call to ls but before the source
new_objects <- setdiff(ls(), c(old_obj_names, "old_obj_names", "old_objects"))
modified_objects <- sapply(old_obj_names, function(x) !identical(old_objects[[x]], get(x)),
USE.NAMES = TRUE)
modified_objects <- names(modified_objects[modified_objects])
new_objects is indeed "c" and modified_objects is indeed "a" in this example. Obviously, for this to work, you need to ensure that neither old_objects nor old_obj_names are in any way created or modified in the sourced file!
I have something like 700,000 files in a folder where I need to find and replace multiple strings with different other strings (all 4 caracters codes). It is unsure if a string is present or not in a file. I'm trying to use gsub but I can't find how to do it with regular expressions. Can someone tell me a good and efficient way to handle this task?
This is the code I've used so far. It worked well with only one y <- gsub(...) instruction but doesn't work for my purpose, obviously because only the last gsub instruction is taken into account for defining the y variable...
chm_files <- list.files(getwd(), pattern=("^[[:digit:]]*.chm$"), full.names=F)
for(chm_file in chm_files) {
x <- readLines(chm_file)
y <- gsub("AG02|AG07|AG05|AG18|AG19|AG08|AG09|AG17", "AGRL", x)
y <- gsub("SB28|SB42|SB43|SB33|SB41|SB34|SB39|SB35", "SWHT", x)
y <- gsub("WB28|WB42|WB43|WB32|WB09|WB33|WB41|WB26", "BARL", x)
y <- gsub("WW02|WW25|WW08|WW31|WW05|WW28|WW19|WW42", "WWHT", x)
cat(y, file=chm_file, sep="\n")
}
I am sure there are already numerous pre-built functions for this task in various R-packages, but anyhow I just cooked this one up for myself and others to use/modify. Apart from the tasks request above it also prints out a tracking log of the count of all changes made across files function: multi_replace.
Here is some example code of how it should be run
# local directory with files you want to work with
setwd("C:/Users/DW/Desktop/New folder")
# get a list of files based on a pattern of interest e.g. .html, .txt, .php
filer = list.files(pattern=".php")
# f - list of original string values you want to change
f <- c("localhost","dbtest","root","oldpassword")
# r - list of values to replace the above values with
# make sure the indexing of f & r
r <- c("newhost", "newdb", "newroot", "newpassword")
# Run the function and watch all your changes take place ;)
tracking_sheet <- multi_replace(filer, f, r)
tracking_sheet
setwd("D:/R Training Material Kathmandu/File renaming procedures")
filer = list.files(pattern="2016")
f <- c("DATA,","$")
r <- c("","")
tracking_sheet <- multi_replace(filer, f, r)
tracking_sheet
I used the above script but the code failed to replace the $ sign among all files