Getting example codes of R functions into knitr using helpExtract function - r

I want to get the example codes of R functions to use in knitr. There might be an easy way but tried the following code using helpExtract function which can be obtained from here (written by #AnandaMahto). With my approach I have to look whether a function has Examples or not and have to include only those functions which have Examples.
This is very inefficient and naive approach. Now I'm trying to include only those functions which have Examples. I tried the following code but it is not working as desired. How can I to extract Examples codes from an R package?
\documentclass{book}
\usepackage[T1]{fontenc}
\begin{document}
<< label=packages, echo=FALSE>>=
library(ggplot2)
library(devtools)
source_gist("https://gist.github.com/mrdwab/7586769")
library(noamtools) # install_github("noamtools", "noamross")
#
\chapter{Linear Model}
<< label = NewTest1, results="asis">>=
tryCatch(
{helpExtract(lm, section="Examples", type = "s_text");
cat(
"\\Sexpr{
knit_child(
textConnection(helpExtract(lm, section=\"Examples\", type = \"s_text\"))
, options = list(tidy = FALSE, eval = TRUE)
)
}", "\n"
)
}
, error=function(e) FALSE
)
#
\chapter{Modify properties of an element in a theme object}
<< label = NewTest2, results="asis">>=
tryCatch(
{helpExtract(add_theme , section="Examples", type = "s_text");
cat(
"\\Sexpr{
knit_child(
textConnection(helpExtract(add_theme , section=\"Examples\", type = \"s_text\"))
, options = list(tidy = FALSE, eval = TRUE)
)
}", "\n"
)
}
, error=function(e) FALSE
)
#
\end{document}

I've done some quick work modifying the function (which I've included at this Gist). The Gist also includes a sample Rnw file (I haven't had a chance to check an Rmd file yet).
The function now looks like this:
helpExtract <- function(Function, section = "Usage", type = "m_code", sectionHead = NULL) {
A <- deparse(substitute(Function))
x <- capture.output(tools:::Rd2txt(utils:::.getHelpFile(utils::help(A)),
options = list(sectionIndent = 0)))
B <- grep("^_", x) ## section start lines
x <- gsub("_\b", "", x, fixed = TRUE) ## remove "_\b"
X <- rep(FALSE, length(x)) ## Create a FALSE vector
X[B] <- 1 ## Initialize
out <- split(x, cumsum(X)) ## Create a list of sections
sectionID <- vapply(out, function(x) ## Identify where the section starts
grepl(section, x[1], fixed = TRUE), logical(1L))
if (!any(sectionID)) { ## If the section is missing...
"" ## ... just return an empty character
} else { ## Else, get that list item
out <- out[[which(sectionID)]][-c(1, 2)]
while(TRUE) { ## Remove the extra empty lines
out <- out[-length(out)] ## from the end of the file
if (out[length(out)] != "") { break }
}
switch( ## Determine the output type
type,
m_code = {
before <- "```r"
after <- "```"
c(sectionHead, before, out, after)
},
s_code = {
before <- "<<eval = FALSE>>="
after <- "#"
c(sectionHead, before, out, after)
},
m_text = {
c(sectionHead, paste(" ", out, collapse = "\n"))
},
s_text = {
before <- "\\begin{verbatim}"
after <- "\\end{verbatim}"
c(sectionHead, before, out, after)
},
stop("`type` must be either `m_code`, `s_code`, `m_text`, or `s_text`")
)
}
}
What has changed?
A new argument sectionHead has been added. This is used to be able to specify the section title in the call to the helpExtract function.
The function checks to see whether the relevant section is available in the parsed document. If it is not, it simply returns a "" (which doesn't get printed).
Example use would be:
<<echo = FALSE>>=
mySectionHeading <- "\\section{Some cool section title}"
#
\Sexpr{knit_child(textConnection(
helpExtract(cor, section = "Examples", type = "s_code",
sectionHead = mySectionHeading)),
options = list(tidy = FALSE, eval = FALSE))}
Note: Since Sexpr doesn't allow curly brackets to be used ({), we need to specify the title outside of the Sexpr step, which I have done in a hidden code chunk.

This is not a complete answer so I'm marking it as community wiki. Here are two simple lines to get the examples out of the Rd file for a named function (in this case lm). The code is much simpler than Ananda's gist in my opinion:
x <- utils:::.getHelpFile(utils::help(lm))
sapply(x[sapply(x, function(z) attr(z, "Rd_tag") == "\\examples")][[1]], `[[`, 1)
The result is a simple vector of all of the text in the Rd "examples" section, which should be easy to parse, evaluate, or include in a knitr doc.
[1] "\n"
[2] "require(graphics)\n"
[3] "\n"
[4] "## Annette Dobson (1990) \"An Introduction to Generalized Linear Models\".\n"
[5] "## Page 9: Plant Weight Data.\n"
[6] "ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)\n"
[7] "trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)\n"
[8] "group <- gl(2, 10, 20, labels = c(\"Ctl\",\"Trt\"))\n"
[9] "weight <- c(ctl, trt)\n"
[10] "lm.D9 <- lm(weight ~ group)\n"
[11] "lm.D90 <- lm(weight ~ group - 1) # omitting intercept\n"
[12] "\n"
[13] "\n"
[14] "opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0))\n"
[15] "plot(lm.D9, las = 1) # Residuals, Fitted, ...\n"
[16] "par(opar)\n"
[17] "\n"
[18] "\n"
[19] "### less simple examples in \"See Also\" above\n"

Perhaps the following might be useful.
get.examples <- function(pkg=NULL) {
suppressWarnings(f <- unique(utils:::index.search(TRUE, find.package(pkg))))
out <- setNames(sapply(f, function(x) {
tf <- tempfile("Rex")
tools::Rd2ex(utils:::.getHelpFile(x), tf)
if (!file.exists(tf)) return(invisible())
readLines(tf)
}), basename(f))
out[!sapply(out, is.null)]
}
ex.base <- get.examples('base')
This returns the examples for all functions (that have documentation containing examples) within the specified vector of packages. If pkg=NULL, it returns the examples for all functions within loaded packages.
For example:
ex.base['scan']
# $scan
# [1] "### Name: scan"
# [2] "### Title: Read Data Values"
# [3] "### Aliases: scan"
# [4] "### Keywords: file connection"
# [5] ""
# [6] "### ** Examples"
# [7] ""
# [8] "cat(\"TITLE extra line\", \"2 3 5 7\", \"11 13 17\", file = \"ex.data\", sep = \"\\n\")"
# [9] "pp <- scan(\"ex.data\", skip = 1, quiet = TRUE)"
# [10] "scan(\"ex.data\", skip = 1)"
# [11] "scan(\"ex.data\", skip = 1, nlines = 1) # only 1 line after the skipped one"
# [12] "scan(\"ex.data\", what = list(\"\",\"\",\"\")) # flush is F -> read \"7\""
# [13] "scan(\"ex.data\", what = list(\"\",\"\",\"\"), flush = TRUE)"
# [14] "unlink(\"ex.data\") # tidy up"
# [15] ""
# [16] "## \"inline\" usage"
# [17] "scan(text = \"1 2 3\")"
# [18] ""
# [19] ""
# [20] ""
# [21] ""

Related

Create Recipes and passing column names dynamically

I have a function that simply creates a couple of recipe objects. The issue is that inside of the function I have to rename the columns of the data.frame/tibble passed so that I can make the recipes.
I don't want to do this for obvious reasons, the main being, that the column names will have to be what is in the data.frame itself otherwise down the line they are not going to work.
Simple example:
library(tidyverse)
data_tbl <- tibble(
visit_date = seq(
from = as.Date("2021-01-01"),
to = as.Date("2021-10-15"),
by = 7,
),
visits = rnbinom(
n = 42,
size = 100,
mu = 66
)
)
ts_auto_recipe <- function(.data, .date_col, .pred_col){
# * Tidyeval ----
date_col_var <- rlang::enquo(.date_col)
pred_col_var <- rlang::enquo(.pred_col)
# * Checks ----
if(!is.data.frame(.data)){
stop(call. = FALSE, "You must supply a data.frame/tibble.")
}
if(rlang::quo_is_missing(date_col_var)){
stop(call. = FALSE, "The (.date_col) must be supplied.")
}
if(rlang::quo_is_missing(pred_col_var)){
stop(call. = FALSE, "The (.pred_col) must be supplied.")
}
# * Data ----
data_tbl <- tibble::as_tibble(.data)
data_tbl <- data_tbl %>%
dplyr::select(
{{ date_col_var }}, {{ pred_col_var }}, dplyr::everything()
) %>%
dplyr::rename(
date_col = {{ date_col_var }}
, value_col = {{ pred_col_var }}
)
# * Recipe Objects ----
# ** Base recipe ----
rec_base_obj <- recipes::recipe(
formula = date_col ~ . # I have to do the above so I can do this, which I don't like
, data = data_tbl
)
# * Add Steps ----
# ** ts signature and normalize ----
rec_date_obj <- rec_base_obj %>%
timetk::step_timeseries_signature(date_col) %>%
recipes::step_normalize(
dplyr::contains("index.num")
, dplyr::contains("date_col_year")
)
# * Recipe List ----
rec_lst <- list(
rec_base = rec_base_obj,
rec_date = rec_date_obj
)
# * Return ----
return(rec_lst)
}
rec_objs <- ts_auto_recipe(data_tbl, visit_date, visits)
The reason I am doing this is because I cannot use dynamic names inside of the recipe function itself, so something like rlang::sym(names(data_tbl)[[1]]) will not work, nor would something like data_tbl[[1]]. I was thinking of using something like step_rename() but that would require you to know the name ahead of time and it cannot be a variable inside of the recipe step. However you can pass a variable to something like timetk::step_time_series_signature
The only other thing I could think of was to force users to use specific column name like in the Facebook Prophet R library of ds and y
I also notice I get some funky output to the terminal when I run rec_objs I get the following:
> rec_objs
$rec_base
Recipe
Inputs:
role #variables
outcome 1
predictor 1
$rec_date
Recipe
Inputs:
role #variables
outcome 1
predictor 1
Operations:
Timeseries signature features from date_col
Centering and scaling for dplyr::contains("ÿþindex.numÿþ"), dplyr::contains("ÿþdate_col...
Yet when I do:
> rec_objs[[2]]
Recipe
Inputs:
role #variables
outcome 1
predictor 1
Operations:
Timeseries signature features from date_col
Centering and scaling for dplyr::contains("index.num"), dplyr::contains("date_col_year")
It does not happen.
Thank you,
I think I have found a solution to this problem, see the following custom function:
ts_auto_recipe_b <- function(.data
, .date_col
, .pred_col
, .step_ts_sig = TRUE
, .step_ts_rm_misc = TRUE
, .step_ts_dummy = TRUE
, .step_ts_fourier = TRUE
, .step_ts_fourier_period = 1
, .K = 1
, .step_ts_yeo = TRUE
, .step_ts_nzv = TRUE) {
# * Tidyeval ----
date_col_var_expr <- rlang::enquo(.date_col)
pred_col_var_expr <- rlang::enquo(.pred_col)
step_ts_sig <- .step_ts_sig
step_ts_rm_misc <- .step_ts_rm_misc
step_ts_dummy <- .step_ts_dummy
step_ts_fourier <- .step_ts_fourier
step_ts_fourier_k <- .K
step_ts_fourier_period <- .step_ts_fourier_period
step_ts_yeo <- .step_ts_yeo
step_ts_nzv <- .step_ts_nzv
# * Checks ----
if(!is.data.frame(.data)){
stop(call. = FALSE, "You must supply a data.frame/tibble.")
}
if(rlang::quo_is_missing(date_col_var_expr)){
stop(call. = FALSE, "The (.date_col) must be supplied.")
}
if(rlang::quo_is_missing(pred_col_var_expr)){
stop(call. = FALSE, "The (.pred_col) must be supplied.")
}
# * Data ----
data_tbl <- tibble::as_tibble(.data)
data_tbl <- data_tbl %>%
dplyr::select(
{{ date_col_var_expr }}
, {{ pred_col_var_expr }}
, dplyr::everything()
)
# %>%
# dplyr::rename(
# date_col = {{ date_col_var_expr }}
# , value_col = {{ pred_col_var_expr }}
# )
# Original Col names ----
ds <- rlang::sym(names(data_tbl)[[1]])
v <- rlang::sym(names(data_tbl)[[2]])
f <- as.formula(paste(v, " ~ ."))
# * Recipe Objects ----
# ** Base recipe ----
rec_base_obj <- recipes::recipe(
formula = f
, data = data_tbl
)
# * Add Steps ----
# ** ts signature and normalize ----
if(step_ts_sig){
rec_date_obj <- rec_base_obj %>%
timetk::step_timeseries_signature(ds) %>%
recipes::step_normalize(
dplyr::contains("index.num")
, dplyr::contains("date_col_year")
)
}
# ** Step rm ----
if(step_ts_rm_misc){
rec_date_obj <- rec_date_obj %>%
recipes::step_rm(dplyr::matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)"))
}
# ** Step Dummy ----
if(step_ts_dummy){
rec_date_obj <- rec_date_obj %>%
recipes::step_dummy(recipes::all_nominal_predictors(), one_hot = TRUE)
}
# ** Step Fourier ----
if(step_ts_fourier){
rec_date_fourier_obj <- rec_date_obj %>%
timetk::step_fourier(
ds
, period = 1#step_ts_fourier_period
, K = 1#step_ts_fourier_k
)
}
# ** Step YeoJohnson ----
if(step_ts_yeo){
rec_date_fourier_obj <- rec_date_fourier_obj %>%
recipes::step_YeoJohnson(!!v, limits = c(0, 1))
}
# ** Step NZV ----
if(step_ts_nzv){
rec_date_fourier_nzv_obj <- rec_date_fourier_obj %>%
recipes::step_nzv(recipes::all_predictors())
}
# * Recipe List ----
rec_lst <- list(
rec_base = rec_base_obj,
rec_date = rec_date_obj,
rec_date_fourier = rec_date_fourier_obj,
rec_date_fourier_nzv = rec_date_fourier_nzv_obj
)
# * Return ----
return(rec_lst)
}
Then running the following:
> rec_objs <- ts_auto_recipe_b(.data = data_tbl, .date_col = visit_date, .pred_col = visits)
> rec_objs[[1]] %>% prep() %>% juice() %>% names()
[1] "visit_date" "visits"
> rec_objs[[2]] %>% prep() %>% juice() %>% names()
[1] "visit_date" "visits" "visit_date_index.num"
[4] "visit_date_year" "visit_date_half" "visit_date_quarter"
[7] "visit_date_month" "visit_date_day" "visit_date_wday"
[10] "visit_date_mday" "visit_date_qday" "visit_date_yday"
[13] "visit_date_mweek" "visit_date_week" "visit_date_week2"
[16] "visit_date_week3" "visit_date_week4" "visit_date_mday7"
[19] "visit_date_month.lbl_01" "visit_date_month.lbl_02" "visit_date_month.lbl_03"
[22] "visit_date_month.lbl_04" "visit_date_month.lbl_05" "visit_date_month.lbl_06"
[25] "visit_date_month.lbl_07" "visit_date_month.lbl_08" "visit_date_month.lbl_09"
[28] "visit_date_month.lbl_10" "visit_date_month.lbl_11" "visit_date_month.lbl_12"
[31] "visit_date_wday.lbl_1" "visit_date_wday.lbl_2" "visit_date_wday.lbl_3"
[34] "visit_date_wday.lbl_4" "visit_date_wday.lbl_5" "visit_date_wday.lbl_6"
[37] "visit_date_wday.lbl_7"
> rec_objs[[3]] %>% prep() %>% juice() %>% names()
[1] "visit_date" "visits" "visit_date_index.num"
[4] "visit_date_year" "visit_date_half" "visit_date_quarter"
[7] "visit_date_month" "visit_date_day" "visit_date_wday"
[10] "visit_date_mday" "visit_date_qday" "visit_date_yday"
[13] "visit_date_mweek" "visit_date_week" "visit_date_week2"
[16] "visit_date_week3" "visit_date_week4" "visit_date_mday7"
[19] "visit_date_month.lbl_01" "visit_date_month.lbl_02" "visit_date_month.lbl_03"
[22] "visit_date_month.lbl_04" "visit_date_month.lbl_05" "visit_date_month.lbl_06"
[25] "visit_date_month.lbl_07" "visit_date_month.lbl_08" "visit_date_month.lbl_09"
[28] "visit_date_month.lbl_10" "visit_date_month.lbl_11" "visit_date_month.lbl_12"
[31] "visit_date_wday.lbl_1" "visit_date_wday.lbl_2" "visit_date_wday.lbl_3"
[34] "visit_date_wday.lbl_4" "visit_date_wday.lbl_5" "visit_date_wday.lbl_6"
[37] "visit_date_wday.lbl_7" "visit_date_sin1_K1" "visit_date_cos1_K1"
> rec_objs[[4]] %>% prep() %>% juice() %>% names()
[1] "visit_date" "visits" "visit_date_index.num"
[4] "visit_date_half" "visit_date_quarter" "visit_date_month"
[7] "visit_date_day" "visit_date_mday" "visit_date_qday"
[10] "visit_date_yday" "visit_date_mweek" "visit_date_week"
[13] "visit_date_week2" "visit_date_week3" "visit_date_week4"
[16] "visit_date_mday7" "visit_date_month.lbl_01" "visit_date_month.lbl_02"
[19] "visit_date_month.lbl_03" "visit_date_month.lbl_04" "visit_date_month.lbl_05"
[22] "visit_date_month.lbl_06" "visit_date_month.lbl_07" "visit_date_month.lbl_08"
[25] "visit_date_month.lbl_09" "visit_date_month.lbl_10" "visit_date_sin1_K1"
[28] "visit_date_cos1_K1"
Will show that visit_date and visits were passed as desired to the functions by making use of !!v for recipes functions, where as timetk allows for passing variables.

Find data in folders and give feedback about missing data

I have a R-script to create several small parts of a big dataset (actually a dataset of Europe). We need these small parts (tiles) to edit these tiles more easily than it would be with one big dataset.
Now I have 1 windows folder and in this folder I have 966 auto-generated folders - each one with 4 datasets (I hope at least it is 4). We need to know if there are exactly these 4 datasets in the folders and if some dataset is missing we need to know which one. The code you can see below is creating the folders. Its posted just to let you know the structures.
in_file <- "P:/High_Resolution_Layers/Forest... .tif/2015/TCD_2015_020m_eu_03035_d04_full/TCD_2015_020m_eu_03035_d04_full.tif"
for (t in 1:length(tiles)){
tileID <- tiles[t]
out_dir <- file.path(output_dir,tileID)
# out_dir_tmp <- file.path(out_dir, "tmp")
if(!exists(out_dir)) {dir.create(out_dir, recursive = T)}
# if(!exists(out_dir)) {dir.create(out_dir_tmp, recursive = T)}
# tmp_file <- file.path(out_dir_tmp, paste0(tileID, "_HRL_Forest.tif")) ## <- ändern ("_HRL_Forest.tif", _clc_2012.tif, _clc_2018.tif, _slope.tif)
out_file <- file.path(out_dir, paste0(tileID, "_HRL_Forest.tif")) ## <- ändern ("_HRL_Forest.tif", _clc_2012.tif, _clc_2018.tif, _slope.tif)
cmd <- paste("gdalwarp",
"-overwrite",
"-s_srs EPSG:3035",
"-t_srs EPSG:3035",
"-r near",
"-q",
"-tr 20 20",
"-te ", tile_list[t,3],tile_list[t,4],tile_list[t,3]+100000, tile_list[t,4]+100000,
"-tap",
"-of GTiff",
in_file,
out_file)
system(osgeo, input=cmd)
# cmd <- sprintf('gdal_translate -ot Byte -a_nodata 255 -co "COMPRESS=LZW" %s %s', tmp_file, out_file)
# system(osgeo, input=cmd)
# unlink(out_dir_tmp,recursive=T)
}
I'm going to make up a structure and list of files.
directories A through D
each directory must have files a.tif though c.tif
Since all dirs must have the same files within them, we can do a cartesian/outer join of them:
dirs <- LETTERS[1:4]
files_each_dir <- paste0(letters[1:3], ".tif")
(all_files <- outer(dirs, files_each_dir, file.path))
# [,1] [,2] [,3]
# [1,] "A/a.tif" "A/b.tif" "A/c.tif"
# [2,] "B/a.tif" "B/b.tif" "B/c.tif"
# [3,] "C/a.tif" "C/b.tif" "C/c.tif"
# [4,] "D/a.tif" "D/b.tif" "D/c.tif"
Since we don't need a matrix, I'll unlist them and then create the dirs/files:
c(all_files)
# [1] "A/a.tif" "B/a.tif" "C/a.tif" "D/a.tif" "A/b.tif" "B/b.tif" "C/b.tif"
# [8] "D/b.tif" "A/c.tif" "B/c.tif" "C/c.tif" "D/c.tif"
for (d in dirs) dir.create(d)
for (p in all_files) writeLines(p, p)
All expected files exist
(files_found <- list.files(pattern = "*.tif", recursive = TRUE, full.names = TRUE))
# [1] "./A/a.tif" "./A/b.tif" "./A/c.tif" "./B/a.tif" "./B/b.tif" "./B/c.tif"
# [7] "./C/a.tif" "./C/b.tif" "./C/c.tif" "./D/a.tif" "./D/b.tif" "./D/c.tif"
### remove the leading "./"
(files_found <- gsub("^\\./", "", files_found))
# [1] "A/a.tif" "A/b.tif" "A/c.tif" "B/a.tif" "B/b.tif" "B/c.tif" "C/a.tif"
# [8] "C/b.tif" "C/c.tif" "D/a.tif" "D/b.tif" "D/c.tif"
all(all_files %in% files_found)
# [1] TRUE
all_files[!all_files %in% files_found]
# character(0)
Test a missing file:
file.remove("B/c.tif")
# [1] TRUE
files_found <- list.files(pattern = "*.tif", recursive = TRUE, full.names = TRUE)
files_found <- gsub("^\\./", "", files_found)
all_files[!all_files %in% files_found]
# [1] "B/c.tif"
Note: we do not use files_each_dir for any of the follow-on tests. It is only needed if we expect a fixed-set of filenames.
Count files within each dir
If the filenames might be different, then we can count the number of files in each directory, irrespective of the actual names.
(len3 <- lengths(split(files_found, sapply(strsplit(files_found, "[/\\]"), `[[`, 1))) == 3)
# A B C D
# TRUE FALSE TRUE TRUE
names(len3)[ !len3 ]
# [1] "B"
File contents
If you need to test the contents such that some condition is true, try something like this. Here, I'm using simple shell command grep, but any function (R or shell) that takes a path and returns something you need (size, property, etc) should work.
func <- function(path) length(system2("grep", c("-lE", "'[a-z]'", path), stdout = TRUE)) > 0
(proper_contents <- sapply(files_found, func))
# A/a.tif A/b.tif A/c.tif B/a.tif B/b.tif C/a.tif C/b.tif C/c.tif D/a.tif D/b.tif
# TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# D/c.tif
# TRUE
Let's change one file's contents to test:
writeLines("123", "D/a.tif")
proper_contents <- sapply(files_found, func)
# Warning in system2("grep", c("-lE", "'[a-z]'", path), stdout = TRUE) :
# running command '"grep" -lE '[a-z]' D/a.tif' had status 1
names(proper_contents)[ !proper_contents ]
# [1] "D/a.tif"

stack files of various folders using R

I stacked certain tif files of a Landsat image, as is shown:
setwd("C:/Users/Landsat/L5__002072-09MAY-2006")
may2006<-list.files(".",pattern="*B[123457]\\.tif$", ignore.case=TRUE)
[1] "LT05_L1TP_002072_20060509_20161121_01_T1_B1.TIF"
[2] "LT05_L1TP_002072_20060509_20161121_01_T1_B2.TIF"
[3] "LT05_L1TP_002072_20060509_20161121_01_T1_B3.TIF"
[4] "LT05_L1TP_002072_20060509_20161121_01_T1_B4.TIF"
[5] "LT05_L1TP_002072_20060509_20161121_01_T1_B5.TIF"
[7] "LT05_L1TP_002072_20060509_20161121_01_T1_B7.TIF"
landsat_stack <- stack(may2006)
I want to do the same, but for all the images of the folder Landsat (each folder been a separate stack )
setwd("C:/Users/Landsat")
foldersList <- normalizePath(list.dirs(full.names = TRUE, recursive = FALSE))
[1] "C:\\Users\\Landsat\\L5__002072-09MAY-2006"
[2] "C:\\Users\\Landsat\\L5_001073_02MAY-2006"
[3] "C:\\Users\\Landsat\\L5_001073_14MAY-1987"
[4] "C:\\Users\\Landsat\\L8__002072-7MAY-2017"
Is it possible to do this simultaneously for all the images?
I thought in first do one list with all tif files(no matter the folder), and then with a loop stacks only the files that have a match in the name (condition 1), but finish with this pattern "B[123457]"(condition 2)
all_Landsat<-list.files(".",pattern="*B[123457]\\.tif$", ignore.case=TRUE, recursive= TRUE)
all_Landsat
[1] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B1.TIF"
[2] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B2.TIF"
[3] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B3.TIF"
[4] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B4.TIF"
[5] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B5.TIF"
[6] "L5__002072-09MAY-2006/LT05_L1TP_002072_20060509_20161121_01_T1_B7.TIF"
[7] "L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B1.TIF"
[8] "L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B2.TIF"
[9] "L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B3.TIF"
[10]"L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B4.TIF"
[11]"L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B5.TIF"
[12]"L5_001073_02MAY-2006/LT05_L1TP_001073_20060502_20161122_01_T1_B7.TIF"
[13]"L5_001073_14MAY-1987/LM50010731987134AAA03_B1.TIF"
[14]"L5_001073_14MAY-1987/LM50010731987134AAA03_B2.TIF"
[15]"L5_001073_14MAY-1987/LM50010731987134AAA03_B3.TIF"
[16]"L5_001073_14MAY-1987/LM50010731987134AAA03_B4.TIF"
[17]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B1.TIF"
[18]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B2.TIF"
[19]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B3.TIF"
[20]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B4.TIF"
[21]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B5.TIF"
[22]"L8__002072-7MAY-2017/LC08_L1TP_002072_20170507_20170515_01_T1_B7.TIF"
But I can´t find the right code for the 2 conditions:
for (i in all_Landsat){
if (grep(pattern="+B[123457]\\.tif$", ignore.case=FALSE)){
stack(i)
}
}
I haven't checked this but hope this works:
setwd("C:/Users/Landsat")
a<-list.dirs(getwd(),recursive = FALSE )
flist <- list()
stackfile <- list()
for (i in 1:length(a)){
flist[[i]] <- list.files(a[i], recursive = TRUE, full.names = TRUE, pattern = "tif$")
stackfile[[i]] <- stack(flist[[i]])
}
Thanks #anup. I finally solved the problem with this code. It returns a list of the TIF images stacked by folder.
setwd("C:/Users/Landsat")
a<-list.dirs(getwd(),recursive = FALSE )
landsat<- apply(a,function (dir){
img<-stack(list.files(path=dir,ignore.case= TRUE,
pattern="*B[123457]\\.tif$", full.names= TRUE))
})

Avoiding for loop, Naming Example

I would like to avoid using for loop in following example. Goal is to repeat string vector multiple times with different second part which changes each repetition. Is that possible?
str2D = mtcars
Vector = c(10,20)
Dimen = dim( str2D )
nn = c()
for ( i in Dimen[2]*(1:length(Vector)) ){
nn[ (i+1-Dimen[2]): i ] = rep(paste("|d",Vector[i/Dimen[2]],sep=""), Dimen[2] )
}
Name = paste( rep(names(str2D) , length(Vector) ),nn,sep="")
Correct result for "Name" vector is following:
"mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10" "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20" "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20" "carb|d20"
Thank you
I don't quite understand the end goal here but at least this achieves your desired output without a loop:
Name <- paste0(paste(names(mtcars)), "|d", rep(1:2, each = length(names(mtcars))), "0")
> Name
[1] "mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10"
[8] "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20"
[15] "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20"
[22] "carb|d20"

Read a text file in R line by line

I would like to read a text file in R, line by line, using a for loop and with the length of the file. The problem is that it only prints character(0). This is the code:
fileName="up_down.txt"
con=file(fileName,open="r")
line=readLines(con)
long=length(line)
for (i in 1:long){
linn=readLines(con,1)
print(linn)
}
close(con)
You should take care with readLines(...) and big files. Reading all lines at memory can be risky. Below is a example of how to read file and process just one line at time:
processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
break
}
print(line)
}
close(con)
}
Understand the risk of reading a line at memory too. Big files without line breaks can fill your memory too.
Just use readLines on your file:
R> res <- readLines(system.file("DESCRIPTION", package="MASS"))
R> length(res)
[1] 27
R> res
[1] "Package: MASS"
[2] "Priority: recommended"
[3] "Version: 7.3-18"
[4] "Date: 2012-05-28"
[5] "Revision: $Rev: 3167 $"
[6] "Depends: R (>= 2.14.0), grDevices, graphics, stats, utils"
[7] "Suggests: lattice, nlme, nnet, survival"
[8] "Authors#R: c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"),"
[9] " email = \"ripley#stats.ox.ac.uk\"), person(\"Kurt\", \"Hornik\", role"
[10] " = \"trl\", comment = \"partial port ca 1998\"), person(\"Albrecht\","
[11] " \"Gebhardt\", role = \"trl\", comment = \"partial port ca 1998\"),"
[12] " person(\"David\", \"Firth\", role = \"ctb\"))"
[13] "Description: Functions and datasets to support Venables and Ripley,"
[14] " 'Modern Applied Statistics with S' (4th edition, 2002)."
[15] "Title: Support Functions and Datasets for Venables and Ripley's MASS"
[16] "License: GPL-2 | GPL-3"
[17] "URL: http://www.stats.ox.ac.uk/pub/MASS4/"
[18] "LazyData: yes"
[19] "Packaged: 2012-05-28 08:47:38 UTC; ripley"
[20] "Author: Brian Ripley [aut, cre, cph], Kurt Hornik [trl] (partial port"
[21] " ca 1998), Albrecht Gebhardt [trl] (partial port ca 1998), David"
[22] " Firth [ctb]"
[23] "Maintainer: Brian Ripley <ripley#stats.ox.ac.uk>"
[24] "Repository: CRAN"
[25] "Date/Publication: 2012-05-28 08:53:03"
[26] "Built: R 2.15.1; x86_64-pc-mingw32; 2012-06-22 14:16:09 UTC; windows"
[27] "Archs: i386, x64"
R>
There is an entire manual devoted to this.
Here is the solution with a for loop. Importantly, it takes the one call to readLines out of the for loop so that it is not improperly called again and again. Here it is:
fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)
I write a code to read file line by line to meet my demand which different line have different data type follow articles: read-line-by-line-of-a-file-in-r and determining-number-of-linesrecords. And it should be a better solution for big file, I think. My R version (3.3.2).
con = file("pathtotargetfile", "r")
readsizeof<-2 # read size for one step to caculate number of lines in file
nooflines<-0 # number of lines
while((linesread<-length(readLines(con,readsizeof)))>0) # calculate number of lines. Also a better solution for big file
nooflines<-nooflines+linesread
con = file("pathtotargetfile", "r") # open file again to variable con, since the cursor have went to the end of the file after caculating number of lines
typelist = list(0,'c',0,'c',0,0,'c',0) # a list to specific the lines data type, which means the first line has same type with 0 (e.g. numeric)and second line has same type with 'c' (e.g. character). This meet my demand.
for(i in 1:nooflines) {
tmp <- scan(file=con, nlines=1, what=typelist[[i]], quiet=TRUE)
print(is.vector(tmp))
print(tmp)
}
close(con)
I suggest you check out chunked and disk.frame. They both have functions for reading in CSVs chunk-by-chunk.
In particular, disk.frame::csv_to_disk.frame may be the function you are after?
fileName = "up_down.txt"
### code to get the line count of the file
length_connection = pipe(paste("cat ", fileName, " | wc -l", sep = "")) # "cat fileName | wc -l" because that returns just the line count, and NOT the name of the file with it
long = as.numeric(trimws(readLines(con = length_connection, n = 1)))
close(length_connection) # make sure to close the connection
###
for (i in 1:long){
### code to extract a single line at row i from the file
linn_connection_cmd = paste("head -n", format(x = i, scientific = FALSE, big.mark = ""), fileName, "| tail -n 1", sep = " ") # extracts one line from fileName at the desired line number (i)
linn_connection = pipe(linn_connection_cmd)
linn = readLines(con = linn_connection, n = 1)
close(linn_connection) # make sure to close the conection
###
# the line is now loaded into R and anything can be done with it
print(linn)
}
close(con)
By using R's pipe() command, and using shell commands to extract what we want, the full file is never loaded into R, and is read in line by line.
paste("head -n", format(x = i, scientific = FALSE, big.mark = ""), fileName, "| tail -n 1", sep = " ")
It is this command that does all the work; it extracts one line from the desired file.
Edit: R's default behavior is for i to return as normal number when less than 100,000, but begins returning i in scientific notation when it is greater than or equal to 100,000 (1e+05). Thus, format(x = i, scientific = FALSE, big.mark = "") is used in our pipe command to make sure that the pipe() command always receives a number in normal form, which is all that the command can understand. If the pipe() command is given any number like 1e+05, it will not be able to comprehend it and will result in the following error:
head: 1e+05: invalid number of lines

Resources