I have a PDF document with 300 pages. I need to split this file in 150 files containing each one 2 pages. For example, the 1st document would contain pages 1 & 2 of the original file, the 2nd document, the pages 3 & 4 and so on.
Maybe I can use the "pdftools" package, but I don't know how.
1) pdftools Assuming that the input PDF is in the current directory and the outputs are to go into the same directory, change the inputs below and then get the number of pages num, compute the st and en vectors of start and end page numbers and repeatedly call pdf_subset. Note that the pdf_length and pdf_subset functions come from the qpdf R package but are also made available by the pdftools R package by importing them and exporting them back out.
library(pdftools)
# inputs
infile <- "a.pdf" # input pdf
prefix <- "out_" # output pdf's will begin with this prefix
num <- pdf_length(infile)
st <- seq(1, num, 2)
en <- pmin(st + 1, num)
for (i in seq_along(st)) {
outfile <- sprintf("%s%0*d.pdf", prefix, nchar(num), i)
pdf_subset(infile, pages = st[i]:en[i], output = outfile)
}
2) pdfbox The Apache pdfbox utility can split into files of 2 pages each. Download the .jar command line utilities file from pdfbox and be sure you have java installed. Then run this assuming that your input file is a.pdf and is in the current directory (or run the quoted part directly from the command line without the quotes and without R). The jar file name below may need to be changed if a later version is to be used. The one named below is the latest one currently (not counting alpha version).
system("java -jar pdfbox-app-2.0.26.jar PDFSplit -split 2 a.pdf")
3) animation/pdftk Another option is to install the pdftk program, change the inputs at the top of the script below and run. This gets the number of pages in the input, num, using pdftk and then computes the start and end page numbers, st and en, and then invokes pdftk repeatedly, once for each st/en pair to extract those pages into another file.
library(animation)
# inputs
PDFTK <- "~/../bin/pdftk.exe" # path to pdftk
infile <- "a.pdf" # input pdf
prefix <- "out_" # output pdf's will begin with this prefix
ani.options(pdftk = Sys.glob(PDFTK))
tmp <- tempfile()
dump_data <- pdftk(infile, "dump_data", tmp)
g <- grep("NumberOfPages", readLines(tmp), value = TRUE)
num <- as.numeric(sub(".* ", "", g))
st <- seq(1, num, 2)
en <- pmin(st + 1, num)
for (i in seq_along(st)) {
outfile <- sprintf("%s%0*d.pdf", prefix, nchar(num), i)
pdftk(infile, sprintf("cat %d-%d", st[i], en[i]), outfile)
}
Neither pdftools nor qpdf (on which the first depends) support splitting PDF files by other than "every page". You likely will need to rely on an external program, I'm confident you can get pdftk to do that by calling it once for each 2-page output.
I have a 36-page PDF here named quux.pdf in the current working directory.
str(pdftools::pdf_info("quux.pdf"))
# List of 11
# $ version : chr "1.5"
# $ pages : int 36
# $ encrypted : logi FALSE
# $ linearized : logi FALSE
# $ keys :List of 8
# ..$ Producer : chr "pdfTeX-1.40.24"
# ..$ Author : chr ""
# ..$ Title : chr ""
# ..$ Subject : chr ""
# ..$ Creator : chr "LaTeX via pandoc"
# ..$ Keywords : chr ""
# ..$ Trapped : chr ""
# ..$ PTEX.Fullbanner: chr "This is pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022) kpathsea version 6.3.4"
# $ created : POSIXct[1:1], format: "2022-05-17 22:54:40"
# $ modified : POSIXct[1:1], format: "2022-05-17 22:54:40"
# $ metadata : chr ""
# $ locked : logi FALSE
# $ attachments: logi FALSE
# $ layout : chr "no_layout"
I also have pdftk installed and available in the page,
Sys.which("pdftk")
# pdftk
# "C:\\PROGRA~2\\PDFtk Server\\bin\\pdftk.exe"
With this, I can run an external script to create 2-page PDFs:
list.files(pattern = "pdf$")
# [1] "quux.pdf"
pages <- seq(pdftools::pdf_info("quux.pdf")$pages)
pages <- split(pages, (pages - 1) %/% 2)
pages[1:3]
# $`0`
# [1] 1 2
# $`1`
# [1] 3 4
# $`2`
# [1] 5 6
for (pg in pages) {
system(sprintf("pdftk quux.pdf cat %s-%s output out_%02i-%02i.pdf",
min(pg), max(pg), min(pg), max(pg)))
}
list.files(pattern = "pdf$")
# [1] "out_01-02.pdf" "out_03-04.pdf" "out_05-06.pdf" "out_07-08.pdf"
# [5] "out_09-10.pdf" "out_11-12.pdf" "out_13-14.pdf" "out_15-16.pdf"
# [9] "out_17-18.pdf" "out_19-20.pdf" "out_21-22.pdf" "out_23-24.pdf"
# [13] "out_25-26.pdf" "out_27-28.pdf" "out_29-30.pdf" "out_31-32.pdf"
# [17] "out_33-34.pdf" "out_35-36.pdf" "quux.pdf"
str(pdftools::pdf_info("out_01-02.pdf"))
# List of 11
# $ version : chr "1.5"
# $ pages : int 2
# $ encrypted : logi FALSE
# $ linearized : logi FALSE
# $ keys :List of 2
# ..$ Creator : chr "pdftk 2.02 - www.pdftk.com"
# ..$ Producer: chr "itext-paulo-155 (itextpdf.sf.net-lowagie.com)"
# $ created : POSIXct[1:1], format: "2022-05-18 09:37:56"
# $ modified : POSIXct[1:1], format: "2022-05-18 09:37:56"
# $ metadata : chr ""
# $ locked : logi FALSE
# $ attachments: logi FALSE
# $ layout : chr "no_layout"
Related
I'd like to create a BigQuery table with geoJSON files, despite the geoJSONis an accepted format in BQ (NEWLINE_DELIMITED_JSON) and bq_fields specification, or something coercible to it (like a data frame) the function bq_table_create() of the bigrquery package doesn't work. In my example below the output error is Erro: Unsupported type: list:
library(sf)
library(bigrquery)
library(DBI)
library(googleAuthR)
library(geojsonsf)
library(geojsonR)
# Convert shapefile to geoJSON
stands_sel <- st_read(
"D:/Dropbox/Stinkbug_Ml_detection_CMPC/dashboard/v_08_CMPC/sel_stands_CMPC.shp")
# Open as geoJSON
geo <- sf_geojson(stands_sel)
# Convert geoJSON to data frame
geo_js_df <- as.data.frame(geojson_wkt(geo))
str(geo_js_df)
# 'data.frame': 2 obs. of 17 variables:
# $ SISTEMA_PR: chr "MACRO ESTACA - EUCALIPTO" "SEMENTE - EUCALIPTO"
# $ ESPECIE : chr "SALIGNA" "DUNNI"
# $ ID_UNIQUE : chr "BARBANEGRA159A" "CAMPOSECO016A"
# $ CICLO : num 2 1
# $ LOCALIDADE: chr "BARRA DO RIBEIRO" "DOM FELICIANO"
# $ ROTACAO : num 1 1
# $ CARACTER_1: chr "Produtivo" "Produtivo"
# $ VLR_AREA : num 8.53 28.07
# $ ID_REGIAO : num 11 11
# $ CD_USO_SOL: num 2433 9053
# $ DATA_PLANT: chr "2008/04/15" "2010/04/15"
# $ ID_PROJETO: chr "002" "344"
# $ CARACTERIS: chr "Plantio Comercial" "Plantio Comercial"
# $ PROJETO : chr "BARBA NEGRA" "CAMPO SECO"
# $ ESPACAMENT: chr "3.00 x 2.50" "3.5 x 2.14"
# $ CD_TALHAO : chr "159A" "016A"
# $ geometry :List of 2
# ..$ : 'wkt' chr "MULTIPOLYGON (((-51.2142 -30.3517,-51.2143 -30.3518,-51.2143 -30.3518,-51.2143 -30.3519,-51.2143 -30.3519,-51.2"| __truncated__
# ..$ : 'wkt' chr "MULTIPOLYGON (((-52.3214 -30.4271,-52.3214 -30.4272,-52.3214 -30.4272,-52.3215 -30.4272,-52.3215 -30.4272,-52.3"| __truncated__
# - attr(*, "wkt_column")= chr "geometry"
# Insert information inside BQ
bq_conn <- dbConnect(bigquery(),
project = "my-project",
use_legacy_sql = FALSE
)
# First create the table
players_table = bq_table(project = "my-project", dataset = "stands_ROI_2021", table = "CF_2021")
bq_table_create(x = players_table, fields = as_bq_fields(geo_js_df))
Erro: Unsupported type: list
You can upload data frame with a list-type column on BigQuery by using bq_table_upload() syntax. Try this on your script instead of bq_table_create(),
bq_table_upload(players_table, geo_js_df)
For your reference, I tried this on my end using this sample data with a list-type column:
d <- data.frame(id = 1:2,
name = c("Jon", "Mark"),
children = I(list(c("Mary", "James"),
c("Greta", "Sally")))
)
R console:
Created BQ table:
EDIT:
As per this documentation, FeatureCollection is not yet supported in BigQuery, however there is an ongoing feature request you can find here. Workaround is to convert the GeoJson file to BigQuery new-line-delimited JSON before converting it to dataframe.
To convert GeoJson file to BigQuery new-line-delimited JSON, follow these steps:
Install node.js.
Add packages:
npm install fs JSONStream line-input-stream yargs
Clone the github repository:
git clone https://github.com/mentin/geoscripts.git
Change directory:
cd geoscripts/geojson2bq/
Convert GeoJson file to BigQuery new-line-delimited JSON:
node geojson2bqjson.js sel_stands.geojson > out.json
Using the new-line-delimited JSON file, convert this to dataframe in the R console, then use bq_table_upload() to upload the data to BigQuery.
library(bigrquery)
library(dplyr)
library(tidyverse)
library(jsonlite)
out <- stream_in(file('out.json'))
projectid<-"my-project"
datasetid<-"my-dataset"
bq_conn <- dbConnect(bigquery(),
project = projectid,
dataset = datasetid,
use_legacy_sql = FALSE)
players_table = bq_table(project = "my-project", dataset = "my-dataset", table = "CF_2021_test5")
bq_table_upload(players_table, out)
bq_table_download(players_table)
R console:
BigQuery table:
I am reading several SAS files from a server and load them all into a list into R. I removed one of the datasets because I didn't need it in the final analysis ( dateset # 31)
mylist<-list.files("path" , pattern = ".sas7bdat")
mylist <- mylist[- 31]
Then I used lapply to read all the datasets in the list ( mylist) at the same time
read.all <- lapply(mylist, read_sas)
the code works well. However when I run view(read.all) to see the the datasets, I can only see a number ( e.g, 1, 2, etc) instead of the names of the initial datasets.
Does anyone know how I can keep the name of datasets in the final list?
Also, can anyone tell me how I can work with this list in R?
is it an object ? may I read one of the dateset of the list ? or how can I join some of the datasets of the list?
Use basename and tools::file_path_sans_ext:
filenames <- head(list.files("~/StackOverflow", pattern = "^[^#].*\\.R", recursive = TRUE, full.names = TRUE))
filenames
# [1] "C:\\Users\\r2/StackOverflow/1000343/61469332.R" "C:\\Users\\r2/StackOverflow/10087004/61857346.R"
# [3] "C:\\Users\\r2/StackOverflow/10097832/60589834.R" "C:\\Users\\r2/StackOverflow/10214507/60837843.R"
# [5] "C:\\Users\\r2/StackOverflow/10215127/61720149.R" "C:\\Users\\r2/StackOverflow/10226369/60778116.R"
basename(filenames)
# [1] "61469332.R" "61857346.R" "60589834.R" "60837843.R" "61720149.R" "60778116.R"
tools::file_path_sans_ext(basename(filenames))
# [1] "61469332" "61857346" "60589834" "60837843" "61720149" "60778116"
somedat <- setNames(lapply(filenames, readLines, n=2),
tools::file_path_sans_ext(basename(filenames)))
names(somedat)
# [1] "61469332" "61857346" "60589834" "60837843" "61720149" "60778116"
str(somedat)
# List of 6
# $ 61469332: chr [1:2] "# https://stackoverflow.com/questions/61469332/determine-function-name-within-that-function/61469380" ""
# $ 61857346: chr [1:2] "# https://stackoverflow.com/questions/61857346/how-to-use-apply-family-instead-of-nested-for-loop-for-my-problem?noredirect=1" ""
# $ 60589834: chr [1:2] "# https://stackoverflow.com/questions/60589834/add-columns-to-data-frame-based-on-function-argument" ""
# $ 60837843: chr [1:2] "# https://stackoverflow.com/questions/60837843/how-to-remove-all-parentheses-from-a-vector-of-string-except-whe"| __truncated__ ""
# $ 61720149: chr [1:2] "# https://stackoverflow.com/questions/61720149/extracting-the-original-data-based-on-filtering-criteria" ""
# $ 60778116: chr [1:2] "# https://stackoverflow.com/questions/60778116/how-to-shift-data-by-a-factor-of-two-months-in-r" ""
Each "name" is the character representation of (in this case) the stackoverflow question number, with the ".R" removed. (And since I typically include the normal URL as the first line then an empty line in the files I use to test/play and answer SO questions, all of these files look similar at the top two lines.)
I have ASCII files with data separated by $ signs.
There are 23 columns in the data, the first row is of column names, but there is inconsistency between the line endings, which causes R to import the data improperly, by shift the data left-wise with respect to their columns.
Header line:
ISR$CASE$I_F_COD$FOLL_SEQ$IMAGE$EVENT_DT$MFR_DT$FDA_DT$REPT_COD$MFR_NUM$MFR_SNDR$AGE$AGE_COD$GNDR_COD$E_SUB$WT$WT_COD$REPT_DT$OCCP_COD$DEATH_DT$TO_MFR$CONFID$REPORTER_COUNTRY
which does not end with a $ sign.
First row line:
7215577$8135839$I$$7215577-0$20101011$$20110104$DIR$$$67$YR$F$N$220$LBS$20110102$CN$$N$Y$UNITED STATES$
Which does end with a $ sign.
My import command:
read.table(filename, header=TRUE, sep="$", comment.char="", header=TRUE, quote="")
My guess is that the inconsistency between the line endings causes R to think that the records have one column more than the header, thus making the first column as a row.names column, which is not correct. Adding the specification row.names=NULL does not fix the issue.
If I manually add a $ sign in the file the problem is solved, but this is infeasible as the issue occurs in hundreds of files. Is there a way to specify how to read the header line? Do I have any alternative?
Additional info: the headers change across different files, so I cannot set my own vector of column names
Create a dummy test file:
cat("ISR$CASE$I_F_COD$FOLL_SEQ$IMAGE$EVENT_DT$MFR_DT$FDA_DT$REPT_COD$MFR_NUM$MFR_SNDR$AGE$AGE_COD$GNDR_COD$E_SUB$WT$WT_COD$REPT_DT$OCCP_COD$DEATH_DT$TO_MFR$CONFID$REPORTER_COUNTRY\n7215577$8135839$I$$7215577-0$20101011$$20110104$DIR$$$67$YR$F$N$220$LBS$20110102$CN$$N$Y$UNITED STATES$",
file="deleteme.txt",
"\n")
Solution using gsub:
First read the file as text and then edit its content:
file_path <- "deleteme.txt"
fh <- file(file_path)
file_content <- readLines(fh)
close(fh)
Either add a $ at the end of header row:
file_content[1] <- paste0(file_content, "$")
Or remove $ from the end of all rows:
file_content <- gsub("\\$$", "", file_content)
Then we write the fixed file back to disk:
cat(paste0(file_content, collapse="\n"), file=paste0("fixed_", file_path), "\n")
Now we can read the file:
df <- read.table(paste0("fixed_", file_path), header=TRUE, sep="$", comment.char="", quote="", stringsAsFactors=FALSE)
And get the desired structure:
str(df)
'data.frame': 1 obs. of 23 variables:
$ ISR : int 7215577
$ CASE : int 8135839
$ I_F_COD : chr "I"
$ FOLL_SEQ : logi NA
$ IMAGE : chr "7215577-0"
$ EVENT_DT : int 20101011
$ MFR_DT : logi NA
$ FDA_DT : int 20110104
$ REPT_COD : chr "DIR"
$ MFR_NUM : logi NA
$ MFR_SNDR : logi NA
$ AGE : int 67
$ AGE_COD : chr "YR"
$ GNDR_COD : logi FALSE
$ E_SUB : chr "N"
$ WT : int 220
$ WT_COD : chr "LBS"
$ REPT_DT : int 20110102
$ OCCP_COD : chr "CN"
$ DEATH_DT : logi NA
$ TO_MFR : chr "N"
$ CONFID : chr "Y"
$ REPORTER_COUNTRY: chr "UNITED STATES "
I have the following problem according to different sources it should be able to read WFS layer in R using rgdal.
dsn<-"WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities"
ogrListLayers(dsn)
readOGR(dsn,"SIC")
The result of that code should be 1) to list the available WFS layer and 2) to read a specific Layer (SIC) into R as a Spatial(Points)DataFrame.
I tried several other WFS server but it does not work.
I always get the warning:
Cannot open data source
Checking for the WFS driver i get the following result:
> "WFS" %in% ogrDrivers()$name
[1] FALSE
Well it looks like the WFS driver is not implemented in rgdal (anymore?)
Or why are there so many examples "claiming" the opposite?
I also tried the gdalUtils package and well it works but It gives out the whole console message of ogrinfo.exe and not only the available layers.(I guess it "just" calls the ogrinfo.exe and sends the result back to R like using the r shell or system command).
Well does anyone know what I´m making wrong, or if something like that is even possible with rgdal or any similar package?
You can combine the two packages to accomplish your task.
First, convert the layer you need into a local shapefile using gdalUtils. Then, use rgdal as normal. NOTE: you'll see a warning message after the ogr2ogr call but it performed the conversion fine for me. Also, ogr2ogr won't overwrite local files without the overwrite parameter being TRUE (there are other parameters that may be of use as well).
library(gdalUtils)
library(rgdal)
dsn <- "WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities"
ogrinfo(dsn, so=TRUE)
## [1] "Had to open data source read only."
## [2] "INFO: Open of `WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities'"
## [3] " using driver `WFS' successful."
## [4] "1: AreeProtette"
## [5] "2: ZPS"
## [6] "3: SIC"
ogr2ogr(dsn, "sic.shp", "SIC")
sic <- readOGR("sic.shp", "sic", stringsAsFactors=FALSE)
## OGR data source with driver: ESRI Shapefile
## Source: "sic.shp", layer: "sic"
## with 128 features
## It has 23 fields
plot(sic)
str(sic#data)
## 'data.frame': 128 obs. of 23 variables:
## $ gml_id : chr "SIC.510" "SIC.472" "SIC.470" "SIC.508" ...
## $ objectid : chr "510" "472" "470" "508" ...
## $ inspire_id: chr NA NA NA NA ...
## $ codice : chr "IT1160026" "IT1160017" "IT1160018" "IT1160020" ...
## $ nome : chr "Faggete di Pamparato, Tana del Forno, Grotta delle Turbiglie e Grotte di Bossea" "Stazione di Linum narbonense" "Sorgenti del T.te Maira, Bosco di Saretto, Rocca Provenzale" "Bosco di Bagnasco" ...
## $ cod_tipo : chr "B" "B" "B" "B" ...
## $ tipo : chr "SIC" "SIC" "SIC" "SIC" ...
## $ cod_reg_bi: chr "1" "1" "1" "1" ...
## $ des_reg_bi: chr "Alpina" "Alpina" "Alpina" "Alpina" ...
## $ mese_istit: chr "11" "11" "11" "11" ...
## $ anno_istit: chr "1996" "1996" "1996" "1996" ...
## $ mese_ultmo: chr "2" NA NA NA ...
## $ anno_ultmo: chr "2002" NA NA NA ...
## $ sup_sito : chr "29396102.9972" "82819.1127" "7272687.002" "3797600.3563" ...
## $ perim_sito: chr "29261.8758" "1227.8846" "17650.289" "9081.4963" ...
## $ url1 : chr "http://gis.csi.it/parchi/schede/IT1160026.pdf" "http://gis.csi.it/parchi/schede/IT1160017.pdf" "http://gis.csi.it/parchi/schede/IT1160018.pdf" "http://gis.csi.it/parchi/schede/IT1160020.pdf" ...
## $ url2 : chr "http://gis.csi.it/parchi/carte/IT1160026.djvu" "http://gis.csi.it/parchi/carte/IT1160017.djvu" "http://gis.csi.it/parchi/carte/IT1160018.djvu" "http://gis.csi.it/parchi/carte/IT1160020.djvu" ...
## $ fk_ente : chr NA NA NA NA ...
## $ nome_ente : chr NA NA NA NA ...
## $ url3 : chr NA NA NA NA ...
## $ url4 : chr NA NA NA NA ...
## $ tipo_geome: chr "poligono" "poligono" "poligono" "poligono" ...
## $ schema : chr "Natura2000" "Natura2000" "Natura2000" "Natura2000" ...
Neither the questioner nor the answerer say how rgdal was installed. If it is a CRAN binary for Windows or OSX, it may well have a smaller set of drivers than an independent installation of GDAL underlying gdalUtils. Always state your platform, and whether rgdal was installed binary or from source, and always provide the output of the messages displayed as rgdal loads, as well as of sessionInfo() to show the platform on which you are running.
Given the possible difference in sets of drivers, the advice given seems reasonable.
I have read an ascii (.spe) file into R. This file contains one column of, mostly, integers. However R is interpreting these integers incorrectly, probably because I am not specifying the correct format or something like that. The file was generated in Ortec Maestro software. Here is the code:
library(SDMTools)
strontium<-read.table("C:/Users/Hal 2/Desktop/beta_spec/strontium 90 spectrum.spe",header=F,skip=2)
str_spc<-vector(mode="numeric")
for (i in 1:2037)
{
str_spc[i]<-as.numeric(strontium$V1[i+13])
}
Here, for example, strontium$V1[14] has the value 0, but R is interpreting it as a 10. I think I may have to convert the data to some other format, or something like that, but I'm not sure and I'm probably googling the wrong search terms.
Here are the first few lines from the file:
$SPEC_ID:
No sample description was entered.
$SPEC_REM:
DET# 1
DETDESC# MCB 129
AP# Maestro Version 6.08
$DATE_MEA:
10/14/2014 15:13:16
$MEAS_TIM:
1516 1540
$DATA:
0 2047
Here is a link to the file: https://www.dropbox.com/sh/y5x68jen487qnmt/AABBZyC6iXBY3e6XH0XZzc5ba?dl=0
Any help appreciated.
I saw someone had made a parser for SPE Spectra files in python and I can't let that stand without there being at least a minimally functioning R version, so here's one that parses some of the fields, but gets you your data:
library(stringr)
library(gdata)
library(lubridate)
read.spe <- function(file) {
tmp <- readLines(file)
tmp <- paste(tmp, collapse="\n")
records <- strsplit(tmp, "\\$")[[1]]
records <- records[records!=""]
spe <- list()
spe[["SPEC_ID"]] <- str_match(records[which(startsWith(records, "SPEC_ID"))],
"^SPEC_ID:[[:space:]]*([[:print:]]+)[[:space:]]+")[2]
spe[["SPEC_REM"]] <- strsplit(str_match(records[which(startsWith(records, "SPEC_REM"))],
"^SPEC_REM:[[:space:]]*(.*)")[2], "\n")
spe[["DATE_MEA"]] <- mdy_hms(str_match(records[which(startsWith(records, "DATE_MEA"))],
"^DATE_MEA:[[:space:]]*(.*)[[:space:]]$")[2])
spe[["MEAS_TIM"]] <- strsplit(str_match(records[which(startsWith(records, "MEAS_TIM"))],
"^MEAS_TIM:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ROI"]] <- str_match(records[which(startsWith(records, "ROI"))],
"^ROI:[[:space:]]*(.*)[[:space:]]$")[2]
spe[["PRESETS"]] <- strsplit(str_match(records[which(startsWith(records, "PRESETS"))],
"^PRESETS:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ENER_FIT"]] <- strsplit(str_match(records[which(startsWith(records, "ENER_FIT"))],
"^ENER_FIT:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["MCA_CAL"]] <- strsplit(str_match(records[which(startsWith(records, "MCA_CAL"))],
"^MCA_CAL:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SHAPE_CAL"]] <- str_match(records[which(startsWith(records, "SHAPE_CAL"))],
"^SHAPE_CAL:[[:space:]]*(.*)[[:space:]]*$")[2]
spe_dat <- strsplit(str_match(records[which(startsWith(records, "DATA"))],
"^DATA:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SPE_DAT"]] <- as.numeric(gsub("[[:space:]]", "", spe_dat)[-1])
return(spe)
}
dat <- read.spe("strontium 90 spectrum.Spe")
str(dat)
## List of 10
## $ SPEC_ID : chr "No sample description was entered."
## $ SPEC_REM :List of 1
## ..$ : chr [1:3] "DET# 1" "DETDESC# MCB 129" "AP# Maestro Version 6.08"
## $ DATE_MEA : POSIXct[1:1], format: "2014-10-14 15:13:16"
## $ MEAS_TIM : chr "1516 1540"
## $ ROI : chr "0"
## $ PRESETS : chr [1:3] "None" "0" "0"
## $ ENER_FIT : chr "0.000000 0.002529"
## $ MCA_CAL : chr [1:2] "3" "0.000000E+000 2.529013E-003 0.000000E+000 keV"
## $ SHAPE_CAL: chr "3\n3.100262E+001 0.000000E+000 0.000000E+000"
## $ SPE_DAT : num [1:2048] 0 0 0 0 0 0 0 0 0 0 ...
head(dat$SPE_DAT)
## [1] 0 0 0 0 0 0
It needs some polish and there's absolutely no error checking (i.e. for missing fields), but no time today to deal with that. I'll finish the parsing and make a minimal package wrapper for it over the next couple days.