R - low-level file IO

R - low-level file IO - r

I am trying to read in a file in "flexible data format" using R.
I got the number of bytes I should be reading in (counting from EOF, e.g., I should be reading EOF-32 to EOF bytes in as my data).
I am seeking the equivalences to the fseek and fread from MATLAB in R.

I think you would do better with a different approach (if I've got the right "flexible data format" file format here). You can deal with much of these (horrible) files with basic string functions in R:
library(stringr)
# read in fdf file
l <- readLines("http://rud.is/dl/Fe.fdf")
# some basic cleanup
l <- sub("#.*$", "", l) # remove comments
l <- sub("^=.*$", "", l) # remove comments
l <- gsub("\ +", " ", l) # compress spaces
l <- str_trim(l) # beg/end space trim
l <- grep("^$", l, value=TRUE, invert=TRUE) # ignore blank lines
# start of data blocks
blocks <- which(grepl("^%block", l))
# all "easy"/simple lines
simple <- str_split_fixed(grep("^[[:digit:]%]", l, value=TRUE, invert=TRUE),
"[[:space:]]+", 2)
# "simple" name/val [unit] conversions
convert_vals <- function(simple) {
vals <- simple[,2]
names(vals) <- simple[,1]
lapply(vals, function(v) {
# if logical
if (tolower(v) %in% c("t", "true", ".true.", "f", "false", ".false.")) {
return(as.logical(gsub("\\.", "", v)))
}
# if it's just a number
# i may be missing a numeric fmt char in this horrible format
if (grepl("^[[:digit:]\\.\\+\\-]+$", v)) {
return(as.numeric(v))
}
# if value and unit convert to an actual number with a unit attribute
# or convert it here from the table starting on line 927 of fdf.f
if (grepl("^[[:digit:]]", v) & (!any(is.na(str_locate(v, " "))))) {
vu <- str_split_fixed(v, " ", 2)
x <- as.numeric(vu[,1])
attr(x, "unit") <- vu[,2]
return(x)
}
# handle "1.d-3" and other vals with other if's
# anything not handled is returned
return(v)
})
}
# handle begin/end block "complex" data conversion
convert_blocks <- function(lines) {
block_names <- sub("^%block ", "", grep("^%block", lines, value=TRUE))
lapply(blocks, function(blk_start) {
blk <- lines[blk_start]
blk_info <- str_split_fixed(blk, " ", 2)
blk_end <- which(grepl(sprintf("^%%endblock %s", blk_info[,2]), lines))
# this is overly simplistic since you have to do some conversions, but you know the line
# range of the data values now so you can process them however you need to
read.table(text=lines[(blk_start+1):(blk_end-1)],
header=FALSE, stringsAsFactors=FALSE, fill=TRUE)
}) -> blks
names(blks) <- block_names
return(blks)
}
fdf <- c(convert_vals(simple),
convert_blocks(l))
str(fdf)
Output of the str:
List of 32
$ SystemName : chr "bcc Fe ferro GGA"
$ SystemLabel : chr "Fe"
$ WriteCoorStep : chr ""
$ WriteMullikenPop : num 1
$ NumberOfSpecies : num 1
$ NumberOfAtoms : num 1
$ PAO.EnergyShift : atomic [1:1] 50
..- attr(*, "unit")= chr "meV"
$ PAO.BasisSize : chr "DZP"
$ Fe : num 2
$ LatticeConstant : atomic [1:1] 2.87
..- attr(*, "unit")= chr "Ang"
$ KgridCutoff : atomic [1:1] 15
..- attr(*, "unit")= chr "Ang"
$ xc.functional : chr "GGA"
$ xc.authors : chr "PBE"
$ SpinPolarized : logi TRUE
$ MeshCutoff : atomic [1:1] 150
..- attr(*, "unit")= chr "Ry"
$ MaxSCFIterations : num 40
$ DM.MixingWeight : num 0.1
$ DM.Tolerance : chr "1.d-3"
$ DM.UseSaveDM : logi TRUE
$ DM.NumberPulay : num 3
$ SolutionMethod : chr "diagon"
$ ElectronicTemperature : atomic [1:1] 25
..- attr(*, "unit")= chr "meV"
$ MD.TypeOfRun : chr "cg"
$ MD.NumCGsteps : num 0
$ MD.MaxCGDispl : atomic [1:1] 0.1
..- attr(*, "unit")= chr "Ang"
$ MD.MaxForceTol : atomic [1:1] 0.04
..- attr(*, "unit")= chr "eV/Ang"
$ AtomicCoordinatesFormat : chr "Fractional"
$ ChemicalSpeciesLabel :'data.frame': 1 obs. of 3 variables:
..$ V1: int 1
..$ V2: int 26
..$ V3: chr "Fe"
$ PAO.Basis :'data.frame': 5 obs. of 3 variables:
..$ V1: chr [1:5] "Fe" "0" "6." "2" ...
..$ V2: num [1:5] 2 2 0 2 0
..$ V3: chr [1:5] "" "P" "" "" ...
$ LatticeVectors :'data.frame': 3 obs. of 3 variables:
..$ V1: num [1:3] 0.5 0.5 0.5
..$ V2: num [1:3] 0.5 -0.5 0.5
..$ V3: num [1:3] 0.5 0.5 -0.5
$ BandLines :'data.frame': 5 obs. of 5 variables:
..$ V1: int [1:5] 1 40 28 28 34
..$ V2: num [1:5] 0 2 1 0 1
..$ V3: num [1:5] 0 0 1 0 1
..$ V4: num [1:5] 0 0 0 0 1
..$ V5: chr [1:5] "\\Gamma" "H" "N" "\\Gamma" ...
$ AtomicCoordinatesAndAtomicSpecies:'data.frame': 1 obs. of 4 variables:
..$ V1: num 0
..$ V2: num 0
..$ V3: num 0
..$ V4: int 1
You can see the output (and the file and this code) in this gist since it's easier to copy/past/clone a gist.
You still need to:
deal with unit conversion (but with this grid::unit-like structure that shld be far more straightforward)
swap out the naive read.table with a better "block reader"
deal with file includes (pretty simple, tho, if you add a function or two)
With a bit of tweaking/polish this cld be a new R package, not that I'd ever want a data file in this format ever.

Related

Group a list of elements into nested list based on pattern

I have a list, List_A, which contains elements with a similar pattern. I am trying to nest the similar items based on the pattern. Example below
List_A <- list("Q_2020", "Q_2021", "C_2019", "C_2020", "K_2020")
Output:
# List of 5
# $ : chr "Q_2020"
# $ : chr "Q_2021"
# $ : chr "C_2019"
# $ : chr "C_2020"
# $ : chr "K_2020"
Desired Output:
# List of 3
# $ Q:List of 2
# ..$ : chr "Q_2020"
# ..$ : chr "Q_2021"
# $ C:List of 2
# ..$ : chr "C_2019"
# ..$ : chr "C_2020"
# $ K: chr "K_2020"
I know the pattern would be before "_" so I am using sub() to get the pattern - but I would like some input on how I can create the nested lists. My attempt is a bit too noisy: 1) extract pattern i.e. Q, K, C, 2) create nested list skeleton, 3) Loop an if statement.
Any input is appreciated.

We could split by substring
v1 <- unlist(List_A)
subv1 <- trimws(v1, whitespace = "_.*")
List_A_new <- lapply(split(v1, factor(subv1, levels = unique(subv1))), as.list)
-output structure
str(List_A_new)
List of 3
$ Q:List of 2
..$ : chr "Q_2020"
..$ : chr "Q_2021"
$ C:List of 2
..$ : chr "C_2019"
..$ : chr "C_2020"
$ K:List of 1
..$ : chr "K_2020"

Do you need this?
> str(split(List_A, factor(u <- substr(unlist(List_A), 1, 1), levels = unique(u))))
List of 3
$ Q:List of 2
..$ : chr "Q_2020"
..$ : chr "Q_2021"
$ C:List of 2
..$ : chr "C_2019"
..$ : chr "C_2020"
$ K:List of 1
..$ : chr "K_2020"

How can I pass a list of names to list.select?

Assume I want to use list.select function from rlist package to select two fields.
x <- list(p1 = list(type='A',score=list(c1=10,c2=8)),
p2 = list(type='B',score=list(c1=9,c2=9)),
p3 = list(type='B',score=list(c1=9,c2=7)))
rather than using this syntax:
list.select(x, type, score)
I want to use something list this, but it doesn't work:
param <- c("type", "score")
list.select(x, param)

Not sure how to do it using list.select, but here is a purrr solution:
library(purrr)
param <- c("type", "score")
map(x, `[`, param)
this obviously also works with lapply:
lapply(x, `[`, param)
but if you have a deeper nested list of lists, use modify_depth:
modify_depth(x, 1, `[`, param)
the .depth argument can be adjusted to go deeper down the hierarchy.
Output:
$p1
$p1$type
[1] "A"
$p1$score
$p1$score$c1
[1] 10
$p1$score$c2
[1] 8
$p2
$p2$type
[1] "B"
$p2$score
$p2$score$c1
[1] 9
$p2$score$c2
[1] 9
$p3
$p3$type
[1] "B"
$p3$score
$p3$score$c1
[1] 9
$p3$score$c2
[1] 7

This is a hackish way using eval(parse(.)) but the result is not identical to your solution. The pieces are there, though.
> str(list.select(x, do.call(c, sapply(param, FUN = function(x) eval(parse(text = x))))))
List of 3
$ p1:List of 1
..$ :List of 3
.. ..$ type : chr "A"
.. ..$ score.c1: num 10
.. ..$ score.c2: num 8
$ p2:List of 1
..$ :List of 3
.. ..$ type : chr "B"
.. ..$ score.c1: num 9
.. ..$ score.c2: num 9
$ p3:List of 1
..$ :List of 3
.. ..$ type : chr "B"
.. ..$ score.c1: num 9
.. ..$ score.c2: num 7
> str(list.select(x, type, score))
List of 3
$ p1:List of 2
..$ type : chr "A"
..$ score:List of 2
.. ..$ c1: num 10
.. ..$ c2: num 8
$ p2:List of 2
..$ type : chr "B"
..$ score:List of 2
.. ..$ c1: num 9
.. ..$ c2: num 9
$ p3:List of 2
..$ type : chr "B"
..$ score:List of 2
.. ..$ c1: num 9
.. ..$ c2: num 7

R - Register the Errors as a new Variable/Vector/Column

I am learning the basics of R and I am currently using tryCatch to continue a loop even when an error is encountered. It basically looks like this:
for (variableloop in (1:10000)){
tryCatch({
My function/ formula goes here
},error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
I was wondering if there is a command to save up the list of cases where the loop provided an error.
Thank you very much for your time.

What you want is to for each call to your function to return both the result and the error, where exactly one of the two is empty. Something like this (using base R):
# bigger loop than this ...
input <- 1:5
myfunc <- function(ign) if ( (x <- runif(1)) < 0.2) stop(paste0("some error: ", x)) else x
set.seed(2)
ret <- lapply(input, function(i) {
tryCatch(list(result = myfunc(i), error = NA),
error = function(e) list(result = NA, error = e))
})
str(ret)
# List of 5
# $ :List of 2
# ..$ result: logi NA
# ..$ error :List of 2
# .. ..$ message: chr "some error: 0.18488225992769"
# .. ..$ call : language myfunc(i)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ :List of 2
# ..$ result: num 0.702
# ..$ error : logi NA
# $ :List of 2
# ..$ result: num 0.573
# ..$ error : logi NA
# $ :List of 2
# ..$ result: logi NA
# ..$ error :List of 2
# .. ..$ message: chr "some error: 0.168051920365542"
# .. ..$ call : language myfunc(i)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ :List of 2
# ..$ result: num 0.944
# ..$ error : logi NA
You can access just the (possibly empty) errors with:
str(lapply(ret, `[[`, "error"))
# List of 5
# $ :List of 2
# ..$ message: chr "some error: 0.18488225992769"
# ..$ call : language myfunc(i)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : logi NA
# $ : logi NA
# $ :List of 2
# ..$ message: chr "some error: 0.168051920365542"
# ..$ call : language myfunc(i)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : logi NA
You can also use the purrr package:
set.seed(2)
ret <- lapply(input, function(i) {
purrr::safely(myfunc)(i)
})
str(lapply(ret, `[[`, "error"))
# List of 5
# $ :List of 2
# ..$ message: chr "some error: 0.18488225992769"
# ..$ call : language .f(...)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : NULL
# $ : NULL
# $ :List of 2
# ..$ message: chr "some error: 0.168051920365542"
# ..$ call : language .f(...)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : NULL

How extract complete dataset from Amelia package

In mice package for extract complete dataset you can use complete() command as follow :
install.packages("mice")
library ("mice")
imp1=mice(nhanes,10)
fill1=complete(imp,1)
fill2=complete(imp,2)
fillall=complete(imp,"long")
But can some one tell me how to extract complete dataset in Amelia package??
install.packages("Amelia")
library ("Amelia")
imp2= amelia(freetrade, m = 5, ts = "year", cs = "country")

The str() function is always helpful here. You'll see that the complete datasets are stored in the imputations element of the object returned by amelia():
> str(imp2, 1)
List of 12
$ imputations:List of 5
..- attr(*, "class")= chr [1:2] "mi" "list"
$ m : num 5
$ missMatrix : logi [1:171, 1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
$ overvalues : NULL
$ theta : num [1:9, 1:9, 1:5] -1 -0.0161 0.199 -0.0368 -0.0868 ...
$ mu : num [1:8, 1:5] -0.0161 0.199 -0.0368 -0.0868 -0.0658 ...
$ covMatrices: num [1:8, 1:8, 1:5] 0.8997 -0.3077 0.0926 0.2206 -0.1115 ...
$ code : num 1
$ message : chr "Normal EM convergence."
$ iterHist :List of 5
$ arguments :List of 23
..- attr(*, "class")= chr [1:2] "ameliaArgs" "list"
$ orig.vars : chr [1:10] "year" "country" "tariff" "polity" ...
- attr(*, "class")= chr "amelia"
To get each imputation alone, just do imp2$imputations[[1]], etc. up through all imputations that you requested. In your example, there are five:
> str(imp2$imputations, 1)
List of 5
$ imp1:'data.frame': 171 obs. of 10 variables:
$ imp2:'data.frame': 171 obs. of 10 variables:
$ imp3:'data.frame': 171 obs. of 10 variables:
$ imp4:'data.frame': 171 obs. of 10 variables:
$ imp5:'data.frame': 171 obs. of 10 variables:
- attr(*, "class")= chr [1:2] "mi" "list"

Rename subtitles of a list using a loop

I do have to rename sublist titles within a main matrix list called l1. Each Name(n) is related to a value as a character string. Here is my code :
names(l1)[1] <- Name1
names(l1)[2] <- Name2
names(l1)[3] <- Name3
names(l1)[4] <- Name4
## ...
names(l1)[43] <- Name43
As you can see, I have 43 sublists. Is there a way do do that using an automated loop like for (i in 1:43) or something ? I tried to perform a loop but I am a beginner and that's very hard for now.
Edit : I would like to rename the elements of my list without having to type 43 lines manually. Here is the first three elements of my list :
str(l1)
List of 43
$ XXX : num [1:640, 1:3] -0.83 -0.925 -0.623 -0.191 0.155 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "EV_BICYCLE" "HW_DISTANCE" "NO_ASSETS"
$ XXX : num [1:640, 1:2] -0.159 0.485 -0.686 -0.245 -3.361 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "HOME_OWN" "METRO_DISTANCE"
$ XXX : num [1:640, 1:3] -0.79 1.15 0.224 0.388 -1.571 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "BICYCLE" "HOME_OWN_SC" "POP_SC"
That is to say, I would like to replace the 43 XXX by Name1, Name2 ... to Name43

Try
names(l1) <- unlist(mget(ls(pattern="^Nom_F")))
str(l1, list.len=2)
#List of 3
# $ Accessibility : int [1:5, 1:5] 10 10 3 9 7 6 8 2 7 8 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
# $ Access : int [1:5, 1:5] 6 4 10 5 9 8 9 4 7 1 ...
#..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
Instead of creating separate objects, you could create a vector of real titles. For example
v1 <- LETTERS[1:3]
names(l1) <- v1
data
set.seed(42)
l1 <- setNames(lapply(1:3, function(x)
matrix(sample(1:10, 5*5, replace=TRUE), ncol=5,
dimnames=list(NULL, LETTERS[1:5]))), rep('XXX',3))
Nom_F1 <- "Accessibility"
Nom_F2 <- "Access"
Nom_F3 <- "Poverty_and_SC"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - low-level file IO - r

I am trying to read in a file in "flexible data format" using R. I got the number of bytes I should be reading in (counting from EOF, e.g., I should be reading EOF-32 to EOF bytes in as my data). I am seeking the equivalences to the fseek and fread from MATLAB in R.

Related

Group a list of elements into nested list based on pattern

How can I pass a list of names to list.select?

R - Register the Errors as a new Variable/Vector/Column

How extract complete dataset from Amelia package

Rename subtitles of a list using a loop

Categories

Resources