Extracting ancillary date from array in R - r

I have an object called data (representing a field of wind strengths at geographical locations) obtained by using some code to read in a .grib file:
> data
ecmf : u-component of wind
Time:
2020/07/09 z00:00 0-0 h
Domain summary:
601 x 351 domain
Projection summary:
proj= latlong
NE = ( 50 , 75 )
SW = ( -10 , 40 )
Data summary:
-89.06099 -50.08242 -41.13694 -43.42623 -34.77617 -25.03278
data is 601 x 351 array of doubles:
> typeof(data)
[1] "double"
> is.array(data)
[1] TRUE
> dim(data)
[1] 601 351
but, as shown above, it also has extra information attached beyond the numerical values of the array elements (Time:, Projection summary etc). How do I extract these? Attempts such as data$time do not seem to work.

As suggested in the comments to the question, I was able to access the values I wanted using attributes(). attributes(data) returns a list of all the relevant elements.

Related

R: How to get column names for columns that contain a certain word AND their associated index number?

I want to create a list of column names that contain the word "arrest" AND their associated index number. I do not want all the columns, so I DO NOT want to subset the arrest columns into a new data frame. I merely want to see the list of names and their index numbers so I can delete the ones I don't want from the original data frame.
I tried getting the column names and their associated index numbers by using the below codes, but they only gave one or the other.
This gives me their names only
colnames(x2009_2014)[grepl("arrest",colnames(x2009_2014))]
[1] "poss_cannabis_tot_arrests" "poss_drug_total_tot_arrests"
[3] "poss_heroin_coke_tot_arrests" "poss_other_drug_tot_arrests"
[5] "poss_synth_narc_tot_arrests" "sale_cannabis_tot_arrests"
[7] "sale_drug_total_tot_arrests" "sale_heroin_coke_tot_arrests"
[9] "sale_other_drug_tot_arrests" "sale_synth_narc_tot_arrests"
[11] "total_drug_tot_arrests"
This gives me their index numbers only
grep("county", colnames(x2009_2014))
[1] 93 168 243 318 393 468 543 618 693 768 843
But I want their name AND index number so that it looks something like this
[93] "poss_cannabis_tot_arrests"
[168] "poss_drug_total_tot_arrests"
[243] "poss_heroin_coke_tot_arrests"
[318] "poss_other_drug_tot_arrests"
[393] "poss_synth_narc_tot_arrests"
[468] "sale_cannabis_tot_arrests"
[543] "sale_drug_total_tot_arrests"
[618] "sale_heroin_coke_tot_arrests"
[693] "sale_other_drug_tot_arrests"
[768] "sale_synth_narc_tot_arrests"
[843] "total_drug_tot_arrests"
Lastly, using advice here, I used the below code, but it did not work.
K=sapply(x2009_2014,function(x)any(grepl("arrest",x)))
which(K)
named integer(0)
The person who provided the advice in the above link used
K=sapply(df,function(x)any(grepl("\\D+",x)))
names (df)[K]
Zo.A Zo.B
Which (k)
Zo.A Zo.B
2 4
I'd prefer the list I showed in the third block of code, but the code this person used provides a structure I can work with. It just did not work for me when I tried using it.
Hacky as a one-liner because I really dislike use <- inside a function call, but this should work:
setNames(
nm = matches <- grep("arrest", colnames(x2009_2014)),
colnames(x2009_2014)[matches]
)
Reproducible example:
setNames(nm = x <- grep("b|c", letters), letters[x])
# 2 3
# "b" "c"
Or write your own function that does it. Here I put it in a data frame, which seems nicer than a named vector:
grep_ind_value = function(pattern, x, ...) {
index = grep(x, pattern, ...)
value = x[index]
data.frame(index, value)
}

Character conversion from raw in R giving unwanted result

I have a web response being returned in raw format which I'm unable to properly encode. It contains the following values:
ef bc 86
The character is meant to be a Fullwidth Ampersand (to illustrate below):
> as.character("\uFF06")
[1] "&"
> charToRaw("\uFF02")
[1] ef bc 82
However, no matter what I've tried it gets converted to ". To illustrate:
> rawToChar(charToRaw("\uFF02"))
[1] """
Because of the equivalence of the raw values, I don't think there's anything I can do in my web call to influence the problem I'm having (happy to be corrected). I believe I need to work out how to properly do the character encoding.
I also took an extreme approach of trying all other encodings as follows but none converted to the fullwidth ampersand:
> x_raw <- charToRaw("\uFF02")
> x_raw
[1] ef bc 82
> sapply(
+ stringi::stri_enc_list()
+ ,function(encoding) stringi::stri_encode(str = x_raw, encoding)
+ ) |> # R's new native pipe
+ tibble::enframe(name = "encoding")
# A tibble: 1,203 x 2
encoding value
<chr> <chr>
1 037 "Õ¯b"
2 273 "Õ¯b"
3 277 "Õ¯b"
4 278 "Õ¯b"
5 280 "Õ¯b"
6 284 "Õ¯b"
7 285 "Õ~b"
8 297 "Õ¯b"
9 420 "\u001a\u001ab"
10 424 "\u001a\u001ab"
# ... with 1,193 more rows
My work around at the moment is to replace the strings after the encoding, but this character is just one example of many, and hard-coding every instance doesn't seem practical.
> rawToChar(x_raw)
[1] """
> stringr::str_replace_all(rawToChar(x_raw), c(""" = "\uFF06"))
[1] "&"
The substitution workaround is also complicated that I've also got characters like the HYPHEN (not HYPEN-MINUS) somehow getting converted where the last to raw values are getting converted to a string with what appears to be octal values:
> as.character("\u2010") # HYPHEN
[1] "‐"
> as.character("\u2010") |> charToRaw() # As raw
[1] e2 80 90
> as.character("\u2010") |> charToRaw() |> rawToChar() # Converted back to string
[1] "â€\u0090"
> charToRaw("â\200\220") # string with equivalent raw
[1] e2 80 90
Any help appreciated.
I'm not totally clear on exactly what you are trying to do, but the problem with getting back your original character is that R cannot determine the encoding automatically from the raw bytes. I assume you are on Windows. If you do
val <- rawToChar(charToRaw("\uFF06"))
val
# [1] "&"
Encoding(val)
# [1] "unknown"
Encoding(val) <- "UTF-8"
val
# [1] "&"
Just make sure to set the encoding properly.

How to create specefic columns out of text in r

Here is just an example I hope you can help me with, given that the input is a line from a txt file, I want to transform it into a table (see output) and save it as a csv or tsv file.
I have tried with separate functions but could not get it right.
Input
"PR7 - Autres produits d'exploitation 6.9 371 667 1 389"
Desired output
Variable
note
2020
2019
2018
PR7 - Autres produits d'exploitation
6.9
371
667
1389
I'm assuming that this badly delimited data-set is the only place where you can read your data.
I created for the purpose of this answer an example file (that I called PR.txt) that contains only the two following lines.
PR6 - Blabla 10 156 3920 245
PR7 - Autres produits d'exploitation 6.9 371 667 1389
First I create a function to parse each line of this data-set. I'm assuming here that the original file does not contain the names of the columns. In reality, this is probably not the case. Thus this function that could be easily adapted to take a first "header" line into account.
readBadlyDelimitedData <- function(x) {
# Read the data
dat <- read.table(text = x)
# Get the type of each column
whatIsIt <- sapply(dat, typeof)
# Combine the columns that are of type "character"
variable <- paste(dat[whatIsIt == "character"], collapse = " ")
# Put everything in a data-frame
res <- data.frame(
variable = variable,
dat[, whatIsIt != "character"])
# Change the names
names(res)[-1] <- c("note", "Year2021", "Year2020", "Year2019")
return(res)
}
Note that I do not call the columns with the yearly figure by only "numeric" names because giving rows or columns purely "numerical" names is not a good practice in R.
Once I have this function, I can (l)apply it to each line of the data by combining it with readLines, and collapse all the lines with an rbind.
out <- do.call("rbind", lapply(readLines("tests/PR.txt"), readBadlyDelimitedData))
out
variable note Year2021
1 PR6 - Blabla 10.0 156
2 PR7 - Autres produits d'exploitation 6.9 371
Year2020 Year2019
1 3920 245
2 667 1389
Finally, I save the result with read.csv :
read.csv(out, file = "correctlyDelimitedFile.csv")
If you can get your hands on the Excel file, a simple gdata::read.xls or openxlsx::read.xlsx would be enough to read the data.
I wish I knew how to make the script simpler... maybe a tidyr magic person would have a more elegant solution?

R: Retrieving multiple variable of a nested list

I am looking at vote data and it's a nested list. I am trying to get multiple variable on each element of my list (example bellow )
So for each element "vote" i am trying to get the uid and the list of individual that vote for or against ("pours" and "contre" ) the law.
I try to simplify the original data ( can be found here )
This is the simplified list i came up with :
scrutin1_detail<-list(uid="VTANR5L14V1",organref="P0644420")
scrutin1_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin1_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_against<-list(scrutin1_vote1_against,scrutin1_vote2_against,scrutin1_vote3_against)
votant1<-list(pours=scrutin1_vote_for,contres=scrutin1_vote_against)
vote1<-list(decompte_nominatif=votant1)
ventilationVotes1<-list(vote=vote1)
scrutin1<-list(scrutin1_detail,list(ventilationVotes=ventilationVotes1))
# Scrutin 2
scrutin2_detail<-list(uid="VTANR5L14V5",organref="P0644423")
scrutin2_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin2_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_against<-list(scrutin2_vote1_against,scrutin2_vote2_against,scrutin2_vote3_against)
scrutin2_votant1<-list(pours=scrutin2_vote_for,contres=scrutin2_vote_against)
scrutin2_vote1<-list(decompte_nominatif=scrutin2_votant1)
scrutin2_ventilationVotes1<-list(vote=scrutin2_vote1)
scrutin2<-list(scrutin2_detail,list(ventilationVotes=scrutin2_ventilationVotes1))
scrutins<-list(scrutins=list(scrutin=list(scrutin1,scrutin2)))
So i am looking at the end ( but i am really interested to understand how to do it as i run into this problem quite often ) to build a dataframe with these column :
uid
For/against (if it was in the list "pour"(for) or "contre" (against)
-acteurref
-mandatref
Sadly I don't speak (or read French) and so am not able to make many correct guesses as to the meaning of names of items in the object constructed using alistaire's suggestion:
library(jsonlite)
scrutin1_detail <- fromJSON("~/Downloads/Scrutins_XIV.json")
> length(scrutin1_detail[[1]])
[1] 1
> length(scrutin1_detail[[1]][[1]])
[1] 18
> names(scrutin1_detail[[1]][[1]])
[1] "#xmlns:xsi" "uid"
[3] "numero" "organeRef"
[5] "legislature" "sessionRef"
[7] "seanceRef" "dateScrutin"
[9] "quantiemeJourSeance" "typeVote"
[11] "sort" "titre"
[13] "demandeur" "objet"
[15] "modePublicationDesVotes" "syntheseVote"
[17] "ventilationVotes" "miseAuPoint"
> str(scrutin1_detail[[1]][[1]]$uid)
chr [1:1219] "VTANR5L14V1" "VTANR5L14V2" "VTANR5L14V3" ...
> table( scrutin1_detail[[1]][[1]]$organeRef)
PO644420
1219
> table( scrutin1_detail[[1]][[1]]$sessionRef)
SCR5A2012E1 SCR5A2012E2 SCR5A2013E1 SCR5A2013E3 SCR5A2013O1 SCR5A2014E1
15 5 42 4 529 50
SCR5A2014E2 SCR5A2014O1 SCR5A2015E1 SCR5A2015E2 SCR5A2015O1 SCR5A2016O1
7 253 18 5 236 55
Maybe you should help us Anglophones to make sense of this. It is very beneficial to provide context rather than just code.

Create separate vectors for each of a data frame's columns (variables)

Goal: Take a data frame and create separate vectors for each of its columns (variables).
The following code gets me close:
batting <- read.csv("mlb_2014.csv", header = TRUE, sep = ",")
hr <- batting[(batting$HR >= 20 & batting$PA >= 100), ]
var_names <- colnames(hr)
for(i in var_names) {
path <- paste("hr$", i, sep = "")
assign(i, as.vector(path))
}
It creates the a vector for each column in the data frame as shown by the output below:
> ls()
[1] "AB" "Age" "BA" "batting" "BB" "CS"
[7] "G" "GDP" "H" "HBP" "hr" "HR"
[13] "i" "IBB" "Lg" "Name" "OBP" "OPS"
[19] "OPS." "PA" "path" "Pos.Summary" "R" "RBI"
[25] "SB" "SF" "SH" "SLG" "SO" "TB"
[31] "Tm" "var_names" "X2B" "X3B"
So far so good until you call one of the vectors. For example:
AB
[1] "hr$AB"
Alas, all that is created is a one element character vector. When what I want it to create is this...
> AB <- as.vector(hr$AB)
> AB
[1] 459 456 506 417 492 496 404 430 497 346 494 501 415 370 500 331 501 539 456 443 316 437
[23] 449 526 349 486 432 480 295 489 354 506 315 471
...for each variable in the original data frame.
How do I get R to recognize the elements in the character vector "path" as objects to call in the assign function, rather than an individual character element to assign to the vector I'm creating? I would like to keep this within the loop frame work, since the main motivation behind this project is teach my self how to use loops.
Thanks!
We have list2env for this:
list2env(iris, .GlobalEnv)
head(Species)
#[1] setosa setosa setosa setosa setosa setosa
#Levels: setosa versicolor virginica
However, there is almost never a reason to pollute your workspace like that.
Edit:
Here is how you can do this with a loop:
var_names <- colnames(iris)
for(i in var_names) {
assign(i, iris[[i]])
}
Note that instead of creating your paths I use [[ to access the data.frame columns. If you have a column name as a character vector, that (or [) is the way to use this character to access the column.
As #Roland mentions, you generally don't want to do that. Life is easier in the long run if you keep things together in lists, environments, or data frames.
A better approach is to learn how to use the with, within and related functions. These will temporarily attach a list, environment, or data frame to the beginning of the search path so that you can refer to the elements/columns directly by name:
> with(iris, head( Sepal.Width/Petal.Length ) )
[1] 2.500000 2.142857 2.461538 2.066667 2.571429 2.294118
These functions give you the convenience without polluting the global environment or search path.

Resources