Find the min and max from an unstructured array

Find the min and max from an unstructured array - r

I have the following vector and it shows the possible values that a variable can take. As you can see, it's not user-friendly and I'm having a hard time finding a systemic way of going through and identifying the min and max values. Does anyone have any suggestions?
[211] "-1\n1-960" "-1\n1-960"
[213] "-1\n1-960" "-1\n1\n2\n3"
[215] "-1\n0\n1\n\n2\n3\n\n4\n\n5" "-1\nF\nG\nH\nP\nR\nS\nU"
[217] "-1\n0\n1\n2\n3" "-1\n0\n1"
[219] "-1\n0\n1\n2\n3\n4\n5\n6" "-1\n0-255"
[221] "-1\n0-255" "-1\n0-255"
[223] "-1\n0-255" "-1\n0-255"
[225] "-1\n0\n0.01–0.99\n1\n1.01–99.99" "-1\n0\n1\n2\n3\n4\n5\n\n6\n\n7\n8\n\n9\n10\n11\n12"
[227] "-1\n0\n1\n\n2\n\n3\n4\n5\n\n6" "-1\n0\n1\n2\n\n3\n\n4\n5\n6"
The value "-1\n1-960" refers to the possible range of values being between 1 and 960. -1 doesn't mean anything and should be disregarded, along with all letters.
For example:
"-1\n1-960"
"-1\n0\n1\n\n2\n\n3\n4\n5\n\n6" "-1\n0\n1\n2\n\n3\n\n4\n5\n6"
Should result in:
max min
960 1
6 0
6 0

After removing the leading -1, you can split on newlines. Then, since a - means a range, you can also split on - characters, as the two numbers give the min and max of the range. So here's some code:
lapply(
strsplit(
gsub('^-1\n', '', dat),
'\n|-'
),
function(x) range(x)
)
[[1]]
[1] "1" "960"
[[2]]
[1] "1" "960"
[[3]]
[1] "1" "960"
[[4]]
[1] "1" "3"
[[5]]
[1] "" "5"
[[6]]
[1] "F" "U"
[[7]]
[1] "0" "3"
[[8]]
[1] "0" "1"
[[9]]
[1] "0" "6"
[[10]]
[1] "0" "255"
[[11]]
[1] "0" "255"
[[12]]
[1] "0" "255"
[[13]]
[1] "0" "255"
[[14]]
[1] "0" "255"
[[15]]
[1] "0" "1.01–99.99"
[[16]]
[1] "" "9"
[[17]]
[1] "" "6"
[[18]]
[1] "" "6"

Expanding my comment with additional code which might or might not be a partial answer:
I'm guessing that -255 is some sort of missing value marker. Some of those character values (at the moment) could be parsed in R as "numeric" values, but others would throw an error if you tried to parse as such. What were you expecting from 1-960. That's an expression, so neither numeric nor character.
dat <- c( "-1\n1-960" , "-1\n1-960",
"-1\n1-960" , "-1\n1\n2\n3" ,
"-1\n0\n1\n\n2\n3\n\n4\n\n5" , "-1\nF\nG\nH\nP\nR\nS\nU",
"-1\n0\n1\n2\n3" , "-1\n0\n1" ,
"-1\n0\n1\n2\n3\n4\n5\n6" , "-1\n0-255" ,
"-1\n0-255" , "-1\n0-255" ,
"-1\n0-255" , "-1\n0-255" ,
"-1\n0\n0.01–0.99\n1\n1.01–99.99" , "-1\n0\n1\n2\n3\n4\n5\n\n6\n\n7\n8\n\n9\n10\n11\n12" ,
"-1\n0\n1\n\n2\n\n3\n4\n5\n\n6" , "-1\n0\n1\n2\n\n3\n\n4\n5\n6" )
scandat <- sapply( dat, function(x) try( scan(textConnection(x)) ) )
# Lots of error messages but wrapping the scan call in try let's it continue
# So these are the items that could be parsed as numeric:
> scandat[ sapply(scandat,class)=="numeric" ]
$`-1\n1\n2\n3`
[1] -1 1 2 3
$`-1\n0\n1\n\n2\n3\n\n4\n\n5`
[1] -1 0 1 2 3 4 5
$`-1\n0\n1\n2\n3`
[1] -1 0 1 2 3
$`-1\n0\n1`
[1] -1 0 1
$`-1\n0\n1\n2\n3\n4\n5\n6`
[1] -1 0 1 2 3 4 5 6
$`-1\n0\n1\n2\n3\n4\n5\n\n6\n\n7\n8\n\n9\n10\n11\n12`
[1] -1 0 1 2 3 4 5 6 7 8 9 10 11 12
$`-1\n0\n1\n\n2\n\n3\n4\n5\n\n6`
[1] -1 0 1 2 3 4 5 6
$`-1\n0\n1\n2\n\n3\n\n4\n5\n6`
[1] -1 0 1 2 3 4 5 6
I'm not cleaning this up but you could replace the funky names with womething else and it would print better:
> sapply( scandat[ sapply(scandat,class)=="numeric" ], function(x) list(minx=min(x), maxx=max(x) )
+ )
-1\n1\n2\n3 -1\n0\n1\n\n2\n3\n\n4\n\n5 -1\n0\n1\n2\n3 -1\n0\n1 -1\n0\n1\n2\n3\n4\n5\n6
minx -1 -1 -1 -1 -1
maxx 3 5 3 1 6
-1\n0\n1\n2\n3\n4\n5\n\n6\n\n7\n8\n\n9\n10\n11\n12 -1\n0\n1\n\n2\n\n3\n4\n5\n\n6 -1\n0\n1\n2\n\n3\n\n4\n5\n6
minx -1 -1 -1
maxx 12 6 6

Related

How to get an element from a text string in R

> my_data <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25"
> lengths(gregexpr(",", my_data))+1
[1] 10
I need to get each element individually. I tried with
print(gregexpr(",", my_data))[[1]][1]
> print(gregexpr(",", my_data))[[1]][1]
[[1]]
[1] 3 6 17 19 21 33 44 48 51
attr(,"match.length")
[1] 1 1 1 1 1 1 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[1] 3
but my_data has the first element "08" but it displays 3.. anyone give me correct syntax to display every element.

library(tidyverse)
strings <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25" %>%
str_split(pattern = ",") %>%
unlist()
strings[1]
#> [1] "08"
Created on 2022-06-29 by the reprex package (v2.0.1)

Let's try scan
> scan(text = my_data, what = "",sep = ",",quiet = TRUE)
[1] "08" "23" "02.06.2022" "5" "7"
[6] "THISPRODUCT" "09.02.2022" "yes" "89" "25"

Using lapply:
lapply(strsplit(my_data, ","), `[`)
Output:
[[1]]
[1] "08" "23" "02.06.2022" "5" "7" "THISPRODUCT" "09.02.2022" "yes"
[9] "89" "25"

You can simply do:
unlist(strsplit(my_data, split = ","))

Replacing values in a list based on a condition

I have a list of values called squares and would like to replace all values which are 0 to a 40.
I tried:
replace(squares, squares==0, 40)
but the list remains unchanged

If it is a list, then loop through the list with lapply and use replace
squares <- lapply(squares, function(x) replace(x, x==0, 40))
squares
#[[1]]
#[1] 40 1 2 3 4 5
#[[2]]
#[1] 1 2 3 4 5 6
#[[3]]
#[1] 40 1 2 3
data
squares <- list(0:5, 1:6, 0:3)

I think for this purpose, you can just treat it as if it were a vector as follows:
squares=list(2,4,6,0,8,0,10,20)
squares[squares==0]=40
Output:
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 40
[[5]]
[1] 8
[[6]]
[1] 40
[[7]]
[1] 10
[[8]]
[1] 20

Getting all splits of numeric sequence in R

I'm trying to get all the possible splits of a sequence [1:n] in R. E.g.:
getSplits(0,3)
Should return all possible splits of the sequence 123, in other words (in a list of vectors):
[1] 1
[2] 1 2
[3] 1 2 3
[4] 1 3
[5] 2
[6] 2 3
[7] 3
Now I've created a function which does get to these vectors recursively, but having trouble combining them into one as above. My function is:
getSplits <- function(currentDigit, lastDigit, split) {
splits=list();
for (nextDigit in currentDigit: lastDigit)
{
currentSplit <- c(split, c(nextDigit));
print(currentSplit);
if(nextDigit < lastDigit) {
possibleSplits = c(list(currentSplit), getSplits(nextDigit+1, lastDigit, currentSplit));
}else{
possibleSplits = currentSplit;
}
splits <- c(splits, list(possibleSplits));
}
return(splits);
}
Where printing each currentSplit results in all the right vectors I need, but somehow the final returnt list (splits) nests them into deeper levels of lists, returning:
[1] 1
[[1]][[2]]
[[1]][[2]][[1]]
[1] 1 2
[[1]][[2]][[2]]
[1] 1 2 3
[[1]][[3]]
[1] 1 3
[[2]]
[[2]][[1]]
[1] 2
[[2]][[2]]
[1] 2 3
[[3]]
[1] 3
For the corresponding function call getSplits(1, 3, c()).
If anyone could help me out on getting this to work the way I described above, it'd be much appreciated!

character vector output
Try combn:
k <- 3
s <- unlist(lapply(1:k, combn, x = k, toString))
s
## [1] "1" "2" "3" "1, 2" "1, 3" "2, 3" "1, 2, 3"
data frame output
If you would prefer that the output be in the form of a data frame:
read.table(text = s, header = FALSE, sep = ",", fill = TRUE, col.names = 1:k)
giving:
X1 X2 X3
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 1 2 NA
5 1 3 NA
6 2 3 NA
7 1 2 3
list output
or a list:
lapply(s, function(x) scan(textConnection(x), quiet = TRUE, sep = ","))
giving:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 1 2
[[5]]
[1] 1 3
[[6]]
[1] 2 3
[[7]]
[1] 1 2 3
Update: Have incorporated improvement mentioned in comments as well as one further simplification and also added data frame and list output.

Here is another approach:
f <- function(nums) sapply(1:length(nums), function(x) t(combn(nums, m = x)))
f(1:3)
This yields
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 3
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 2 3

The OP is looking for the Power set of c(1,2,3). There are several packages that will quickly get you this in one line. Using the package rje, we have:
library(rje)
powerSet(c(1,2,3))
[[1]]
numeric(0)
[[2]]
[1] 1
[[3]]
[1] 2
[[4]]
[1] 1 2
[[5]]
[1] 3
[[6]]
[1] 1 3
[[7]]
[1] 2 3
[[8]]
[1] 1 2 3
... and with iterpc:
library(iterpc)
getall(iterpc(c(2,1,1,1), 3, labels = 0:3))
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 0 2
[3,] 0 0 3
[4,] 0 1 2
[5,] 0 1 3
[6,] 0 2 3
[7,] 1 2 3
More generally,
n <- 3
getall(iterpc(c(n-1,rep(1, n)), n, labels = 0:n)) ## same as above

convert character string into integer for modulo operation

I want to map md5 hashed character strings to weekday numbers (0-6) via modulo operation. Therefore I need to transform the character hashes into integers (numeric). I haven't found a way to output the hashes in byte form instead of ascii strings (via digest package). Any hints with base R or different approaches appreciated.

If you really want to do this, you'll require multiple-precision arithmetic, because a single md5 hash has 128 bits, which is too large to fit into a normal integer value. This can be done using the gmp package.
library('digest');
library('gmp');
as.integer(do.call(c,lapply(strsplit(sapply(letters,digest,'md5'),''), function(x) sum(as.bigz(match(x,c(0:9,letters[1:6]))-1)*as.bigz(16)^((length(x)-1):0)) ))%%7);
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
Let's break that down:
sapply(letters,digest,'md5')
## a b c ...
## "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1" ...
I wanted to design this algorithm to be vectorized, and decided to use the built-in letters vector as 26 arbitrary input values for demonstration purposes. Unfortunately the dream of a fully vectorized algorithm (i.e. with no hidden loops) was dashed right away, since digest() is not vectorized for some reason, which is why I had to use sapply() here to produce a vector of md5 hashes corresponding to the inputs.
strsplit(...,'')
## $a
## [1] "1" "2" "7" "a" "2" "e" "c" "0" "0" "9" "8" "9" "b" "9" "f" "7" "f" "a" "f" "6" "7" "1" "e" "d" "4" "7" "0" "b" "e" "7" "f" "8"
##
## $b
## [1] "d" "d" "f" "1" "0" "0" "6" "1" "2" "8" "0" "5" "3" "5" "9" "c" "d" "8" "1" "f" "d" "c" "5" "c" "e" "3" "b" "9" "f" "b" "b" "a"
##
## $c
## [1] "6" "e" "7" "a" "8" "c" "1" "c" "0" "9" "8" "e" "8" "8" "1" "7" "e" "3" "d" "f" "3" "f" "d" "1" "b" "2" "1" "1" "4" "9" "d" "1"
## ...
Splits the hashes into character vectors, each element being one hex digit of the hash. We now have a list of 26 character vectors.
lapply(..., function(x) ... )
Process each character vector one at a time. Diving into the function (example output will be given for the value of x corresponding to input string 'a'):
match(x,c(0:9,letters[1:6]))-1
## [1] 1 2 7 10 2 14 12 0 0 9 8 9 11 9 15 7 15 10 15 6 7 1 14 13 4 7 0 11 14 7 15 8
This returns the value of each digit as a plain old integer, by finding the index within the hex digit sequence (c(0:9,letters[1:6])) and subtracting one.
as.bigz(...)
## Big Integer ('bigz') object of length 32:
## [1] 1 2 7 10 2 14 12 0 0 9 8 9 11 9 15 7 15 10 15 6 7 1 14 13 4 7 0 11 14 7 15 8
Cast to big integer, required for the arithmetic we're about to do.
...*as.bigz(16)^((length(x)-1):0)
## Big Integer ('bigz') object of length 32:
## [1] 21267647932558653966460912964485513216 2658455991569831745807614120560689152 581537248155900694395415588872650752 51922968585348276285304963292200960 649037107316853453566312041152512
## [6] 283953734451123385935261518004224 15211807202738752817960438464512 0 0 2785365088392105618523029504
## [11] 154742504910672534362390528 10880332376531662572355584 831136500985057557610496 42501298345826806923264 4427218577690292387840
## [16] 129127208515966861312 17293822569102704640 720575940379279360 67553994410557440 1688849860263936
## [21] 123145302310912 1099511627776 962072674304 55834574848 1073741824
## [26] 117440512 0 720896 57344 1792
## [31] 240 8
Treating the hash as a big-endian hex number, multiply each digit value by its place value.
sum(...)
## Big Integer ('bigz') :
## [1] 24560512346470571536449760694956189688
Add up each place-value-weighted digit value to get the bigz representation of the hash.
This completes the lapply() function. Thus, coming out of the lapply() call is a list of bigz values corresponding to the hashes:
lapply(..., function(x) ... )
## $a
## Big Integer ('bigz') :
## [1] 24560512346470571536449760694956189688
##
## $b
## Big Integer ('bigz') :
## [1] 295010738308890763454498908323798711226
##
## $c
## Big Integer ('bigz') :
## [1] 146851381511772731860674382282097773009
## ...
do.call(c,...)
## Big Integer ('bigz') object of length 26:
## [1] 24560512346470571536449760694956189688 295010738308890763454498908323798711226 146851381511772731860674382282097773009 277896596675540352347406615789605003835 196274166648971101707441276945175337351
## [6] 152164057440943545205375583549802787690 177176961461451259509149953911555923867 104722841650969351697149582356678916643 338417919426764038104581950237023359466 337938589168387959049175020406476846763
## [11] 182882473465429367490220828342074920857 80661780033646501757972845962914093977 251563583963884775614900275564391350478 279860001817578054753205218523665183571 158142488666995307556311659134646734337
## [16] 116423801372716526262639744414150237351 97172586736798383425273805088952414146 316382305028166656556246910315962582893 245775506345085992020540282526076959865 96713787940004003047734284080139522561
## [21] 227309401343419671779216095382349119699 250431221767618781785406207793096585421 33680856367414392588062933086110875192 119974848773126933055729663395967301868 296965764652868210844163281547943654188
## [26] 118199003122415992890118393158735259681
This "unlists" the list. Note: I tried sapply() instead of lapply(), and alternatively unlist(), and neither worked. This is probably related to the bigz class, possibly to the fact that a vector of bigz values is actually weirdly encoded as a single vector of raw.
...%%7
## Big Integer ('bigz') object of length 26:
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
And finally we can take the modulus on 7.
as.integer(...)
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
Last step is to convert back to plain old integer from bigz.

Multiple operators in a string

I have some operators in a list
[[1]]
[1] "*"
[[2]]
[1] "-"
[[3]]
[1] "+"
[[4]]
[1] "/"
[[5]]
[1] "^"
I wanted to do the operations between two two datasets of same dimensions. For example, dataset1*dataset2, dataset1-dataset2, etc. Is it possible using the strings in list?

Yes, here is one example:
ops <- list("+", "-")
x <- y <- 1:10
lapply(ops, function(op) eval(parse(text = paste0("x", op, "y"))))
# [[1]]
# [1] 2 4 6 8 10 12 14 16 18 20
#
# [[2]]
# [1] 0 0 0 0 0 0 0 0 0 0

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Find the min and max from an unstructured array - r

Related

How to get an element from a text string in R

Replacing values in a list based on a condition

Getting all splits of numeric sequence in R

convert character string into integer for modulo operation

Multiple operators in a string

Categories

Resources