How to get consistent output when checking for infinite values - r

# works fine
check = c(1,2,3,4, Inf)
out = check[-which(check == Inf)]
print(out)
# [1] 1 2 3 4
# does not work fine
check = c(1,2,3,4)
out = check[-which(check == Inf)]
print(out)
# numeric(0)
The first example creates an outvariable with the correct values 1,2,3,4. The second variable creates an empty variable out as the which function returns integer(0) and apparently remove integer(0) from the check vector gives 0 elements.
I know how to write this in several lines but is there a one-liner for this?

Try, is.finite():
# example 1
check <- c(1, 2, 3, 4, Inf)
out <- check[ is.finite(check) ]
out
# [1] 1 2 3 4
# example 2
check <- c(1, 2, 3, 4)
out <- check[ is.finite(check) ]
out
# [1] 1 2 3 4
Related post about: is.finite().

check = c(1,2,3,4)
out = check[!is.infinite(check)]
print(out)

Not sure whether this is technically a oneliner...
out = if (any(is.na(check))) {check[-which(is.na(check))]} else {check}

Related

Is there a R function equivalent to the subset operator `[ ]`, in order to slice by row index?

I know that [] is a function itself, but is there a function that does the following ?
vect = c(1, 5, 4)
# Slicing by row index with []
vect[2]
# [1] 5
# Does this kind of function exist ?
slicing_func(vect, 2)
# [1] 5
# And for dataframes ?
To understand the deeper meaning of "[] is actually a function" —
vect[2]
# [1] 5
is equivalent to:
`[`(vect, 2)
# [1] 5
Seems you have already used the function you are looking for.
Note, that it also works for data frames/matrices.
dat
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
`[`(dat, 2, 3)
# [1] 8
`[`(dat, 2, 3, drop=F) ## to get a data frame back
# X3
# 2 3
Data:
vect <- c(1, 5, 4)
dat <- data.frame(matrix(1:12, 3, 4))
You can use getElement function
vect = c(1, 5, 4)
getElement(vect, 2)
#> 5
Or you can use
vctrs::vec_slice(vect , 2)
#> 5
which works for slices and data.frames too.
For a data frame you can use slice:
library(dplyr)
vect = c(1, 5, 4)
vect %>% as.data.frame() %>% slice(2)
#> .
#> 1 5
nth(vect, 2)
#> [1] 5
Created on 2022-07-10 by the reprex package (v2.0.1)
slice according to documentation:
slice() lets you index rows by their (integer) locations. It allows
you to select, remove, and duplicate rows.
We could use pluck or chuck from purrr package:
pluck() and chuck() implement a generalised form of [[ that allow you to index deeply and flexibly into data structures. pluck() consistently returns NULL when an element does not exist, chuck() always throws an error in that case.
library(purrr)
pluck(vect, 2)
chuck(vect, 2)
> pluck(vect, 2)
[1] 5
> chuck(vect, 2)
[1] 5

Function that splits numeric vector in the natural sequences it contains

I have a vector as the following:
example <- c(1, 2, 3, 8, 10, 11)
And I am trying to write a function that returns an output as the one you would get from:
desired_output <- list(first_sequence = c(1, 2, 3),
second_sequence = 8,
third_sequence = c(10, 11)
)
Actually, what I want is to count how many sequences as of those there are in my vector, and the length of each one. It just happens that a list as the one in "desired_ouput" would be sufficient.
The finality is to construct another vector, let's call it "b", that contains the following:
b <- c(3, 3, 3, 1, 2, 2)
The real world problem behind this is to measure the height of 3d objects contained in a 3D pointcloud.
I've tried to program both a function that returns the list in "example_list" and a recursive function that directly outputs vector "b", succeeded at none.
Someone has any idea?
Thank you very much.
We can split to a list by creating a grouping by difference of adjacent elements
out <- split(example, cumsum(c(TRUE, abs(diff(example)) != 1)))
Then, we get the lengths and replicate
unname(rep(lengths(out), lengths(out)))
[1] 3 3 3 1 2 2
You could do:
out <- split(example, example - seq_along(example))
To get the lengths:
ln <- unname(lengths(out))
rep(ln, ln)
[1] 3 3 3 1 2 2
Here is one more. Not elegant but a different approach:
Create a dataframe of the example vector
Assign the elements to groups
aggregate with tapply
example_df <- data.frame(example = example)
example_df$group <- cumsum(ifelse(c(1, diff(example) - 1), 1, 0))
tapply(example_df$example, example_df$group, function(x) x)
$`1`
[1] 1 2 3
$`2`
[1] 8
$`3`
[1] 10 11
One other option is to use ave:
ave(example, cumsum(c(1, diff(example) != 1)), FUN = length)
# [1] 3 3 3 1 2 2
#or just
ave(example, example - seq(example), FUN = length)

Changing values of a dice roll

So let's say I roll 5 dice.
The code to simulate the rolls would be
Rolls<-sample(1:6, 5, replace=TRUE)
and that's if I want to store my rolls under the object Rolls.
Now let's say for some reason I don't want there to be more than 2 sixes. That means if I roll, for example, 6 3 5 6 6 1 would I be able to re-roll one of the 6 values into a new value so that there are only 2 values of 6 and 4 values that are not 6?
Any support would be appreciated.
Thanks in advance
A solution without loops could be:
condition = which(Rolls==6)
if(length(condition)>=3){
Rolls[condition[3:length(condition)]] = sample(1:5, length(condition)-2, replace=TRUE)
}
condition states the places in Rolls with 6's, if there's more than 2, you select the third one onward Rolls[condition[3:length(condition)]] and re-sample them.
And the second question could be something like:
remove = 3
Rolls = Rolls[-which(Rolls==remove)[1]]
You can easily put those into functions if you like
Edit 1
To make the second answer a bit more interactive, you can build a function for it:
remove.roll = function(remove, rolls){
rolls = rolls[-which(rolls==remove)[1]]}
And then the user can call the function with whatever remove he likes. You can also make a program that takes information from the prompt:
remove = readline(prompt="Enter number to remove: ")
print(Rolls = Rolls[-which(Rolls==remove)[1]])
if i understood it correctly, that should work:
n <- 10
(Rolls<-sample(1:6, n, replace=TRUE))
#> [1] 6 2 4 1 1 6 5 2 1 6
(Nr_of_six <- sum(6 == Rolls))
#> [1] 3
while (Nr_of_six > 1) {
extra_roll <- sample(1:6, 1, replace=TRUE)
second_six <- which(Rolls==6)[2]
Rolls[second_six] <- extra_roll
print(Rolls)
Nr_of_six <- sum(6 == Rolls)
}
#> [1] 6 2 4 1 1 4 5 2 1 6
#> [1] 6 2 4 1 1 4 5 2 1 3
print(Rolls)
#> [1] 6 2 4 1 1 4 5 2 1 3
Created on 2021-03-21 by the reprex package (v1.0.0)
We can make this a fun demonstration of a use case for scan(). You can input the position of the values that you want to replace. Note that you need to hand scan() each position value piece by piece and hit enter after every one, in the end you can end the input by handing over an empty string "" and pressing enter.
Code
dice.roll <- function(){
# Initial toss
Rolls <- sample(seq(1, 6), 5, replace=TRUE)
# Communicate
cat("The outcome of the dice roll was:", "\n\n", Rolls, "\n\n",
"would you like to reroll any of those values ?", "\n",
"If yes enter the positions of the values you would \n like to replace, else just input an empty string.")
# Take input
tmp1 <- scan(what = "")
# Replace
Rolls[as.numeric(tmp1)] <- sample(seq(1, 6), length(tmp1), replace=TRUE)
# Return
cat("You succesfully replaced", length(tmp1), "elements. Your rolls now look as follows: \n\n",
Rolls)
}
dice.roll()
# The outcome of the dice Roll was:
#
# 6 4 6 3 4
#
# would you like to reroll any of those values ?
# If yes enter the positions of the values you would
# like to replace, else just input an empty string.
# 1: 1
# 2: 3
# 3: ""
# Read 2 items
# You succesfully replaced 2 elements. Your set now looks as follows
#
# 2 4 2 3 4
Please note that this function is just a quick write-up to properly implement this you should use a while statement or recursion to repeat the replacement as often as you'd like. Additionally, before actually using this one would have to insert if statements that handle inputs that are too long and other user behavior that could cause an error.
Here is my version of this function that uses recursion to roll extra values so that we only have no more than 2 6s. Pay attention that I put rolls vector outside of the function so in order to replace third, fourth or ... 6 from inside the function we use complex assignment operator <<-.
I personally chose to modify the first 6 value in a run of 3 6s or more.
rolls <- sample(1:6, 6, replace = TRUE)
n_six <- function() {
n <- length(rolls[rolls == 6])
if(n <= 2) {
return(rolls)
} else {
extra <- sample(1:6, 1, replace = TRUE)
rolls[which(rolls == 6)][1] <<- extra
}
n_six()
}
# Imagine our rolls would be a vector with 3 six values like this
rolls <- c(1, 2, 6, 5, 6, 6)
> n_six()
[1] 1 2 3 5 6 6 # First 6 was replaced
# Or our rolls contains 4 six values
rolls <- c(1, 6, 6, 5, 6, 6)
> n_six()
[1] 1 4 1 5 6 6 # First 2 6s have been replaced
And so on ...

Difference between dataframe's $ and [] functions

How & Why, are a dataframe's $ and [] functions different when assigning values.
Can I tweak the abc.df[,"b"] = get("b") line to have same effect as abc.df$b = get("b")
abc.df = NULL
a = 1:10
abc.df = data.frame(a)
b_vector = 11:20
b_list = rep(list(c(1,2)),10)
sp_colmns1 = c("b_vector")
# This works :
abc.df$b_vector_method1 = get(sp_colmns1) # Method 1
abc.df[,"b_vector_method2"] = get(sp_colmns1) # Method 2
print(abc.df)
sp_colmns2 = c("b_list")
# Similarly :
# The same code as above, but does not work
# Only difference is b_list is a list
abc.df$b_list_method1 = get(sp_colmns2) # Method 1 (Works)
# TODO: Need to get the reason for & Solve the error on following line
# abc.df[,"b_list_method2"] = get(sp_colmns2) # Method 2 (Doesnt work)
print(abc.df)
You could add the list with any name "new" and change the column name in a second step with the string you saved somewhere else.
abc.df$new <- get(sp_colmns2)
names(abc.df)[which(names(abc.df) == "new")] <- "b_list_method2"
# > head(abc.df)
# a b_list_method2
# 1 1 1, 2
# 2 2 1, 2
# 3 3 1, 2
# 4 4 1, 2
# 5 5 1, 2
# 6 6 1, 2
After quite a lot of trial and error, this seems to work.
The solution turns out to be quite a simple one...
list(get(sp_colmns2)) instead of get(sp_colmns2)
abc.df = NULL
a = 1:10
abc.df = data.frame(a)
b_vector = 11:20
b_list = rep(list(c(1,2)),10)
sp_colmns1 = c("b_vector")
# This works :
abc.df$b_vector_method1 = get(sp_colmns1) # Method 1
abc.df[,"b_vector_method2"] = get(sp_colmns1) # Method 2
print(abc.df)
sp_colmns2 = c("b_list")
# Similarly :
# The same code as above, but does not work
# Only difference is b_list is a list
abc.df$b_list_method1 = get(sp_colmns2) # Method 1 (Works)
# TODO: Need to get the reason for & Solve the error on following line
abc.df[,"b_list_method2"] = list(get(sp_colmns2)) # Method 2 (Doesnt work)
print(abc.df)

count new elements in vector list

I want to count new elements that weren't present in previous years. In the example
Sample data:
var1 <- list('2003' = 1:3, '2004' = c(4:3), '2005' = c(6,4,1), '2006' = 1:4 )
I would like to get the output
newcount <- list('2003' = 0, '2004' = 1, '2005' = 1, '2006' = 0)
Unsuccessful code:
newcount <- mapply(setdiff, var1, seq_along(var1), function(i)
{if (i > 1) {Reduce(union, var1[i-1], accumulate=T)}}, length)
Almost there, but its better to use vector indexing to work with the offset and add the always-known initial element afterwards:
lapply(c(list(`2003`=integer(0)),
mapply(setdiff,var1[-1],
Reduce(union,var1,accumulate=TRUE)[-length(var1)])),length)
$`2003`
[1] 0
$`2004`
[1] 1
$`2005`
[1] 1
$`2006`
[1] 0
Assuming that var1 is sorted according to year, and that for 2003 you'd like 3 instead of 1, you could try
newcount <- lapply(seq_along(var1),function(x){
prev<-unlist(var1[seq_len(x-1)])
# Improvement suggested by plannapus
sum(!var1[[x]]%in%prev) # length(which(!var1[[x]]%in%prev))
})
names(newcount)<-names(var1)
newcount
# $`2003`
# [1] 3
# $`2004`
# [1] 1
# $`2005`
# [1] 1
# $`2006`
# [1] 0
OK, if you're absolutely sure that 2003 should be 0 (which I see as an exception to your logic), then you could do the following:
newcount <- c(0, lapply(seq_along(var1)[-1],function(x){
prev<-unlist(var1[seq_len(x-1)])
sum(!var1[[x]]%in%prev)
}))

Resources