Recoding Numeric Vector R - r

I have a numeric vector, let's say something like:
x <- rep(1:6, 300)
What I would like to do is recode the vector, in place such that 6=1,5=2,4=3,3=4,2=5,1=6. I don't want to create a factor out of it.
Everything I have tried so far gives me the wrong counts because of the order, ie:
x[x == 6] <- 1
x[x == 5] <- 2 ## Lines that follow where x[x == 2] removes 5's entirely from counts.
Note: I'm aware of the car package, but would prefer to use base R for this problem.

Construct a map between the old and new values, and subset with the old,
(6:1)[x]

Wouldn't something as simple as 7 - x give you what you are after?

See manual for car::recode. Otherwise, create variable y:
y <- numeric()
length(y) <- length(x)
y[x == 6] <- 1
y[x == 5] <- 2
## ad nauseam...
It's always considered a bad practice to recode variables in place, because if you mess things up, you're probably going to lose data. Be careful.

In your case, yes, just subtract. In general, match can be quite useful in cases like this. For example, suppose you wanted to recode the values in this x column to the values in the y column
> d <- data.frame(x=c(1,3,4,5 ,6),y=c(3,4,2.2,1,4.6))
> print(d, row.names=FALSE)
x y
1 3.0
3 4.0
4 2.2
5 1.0
6 4.6
Then this would recode the values in a to the new values.
> a <- c(3,4,6,1,5)
> d$y[match(a,d$x)]
[1] 4.0 2.2 4.6 3.0 1.0

rev(x) ... at least when the length is an even multiple of the sequence.

if you want to recode multiple variables you might take the following approach:
MapFunc = function(x) {
y = NULL;
if (x %in% c("1","2","3")) {y=100}
if (x %in% c("0","4")) {y=200}
if (x %in% c("5")) {y=100}
print(y)
}
MapFunc(x=1); MapFunc(x=0); #working ok for scalars
#
X = matrix( sample(0:5,25,replace=TRUE), nrow=5,ncol=5)
apply(X,c(1,2),MapFunc) #working ok for matrices...

Related

Filter R dataframe by another dataframe using a tolerance?

I have two data frames (A & B) of different lengths. For a given value in A, I want to know if there are values anywhere in B that are within a tolerance of +/- 0.3. It would also be useful to know the position of this value in B.
A<-c(1:10)
B<-c(2.2,15,1.8,4.9,20,14,8.2,33,9.8,41,16)
i.e for A[1] there is no value in B within the tolerance,
but for A[2], the values at B[1] and B[3] are within the tolerance,
and so on.
I have experimented with the near function in dplyr, however I can only seem to get it to compare on a row by row basis. Any help would be greatly appreciated!
We can use between
library(purrr)
map(A, ~ B[between(B, .x - 0.3, .x + 0.3)]) %>%
flatten_dbl
Here is one idea. result is a list. If there are no any match, the length of the element in the result is 0. Otherwise, result will document the index in B that matches the requirement.
A <- 1:10
B <- c(2.2,15,1.8,4.9,20,14,8.2,33,9.8,41,16)
difference <- list()
for (i in 1:length(A)){
difference[[i]] <- B - A[i]
}
result <- lapply(difference, function(x) which(x < 0.3 & x > -0.3))
In base R, we can use sapply :
unlist(sapply(A, function(x) B[B >= (x- 0.3) & B <= (x + 0.3)]))
#[1] 2.2 1.8 4.9 8.2 9.8

Store and use operators on command

How (and can) I use different operators on command in if and else function?
x <- as.numeric(c(1,1,4,5,6,7,8))
if(mean(x) < 3){operator.is <- <}else{operator.is <- >}
sub <- subset(x, x operator.is 2)
#expected results
sub
[1] 3 4 5 6 7 8
I want to store the operator in "operator.is", based on the if statement. Yet, I do not seem to be able to store an operator and use it in the subset function. Later in want to use this operator to subset. Without this I will need to copy and past the whole code just to use the other operator. Is there any elegant and simple way to solve this?
Thanks in advance
operators can be assigned with the % sign:
`%op%` = `>`
vector <- c(1:10)
vector2 <- subset(vector, vector %op% 5)
In your case:
x <- as.numeric(c(1,1,4,5,6,7,8))
if(mean(x) < 3){`%operator.is%` <- `<`}else{`%operator.is%` <- `>`}
sub <- subset(x, x %operator.is% 2)
x <- as.numeric(c(1,1,4,5,6,7,8))
if(mean(x) < 3){`%my_op%` <- `<`}else{`%my_op%` <- `>`}
sub <- subset(x, x %my_op% 2)
sub
##[1] 4 5 6 7 8
"Things to remember while defining your own infix operators are that they must start and end with %. Surround it with back tick (`) in the function definition and escape any special symbols."
from https://www.datamentor.io/r-programming/infix-operator/
better to follow the lead of #Oliver and just
x <- as.numeric(c(1,1,4,5,6,7,8))
if(mean(x) < 3){operator.is <- `<`}else{operator.is <- `>`}
sub <- subset(x, operator.is(x,2))
sub
##[1] 4 5 6 7 8

Can I further vectorize this function

I am relatively new to R, and matrix-based scripting languages in general. I have written this function to return the index's of each row which has a content similar to any another row's content. It is a primitive form of spam reduction that I am developing.
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
is there a way that I could write this to avoid the for loop entirely?
We can simplify the code somewhat using sapply.
# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
I.e. this returns the index positions of 'hello', 'hollow', 'turtle', and 'bottle' as being similar to another string
If you prefer, you can use colSums instead of rowSums to get a named vector, but this could be messy if the strings are long:
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5

How to properly avoid if-expressions by using vector indices?

x is a vector of integers ranging between 1 and 100
I created a function that determines in which category a number is:
x∈[1,20]: small
x∈[21,50]: med
x∈[51, 100]:large
Here the function:
x <- c(1:99)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=99] <-"large"
return(x)
}
vector.fun(89)
However as you can see, in the function my vector is 1:99 instead of 1:100, for some reason when i change it to:
x <- c(1:100)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=100] <-"large"
return(x)
}
vector.fun(100)
it doesn't recognise any number from the last line: x[x >=51 & x <=100] <-"large" and when it does it returns "med" instead of "large" as it should be.
what am I doing wrong? Which changes should I do in my function in order that 100 is included in the parameter and returns "large"?
It is indeed a coercion problem as mentioned in the comments above.
If you want to keep your function structure the way you created it, you can alter it as follows:
vector.fun<-function(y){
x <- y
x[y >= 1 & y <=20] <-"small"
x[y >= 21 & y <=50] <-"med"
x[y >=51 & y <=100] <-"large"
return(x)
}
Although the solution suggested by #alexis_laz is more concise and elegant:
vector.fun<-function(x){
cut(x, c(0,20,50,100), labels = c("small", "med", "large"))
}
Keep in mind, this second version will produce a factor type vector, while the first version will produce a character type vector.

How to apply a function to each element of a vector in R

Let's say I want to multiply each even element of a vector by 2 and each odd element of a vector by 3. Here is some code that can do this:
v <- 0:10
idx <- v %% 2 == 0
v[idx] <- v[idx] * 2
v[!idx] <- v[!idx] * 3
This would get difficult if I had more than two cases. It seems like the apply family of functions never deals with vectors so I don't know a better way to do this problem. Maybe using an apply function would work if I made transformations on the data, but it seems like that shouldn't be something that I would need to do to solve this simple problem.
Any ideas?
Edit: Sorry for the confusion. I am not specifically interested in the "%%" operator. I wanted to put some concrete code in my question, but, based on the responses to the question, was too specific. I wanted to figure out how to apply some arbitrary function to each member of the list. This was not possible with apply() and I thought sapply() only worked with lists.
You can do:
v <- v * c(2, 3)[v %% 2 + 1]
It is generalizable to any v %% n, e.g.:
v <- v * c(2, 3, 9, 1)[v %% 4 + 1]
Also it does not require that length(v) be a multiple of n.
You can use vector multiplication to do what you want:
tmp <- 1:10
tmp * rep(c(3,2), length(tmp)/2)
This is easy to extend to three or more cases:
tmp * rep(c(3,2,4), length(tmp)/3)
Easiest would be:
v*c(2,3) # as suggested by flodel in a comment.
The term to search for in the documentation is "argument recycling" ... a feature of the R language. Only works for dyadic infix functions (see ?Ops). For non-dyadcic vectorized functions that would not error out with some of the arguments and where you couldn't depend on the structure of "v" to be quite so regular, you could use ifelse:
ifelse( (1:length(v)) %% 2 == 0, func1(v), func2(v) )
This constructs two vectors and then chooses elements in the first or second based on the truth value of hte first argument. If you were trying to answer the question in the title of your posting then you should look at:
?sapply
Here is an answer allowing any set of arbitrary functions to be applied to defined groups within a vector.
# source data
test <- 1:9
# categorisations of source data
cattest <- rep(1:3,each=3)
#[1] 1 1 1 2 2 2 3 3 3
Make the function to differentially apply functions:
categ <- function(x,catg) {
mapply(
function(a,b) {
switch(b,
a * 2,
a * 3,
a / 2
)
},
x,
catg
)
}
# where cattest = 1, multiply by 2
# where cattest = 2, multiply by 3
# where cattest = 3, divide by 2
The result:
categ(test,cattest)
#[1] 2.0 4.0 6.0 12.0 15.0 18.0 3.5 4.0 4.5

Resources