Replace some component value in a vector with some other value - r

In R, in a vector, i.e. a 1-dim matrix, I would like to change components with value 3 to with value 1, and components with value 4 with value 2. How shall I do that? Thanks!

The idiomatic r way is to use [<-, in the form
x[index] <- result
If you are dealing with integers / factors or character variables, then == will work reliably for the indexing,
x <- rep(1:5,3)
x[x==3] <- 1
x[x==4] <- 2
x
## [1] 1 2 1 2 5 1 2 1 2 5 1 2 1 2 5
The car has a useful function recode (which is a wrapper for [<-), that will let you combine all the recoding in a single call
eg
library(car)
x <- rep(1:5,3)
xr <- recode(x, '3=1; 4=2')
x
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
xr
## [1] 1 2 1 2 5 1 2 1 2 5 1 2 1 2 5
Thanks to #joran for mentioning mapvalues from the plyr package, another wrapper for [<-
x <- rep(1:5,3)
mapvalues(x, from = c(3,1), to = c(1,2))
plyr::revalue is a wrapper for mapvalues specifically factor or character variables.

Related

Reorder (collate) vector elements automatically

It's an easy one, but I can find a simple solution for my problem. I have several vectors look like this one: rep(1:3, each = 3) and I want to convert them to like rep(1:3, times = 3).
So each element is repeated multiple times c(1,1,1,2,2,2,3,3,3) and I want to reorder them to c(1,2,3,1,2,3,1,2,3). How can I achieve that?
You can use a matrix transpose:
as.vector(t(matrix(x, nrow = 3)))
# [1] 1 2 3 1 2 3 1 2 3
v1 <- c(1,1,1,2,2,2,3,3,3)
o1 <- rle(v1)
rep(o1$values, min(o1$length))
[1] 1 2 3 1 2 3 1 2 3
This allows for unknown amount of numbers or strings but expects each value to be present in equal numbers. It only has some flexibility on what you want to do on some values occuring more than others.
Consider:
v2 <- c(1,1,1,2,2,2,3,3,3,3)
o2 <- rle(v2)
rep(o2$values, min(o2$length))
[1] 1 2 3 1 2 3 1 2 3
rep(o2$values, max(o2$length))
[1] 1 2 3 1 2 3 1 2 3 1 2 3

how to compare and select the minimum of two features in R?

Assume i have the following dataset:
dt<-data.frame(X=sample(5),Y=sample(5))
now, i need to compare these two features and select the one which is smaller.
X Y
1 4 3
2 5 2
3 2 4
4 3 5
5 1 1
Then the expected answer would be
3
2
2
3
1
I know
min(dt[1,])
could be helpful but it only gives me 1
Use pmin, which is the vectorized version of min:
pmin(dt$X,dt$Y)
Like thus:
> dt<-data.frame(X=sample(5),Y=sample(5))
> dt
X Y
1 3 2
2 4 3
3 1 5
4 2 4
5 5 1
> pmin(dt$X,dt$Y)
[1] 2 3 1 2 1
high <- apply(dt[,c("X","Y")], 1, max)
is another implementation
integer(0) or length 0 element happens when one of X or Y is of length(0)
For min or max, a length-one vector. For pmin or pmax, a vector of length the longest of the input vectors, or length zero if one of the inputs had zero length.
(from documentation)
max(which(1:3 == 5),10) works but pmax(which(1:3 == 5),10) gives integer(0)

Using factors in R programming

If I have the code:
x <- c(rnorm(10),runif(10), rnorm(10,1))
f <- gl(3,10)
f
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
tapply(x,f,mean)
1 2 3
0.07368817 0.42992416 0.64212383
How are the 1,2,3's decided? I am assuming they are levels of something.
Furthermore, why is f used in the second argument, I dont see why it is an index and how does it know when to stop running through the index?.
I tried looking up the function definition but to no avail.
If you are asking about how tapply works (rather than gl) consider another simpler example:
> x1 <- c(1,1,2,2,3,3)
> tapply(x1, x1, mean)
1 2 3
1 2 3
> f2 <- c(2,2,2,2,3,3)
> tapply(x1, f2, mean)
2 3
1.5 3.0
In the first case, tapply has picked the first two items (indices), and found their mean
giving 1 for 1, then the next two items (2 and 2) having mean 2 etc.
In the second case, the first 4 items are treated as 2's, having mean (1+1+2+2)/4, and the last two and 3's having mean (3+3)/2
In effect, then "index" is labelling the data, and applying the requested function to each "group"

Create a vector listing run length of original vector with same length as original vector

This problem seems trivial but I'm at my wits end after hours of reading.
I need to generate a vector of the same length as the input vector that lists for each value of the input vector the total count for that value. So, by way of example, I would want to generate the last column of this dataframe:
> df
customer.id transaction.count total.transactions
1 1 1 4
2 1 2 4
3 1 3 4
4 1 4 4
5 2 1 2
6 2 2 2
7 3 1 3
8 3 2 3
9 3 3 3
10 4 1 1
I realise this could be done two ways, either by using run lengths of the first column, or grouping the second column using the first and applying a maximum.
I've tried both tapply:
> tapply(df$transaction.count, df$customer.id, max)
And rle:
> rle(df$customer.id)
But both return a vector of shorter length than the original:
[1] 4 2 3 1
Any help gratefully accepted!
You can do it without creating transaction counter with:
df$total.transactions <- with( df,
ave( transaction.count , customer.id , FUN=length) )
You can use rle with rep to get what you want:
x <- rep(1:4, 4:1)
> x
[1] 1 1 1 1 2 2 2 3 3 4
rep(rle(x)$lengths, rle(x)$lengths)
> rep(rle(x)$lengths, rle(x)$lengths)
[1] 4 4 4 4 3 3 3 2 2 1
For performance purposes, you could store the rle object separately so it is only called once.
Or as Karsten suggested with ddply from plyr:
require(plyr)
#Expects data.frame
dat <- data.frame(x = rep(1:4, 4:1))
ddply(dat, "x", transform, total = length(x))
You are probably looking for split-apply-combine approach; have a look at ddply in the plyr package or the split function in base R.

Create sequence of repeated values, in sequence?

I need a sequence of repeated numbers, i.e. 1 1 ... 1 2 2 ... 2 3 3 ... 3 etc. The way I implemented this was:
nyear <- 20
names <- c(rep(1,nyear),rep(2,nyear),rep(3,nyear),rep(4,nyear),
rep(5,nyear),rep(6,nyear),rep(7,nyear),rep(8,nyear))
which works, but is clumsy, and obviously doesn't scale well.
How do I repeat the N integers M times each in sequence?
I tried nesting seq() and rep() but that didn't quite do what I wanted.
I can obviously write a for-loop to do this, but there should be an intrinsic way to do this!
You missed the each= argument to rep():
R> n <- 3
R> rep(1:5, each=n)
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
R>
so your example can be done with a simple
R> rep(1:8, each=20)
Another base R option could be gl():
gl(5, 3)
Where the output is a factor:
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
Levels: 1 2 3 4 5
If integers are needed, you can convert it:
as.numeric(gl(5, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
For your example, Dirk's answer is perfect. If you instead had a data frame and wanted to add that sort of sequence as a column, you could also use group from groupdata2 (disclaimer: my package) to greedily divide the datapoints into groups.
# Attach groupdata2
library(groupdata2)
# Create a random data frame
df <- data.frame("x" = rnorm(27))
# Create groups with 5 members each (except last group)
group(df, n = 5, method = "greedy")
x .groups
<dbl> <fct>
1 0.891 1
2 -1.13 1
3 -0.500 1
4 -1.12 1
5 -0.0187 1
6 0.420 2
7 -0.449 2
8 0.365 2
9 0.526 2
10 0.466 2
# … with 17 more rows
There's a whole range of methods for creating this kind of grouping factor. E.g. by number of groups, a list of group sizes, or by having groups start when the value in some column differs from the value in the previous row (e.g. if a column is c("x","x","y","z","z") the grouping factor would be c(1,1,2,3,3).

Resources