Recoding Number to String R - r

I am new to R and I am trying to recode a numeric variable
which is 1,2,3 to string. I have seen how to do it but I do not know why mine
is not working, maybe it is because it should be from string to number?
This is what I got, and thanks in advance!
cars$origin = as.factor(cars$origin)
cars$origin
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 2 2 2 2 2 1 1 1 1 1 3 1 3 1 1
[35] 1 1 1 1 1 1 1 1 2 2 2 3 3 2 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1
[69] 2 2 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 2 1 3 1 1 1
Levels: 1 2 3
cars$origin <- recode(cars$origin, "1='american';2='european';3='japan'")
Error: Argument 2 must be named, not unnamed

Function factor has argument labels for that:
cars$origin = factor(cars$origin,
levels = c(1, 2, 3),
labels = c("american", "european", "japan"))

Related

How to find the streaks of a particular value in R?

The rle() function returns a list with values and lengths. I have not found a way to subset the output to isolate the streaks of a particular value that does not involve calling rle() twice, or saving the output into an object to later subset (an added step).
For instance, for runs of heads (1's) in a series of fair coin tosses:
s <- sample(c(0,1),100,T)
rle(s)
Run Length Encoding
lengths: int [1:55] 1 2 1 2 1 2 1 2 2 1 ...
values : num [1:55] 0 1 0 1 0 1 0 1 0 1 ...
# Double-call:
rle(s)[[1]][rle(s)[[2]]==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
# Adding an intermediate step:
> r <- rle(s)
> r$lengths[r$values==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
I see that a very easy way of getting the streak lengths just for 1 is to simply tweak the rle() code (answer), but there may be an even simpler way.
in Base R:
with(rle(s), lengths[values==1])
[1] 1 3 2 2 1 1 1 3 2 1 1 3 1 1 1 1 1 2 3 1 2 1 3 3 1 2 1 1 2
For a sequence of outcomes s and when interested solely the lengths of the streaks on outcome oc:
sk = function(s,oc){
n = length(s)
y <- s[-1L] != s[-n]
i <- c(which(y), n)
diff(c(0L, i))[s[i]==oc]
}
So to get the lengths for 1:
sk(s,1)
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
and likewise for 0:
sk(s,0)
[1] 1 1 1 1 2 2 2 2 4 1 1 2 1 1 1 1 1 1 3 1 1 2 6 2 1 1 4 4

Error using function rle()

I have a dataframe in R called pxlast, for example to access to the 5 column I use pxlast[[5]].
[1] 259.55 259.55 265.21 269.40 278.23 283.63 288.51 289.84 284.83 280.51 289.76 289.38 294.10 -1.00 -1.00 -1.00
[17] 300.30 303.86 311.65 303.29 296.44 295.13 297.22 294.60 299.65 290.23 295.80 -1.00 -1.00 -1.00 298.56 299.25
[33] 287.37 290.06 281.71 287.66 290.16 280.31 281.51 293.69 292.25 293.73 294.60 291.36 283.81 288.65 288.29 -1.00
[49] -1.00 -1.00 293.25 293.54 277.41 268.08 267.01 270.63 267.25 254.73 266.59 266.73 278.34 282.03 289.63 282.40
[65] 289.59 289.54 291.31 290.85 295.60 290.72 288.25 288.00 293.98 297.11 290.00 278.35 270.61 274.89 267.80 276.32
[81] 279.05 289.07 285.87 293.36 293.18 294.76 295.77 296.35 290.23 297.61 296.93 293.31 290.06 289.98 287.29 282.07
[97] 275.89 270.92 273.68 270.85 280.05 279.64 284.83 288.91 294.85 296.91 297.94 301.66 303.05 298.72 303.46 298.22
[113] 304.92 309.59 316.07 318.05 318.86 318.09 317.84 318.04 337.08 346.89 345.36 350.96 354.65 361.06 354.53 352.63
[129] 352.83 351.45 351.38 361.47 365.13 367.11 371.42 364.37 368.83 372.12 375.10 381.97 384.47 388.67 388.61 386.73
[145] 392.16 388.55 383.86 389.50 379.83 381.37 392.27 387.79 388.61 388.01 394.23 401.78 414.70 421.23 427.77 436.23
[161] 423.86 398.80 419.00 413.60 400.77 416.78 412.58 405.90 404.30 405.65 NA
As you can see there are repated values for example -1 values.
I want to return the values and indexes which are repeated more than X times, for example the values that are repeated more than 3 times.
This is my code for doing that.
runs = rle(pxlast[[5]])
pxlast[[5]][runs$lengths > 2]
The result is:
[1] 294.10 299.65 294.60
This result should be the first repeated element from my vector, as you can see the values are incorrect.
Why?
I have been testing and rle function is returning on my runs variable the following.
[1] 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[59] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[117] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
As you can see the function groups that values that are the same, so for example the first "2 value" that appears means that the 2 first numbers are the same, that is to say this vector is grouping if the number are the same, so I can't use it on my vector to return my repeated values because it doesn't match which the total amount of indixes.
If it were in the following way , for example to the 25 first lines, I could use it.
[1] 2 2 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
Because you keep the total sum of the indices.
Any idea to solve it?
If we need to extract the values based on the rle index
runs <- within.list(rle(pxlast[[5]]), {
i1 <- lengths > 2
values <- values[i1]
lengths <- lengths[i1]})
inverse.rle(runs)
Using a reproducible example
v1 <- c(2, 2, 1, 3, 3, 3, 2, 4, 4, 4, 5)
runs <- within.list(rle(v1), {i1 <- lengths > 2
values <- values[i1]
lengths <- lengths[i1]})
inverse.rle(runs)
#[1] 3 3 3 4 4 4
This is a possible way:
df<-data.frame(lengths=as.numeric(runs$lengths),values=as.numeric(runs$values))
df[df[,"lengths"]>2,]
lengths values
13 3 -1
25 3 -1
43 3 -1

number of occurrences by lines R

I have this array:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[75] 1 1 2 1 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
[112] 2 1 1 2 2 2 2 2 2 1 2 1 1 2 1 1 2 1 1 2 1 1 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2
And I want to count the number of occurrences of '1' and '2'. From [1] to [70] and from [71] to the end.
I tried :
sum(x==1)
But this for all.How can I select lines?
the function sum {base} should return the sum of all the values present in its arguments
you could define the arguments the following way:
with x[a:b] you can set boundaries (for example a=1 and b=10, will set the area from [1] to[10]);
with the operator == you can check if one specific value c is present between your boundaries ... e.g.: x[a:b]==c
if you want to look for more than one value ( for example c & d , where c==1 and d==2 , you can (for example) use a simple addition to sum up your results:
Now you can just say: sum(x[a:b]==c) + sum(x[a:b]==c)
Where a&b are your boundaries and c&d are the values you want to compare.

What does this R expression do?

sp_full_in is matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 0 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 2
2 1 0 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
3 2 2 0 2 2 2 2 2 2 1 1 2 2 2 1 2 1 1 1 2 1
4 1 2 1 0 2 2 2 1 2 1 1 1 2 2 1 2 1 1 2 2 1
5 2 2 2 2 0 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 2
6 2 1 1 1 1 0 1 1 1 2 2 2 2 2 1 2 1 2 2 1 1
7 2 1 1 2 1 1 0 1 1 2 1 1 2 1 1 2 1 1 1 2 1
8 1 2 1 1 1 2 2 0 1 1 1 2 2 2 1 2 1 1 2 1 1
9 2 2 1 2 1 1 2 2 0 1 1 2 1 2 1 2 1 1 2 2 2
10 2 2 1 1 1 2 2 1 1 0 2 2 2 2 1 1 1 1 1 2 2
11 2 2 1 1 1 2 1 1 1 1 0 2 1 2 1 2 1 1 1 1 2
12 1 2 1 1 2 1 1 2 1 1 1 0 2 2 1 2 1 2 1 1 1
13 2 2 2 2 1 3 2 2 2 1 1 3 0 2 1 2 2 1 2 2 2
14 2 2 1 2 1 2 1 2 1 2 2 2 1 0 1 2 1 1 1 1 1
15 2 2 2 2 2 2 2 2 2 1 1 2 2 1 0 2 1 1 1 1 2
16 1 2 2 1 1 2 2 2 1 1 2 2 2 2 1 0 1 1 2 1 2
17 2 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 0 2 2 1 1
18 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 0 1 1 1
19 2 2 1 2 1 2 2 2 2 1 1 2 2 2 1 2 1 1 0 2 2
20 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 1 1 0 1
21 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 0
mean(sp_full_in[which(sp_full_in != Inf)])
produces the result [1] 1.38322
I'm not quite sure I understand what this does, but the way I read it is: for every cell in sp_full_in, check if it is not infinite, if so, return the output 1, then average all the outputs. Is that correct? If not, how should it be ready?
which(sp_full_in != Inf) returns a vector of integers (and only one of them is 1). That vector of integers is then handed to "[" as indices into sp_full_in and returns all the values of sp_full_in as a vector passed to the mean function.
It is a good idea to learn to read R expressions from the "inside out". Find the innermost function call and mentally evaluate it, in this case sp_full_in != Inf,. That returns a logical matrix of all TRUE's that gets passed to which(), and since there is no 'arr.ind' argument, it returns an atomic vector of indices.
The other answers are good at explaining why you get the mean of all the finite entries in the matrix, but it's worth noting that in this case the which does nothing. I used to have the bad habit of over-using which as well.
> a <- matrix(rnorm(4), nrow = 2)
> a
[,1] [,2]
[1,] 0.5049551 -0.7844590
[2,] -1.7170087 -0.8509076
> a[which(a != Inf)]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[a != Inf]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[1] <- Inf
> a
[,1] [,2]
[1,] Inf -0.7844590
[2,] -1.717009 -0.8509076
> a[which(a != Inf)]
[1] -1.7170087 -0.7844590 -0.8509076
## Similarly if there was an Infinite value
> a[a != Inf]
[1] -1.7170087 -0.7844590 -0.8509076
And, while we're at it, we should also mention the function is.finite which is often preferable to != Inf. is.finite will return FALSE on Inf, -Inf, NA and NaN.
No, but you are close, when which is applied to a matrix, it checks every cell of the matrix against the condition,here it is Not Inf. Return the indices of all cells satisfying the conditions,then, according to your code, output the value of the cell according to the returned indices and finally calculate mean of those.

Off-diagonal and Diagonal symmetry check, Getting off-diagonal and diagonal element(s) without repetition of a Matrix

Suppose I have this matrix
8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3
3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1
1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1
1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2
2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2
2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1
1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1
1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1
1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1
1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1
1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1
1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2
2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2
2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1
1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1
1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3
3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8
I want to check
Off-diagonals are symmetric or not?(in above matrix, these are symmetric)
Elements occur in Off-diagonal (without repetition)?-- in above matrix, these elements are 1,2,3
Elements in diagonal are symmetric? if yes print element? (like 8 in above matrix)
# 1
all(mat == t(mat))
[1] TRUE
# 2
unique(mat[upper.tri(mat) | lower.tri(mat)])
[1] 3 1 2
# 3
if(length(unique(diag(mat))) == 1) print(diag(mat)[1])
[1] 8
mat <- as.matrix(read.table('abbas.txt'))
isSymmetric(unname(mat))
'Note that a matrix is only symmetric if its 'rownames' and 'colnames' are identical.'
unique(mat[lower.tri(mat)])
all(diag(mat) == rev(diag(mat)))
# I assume you mean the diagonal is symmetric when its reverse is the same with itself.

Resources