Replacing Values in R - Error Received - r

So I have a data frame (called gen) filled with nucleotide information: each value is either A, C, G, or T. I am looking to replace A with 1, C with 2, G with 3, and T with 4. When I use the function gen[gen==A] = 1, I get the error:
Error in [<-.data.frame(*tmp*, gen == A, value = 1) :
object 'A' not found
I even tried using gen <- replace(gen, gen == A, 1), but it gives me the same error. Does anyone know how to fix this error? If not, is there a package that I can install in R with a program that will convert A, C, G, and T to numeric values?
Thanks

You need to wrap A in quotes or else R looks for a variable named A.
If the columns are character vectors:
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE), stringsAsFactors = FALSE)
R> gen[gen == "A"] = 1
R> gen
x y
1 1 1
2 C C
3 G T
4 T T
5 G G
6 G G
7 1 1
8 C C
9 T 1
10 1 1
also 1 way to do all at once
R> library(car)
R> sapply(gen, recode, recodes = "'A'=1; 'C'=2; 'G'=3; 'T'=4")
x y
[1,] 1 1
[2,] 2 2
[3,] 3 4
[4,] 4 4
[5,] 3 3
[6,] 3 3
[7,] 1 1
[8,] 2 2
[9,] 4 1
[10,] 1 1
If the columns are factors
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE))
R> sapply(gen, as.numeric)
x y
[1,] 1 1
[2,] 2 4
[3,] 1 2
[4,] 4 1
[5,] 2 2
[6,] 1 4
[7,] 4 3
[8,] 3 3
[9,] 2 4
[10,] 4 2

Related

Referring to multiple columns by name

My question is a variation of this question. What I want is to add a prefix to a vector or column names (which is a subset of all column names). I tried to expand the solution from the link to more columns as follows, but got stuck.
Data:
m2 <- cbind(1,1:4,4:1)
colnames(m2) <- c("x","y","z")
x y z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1
colnames(m2)[colnames(m2) == c("x","z")] <- paste("Sub", colnames(m2)[colnames(m2) == c("x","z")], sep = "_")
Warning messages:
1: In colnames(m2) == c("x", "z") :
longer object length is not a multiple of shorter object length
2: In colnames(m2) == c("x", "z") :
longer object length is not a multiple of shorter object length
m2
Sub_x y z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1
The code gives two warnings and only changes one column.
Desired output:
m2 <- cbind(1,1:4,4:1)
colnames(m2) <- c("x","y","z")
colnames(m2)[1] <- paste("Sub", colnames(m2)[1], sep = "_")
colnames(m2)[3] <- paste("Sub", colnames(m2)[3], sep = "_")
m2
Sub_x y Sub_z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1
Alternative solution using dplyr
library(dplyr)
m2 %>%
as_tibble() %>%
rename_with(.cols = c("x", "z"), ~ stringr::str_c("Sub_", .))

Creating a new vector based on pair of levels in R or Excel

Suppose I have an existing array that looks like
[,1][,2]
[1,] a b
[2,] a b
[3,] a b
[4,] a c
[5,] a c
[6,] b c
[7,] b c
[8,] b a
[9,] a b
I wish to create a vector of numbers that gives an unique code to each pair. The code is not important, as long as it is unique for each pair, its okay.
For example 2 valid codes are
[1] 1 1 1 2 2 3 3 4 1
[1] 14 14 14 15 15 8 8 67 14
The data in which I am working on contains names as first column and dates (10-May-16 type of format) as 2nd column stored in Excel file.
Its okay if the solution is applicable to either R or Excel. Any help/suggestion please.
Assuming that the matrix is named a:
c(interaction(a[, 1], a[, 2]))
## [1] 3 3 3 5 5 6 6 2 3
Note: For sake of reproduciblity, we used this as a :
a <- matrix(c("a", "a", "a", "a", "a", "b", "b", "b", "a", "b",
"b", "b", "c", "c", "c", "c", "a", "b"), 9, 2)
You can firstly create a factor variable based on the two columns then coerce it into numeric:
as.numeric(as.factor(paste(myMat[,1], myMat[,2])))
[1] 1 1 1 2 2 4 4 3 1

Recursively set dimnames on a list of matrices

On a list of matrices, I'd like to set only the colnames and leave the rownames as NULL. The matrices are all different dimension. Unlike this example, the names are specific to each matrix.
provideDimnames gets me in the ballpark, but I'm having trouble telling it to ignore the NULL row names, and only set the column names. Here are my attempts.
> L <- list(matrix(1:6, 2), matrix(1:20, 5))
> dimnm <- list(list(NULL, letters[1:3]), list(NULL, letters[1:4]))
> lapply(L, provideDimnames, base = dimnm)
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(dimnm))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(letters))
# [[1]]
# a b c
# a 1 3 5
# b 2 4 6
#
# [[2]]
# a b c d
# a 1 6 11 16
# b 2 7 12 17
# c 3 8 13 18
# d 4 9 14 19
# e 5 10 15 20
Almost, but I want [n,] for the row names. The desired result is:
> dimnames(L[[1]]) <- list(NULL, letters[1:3])
> dimnames(L[[2]]) <- list(NULL, letters[1:4])
> L
# [[1]]
# a b c
# [1,] 1 3 5
# [2,] 2 4 6
#
# [[2]]
# a b c d
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
> lapply(L, provideDimnames, base = list(NULL, letters))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, `colnames<-`, , letters)
# Error in FUN(X[[1L]], ...) :
# unused argument (c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
# "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))
Is there a way to do this with provideDimnames()? setNames() wouldn't accept a list for the dim-names either.
How about something like this?
L <- list(matrix(1:6, 2), matrix(1:20, 5))
nms <- list(letters[1:3], letters[23:26])
mapply(function(X,Y) {colnames(X) <-Y; X}, L, nms)
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
w x y z
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
You can do this relatively easily but you are complicating it by trying to do both dimnames where really you just want to fiddle with the column names. I would go about it this way:
## different dimnames; list of only the colnames
dimnm <- list(letters[1:3], letters[1:4])
## function to lapply which does the change
cnames <- function(i, lmat, names) {
colnames(lmat[[i]]) <- names[[i]]
lmat[[i]]
}
## do the change
L2 <- lapply(seq_along(L), cnames, lmat = L, names = dimnm)
L2
Gives us:
> L2
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
a b c d
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

transforming dataset (similarity ratings)

I want to transform the following data format (simplified representation):
image1 image2 rating
1 1 2 6
2 1 3 5
3 1 4 7
4 2 3 3
5 2 4 5
6 3 4 1
Reproduced by:
structure(list(image1 = c(1, 1, 1, 2, 2, 3), image2 = c(2, 3,
4, 3, 4, 4), rating = c(6, 5, 7, 3, 5, 1)), .Names = c("image1",
"image2", "rating"), row.names = c(NA, -6L), class = "data.frame")
To a format where you get a sort of correlation matrix, where the first two columns figure as indicators, and ratings are the values:
1 2 3 4
1 NA 6 5 7
2 6 NA 3 5
3 5 3 NA 1
4 7 5 1 NA
Does any of you know of a function in R to do this?
I would rather use matrix indexing:
N <- max(dat[c("image1", "image2")])
out <- matrix(NA, N, N)
out[cbind(dat$image1, dat$image2)] <- dat$rating
out[cbind(dat$image2, dat$image1)] <- dat$rating
# [,1] [,2] [,3] [,4]
# [1,] NA 6 5 7
# [2,] 6 NA 3 5
# [3,] 5 3 NA 1
# [4,] 7 5 1 NA
I don't like the <<- operator very much, but it works for this (naming your structure s):
N <- max(s[,1:2])
m <- matrix(NA, nrow=N, ncol=N)
apply(s, 1, function(x) { m[x[1], x[2]] <<- m[x[2], x[1]] <<- x[3]})
> m
[,1] [,2] [,3] [,4]
[1,] NA 6 5 7
[2,] 6 NA 3 5
[3,] 5 3 NA 1
[4,] 7 5 1 NA
Not as elegant as Karsten's solution, but it does not rely on the order of the rows, nor does it require that all combinations be present.
Here is one approach, where dat is the data frame as defined in the question
res <- matrix(0, nrow=4, ncol=4) # dim may need to be adjusted
ll <- lower.tri(res, diag=FALSE)
res[which(ll)] <- dat$rating
res <- res + t(res)
diag(res) <- NA
This works only if the rows are ordered as in the question.

data.table "key indices" or "group counter"

After creating a key on a data.table:
set.seed(12345)
DT <- data.table(x = sample(LETTERS[1:3], 10, replace = TRUE),
y = sample(LETTERS[1:3], 10, replace = TRUE))
setkey(DT, x, y)
DT
# x y
# [1,] A B
# [2,] A B
# [3,] B B
# [4,] B B
# [5,] C A
# [6,] C A
# [7,] C A
# [8,] C A
# [9,] C C
# [10,] C C
I would like to get an integer vector giving for each row the corresponding "key index". I hope the expected output (column i) below will help clarify what I mean:
# x y i
# [1,] A B 1
# [2,] A B 1
# [3,] B B 2
# [4,] B B 2
# [5,] C A 3
# [6,] C A 3
# [7,] C A 3
# [8,] C A 3
# [9,] C C 4
# [10,] C C 4
I thought about using something like cumsum(!duplicated(DT[, key(DT), with = FALSE])) but am hoping there is a better solution. I feel this vector could be part of the table's internal representation, and maybe there is a way to access it? Even if it is not the case, what would you suggest?
Update: From v1.8.3, you can simply use the inbuilt special .GRP:
DT[ , i := .GRP, by = key(DT)]
See history for older answers.
I'd probably just do this, since I'm fairly confident that no index counter is available from within the call to [.data.table():
ii <- unique(DT)
ii[ , i := seq_len(nrow(ii))]
DT[ii]
# x y i
# 1: A B 1
# 2: A B 1
# 3: B B 2
# 4: B B 2
# 5: C A 3
# 6: C A 3
# 7: C A 3
# 8: C A 3
# 9: C C 4
# 10: C C 4
You could make this a one-liner, at the expense of an additional call to unique.data.table():
DT[unique(DT)[ , i := seq_len(nrow(unique(DT)))]]

Resources