Why won't my matrix convert from character to numeric? - r

I'm trying to normalise my data for use in a neural network. My data train0 has all integer or double type columns except for the last one which is a factor. This is what I've tried doing.
n <- ncol(train0)-1
y_train <- train0$ffail
x_train <- as.matrix(train0[,4:n])
range_norm <- function(x) {
( (x - min(x)) / (max(x) - min(x)) )}
# Normalize training and test data
x_train_norm <- apply(x_train, 2, range_norm)
But I keep getting this error: Error in x - min(x) : non-numeric argument to binary operator
I've checked the type of each column in x_train and it says their all characters so I've tried converting to numeric like this
for(i in 1:ncol(x_train)){
x_train1[,i] <- as.numeric(x_train[,i])
print(typeof(x_train1[,i]))
}
However, after I use as.numeric, I print the type of each column to check and they're still characters.
I would appreciate any help in trying to normalise the data and how to convert the data to a numeric matrix. Thanks

Here is one way to convert a character matrix to a numeric matrix:
m = matrix(as.character(1:9), 3, 3)
m
## [,1] [,2] [,3]
## [1,] "1" "4" "7"
## [2,] "2" "5" "8"
## [3,] "3" "6" "9"
apply(m, 2, as.numeric)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9

We may set the storage mode of the matrix to "numeric".
m <- matrix(as.character(1:9), 3, 3)
m
# [,1] [,2] [,3]
# [1,] "1" "4" "7"
# [2,] "2" "5" "8"
# [3,] "3" "6" "9"
mode(m)
# [1] "character"
mode(m) <- "numeric" ## set storage mode
m
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9

Related

R data suppression that ignores non-integers

I am trying to suppress all values in my data frame that are less than 5.
data[data < 5] <- NA
For the most part this works fine. However I also have some values that are categorical age groups "0-19" "20-24" and so on. I guess R is doing the subtraction and removing these as well. Is there a way to do this that ignores any value that isn't an integer?
Edit: Some dummy data as an example:
dummydata<-as.data.frame(rbind(c('0-19','4','5'),c('20-24','6','1')))
> dummydata
[,1] [,2] [,3]
[1,] "0-19" "4" "5"
[2,] "20-24" "6" "1"
dummydata[dummydata < 5] <- NA
> dummydata
[,1] [,2] [,3]
[1,] NA NA "5"
[2,] NA "6" NA
Desired output:
[,1] [,2] [,3]
[1,] "0-19" NA "5"
[2,] "20-24" "6" NA
We can first find out values that have only numbers in them (no '-'), turn them to numeric and if it is less than 5 change to NA.
dummydata[] <- lapply(dummydata, function(x) {
tmp <- grepl('^\\d+$', x)
x[tmp][as.numeric(x[tmp]) < 5] <- NA
x
})
dummydata
# V1 V2 V3
#1 0-19 <NA> 5
#2 20-24 6 <NA>

Replace multiple values in a matrix

a is a matrix:
a <- matrix(1:9,3)
> a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
I want to replace all the 1 to good, all the 4 to medium, and all the 9 to bad.
I use the following code:
a[a==1] <- "good"
a[a==4] <- "medium"
a[a==9] <- "bad"
> a
[,1] [,2] [,3]
[1,] "good" "medium" "7"
[2,] "2" "5" "8"
[3,] "3" "6" "bad"
It works, but is this the simplest way to work it out? Can I combine these codes into one command?
Using cut():
matrix(cut(a, breaks = c(0:9),
labels = c("good", 2:3, "medium", 5:8, "bad")), 3)
But not really happy with manual labels bit.
Maybe using match(), more flexible:
res <- matrix(c("good", "medium", "bad")[match(a, c(1, 4, 9))], 3)
res <- ifelse(is.na(res), a, res)
car::recode() does nicely here, returning the same matrix structure as was given as input.
car::recode(a, "1='good';4='medium';9='bad'")
# [,1] [,2] [,3]
# [1,] "good" "medium" "7"
# [2,] "2" "5" "8"
# [3,] "3" "6" "bad"

From list of characters to matrix/data-frame of numeric (R)

I have a long list, whose elements are lists of length one containing a character vector. These vectors can have different lengths.
The element of the vectors are 'characters' but I would like to convert them in numeric, as they actually represent numbers.
I would like to create a matrix, or a data frame, whose rows are the vectors above, converted into numeric. Since they have different lengths, the "right ends" of each row could be filled with NA.
I am trying to use the function rbind.fill.matrix from the library {plyr}, but the only thing I could get is a long numeric 1-d array with all the numbers inside, instead of a matrix.
This is the best I could do to get a list of numeric (dat here is my original list):
dat<-sapply(sapply(dat,unlist),as.numeric)
How can I create the matrix now?
Thank you!
I would do something like:
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
The basic idea is that stri_list2matrix will convert the list to a matrix, but it would still be a character matrix. as.numeric would remove the dimensional attributes of the matrix, so we add those back in with:
`dim<-` ## Yes, the backticks are required -- or at least quotes
POC:
dat <- list(1:2, 1:3, 1:2, 1:5, 1:6)
dat <- lapply(dat, as.character)
dat
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "1" "2" "3"
#
# [[3]]
# [1] "1" "2"
#
# [[4]]
# [1] "1" "2" "3" "4" "5"
#
# [[5]]
# [1] "1" "2" "3" "4" "5" "6"
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
final
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 NA NA NA NA
# [2,] 1 2 3 NA NA NA
# [3,] 1 2 NA NA NA NA
# [4,] 1 2 3 4 5 NA
# [5,] 1 2 3 4 5 6

R: duplicates elimination in a matrix, keeping track of multiplicities

I have a basic problem with R.
I have produced the matrix
M
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "a" "3"
[4,] "c" "1"
I would like to obtain the 3X2 matrix
[,1] [,2] [,3]
[1,] "a" "1" "3"
[2,] "b" "2" NA
[3,] "c" "1" NA
obtained by eliminating duplicates in M[,1] and writing in N[i,2], N[i,3] the values in M[,2] corresponding to the same element in M[,1], for all i's. The "NA"'s in N[,3] correspond to the singletons in M[,1].
I know how to eliminate duplicates from a vector in R: my problem is to keep track of the elements in M[,2] and write them in the resulting matrix N. I tried with for cycles but they do not work so well in my "real world" case, where the matrices are much bigger.
Any suggestions?
I thank you very much.
You can use dcast in the reshape2 package after turning your matrix to a data.frame. To reverse the process you can use melt.
df = data.frame(c("a","b","a","c"),c(1:3,1))
colnames(df) = c("factor","obs")
require(reshape2)
df2=dcast(df, factor ~ obs)
now df2 is:
factor 1 2 3
1 a 1 NA 3
2 b NA 2 NA
3 c 1 NA NA
To me it makes more sense to keep it like this. But if you need it in your format:
res = t(apply(df2,1,function(x) { newLine = as.vector(x[which(!is.na(x))],mode="any"); newLine=c(newLine,rep(NA, ncol(df2)-length(newLine) )) }))
res = res[,-ncol(res)]
[,1] [,2] [,3]
[1,] "a" " 1" " 3"
[2,] "b" " 2" NA
[3,] "c" " 1" NA

Construct dynamic-sized array in R

I was wondering about what are the ways to construct dynamic-size array in R.
For one example, I want to construct a n-vector but its dimension n is dynamically determined. The following code will work:
> x=NULL
> n=2;
> for (i in 1:n) x[i]=i;
> x
[1] 1 2
For another example, I want to construct a n by 2 matrix where the number of rows n is dynamically determined. But I fail even at assigning the first row:
> tmp=c(1,2)
> x=NULL
> x[1,]=tmp
Error in x[1, ] = tmp : incorrect number of subscripts on matrix
> x[1,:]=tmp
Error: unexpected ':' in "x[1,:"
Thanks and regards!
I think the answers you are looking for are rbind() and cbind():
> x=NULL # could also use x <- c()
> rbind(x, c(1,2))
[,1] [,2]
[1,] 1 2
> x <- rbind(x, c(1,2))
> x <- rbind(x, c(1,2)) # now extend row-wise
> x
[,1] [,2]
[1,] 1 2
[2,] 1 2
> x <- cbind(x, c(1,2)) # or column-wise
> x
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 2 2
The strategy of trying to assign to "new indices" on the fly as you attempted can be done in some languages but cannot be done that way in R.
You can also use sparse matrices provided in the Matrix package. They would allow assignments of the form M <- sparseMatrix(i=200, j=50, x=234) resulting in a single value at row 200, column 50 and 0's everywhere else.
require(Matrix)
M <- sparseMatrix(i=200, j=50, x=234)
M[1,1]
# [1] 0
M[200, 50]
# [1] 234
But I think the use of sparse matrices is best reserved for later use after mastering regular matrices.
It is possible to dimension the array after we fill it (in a one-dimensional, vector, fashion)
Emulating the 1-dimension snippet of the question, here's the way it can be done with higher dimensions.
> x=c()
> tmp=c(1,2)
> n=6
> for (i in seq(1, by=2, length=n)) x[i:(i+1)] =tmp;
> dim(x) = c(2,n)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 1 1 1
[2,] 2 2 2 2 2 2
>
Rather than using i:(i+1) as index, it may be preferable to use seq(i, length=2) or better yet, seq(i, length=length(tmp)) for a more generic approach, as illustrated below (for a 4 x 7 array example)
> x=c()
> tmp=c(1,2,3,4)
> n=7
> for (i in seq(1, by=length(tmp), length=n))
x[seq(i, length=length(tmp))] = tmp;
> dim(x) = c(length(tmp),n)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4
>
We can also obtain a similar result by re-assigning x with cbind/rbind, as follow.
> tmp=c(1,2)
> n=6
> x=rbind(tmp)
> for (i in 1:n) x=rbind(x, tmp);
> x
[,1] [,2]
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
Note: one can get rid of the "tmp" names (these are a side effect of the rbind), with
> dimnames(x)=NULL
You can rbind it:
tmp = c(1,2)
x = NULL
rbind(x, tmp)
I believe this is an approach you need
arr <- array(1)
arr <- append(arr,3)
arr[1] <- 2
print(arr[1])
(found on rosettacode.org)
When I want to dynamically construct an array (matrix), I do it like so:
n <- 500
new.mtrx <- matrix(ncol = 2, nrow = n)
head(new.mtrx)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
Your matrix is now ready to accept vectors.
Assuming you already have a vector, you pass that to the matrix() function. Notice how values are "broken" into the matrix (column wise). This can be changed with byrow argument.
matrix(letters, ncol = 2)
[,1] [,2]
[1,] "a" "n"
[2,] "b" "o"
[3,] "c" "p"
[4,] "d" "q"
[5,] "e" "r"
[6,] "f" "s"
[7,] "g" "t"
[8,] "h" "u"
[9,] "i" "v"
[10,] "j" "w"
[11,] "k" "x"
[12,] "l" "y"
[13,] "m" "z"
n = 5
x = c(1,2) %o% rep(1,n)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 2 2 2 2 2
x = rep(1,n) %o% c(1,2)
x
# [,1] [,2]
# [1,] 1 2
# [2,] 1 2
# [3,] 1 2
# [4,] 1 2
# [5,] 1 2

Resources