R data suppression that ignores non-integers - r

I am trying to suppress all values in my data frame that are less than 5.
data[data < 5] <- NA
For the most part this works fine. However I also have some values that are categorical age groups "0-19" "20-24" and so on. I guess R is doing the subtraction and removing these as well. Is there a way to do this that ignores any value that isn't an integer?
Edit: Some dummy data as an example:
dummydata<-as.data.frame(rbind(c('0-19','4','5'),c('20-24','6','1')))
> dummydata
[,1] [,2] [,3]
[1,] "0-19" "4" "5"
[2,] "20-24" "6" "1"
dummydata[dummydata < 5] <- NA
> dummydata
[,1] [,2] [,3]
[1,] NA NA "5"
[2,] NA "6" NA
Desired output:
[,1] [,2] [,3]
[1,] "0-19" NA "5"
[2,] "20-24" "6" NA

We can first find out values that have only numbers in them (no '-'), turn them to numeric and if it is less than 5 change to NA.
dummydata[] <- lapply(dummydata, function(x) {
tmp <- grepl('^\\d+$', x)
x[tmp][as.numeric(x[tmp]) < 5] <- NA
x
})
dummydata
# V1 V2 V3
#1 0-19 <NA> 5
#2 20-24 6 <NA>

Related

Why won't my matrix convert from character to numeric?

I'm trying to normalise my data for use in a neural network. My data train0 has all integer or double type columns except for the last one which is a factor. This is what I've tried doing.
n <- ncol(train0)-1
y_train <- train0$ffail
x_train <- as.matrix(train0[,4:n])
range_norm <- function(x) {
( (x - min(x)) / (max(x) - min(x)) )}
# Normalize training and test data
x_train_norm <- apply(x_train, 2, range_norm)
But I keep getting this error: Error in x - min(x) : non-numeric argument to binary operator
I've checked the type of each column in x_train and it says their all characters so I've tried converting to numeric like this
for(i in 1:ncol(x_train)){
x_train1[,i] <- as.numeric(x_train[,i])
print(typeof(x_train1[,i]))
}
However, after I use as.numeric, I print the type of each column to check and they're still characters.
I would appreciate any help in trying to normalise the data and how to convert the data to a numeric matrix. Thanks
Here is one way to convert a character matrix to a numeric matrix:
m = matrix(as.character(1:9), 3, 3)
m
## [,1] [,2] [,3]
## [1,] "1" "4" "7"
## [2,] "2" "5" "8"
## [3,] "3" "6" "9"
apply(m, 2, as.numeric)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
We may set the storage mode of the matrix to "numeric".
m <- matrix(as.character(1:9), 3, 3)
m
# [,1] [,2] [,3]
# [1,] "1" "4" "7"
# [2,] "2" "5" "8"
# [3,] "3" "6" "9"
mode(m)
# [1] "character"
mode(m) <- "numeric" ## set storage mode
m
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9

From list of characters to matrix/data-frame of numeric (R)

I have a long list, whose elements are lists of length one containing a character vector. These vectors can have different lengths.
The element of the vectors are 'characters' but I would like to convert them in numeric, as they actually represent numbers.
I would like to create a matrix, or a data frame, whose rows are the vectors above, converted into numeric. Since they have different lengths, the "right ends" of each row could be filled with NA.
I am trying to use the function rbind.fill.matrix from the library {plyr}, but the only thing I could get is a long numeric 1-d array with all the numbers inside, instead of a matrix.
This is the best I could do to get a list of numeric (dat here is my original list):
dat<-sapply(sapply(dat,unlist),as.numeric)
How can I create the matrix now?
Thank you!
I would do something like:
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
The basic idea is that stri_list2matrix will convert the list to a matrix, but it would still be a character matrix. as.numeric would remove the dimensional attributes of the matrix, so we add those back in with:
`dim<-` ## Yes, the backticks are required -- or at least quotes
POC:
dat <- list(1:2, 1:3, 1:2, 1:5, 1:6)
dat <- lapply(dat, as.character)
dat
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "1" "2" "3"
#
# [[3]]
# [1] "1" "2"
#
# [[4]]
# [1] "1" "2" "3" "4" "5"
#
# [[5]]
# [1] "1" "2" "3" "4" "5" "6"
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
final
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 NA NA NA NA
# [2,] 1 2 3 NA NA NA
# [3,] 1 2 NA NA NA NA
# [4,] 1 2 3 4 5 NA
# [5,] 1 2 3 4 5 6

R - filtering Matrix based off True/False vector

I have a data structure that can contain both vectors and matrices. I want to filter it based off of of a true false column. I can't figure out how to filter both of them successfully.
result <- structure(list(aba = c(1, 2, 3, 4), beta = c("a", "b", "c", "d"),
chi = structure(c(0.438148361863568, 0.889733991585672, 0.0910745360888541,
0.0512442977633327, 0.812013201415539, 0.717306115897372, 0.995319503592327,
0.758843480376527, 0.366544214077294, 0.706843026448041, 0.108310810523108,
0.225777650484815, 0.831163870869204, 0.274351604515687, 0.323493955424055,
0.351171918679029), .Dim = c(4L, 4L))), .Names = c("aba", "beta", "chi"))
> result
$aba
[1] 1 2 3 4
$beta
[1] "a" "b" "c" "d"
$chi
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] 0.88973399 0.7173061 0.7068430 0.2743516
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
tf <- c(T,F,T,T)
What I would like to do is something like
> lapply(result,function(x) {ifelse(tf,x,NA)})
$aba
[1] 1 NA 3 4
$beta
[1] "a" NA "c" "d"
$chi
[1] 0.43814836 NA 0.09107454 0.05124430
but the $chi matrix structure is lost.
The result I'd expect is
ifelse(matrix(tf,ncol=4,nrow=4),result$chi,NA)
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] NA NA NA NA
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
The challenge I'm having a problem solving is how to match the tf vector to the data. It feels like I need to set it using a conditional based on data type, which I'd like to avoid. Thoughts and answers are appreciated.
I don't see how you can avoid either checking the data type or the "dimensions" of the data. As such, I would propose something like:
lapply(result, function(x) {
if (is.null(dim(x))) x[!tf] <- NA else x[!tf, ] <- NA
x
})
# $aba
# [1] 1 NA 3 4
#
# $beta
# [1] "a" NA "c" "d"
#
# $chi
# [,1] [,2] [,3] [,4]
# [1,] 0.43814836 0.8120132 0.3665442 0.8311639
# [2,] NA NA NA NA
# [3,] 0.09107454 0.9953195 0.1083108 0.3234940
# [4,] 0.05124430 0.7588435 0.2257777 0.3511719
This seems fairly simple:
is.na(tf) <- !tf # convert FALSE to NA
result$chi[ tf, ] # and use the default behavior of "[" with NA arg
[,1] [,2] [,3] [,4]
[1,] 0.43814836 0.8120132 0.3665442 0.8311639
[2,] NA NA NA NA
[3,] 0.09107454 0.9953195 0.1083108 0.3234940
[4,] 0.05124430 0.7588435 0.2257777 0.3511719
But now I see that you wanted NAs at the corresponging postions of the atomic vectors. Unfortunately "[" with the additional NULL argument would error-out on that type of object.

R: duplicates elimination in a matrix, keeping track of multiplicities

I have a basic problem with R.
I have produced the matrix
M
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "a" "3"
[4,] "c" "1"
I would like to obtain the 3X2 matrix
[,1] [,2] [,3]
[1,] "a" "1" "3"
[2,] "b" "2" NA
[3,] "c" "1" NA
obtained by eliminating duplicates in M[,1] and writing in N[i,2], N[i,3] the values in M[,2] corresponding to the same element in M[,1], for all i's. The "NA"'s in N[,3] correspond to the singletons in M[,1].
I know how to eliminate duplicates from a vector in R: my problem is to keep track of the elements in M[,2] and write them in the resulting matrix N. I tried with for cycles but they do not work so well in my "real world" case, where the matrices are much bigger.
Any suggestions?
I thank you very much.
You can use dcast in the reshape2 package after turning your matrix to a data.frame. To reverse the process you can use melt.
df = data.frame(c("a","b","a","c"),c(1:3,1))
colnames(df) = c("factor","obs")
require(reshape2)
df2=dcast(df, factor ~ obs)
now df2 is:
factor 1 2 3
1 a 1 NA 3
2 b NA 2 NA
3 c 1 NA NA
To me it makes more sense to keep it like this. But if you need it in your format:
res = t(apply(df2,1,function(x) { newLine = as.vector(x[which(!is.na(x))],mode="any"); newLine=c(newLine,rep(NA, ncol(df2)-length(newLine) )) }))
res = res[,-ncol(res)]
[,1] [,2] [,3]
[1,] "a" " 1" " 3"
[2,] "b" " 2" NA
[3,] "c" " 1" NA

Construct dynamic-sized array in R

I was wondering about what are the ways to construct dynamic-size array in R.
For one example, I want to construct a n-vector but its dimension n is dynamically determined. The following code will work:
> x=NULL
> n=2;
> for (i in 1:n) x[i]=i;
> x
[1] 1 2
For another example, I want to construct a n by 2 matrix where the number of rows n is dynamically determined. But I fail even at assigning the first row:
> tmp=c(1,2)
> x=NULL
> x[1,]=tmp
Error in x[1, ] = tmp : incorrect number of subscripts on matrix
> x[1,:]=tmp
Error: unexpected ':' in "x[1,:"
Thanks and regards!
I think the answers you are looking for are rbind() and cbind():
> x=NULL # could also use x <- c()
> rbind(x, c(1,2))
[,1] [,2]
[1,] 1 2
> x <- rbind(x, c(1,2))
> x <- rbind(x, c(1,2)) # now extend row-wise
> x
[,1] [,2]
[1,] 1 2
[2,] 1 2
> x <- cbind(x, c(1,2)) # or column-wise
> x
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 2 2
The strategy of trying to assign to "new indices" on the fly as you attempted can be done in some languages but cannot be done that way in R.
You can also use sparse matrices provided in the Matrix package. They would allow assignments of the form M <- sparseMatrix(i=200, j=50, x=234) resulting in a single value at row 200, column 50 and 0's everywhere else.
require(Matrix)
M <- sparseMatrix(i=200, j=50, x=234)
M[1,1]
# [1] 0
M[200, 50]
# [1] 234
But I think the use of sparse matrices is best reserved for later use after mastering regular matrices.
It is possible to dimension the array after we fill it (in a one-dimensional, vector, fashion)
Emulating the 1-dimension snippet of the question, here's the way it can be done with higher dimensions.
> x=c()
> tmp=c(1,2)
> n=6
> for (i in seq(1, by=2, length=n)) x[i:(i+1)] =tmp;
> dim(x) = c(2,n)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 1 1 1
[2,] 2 2 2 2 2 2
>
Rather than using i:(i+1) as index, it may be preferable to use seq(i, length=2) or better yet, seq(i, length=length(tmp)) for a more generic approach, as illustrated below (for a 4 x 7 array example)
> x=c()
> tmp=c(1,2,3,4)
> n=7
> for (i in seq(1, by=length(tmp), length=n))
x[seq(i, length=length(tmp))] = tmp;
> dim(x) = c(length(tmp),n)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4
>
We can also obtain a similar result by re-assigning x with cbind/rbind, as follow.
> tmp=c(1,2)
> n=6
> x=rbind(tmp)
> for (i in 1:n) x=rbind(x, tmp);
> x
[,1] [,2]
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
Note: one can get rid of the "tmp" names (these are a side effect of the rbind), with
> dimnames(x)=NULL
You can rbind it:
tmp = c(1,2)
x = NULL
rbind(x, tmp)
I believe this is an approach you need
arr <- array(1)
arr <- append(arr,3)
arr[1] <- 2
print(arr[1])
(found on rosettacode.org)
When I want to dynamically construct an array (matrix), I do it like so:
n <- 500
new.mtrx <- matrix(ncol = 2, nrow = n)
head(new.mtrx)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
Your matrix is now ready to accept vectors.
Assuming you already have a vector, you pass that to the matrix() function. Notice how values are "broken" into the matrix (column wise). This can be changed with byrow argument.
matrix(letters, ncol = 2)
[,1] [,2]
[1,] "a" "n"
[2,] "b" "o"
[3,] "c" "p"
[4,] "d" "q"
[5,] "e" "r"
[6,] "f" "s"
[7,] "g" "t"
[8,] "h" "u"
[9,] "i" "v"
[10,] "j" "w"
[11,] "k" "x"
[12,] "l" "y"
[13,] "m" "z"
n = 5
x = c(1,2) %o% rep(1,n)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 2 2 2 2 2
x = rep(1,n) %o% c(1,2)
x
# [,1] [,2]
# [1,] 1 2
# [2,] 1 2
# [3,] 1 2
# [4,] 1 2
# [5,] 1 2

Resources