Replace multiple values in a matrix - r

a is a matrix:
a <- matrix(1:9,3)
> a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
I want to replace all the 1 to good, all the 4 to medium, and all the 9 to bad.
I use the following code:
a[a==1] <- "good"
a[a==4] <- "medium"
a[a==9] <- "bad"
> a
[,1] [,2] [,3]
[1,] "good" "medium" "7"
[2,] "2" "5" "8"
[3,] "3" "6" "bad"
It works, but is this the simplest way to work it out? Can I combine these codes into one command?

Using cut():
matrix(cut(a, breaks = c(0:9),
labels = c("good", 2:3, "medium", 5:8, "bad")), 3)
But not really happy with manual labels bit.
Maybe using match(), more flexible:
res <- matrix(c("good", "medium", "bad")[match(a, c(1, 4, 9))], 3)
res <- ifelse(is.na(res), a, res)

car::recode() does nicely here, returning the same matrix structure as was given as input.
car::recode(a, "1='good';4='medium';9='bad'")
# [,1] [,2] [,3]
# [1,] "good" "medium" "7"
# [2,] "2" "5" "8"
# [3,] "3" "6" "bad"

Related

Why won't my matrix convert from character to numeric?

I'm trying to normalise my data for use in a neural network. My data train0 has all integer or double type columns except for the last one which is a factor. This is what I've tried doing.
n <- ncol(train0)-1
y_train <- train0$ffail
x_train <- as.matrix(train0[,4:n])
range_norm <- function(x) {
( (x - min(x)) / (max(x) - min(x)) )}
# Normalize training and test data
x_train_norm <- apply(x_train, 2, range_norm)
But I keep getting this error: Error in x - min(x) : non-numeric argument to binary operator
I've checked the type of each column in x_train and it says their all characters so I've tried converting to numeric like this
for(i in 1:ncol(x_train)){
x_train1[,i] <- as.numeric(x_train[,i])
print(typeof(x_train1[,i]))
}
However, after I use as.numeric, I print the type of each column to check and they're still characters.
I would appreciate any help in trying to normalise the data and how to convert the data to a numeric matrix. Thanks
Here is one way to convert a character matrix to a numeric matrix:
m = matrix(as.character(1:9), 3, 3)
m
## [,1] [,2] [,3]
## [1,] "1" "4" "7"
## [2,] "2" "5" "8"
## [3,] "3" "6" "9"
apply(m, 2, as.numeric)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
We may set the storage mode of the matrix to "numeric".
m <- matrix(as.character(1:9), 3, 3)
m
# [,1] [,2] [,3]
# [1,] "1" "4" "7"
# [2,] "2" "5" "8"
# [3,] "3" "6" "9"
mode(m)
# [1] "character"
mode(m) <- "numeric" ## set storage mode
m
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9

How to transpose a matrix into a second column, while the first column being the first element

First of all I've never coded anything in my life, and I'm just learning R this week.
I'm not sure if the title is any clear, but I guess showing my problem is easier:
Let's say I have this Matrix (m):
[,1] [,2] [,3] [,4]
[1,] A 1 2 3
[2,] B 1 4
[3,] C 3
Basically that A contains 1, 2 and 3, B contains 1 and 4 and so on.
How would I show that in a matrix with 2 columns only?
[,1] [,2]
[1,] A 1
[2,] A 2
[3,] A 3
[4,] B 1
[5,] B 4
[6,] C 3
Thanks a lot!
Assuming that the blanks showed are NA, get the count of NA elements per row with rowSums, cbind the replicated first column based on 'n' while transposing the rest of the columns after omitting the NAs
n <- rowSums(!is.na(m1[,-1]))
cbind(rep(m1[,1], n), na.omit(c(t(m1[,-1]))))
# [,1] [,2]
#[1,] "A" "1"
#[2,] "A" "2"
#[3,] "A" "3"
#[4,] "B" "1"
#[5,] "B" "4"
#[6,] "C" "3"
Or a slightly more compact option is to replicate the first column with col index, cbind with the transpose of rest of the columns, and finally remove the NA rows with na.omit
na.omit(cbind(m1[,1][col(m1[,-1])], c(t(m1[,-1]))))
# [,1] [,2]
#[1,] "A" "1"
#[2,] "A" "2"
#[3,] "A" "3"
#[4,] "B" "1"
#[5,] "B" "4"
#[6,] "C" "3"
NOTE: matrix cannot have multiple column types. So, if there is a character class, all the elements are converted to character
data
m1 <- structure(c("A", "B", "C", "1", "1", "3", "2", "4", NA, "3",
NA, NA), .Dim = 3:4)

strsplit split on either or depending on

Once again I'm struggling with strsplit. I'm transforming some strings to data frames, but there's a forward slash, / and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in strsplit. My working example below should illustrate the issue
The strsplit function I'm currrently using
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }
one type of string I got,
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#> [,1] [,2]
#> [1,] "One" "58/2"
#> [2,] "Two" "22/3"
#> [3,] "Three" "15/5"
another type I got in the same spot,
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#> [,1] [,2] [,3] [,4]
#> [1,] "One" "58" "/" "2"
#> [2,] "Two" "22" "/" "3"
#> [3,] "Three" "15" "/" "5"
They obviously create different outputs, and I can't figure out how to code a solution that work for both. Below is my desired outcome. Thank you in advance!
desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
"15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#> [,1] [,2] [,3]
#> [1,] "One" "58" "2"
#> [2,] "Two" "22" "3"
#> [3,] "Three" "15" "5"
This works:
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string1)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
str_to_df(string2)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
Another approach with tidyr could be:
string1 %>%
as_tibble() %>%
separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")
# A tibble: 3 x 3
# Col1 Col2 Col3
# <chr> <chr> <chr>
# 1 One 58 2
# 2 Two 22 3
# 3 Three 15 5
We can create a function to split at one or more space or tab or forward slash
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
f1(string2)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
Or we can do with read.csv after replacing the spaces with a common delimiter
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
# V1 V2 V3
#1 One 58 2
#2 Two 22 3
#3 Three 15 5

Subset dataframe into equal subgroup chunks

I have df dataframe that needs subsetting into chunks of 2 names. From example below, there are 4 unique names: a,b,c,d. I need to subset into 2 one column matrices a,b and c,d.
Output format:
name1
item_value
item_value
...
END
name2
item_value
item_value
...
END
Example:
#dummy data
df <- data.frame(name=sort(c(rep(letters[1:4],2),"a","a","c")),
item=round(runif(11,1,10)),
stringsAsFactors=FALSE)
#tried approach - split per name. I need to split per 2 names.
lapply(split(df,f=df$name),
function(x)
{name <- unique(x$name)
as.matrix(c(name,x[,2],"END"))
})
#expected output
[,1]
[1,] "a"
[2,] "8"
[3,] "9"
[4,] "6"
[5,] "4"
[6,] "END"
[1,] "b"
[2,] "2"
[3,] "10"
[4,] "END"
[,2]
[1,] "c"
[2,] "6"
[3,] "6"
[4,] "2"
[5,] "END"
[1,] "d"
[2,] "4"
[3,] "1"
[4,] "END"
Note: Actual df has ~300000 rows with ~35000 unique names.
You may try this.
# for each 'name', "pad" 'item' with 'name' and 'END'
l1 <- lapply(split(df, f = df$name), function(x){
name <- unique(x$name)
as.matrix(c(name, x$item, "END"))
})
# create a sequence of numbers, to select two by two elements from the list
steps <- seq(from = 0, to = length(unique(df$name))/2, by = 2)
# loop over 'steps' to bind together list elements, two by two.
l2 <- lapply(steps, function(x){
do.call(rbind, l1[1:2 + x])
})
l2
# [[1]]
# [,1]
# [1,] "a"
# [2,] "6"
# [3,] "4"
# [4,] "10"
# [5,] "3"
# [6,] "END"
# [7,] "b"
# [8,] "6"
# [9,] "7"
# [10,] "END"
#
# [[2]]
# [,1]
# [1,] "c"
# [2,] "2"
# [3,] "6"
# [4,] "10"
# [5,] "END"
# [6,] "d"
# [7,] "5"
# [8,] "4"
# [9,] "END"
Instead of making the lists from individual names make it from the column of subsets of the data.frame
res <- list("a_b" = c(df[df$name == "a",2],"END",df[df$name == "b", 2],"END"),
"c_d" = c(df[df$name == "c",2],"END", df[df$name == "d", 2],"END"))
res2 <- vector(mode="list",length=2)
res2 <- sapply(1:(length(unique(df$name))/2),function(x) {
sapply(seq(1,length(unique(df$name))-1,by=2), function(y) {
name <- unique(df$name)
res2[x] <- as.matrix(c(name[y],df[df$name == name[y],2],"END",name[y+1],df[df$name == name[y+1],2],"END"))
})
})
answer <- res2[,1]
This is giving me a matrix of lists since there are two sapplys happening, I think everything you want is in res2[,1]

R: duplicates elimination in a matrix, keeping track of multiplicities

I have a basic problem with R.
I have produced the matrix
M
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "a" "3"
[4,] "c" "1"
I would like to obtain the 3X2 matrix
[,1] [,2] [,3]
[1,] "a" "1" "3"
[2,] "b" "2" NA
[3,] "c" "1" NA
obtained by eliminating duplicates in M[,1] and writing in N[i,2], N[i,3] the values in M[,2] corresponding to the same element in M[,1], for all i's. The "NA"'s in N[,3] correspond to the singletons in M[,1].
I know how to eliminate duplicates from a vector in R: my problem is to keep track of the elements in M[,2] and write them in the resulting matrix N. I tried with for cycles but they do not work so well in my "real world" case, where the matrices are much bigger.
Any suggestions?
I thank you very much.
You can use dcast in the reshape2 package after turning your matrix to a data.frame. To reverse the process you can use melt.
df = data.frame(c("a","b","a","c"),c(1:3,1))
colnames(df) = c("factor","obs")
require(reshape2)
df2=dcast(df, factor ~ obs)
now df2 is:
factor 1 2 3
1 a 1 NA 3
2 b NA 2 NA
3 c 1 NA NA
To me it makes more sense to keep it like this. But if you need it in your format:
res = t(apply(df2,1,function(x) { newLine = as.vector(x[which(!is.na(x))],mode="any"); newLine=c(newLine,rep(NA, ncol(df2)-length(newLine) )) }))
res = res[,-ncol(res)]
[,1] [,2] [,3]
[1,] "a" " 1" " 3"
[2,] "b" " 2" NA
[3,] "c" " 1" NA

Resources