Create new column based on 4 values in another column - r

I want to create a new column based on 4 values in another column.
if col1=1 then col2= G;
if col1=2 then col2=H;
if col1=3 then col2=J;
if col1=4 then col2=K.
HOW DO I DO THIS IN R?
Please I need someone to help address this. I have tried if/else and ifelse but none seems to be working. Thanks

You could use nested ifelse:
col2 <- ifelse(col1==1, "G",
ifelse(col1==2, "H",
ifelse(col1==3, "J",
ifelse(col1==4, "K",
NA )))) # all other values map to NA
In this simple case it's overkill, but for more complicated ones...

You have a special case of looking up values where the index are integer numbers 1:4. This means you can use vector indexing to solve your problem in one easy step.
First, create some sample data:
set.seed(1)
dat <- data.frame(col1 = sample(1:4, 10, replace = TRUE))
Next, define the lookup values, and use [ subsetting to find the desired results:
values <- c("G", "H", "J", "K")
dat$col2 <- values[dat$col1]
The results:
dat
col1 col2
1 2 H
2 2 H
3 3 J
4 4 K
5 1 G
6 4 K
7 4 K
8 3 J
9 3 J
10 1 G
More generally, you can use [ subsetting combined with match to solve this kind of problem:
index <- c(1, 2, 3, 4)
values <- c("G", "H", "J", "K")
dat$col2 <- values[match(dat$col1, index)]
dat
col1 col2
1 2 H
2 2 H
3 3 J
4 4 K
5 1 G
6 4 K
7 4 K
8 3 J
9 3 J
10 1 G

There are a number of ways of doing this, but here's one.
set.seed(357)
mydf <- data.frame(col1 = sample(1:4, 10, replace = TRUE))
mydf$col2 <- rep(NA, nrow(mydf))
mydf[mydf$col1 == 1, ][, "col2"] <- "A"
mydf[mydf$col1 == 2, ][, "col2"] <- "B"
mydf[mydf$col1 == 3, ][, "col2"] <- "C"
mydf[mydf$col1 == 4, ][, "col2"] <- "D"
col1 col2
1 1 A
2 1 A
3 2 B
4 1 A
5 3 C
6 2 B
7 4 D
8 3 C
9 4 D
10 4 D
Here's one using car's recode.
library(car)
mydf$col3 <- recode(mydf$col1, "1" = 'A', "2" = 'B', "3" = 'C', "4" = 'D')
One more from this question:
mydf$col4 <- c("A", "B", "C", "D")[mydf$col1]

You could have a look at ?symnum.
In your case, something like:
col2<-symnum(col1, seq(0.5, 4.5, by=1), symbols=c("G", "H", "J", "K"))
should get you close.

Related

Is it possible to return the values of reference columns to multiple columns in r

I am organizing a large dataset adapted to my research. Suppose that I have 9 observations (records) and 4 columns as follows:
z <- data.frame("fa" = c(1, NA, NA, 2, 1, 1, 2, 1, 1),
"fb" = c(2, 2, NA, 1, NA, NA, NA, 1, 2),
"initial_1" = c("A", "B", "B", "B", "A", "C", "D", "B", "A"),
"initial_2" = c("D", "C", "C", "A", "B", "A", "A", "D", "D"))
I want to create two new columns, fa_new and fb_new according to the values of the first two columns, fa and fb, which are linked to the reference columns, initial_1 and initial_2, such that fa == # is matching to intial_#.
For example, as can be seen above, the first record of the column fa is 1 which is linked to "A" of intial_1. Thus, the first record of the new column fa_new will be "A". Likewise, the first record of fb is 2 which is linked to "D" of intial_2; thus, the first record of fb_new will be "D".
Accordingly, my expectation is:
fa_new fb_new
1 A D
2 NA C
3 NA NA
4 A B
5 A NA
6 C NA
7 A NA
8 B B
9 A D
Is this possible using r?
You can use lapply to do this for multiple columns :
cols <- 1:2
init_cols <- paste0('initial_', cols)
new_cols <- paste0(names(z)[cols], '_new')
inds <- 1:nrow(z)
z[new_cols] <- lapply(z[cols], function(x) z[init_cols][cbind(inds, x)])
z
# fa fb initial_1 initial_2 fa_new fb_new
#1 1 2 A D A D
#2 NA 2 B C <NA> C
#3 NA NA B C <NA> <NA>
#4 2 1 B A A B
#5 1 NA A B A <NA>
#6 1 NA C A C <NA>
#7 2 NA D A A <NA>
#8 1 1 B D B B
#9 1 2 A D A D
The logic here is we create a matrix with cbind which has row/column number. The row number is inds (1:nrow(z)) whereas column number comes from fa/fb columns which is used to subset z dataframe.
The actual dataframe is labelled dataset, the following answer should work on the real data.
cols <- 1:2
init_cols <- paste0('fuinitials_', 1:94)
new_cols <- paste0(names(z)[cols], '_new')
inds <- 1:nrow(z)
z1 <- data.frame(z)
z1[cols][z1[cols] < 1] <- NA
z1[new_cols] <- lapply(z1[cols], function(x) z1[init_cols][cbind(inds, x)])

Sort matrix by colnames from another matrix

I have two matrices with the same dimensions and they both have the same stock names as colnames, but in a different order!
I would like to sort the matrix "A" by the colnames of the matrix "B".
So the A colnames and the according value should be in the same order as the colnames of B.
How can I do this?
Example:
Kind Regards
Your example in R terms would be
A <- matrix(c(1, 4, 2), nrow = 1)
colnames(A) <- c("B", "D", "E")
A
# B D E
# [1,] 1 4 2
B <- matrix(c(2, 5, 1), nrow = 1)
colnames(B) <- c("E", "B", "D")
B
# E B D
# [1,] 2 5 1
Then we may simply subset the columns of A in the same order as they are in B:
A[, colnames(B)]
# E B D
# 2 1 4

filter rows of dataframe based on an ordered vector of characters

Not sure if my question is a duplicate, but searching in stackoverflow did not yield any possible solutions.
I have the following data frame
num char
1 A
2 K
3 I
4 B
5 I
6 N
7 G
8 O
9 Z
10 Q
I would like to select only those rows that form the word BINGO (in that order) in the char column resulting in the following dataframe:
num char
4 B
5 I
6 N
7 G
8 O
Any help would be much appreciated.
One option is to use zoo::rollapply:
library(zoo)
bingo = c("B", "I", "N", "G", "O") # the pattern you want to check
# use rollapply to check if the pattern exists in any window
index = which(rollapply(df$char, length(bingo), function(x) all(x == bingo)))
# extract the window from the table
df[mapply(`:`, index, index + length(bingo) - 1),]
# num char
#4 4 B
#5 5 I
#6 6 N
#7 7 G
#8 8 O
Here is a solution using a recursive function - the letters of BINGO do not need to be consecutive, but they do need to be in order.
df <- data.frame(num=1:10,char=c("A","K","I","B","I","N","G","O","Z","Q"),stringsAsFactors = FALSE)
word<-"BINGO"
chars<-strsplit(word,"")[[1]]
findword <- function(chars,df,a=integer(0),m=0){ #a holds the result so far on recursion, m is the position to start searching
z <- m+match(chars[1],df$char[(m+1):nrow(df)]) #next match of next letter
if(!is.na(z)){
if(length(chars)==1){
a <- c(z,a)
} else {
a <- c(z,Recall(chars[-1],df,a,max(m,z))) #Recall is function referring to itself recursively
}
return(a) #returns row index numbers of df
} else {
return(NA)
}
}
result <- df[findword(chars,df),]
I went too fast the first time but based on the example you have given, I think this can work :
filter(df[which(df$char == "B"):dim(df)[1],], char %in% c("B","I","N","G","O"))
I guess nobody like loops but this is a possibility in base:
char <- c("A", "K", "I", "B", "I", "N", "G", "O", "Z", "Q")
num <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df <- data.frame(num, char)
word <- "BINGO"
index <- NULL
for(z in 1:nrow(df)){
if(substr(word, 1,1) == as.character(df[z,2])){
index <- c(index, z)
word <- substr(word, 2, nchar(word))
}
}
df[index,]
d = data.frame(num=1:15, char=c('A', 'K', 'I', 'B', 'I', 'N', 'G', 'O', 'Z', 'Q', 'B', 'I', 'N', 'G', 'O'))
w = "BINGO"
N = nchar(w)
char_str = paste(d$char, sep='', collapse='')
idx = as.integer(gregexpr(w, char_str)[[1]])
idx = as.integer(sapply(idx, function(i)seq(i, length=N)))
d[idx, ]
num char
4 4 B
5 5 I
6 6 N
7 7 G
8 8 O
11 11 B
12 12 I
13 13 N
14 14 G
15 15 O

na.strings applied to a dataframe

I currently have a dataframe in which there are several rows I would like converted to "NA". When I first imported this dataframe from a .csv, I could use na.strings=c("A", "B", "C) and so on to remove the values I didn't want.
I want to do the same thing again, but this time using a dataframe already, not importing another .csv
To import the data, I used:
data<-read.csv("code.csv", header=T, strip.white=TRUE, stringsAsFactors=FALSE, na.strings=c("", "A", "B", "C"))
Now, with "data", I would like to subset it while removing even more specific values in the rows.. I tried someting like:
data2<-data.frame(data, na.strings=c("D", "E", "F"))
Of course this doesn't work because I think na.strings only works with the "read" package.. not other functions. Is there any equivalent to simply convert certain values into NA so I can na.omit(data2) fairly easily?
Thanks for your help.
Here's a way to replace values in multiple columns:
# an example data frame
dat <- data.frame(x = c("D", "E", "F", "G"),
y = c("A", "B", "C", "D"),
z = c("X", "Y", "Z", "A"))
# x y z
# 1 D A X
# 2 E B Y
# 3 F C Z
# 4 G D A
# values to replace
na.strings <- c("D", "E", "F")
# index matrix
idx <- Reduce("|", lapply(na.strings, "==", dat))
# replace values with NA
is.na(dat) <- idx
dat
# x y z
# 1 <NA> A X
# 2 <NA> B Y
# 3 <NA> C Z
# 4 G <NA> A
Just assign the NA values directly.
e.g.:
x <- data.frame(a=1:5, b=letters[1:5])
# > x
# a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
# convert the 'b' and 'd' in columb b to NA
x$b[x$b %in% c('b', 'd')] <- NA
# > x
# a b
# 1 1 a
# 2 2 <NA>
# 3 3 c
# 4 4 <NA>
# 5 5 e
data[ data == "D" ] = NA
Note that if you were trying to replace NA with "D", the reverse (df[ df == NA ] = "D") will not work; you would need to use df[is.na(df)] <- "D"
Since we don't have your data I will use mtcars. Suppose we want to set values anywhere in mtcars that are equal to 4 or 19.2 to NA
ind <- which(mtcars == 4, arr.ind = TRUE)
mtcars[ind] <- NA
In your setting you would replace this number by "D" or "E"

Finding the Column Index for a Specific Value

I am having a brain cramp. Below is a toy dataset:
df <- data.frame(
id = 1:6,
v1 = c("a", "a", "c", NA, "g", "h"),
v2 = c("z", "y", "a", NA, "a", "g"),
stringsAsFactors=F)
I have a specific value that I want to find across a set of defined columns and I want to identify the position it is located in. The fields I am searching are characters and the trick is that the value I am looking for might not exist. In addition, null strings are also present in the dataset.
Assuming I knew how to do this, the variable position indicates the values I would like returned.
> df
id v1 v2 position
1 1 a z 1
2 2 a y 1
3 3 c a 2
4 4 <NA> <NA> 99
5 5 g a 2
6 6 h g 99
The general rule is that I want to find the position of value "a", and if it is not located or if v1 is missing, then I want 99 returned.
In this instance, I am searching across v1 and v2, but in reality, I have 10 different variables. It is also worth noting that the value I am searching for can only exist once across the 10 variables.
What is the best way to generate this recode?
Many thanks in advance.
Use match:
> df$position <- apply(df,1,function(x) match('a',x[-1], nomatch=99 ))
> df
id v1 v2 position
1 1 a z 1
2 2 a y 1
3 3 c a 2
4 4 <NA> <NA> 99
5 5 g a 2
6 6 h g 99
Firstly, drop the first column:
df <- df[, -1]
Then, do something like this (disclaimer: I'm feeling terribly sleepy*):
( df$result <- unlist(lapply(apply(df, 1, grep, pattern = "a"), function(x) ifelse(length(x) == 0, 99, x))) )
v1 v2 result
1 a z 1
2 a y 1
3 c a 2
4 <NA> <NA> 99
5 g a 2
6 h g 99
* sleepy = code is not vectorised
EDIT (slightly different solution, I still feel sleepy):
df$result <- rapply(apply(df, 1, grep, pattern = "a"), function(x) ifelse(length(x) == 0, 99, x))

Resources