Adding values to a vector using for loop - r

I have a list named mylist that has character elements in it which I'm trying to merge and save in another object.
The following piece of code:
result <- c()
for (i in length(mylist)) {
temp <- paste(mylist[[i]][2], mylist[[i]][3], mylist[[i]][4], sep="")
result[i] <- temp
}
result
Results in the following output:
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
Why am I getting NA's instead of the merged characters for EVERY result[i]?

The reason for the unexpected result has already been explained by Brent and damir.
However, I suggest to use seq_along(mylist) as it is more safe than 1:length(mylist) in case mylist is empty for some reason.
result <- c()
for (i in seq_along(mylist)) {
result[i] <- paste(mylist[[i]][2:4], collapse = "")
}
result
[1] "BCD" "CDE" "DEF" "EFG" "FGH"
If mylist is empty, length(mylist) would become 0 but the loop would be executed twice for 1:0.
In addition, the collapse parameter tells paste() to concatenate the elements of a vector thereby saving a lot of typing.
By the way, the same result can be achieved by using sapply():
sapply(mylist, function(x) paste(x[2:4], collapse = ""))
[1] "BCD" "CDE" "DEF" "EFG" "FGH"
Data
The OP has not provided a reproducible example but says that he has "a list named mylist that has character elements". So, here are some made-up data:
mylist <- lapply(1:5, function(i) LETTERS[i + (0:3)])
mylist
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "B" "C" "D" "E"
[[3]]
[1] "C" "D" "E" "F"
[[4]]
[1] "D" "E" "F" "G"
[[5]]
[1] "E" "F" "G" "H"

length(myList) is just a scalar value (i.e., a single value). You need to add the starting value like this: 1:length(myList). I am guessing that the last value in myList has some result?

Your loop only starts and ends at length(myList), because the value of i is always length(myList). So, your loop "loops" only once.
To change this, you need to specify the start and end value of i. Something like,
for (i in 1:length(myList))

Related

apply function yields the wrong answer

I am trying to replace all NAs for those columns with 0 or 1 only. However, I found that apply failed to deal with the NAs. If I replace the NAs with an arbitrary string i.e. "Unknown". Then lapply and apply yield the same result. Any explanation would be greatly appreciated.
Here is an example.
df<-data.frame(a=c(0,1,NA),b=c(0,1,0),c=c('d',NA,'c'))
apply(df,2,function(x){all(x %in% c(0,1,NA)) })
unlist(lapply(df,function(x){all(x %in% c(0,1,NA))}))
It is not recommended to use apply on a data.frame with different classes. The recommended option is lapply. Issue is that with apply, it converts to matrix and this can result in some issues especially when there are missing values involved i.e. creating extra spaces.
apply(df, 2, I)
# a b c
#[1,] " 0" "0" "d"
#[2,] " 1" "1" NA
#[3,] NA "0" "c"
If instead if the first column was already character, then the NA conversion from NA_real_ to NA_character_ wouldn't occur i.e.
df1 <- df
df1$a <- as.character(c(0, 1, NA))
apply(df1, 2, I)
# a b c
#[1,] "0" "0" "d"
#[2,] "1" "1" NA
#[3,] NA "0" "c"
An option is to wrap with trimws to remove the leading spaces
apply(df,2,function(x){all(trimws(x) %in% c(0,1,NA)) })
# a b c
# TRUE TRUE FALSE
NOTE: For testing the presence of NA, it is recommended to use is.na instead of %in%

Set values NA from first occurence of Pattern to end

Is there a faster/ shorter way to set values after and including match to NA ?
vec <- 1:10;vec[c(3,5,7)]<-c(NA,NaN,"remove")
#"1" "2" NA "4" "NaN" "6" "remove" "8" "9" "10"
Desired Outcome:
#"1" "2" NA "4" "NaN" "6" NA NA NA NA
My code:
vec[{grep("^remove$",vec)[1]}:length(vec)]<-NA
Please note:
In that case, we assume there will be a "remove" element prominent. So the solution does not have to take care of the case that there isn't any.
You can use match to stop searching after the first match is found:
m = match("remove", vec) - 1L
if (is.na(m)){
vec
} else {
c(head(vec, m), rep(vec[NA_integer_], length(vec)-m))
}
You'd have to have a pretty large vector to notice a speed difference, though, I guess. Alternately, this might prove faster:
m = match("remove", vec)
if (!is.na(m)){
vec[m:length(vec)] <- NA
}
Not sure if this is shorter or faster but here is one alternative :
vec[which.max(vec == "remove"):length(vec)] <- NA
vec
#[1] "1" "2" NA "4" "NaN" "6" NA NA NA NA
Here , we find the first occurrence of "remove" using which.max and then add NA's till the end of the vector.
OP has mentioned that there is a "remove" element always present so we need not take care of other case however, in case we still want to keep a check we can add an additional condition.
inds <- vec == "remove"
if (any(inds)) {
vec[which.max(inds) : length(vec)] <- NA
}
We can use cumsum on a logical vector
vec[cumsum(vec %in% "remove") > 0] <- NA
We can also just extend the vec to the desired length:
`length<-`(vec[1:(which(vec=="remove")-1)],length(vec))
[1] "1" "2" NA "4" "NaN" "6" NA NA NA NA

Extracting one column based on max of other columns of a Dataframe in R

I am trying to fetch the value in column in 'a' corresponding to the max values od columns 'c','d' and 'e' and then store it in a vector.
I have written below code which gives column 'a' data along with two NA.
Can somebody help me to fetch the exact data using sapply.
a<-c('A','B','C','D','E')
b<-c(10,30,45,25,40)
c<-c(19,23,25,37,39)
d<-c(43,21,17,14,26)
e<-c(NA,23,45,32,NA)
df<-data.frame(a,b,c,d,e)
A1<-vector("character",3)
for (i in 3:5){
A1[i]<-c(df[which(df[,i]==max(df[,i],na.rm = TRUE)),1])
A1
}
Actual Result: > A1
[1] "" "" "E" "A" "C"
Expected Result: A1 should have "E" "A" "C"
Please suggest a solution using sapply.
Thanks
We can use mapply
unname(mapply(function(x, y) x[which(y == max(y, na.rm = TRUE))], df[1], df[3:5]))
#[1] "E" "A" "C"
In the loop, the indexing starts from 3:5 which is the index for the columns while the 'A1' vector object is initialized to 3 elements. If the assignment starts from the 3rd element onwards, the vector just appends new elements while keeping the first 2 elements untouched.
A1<-vector("character",3)
A1
#[1] "" "" ""
A2 <- A1
A2[3:5] <- 15
A2
#[1] "" "" "15" "15" "15" #### this is the same thing happening in the loop
Instead, we can loop over the sequence and then assign
i1 <- 3:5
for(i in seq_along(i1)) {
A1[i] <- df[which(df[,i1[i]]==max(df[,i1[i]],na.rm = TRUE)),1]
}
A1
#[1] "E" "A" "C"

How can I keep NA when I change levels

I build a vector of factors containing NA.
my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA
I change one of those levels.
levels(my_vec)[levels(my_vec) == "b"] <- "c"
NA disappears.
levels(my_vec)
# [1] "a" "c"
How can I keep it ?
EDIT
#rawr gave a nice solution that can work most of the time, it works for my previous specific example, but not for the one I'll show below
#Hack-R had a pragmatic option using addNA, I could make it work with that but I'd rather a fully general solution
See this generalized issue
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c" # NA disppeared
#rawr's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA "c" "c" # c is duplicated
#Hack-R's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA # NA is in the end
I want levels(my_vec) == c("a",NA,"c")
You have to quote NA, otherwise R treats it as a null value rather than a factor level. Factor levels sort alphabetically by default, but obviously that's not always useful, so you can specify a different order by passing a new list order to levels()
require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))
#now reorder levels
my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])
Levels: a NA c
I finally created a function that first replaces the NA value with a temp one (inspired by #lmo), then does the replacement I wanted the standard way, then puts NA back in its place using #rawr's suggestion.
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a c c
# Levels: a <NA> c
As a bonus level_sub can be used with na_rep = NULL which will remove the NA, and it will look good in pipe chains :).
level_sub <- function(x,from,to,na_rep = "NA"){
if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
levels(x)[levels(x) %in% from] <- to
if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
x
}
Nevertheless it seems that R really doesn't want you to add NA to factors.
levels(my_vec) <- c(NA,"a") will have a strange behavior but that doesn't stop here. While subset will keep NA levels in your columns, rbind will quietly remove them! I wouldn't be surprised if further investigation revealed that half R functions remove NA factors, making them very unsafe to work with...

matching two lists of words and return the matched words

I want to loop through a list of string to another list
cutter_Ch <- c('happy','birthday','Lucia')
pos <- c('Lucia','today')
one way I can do it is with lapply
pos.matches = lapply(cutter_Ch, pmatch, pos)
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] 1
However, I want the function return the matched string instead of NA and number of times matched, like this
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] Lucia
We need to use the index to subset the 'pos'
lapply(cutter_Ch, function(x) pos[pmatch(x, pos)])
It is not clear whether this example is simplified version of something more complex. Anyway, with str_extract we can get the same output in a vector
library(stringr)
str_extract(cutter_Ch, paste(pos, collapse="|"))
#[1] NA NA "Lucia"

Resources