Selecting specific elements in a vector in R - r

I have a vector,
myvector <- c("a","b","c","cat","4","dog","cat","f"). I would like to select out those elements that immediately follow elements containing the string "cat".
I.e., I want myvector2 containing only "4" and "f". I'm not sure where to begin.

myvector <- c("a","b","c","cat","4","dog","cat","f")
where_is_cat <- which(myvector == "cat")
# [1] 4 7
myvector[where_is_cat + 1]
# [1] "4" "f"
myvector2 <- myvector[where_is_cat + 1]

Try this:
x[grep('cat',x)+1]
#[1] "4" "f"

You can subset list minus its first element (list[-1]) by indices where list minus its last element (list[-length(list)]) equals "cat"
list[-1][list[-length(list)]=="cat"]
# [1] "4" "f"

Related

Append value to more than one position in vector [duplicate]

This question already has answers here:
Insert elements into a vector at given indexes
(8 answers)
insert elements in a vector in R
(6 answers)
Closed 4 years ago.
How do we append a single value to multiple positions in a vector?
x=c(1,2,3)
append(x, "a", c(1,3))
[1] "1" "a" "2" "3"
Warning messages:
1: In if (!after) c(values, x) else if (after >= lengx) c(x, values) else c(x[1L:after], :
條件的長度 > 1,因此只能用其第一元素
2: In if (after >= lengx) c(x, values) else c(x[1L:after], values, :
條件的長度 > 1,因此只能用其第一元素
3: In 1L:after : numerical expression has 2 elements: only the first used
4: In (after + 1L):lengx :
numerical expression has 2 elements: only the first used
With the above code, only the first position is registered, with a warning message.
lapply(c(1,3), function(y) append(x, 'a', y))
yields this result:
[[1]]
[1] "1" "a" "2" "3"
[[2]]
[1] "1" "2" "3" "a"
Expected output:
1 a 2 3 a
You can use `Reduce function:
x=1:10
pos=c(3,5,7,10)
Reduce(function(i,j)append(i,"a",j),cumsum(c(pos[1],diff(pos)+1)),init=x)
[1] "1" "2" "3" "a" "4" "5" "a" "6" "7" "a" "8" "9" "10" "a"

sapply not applying a function created to all rows in R dataframe

I have the following dataframe in R and am trying to use a stringsplit function to the same to yield a different dataframe
DF
A B C
"1,2,3" "1,2"
"2" "1"
The cells of the dataframe are filled with characters. The empty spaces are blank values. I have created the following function
sepfunc<-function(x){strsplit(as.character(x, split= ","))[[1]][1]}
The function works neatly when i use it on a single column
sapply(DF$A, sepfunc)
[1] "1" "2"
However, the following command yields only a single row
sapply(DF, sepfunc)
A B C
"1" NA "1"
The second row is not displayed. I know I must be missing something rudimentary. I request someone to help.
The expected output is
A B C
"1" NA "1"
"2" "1" "NA"
When we do the strsplit, the output is a list of vectors. If we just subset the first list element with [[1]], then the rest of the elements are skipped. Here the first element corresponds to the first row. But, when we do the same on a single column, it is looping through each element and then do the strsplit. It will not hurt by taking the first element [[1]] because the list is of length 1. Here, the case is different. The number of list elements are the same as the number of rows for each of the columns. So, we need to loop through the list (either with sapply/lapply - former gives a vector depends on the case, while latter always return list)
sapply(DF, function(x) sapply(strsplit(as.character(x), ","), `[`, 1))
# A B C
#[1,] "1" NA "1"
#[2,] "2" "1" NA
Let's look this more closely by splitting the codes into chunks. On each column, we can find the output as list of splitted vectors
lapply(DF, function(x) strsplit(as.character(x), ","))
#$A
#$A[[1]]
#[1] "1" "2" "3"
#$A[[2]]
#[1] "2"
#$B
#$B[[1]]
#[1] NA
#$B[[2]]
#[1] "1"
#$C
#$C[[1]]
#[1] "1" "2"
#$C[[2]]
#character(0)
When we do [[1]], the first element is extracted i.e. the first row of 'A', 'B', 'C'
lapply(DF, function(x) strsplit(as.character(x), ",")[[1]])
#$A
#[1] "1" "2" "3"
#$B
#[1] NA
#$C
#[1] "1" "2"
If we again subset on the above, i.e. the first element, the output will be 1 NA 1.
Instead we want to loop through the list and get the first element of each list
As you only want to extract the first part before the , you can also do
sapply(DF, function(x) gsub("^([^,]*),.*$", "\\1", x))
# A B C
# [1,] "1" NA "1"
# [2,] "2" NA "1"
This extracts the the first group (\\1) which is here marked with brackets. ([^,]*)
Or with stringr:
library(stringr)
sapply(DF, function(x) str_extract(x, "^([^,]*)"))
Here is another version of this
lapply(X = df, FUN = function(x) sapply(strsplit(x = as.character(x), split = ","), FUN = head, n=1))
First of all, notice that your sepfun should always give an error:
sepfunc<-function(x){strsplit(as.character(x, split= ","))[[1]][1]}
split should go with strsplit, not as.character, so what you meant is probably:
sepfunc<-function(x){strsplit(as.character(x), split= ",")[[1]][1]}
Second, the question of data sanity. You have character variables stored as factors, and missing data stored as empty strings. I would recommend dealing with these issues before trying to do anything else. (Why do I say NA is more sensible here than an empty string? Because you told me so. You want NA's in the output, so I guess this means that if there are no numbers in the string, it means that something is missing. Missing = NA. There is also a technical reason which would take a bit longer to explain.)
So in the following, I'm just using an altered version of your DF:
DF <- data.frame(A=c("1,2,3", "2"), B=c(NA, "1"), C=c("1,2", NA), stringsAsFactors=FALSE)
(If DF comes from a file, then you could use read.csv("file", as.is=TRUE). And then DF[DF==""] <- NA.)
The output of strsplit is a list so you'll need sapply to get something useful out from it. And another sapply to apply it to all columns in a data frame.
sapply(DF, function(x) sapply(strsplit(x, ","), head, 1))
# A B C
# [1,] "1" NA "1"
# [2,] "2" "1" NA
Or step by step. Before you can sapply a function over all columns of a data frame, you need it to give meaningful results for all the columns. Let's try:
sf <- function(x) sapply(strsplit(x, ","), head, 1)
# and sepfunc as defined above:
sepfunc<-function(x){strsplit(as.character(x), split= ",")[[1]][1]}
sf(DF$A)
# [1] "1" "2"
# as expected
sepfunc(DF$A)
# [1] "1"
Notice that sepfunc uses only the first element (as you told it to!) of each column, and the rest is discarded. You need sapply or something similar to use all elements. So as a consequence, you get this:
sapply(DF, sepfunc)
# A B C
# "1" NA "1"
(It works, because we've redefined empty strings as NA. But you get the results only for the first row of each variable.)
sapply(DF, sf)
# A B C
# [1,] "1" NA "1"
# [2,] "2" "1" NA

Extracting one column based on max of other columns of a Dataframe in R

I am trying to fetch the value in column in 'a' corresponding to the max values od columns 'c','d' and 'e' and then store it in a vector.
I have written below code which gives column 'a' data along with two NA.
Can somebody help me to fetch the exact data using sapply.
a<-c('A','B','C','D','E')
b<-c(10,30,45,25,40)
c<-c(19,23,25,37,39)
d<-c(43,21,17,14,26)
e<-c(NA,23,45,32,NA)
df<-data.frame(a,b,c,d,e)
A1<-vector("character",3)
for (i in 3:5){
A1[i]<-c(df[which(df[,i]==max(df[,i],na.rm = TRUE)),1])
A1
}
Actual Result: > A1
[1] "" "" "E" "A" "C"
Expected Result: A1 should have "E" "A" "C"
Please suggest a solution using sapply.
Thanks
We can use mapply
unname(mapply(function(x, y) x[which(y == max(y, na.rm = TRUE))], df[1], df[3:5]))
#[1] "E" "A" "C"
In the loop, the indexing starts from 3:5 which is the index for the columns while the 'A1' vector object is initialized to 3 elements. If the assignment starts from the 3rd element onwards, the vector just appends new elements while keeping the first 2 elements untouched.
A1<-vector("character",3)
A1
#[1] "" "" ""
A2 <- A1
A2[3:5] <- 15
A2
#[1] "" "" "15" "15" "15" #### this is the same thing happening in the loop
Instead, we can loop over the sequence and then assign
i1 <- 3:5
for(i in seq_along(i1)) {
A1[i] <- df[which(df[,i1[i]]==max(df[,i1[i]],na.rm = TRUE)),1]
}
A1
#[1] "E" "A" "C"

How can I compare two lists and output "hits" into a dataframe

I've tried to find answers here and on google but no luck, been struggling with this issue for some days so would really appreciate help. I'm analyzing a network to see if cycles tend to be within discreet communities or between them, or no pattern. My data are a list of cycles (three nodes forming a loop) and a list of communities (variable amount of nodes). I have two questions, 1) how to compare two lists, and 2) how to output the comparison results in a way which is readable:
Question 1
I have two lists (both igraph objects), one containing 678 items (each of 3 elements, all characters) and another containing 11 items each with a differing number of elements. Example:
x1 <- as.character(c(1,3,5))
x2 <- as.character(c(2,4,6))
x3 <- as.character(c(7,8,9))
x4 <- as.character(c(10,11,12))
x <- list(x1, x2, x3, x4)
y1 <- as.character(c(1,2,3,4,5))
y2 <- as.character(c(2,3,4,5))
y3 <- as.character(c(1,2,3,4,5,7,8,9))
y <- list(y1, y2, y3)
Giving:
> x
[[1]]
[1] "1" "3" "5"
[[2]]
[1] "2" "4" "6"
[[3]]
[1] "7" "8" "9"
[[4]]
[1] "10" "11" "12"
> y
[[1]]
[1] "1" "2" "3" "4" "5"
[[2]]
[1] "2" "3" "4" "5"
[[3]]
[1] "1" "2" "3" "4" "5" "7" "8" "9"
I want to compare every component in x against every component in y and add every hit (i.e. when all the elements from x[[i]] are also found in y[[i]]) to a new dataframe. I tried a loop using all() and %in% but this didn't work:
for (i in 1:length(x)) {
for (j in 1:length(y)) {
hits <- all(y[[j]] %in% x[[i]]) == TRUE
print(hits)
}
}
This returns 12 FALSE hits. Checking individual components, it should have worked, because:
all(x[[1]] %in% y[[1]])
Returns TRUE as it should, and:
all(x[[1]] %in% y[[2]])
Returns FALSE as it should. Where am I going wrong here?
Question 2
I have seen some solutions for outputting loop results into a df, but that's not exactly what I need. What I want as an output is a dataframe telling me which community every cycle is in. Since there's only 11 communities, it could just refer me to the list component's index, but I haven't found a way to do this. I could also just use paste() to concatenate the node names of a community into a title. Either way, here is the output I need:
cycle community
1 1_3_5 1_2_3_4_5
2 1_3_5 1_2_3_4_5_7_8_9
3 7_8_9 1_2_3_4_5_7_8_9
I'm guessing some kind of an if statement. I feel this should be fairly simple to execute and that I should have been able to work it out myself. Nevertheless, thank you for your time and sorry about the essay.
You made a mistake
for (i in 1:length(x)) {
for (j in 1:length(y)) {
# hits <- all(y[[j]] %in% x[[i]]) == TRUE
hits <- all(x[[i]] %in% y[[j]]) == TRUE
print(hits)
}
}
For the second part you can store the indexes that have a hit and use them for later.
a <- list()
for (i in 1:length(x)) {
for (j in 1:length(y)) {
# hits <- all(y[[j]] %in% x[[i]]) == TRUE
hits <- all(x[[i]] %in% y[[j]]) == TRUE
if(hits == TRUE){
a[[length(a)+1]] <- c(i,j)
}
}
}
The final part of the question, creation of cycle and community tags, can be accomplished with stringi::stri_join() (or paste() as pointed out in the comments). The final step to wrangle the list created in Jt Miclat's answer is as follows, using the indexes in the list a to extract the appropriate strings for cycle and community, generate data frames, and rbind() the result to a single data frame.
# combine with cycle & community tags
cycles <- sapply(x,paste,collapse="_")
communities <- sapply(y,paste,collapse="_")
b <- lapply(a,function(x){
cycle <- cycles[x[1]]
community <- communities[x[2]]
data.frame(x=x[1],y=x[2],cycle=cycle,community=community,
stringsAsFactors=FALSE)
})
df <- do.call(rbind,b)
df
...and the output:
> df <- do.call(rbind,b)
> df
x y cycle community
1 1 1 1_3_5 1_2_3_4_5
2 1 3 1_3_5 1_2_3_4_5_7_8_9
3 3 3 7_8_9 1_2_3_4_5_7_8_9
>
Well you can make use of outer:
outer(x,y,function(w,z)Map(function(i,j)all(i%in%j),w,z))->results
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE TRUE
[4,] FALSE FALSE FALSE
x is the rows while y is the columns, so to check all(x[[1]]%in%y[[2]]),just check row 1 column 2 ie element [1,2] etc..
Then you can use apply with a own created function:
fun<-function(i)c(paste(x[[i[1]]],collapse ="_"), paste(y[[i[2]]],collapse ="_"))
t(apply(which(result==T,T),1,fun))
[,1] [,2]
[1,] "1_3_5" "1_2_3_4_5"
[2,] "1_3_5" "1_2_3_4_5_7_8_9"
[3,] "7_8_9" "1_2_3_4_5_7_8_9"

Converting a list of lists of strings to a data frame of numbers in R

I have a list of lists of strings as follows:
> ll
[[1]]
[1] "2" "1"
[[2]]
character(0)
[[3]]
[1] "1"
[[4]]
[1] "1" "8"
The longest list is of length 2, and I want to build a data frame with 2 columns from this list. Bonus points for also converting each item in the list to a number or NA for character(0). I have tried using mapply() and data.frame to convert to a data frame and fill with NA's as follows.
# Find length of each list element
len = sapply(awards2, length)
# Number of NAs to fill for column shorter than longest
len = 2 - len
df = data.frame(mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len))
However, I do not get a data frame with 2 columns (and NA's as fillers) using the code above.
Thanks for the help.
We can use stri_list2matrix from stringi. As the list elements are all character vectors, it seems okay to use this function
library(stringi)
t(stri_list2matrix(ll))
# [,1] [,2]
#[1,] "2" "1"
#[2,] NA NA
#[3,] "1" NA
#[4,] "1" "8"
If we need to convert to data.frame, wrap it with as.data.frame

Resources