Split elements at a value delimiter in vector R [duplicate] - r

This question already has an answer here:
How to split a vector by delimiter?
(1 answer)
Closed 6 years ago.
I am trying to split a vector at a certain value delimiter.
I have the following vector: v <- c("A", "B", "C","-" ,"D", "E", "F")
Let's say for this example, the value delimiter is: '-'.
What I want is to obtain several vectors as many as this vector contain this delimiter, since I don't know how many of them are in there. In this example, the results I want should be two vectors v1 and v2 and the output would be like:
> v1
[1] "A" "B" "C"
and
> v2
[1] "D" "E" "F"
Is there a method or package that does this ?

We can use cumsum on the logical vector and then do the split in to list of vectors.
lst <- split(v[v!='-'], cumsum(v=="-")[v!='-'])
names(lst) <- paste0("v", seq_along(lst))
If we need it as vector objects, use list2env (not recommended though)
list2env(lst, envir = .GlobalEnv)
Or otherwise, we can directly create vector objects in the global environment
i1 <- v=="-"
i2 <- v!= "-"
grp <- cumsum(i1)
v1 <- v[i2 & grp==0]
v2 <- v[i2 & grp == 1]

Related

substitute the elements of a vector with values from dataframe

I need to substitute the elements of a vector which match the elements of a particular column in data frame in R.
Reproducible example:
a<-c("A","B","C","D")
b<-data.frame(col1=c("B","C","E"),col2=c("T","Y","N"))
I need to get the following vector:
new<-c("A","T","Y","D")
What I tried is:
new <- a
new <- b$col2[match(a, b$col1)]
which does the substitution, but converts the unmatched elements into NAs.
Any help is appreciated
You can make a data.table from a and then update only the rows for which there is a match when joining with b.
library(data.table)
setDT(b)
data.table(a)[b, on = .(a = col1), a := i.col2][]
# a
# 1: A
# 2: T
# 3: Y
# 4: D
In base R you could use your current approach but replace the NAs with elements of a using ifelse
temp <- as.character(b$col2[match(a, b$col1)])
ifelse(is.na(temp), a, temp)
# [1] "A" "T" "Y" "D"
You can use replace in base R:
a<-c("A","B","C","D")
b<-data.frame(col1=c("B","C","E"),col2=c("T","Y","N"), stringsAsFactors = F)
replace(a, which(a %in% b$col1), b$col2[b$col1 %in% a])
#[1] "A" "T" "Y" "D"

removing and appending elemets from and to a list based on a vector overlap

I have a list with ~ 500 model objects. The names of this objects are v1:
existing.list <- vector("list", 3)
v1 <- names(existing.list) <- c("A", "B", "C")
I now get different dataset, which i need to model, too, and save in the same list. The objects in this new dataset are overlapping with the some of the objects in existing.list. Because it is very time-consuming, i do want to keep the old results. The names of this new dataset are v2:
v2 <- c("B", "C", "D")
I first want to remove the objects in v1, which are not in v2, then append to existing.list all the new, unique names from v2.
I can do the first task in a rather complicated way:
rm <- v1[!v1 %in% v2]
rm.i <- which(v1 %in% rm)
v1 <- v1[-rm.i]
But then i fail at appending the new objects, as determined by the unique elements in v2:
new.elements <- v2[!v2 %in% v1]
The desired output is a modified existing.list, with intact elements "B" and "C" and a new empty element "D". Basically, its a list with elements determined by the names in v2, but for a number of reasons it would be complicated to just create a new list and copy parts of existing.list to it.
Since i need to do this for a number of lists, a less complicated way than i am doing now would be handy.
Thank you very much in advance! This is a last minute addition to a project, so any help is highly appreciated!
this question is based on a previous question, which i sloppily worded and thus created confusion. My thanks to those users, who still tried to help me.
If I understood you correctly you can first get the names of elements that are in v2 but not in v1
tmp <- setdiff(v2, v1) # "D"
And then subset existing.list and append it as follows
existing.list <- c(existing.list[v1 %in% v2],
setNames(vector("list", length(tmp)), tmp))
Result
existing.list
#$B
#NULL
#$C
#NULL
#$D
#NULL
Are these what you are looking for?
intersect(v1, v2)
# [1] "B" "C"
setdiff(v2, v1)
# [1] "D"

Is there a R function for limiting the length of list elements?

I am struggling with a list manipulation in R right now. I have a list containing about 3000 elements, where each element is a character vector. The length of these character vectors is between 7 and 10.
I would like to manipulate this list in such a way, that those character vectors, that contain more than 7 elements, are limited to only the first 7 elements - hence drop the 8th, 9th, and 10th element/word/number of the respective character vector of the list.
Is there an easy way to do this? I hope you understand what I mean.
Thanks in advance!
You can use lapply as below:
mylist <- list(a = c("a", "b"),
b = c("a", "b", "c"))
mylist
$a
[1] "a" "b"
$b
[1] "a" "b" "c"
mylist2 <- lapply(mylist, function(x) {
x[1:min(length(x), 2)]
})
mylist2
$a
[1] "a" "b"
$b
[1] "a" "b"
What you need is an auxiliary function that will shorten your vector. Something like
shorten_vector <- function(y, max_length = 7){
# NOTE: assumes that there are at least 7 elements in the vector.
y[seq_len(max_length)]
}
you can then shorten the vectors in your list with
lapply(your_list, shorten_vector)
Or better
lapply(your_list, head, 7) # Thanks Moody
Reproducible example
# Make an object for an example. A list of length 15
# where each element is a character vector between length 7 and 10
random_length <- sample(7:10, 15, replace = TRUE)
char_list <-
lapply(random_length,
function(x){
letters[seq_len(x)]
})
# utility function
shorten_vector <- function(y, max_length = 7){
y[seq_len(max_length)]
}
lapply(char_list,
shorten_vector)
Bonus
You said in a comment on Sonny's answer that you weren't really sure how the lapply worked. At it's conceptual core, lapply is a wrapper around a for loop. The equivalent for loop would be
for(i in seq_along(char_list)){
char_list[[i]] <- shorten_vector(char_list[[i]])
}
char_list
The lapply just handles the iteration limits for you and looks a little cleaner on the screen.

Numeric column is coerced to character column when another character column is modified

I have a data frame with two columns, the first one contains numbers, the second one strings. My problem is: once I replace a string in the second column by another string, the first column is coerced from class numeric to character. Here is an example:
df <- data.frame(num = c(1,2), char = c("a", "b"), stringsAsFactors = F)
class(df$num) # "numeric"
class(df$char) # "character"
df[df$char == "a", ] <- "c"
class(df$char) # "character"
class(df$num) # "character" !!
What's the reason for this behavior and how to stop it?
I found my error: df[df$char == "a", ] <- "c" overwrites the whole row, which is why the first column is coerced. The correct way to replace "a" by "c" is: df$char[df$char == "a"] <- "c".
Look at df after you change it:
> df
num char
1 c c
2 2 b
>
So of course $num has become character. Your command (because of its comma syntax) identified entire rows to be changed.
A different substitution command
df[df == "a"] <- "c"
does what you were expecting.

Using row-wise column indices in a vector to extract values from data frame [duplicate]

This question already has an answer here:
Get the vector of values from different columns of a matrix
(1 answer)
Closed 5 years ago.
Using vector of column positional indexes such as:
> i <- c(3,1,2)
How can I use the index to extract the 3rd value from the first row of a data frame, the 1st value from the second row, the 2nd value from the third row, etc.
For example, using the above index and:
> dframe <- data.frame(x=c("a","b","c"), y=c("d","e","f"), z=c("g","h","i"))
> dframe
x y z
1 a d g
2 b e h
3 c f i
I would like to return:
> [1] "g", "b", "f"
Just use matrix indexing, like this:
dframe[cbind(seq_along(i), i)]
# [1] "g" "b" "f"
The cbind(seq_along(i), i) part creates a two column matrix of the relevant row and column that you want to extract.
How about this:
Df <- data.frame(
x=c("a","b","c"),
y=c("d","e","f"),
z=c("g","h","i"))
##
i <- c(3,1,2)
##
index2D <- function(v = i, DF = Df){
sapply(1:length(v), function(X){
DF[X,v[X]]
})
}
##
> index2D()
[1] "g" "b" "f"

Resources