R subscript based on a vector

R subscript based on a vector - r

df <- data.frame(name=c('aa', 'bb', 'cc','dd'),
code=seq(1:4), value= seq(100, 400, by=100))
df
v <- c(1, 2, 2)
v
A <- df[df$code %in% v,]$value
A
str(A)
I tried to obtain the corresponding value based on the code. I was expecting A to be of length 3; but it actually returns a vector of 2. What can I do if I want A to be a vector of 3, that is c(100,200,200)?

%in% returns a logical vector, the same length as vector 1, that indicates whether each element of vector 1 occurs in vector 2.
In contrast, the match function returns, for each element of vector 1, the position in vector 2 where the element first appears (or NA if it doesn't exist in vector 2). Try the following:
df[match(v, df$code), 'value']

You could just use v as an argument if those were the lines whose "value"s you wanted:
> df[v,]$value
[1] 100 200 200

df[v,2] # minimum characters :)

Related

R: Can the entries of a data frame be vectors of length > 1?

Is it possible for the entries of a data frame to be vectors of length > 1? For example, I tried the following:
A <- data.frame(matrix(ncol=2,nrow=2))
A[1,1] <- list("a","b")
But I got the following warning:
Warning message:
In `[<-.data.frame`(`*tmp*`, 1, 1, value = list("a", "b")) :
provided 2 variables to replace 1 variables
The result was that A[1,1] was assigned the value "a" rather than ("a","b"). Is there a way to make this work? Or do I need to use a multidimensional array?

It is possible but you need to convert the column to a list. Here is a raw example:
A[[1]] <- vector(mode="list", length=2L)
A[[c(1,1)]] <- list("a","b")
A
X1 X2
1 a, b NA
2 NULL NA

R: Sum values that are in different parts of vector

I have a number of vectors consisting of 1s and 0s, such as:
[1] x <- c(0, 1, 1, 0, 0, 1)
I would like to count the number of consecutive 1s at different parts of these sequences, and in this instance end up with:
[1] 2 1
I have considered using something like strsplit to split the sequence where there are zeros, though it is a numeric vector so strsplit won't work and ideally I don't want to change back and forth between numeric and character format.
Is there another, simpler, solution to this? Would appreciate any help.

You can split up the value into a vector and use rle like this:
With your original value
x <- 11001
temp <- rle(unlist(strsplit(as.character(x), split="")))
temp$lengths[temp$values == 1]
[1] 2 1
It's a bit simpler when starting with the vector as you don't have to use strsplit and unlist.
x <- c(0, 1, 1, 0, 0, 1)
temp <- rle(x)
temp$lengths[temp$values == 1]
[1] 2 1

Approximate pattern matching in a sequence of integer data and extraction using R

I have a pattern of integers as c(1,2,3,4,5) that needs to be approximately matched in a data as c(1,10,1,6,3,4,5,1,2,3,4,5,9,10,1,2,3,4,6)
I have tried:
pmatch()
all.equal()
grepl()
but they don't seem to support this scenario.
pattern <- c(1,2,3,4,5)
data <- c(1,10,1,6,3,4,5,1,2,3,4,5,9,10,1,2,3,4,6)
For above example I need to produce following output:
1,6,3,4,5
1,2,3,4,5
1,2,3,4,6
Appreciate any thoughts on this.
Thanks

I think you are saying "match a sequence of integers in another sequence of integers where at least N-1 of the integers match". It's unclear what the behavior should be in the case of overlapping matches, so the following will pick up sequences that do overlap.
# helper function to test "match" at a threshold of 4 matches
is_almost <- function(s1, s2, thresh = 4) {
sum(s1 == s2) >= thresh }
# function to lookup and return sequences
extract_seq <- function(pattern, data) {
res <- lapply(1:(length(data) - length(pattern) + 1), function(s) {
subseq <- data[s:(s+length(pattern)-1)]
if (is_almost(pattern, subseq)) {
subseq}
})
Filter(Negate(is.null),res)
}
# let's test it out
pattern <- c(1,2,3,4,5)
data <- c(1,10,1,6,3,4,5,1,2,3,4,5,9,10,1,2,3,4,6)
extract_seq(pattern,data)
[[1]]
[1] 1 6 3 4 5
[[2]]
[1] 1 2 3 4 5
[[3]]
[1] 1 2 3 4 6

If you want to find the unique elements in a vector that match a given vector you can use %Iin% to test for the presence of your 'pattern' within the larger vector. The operator, %in%, returns a logical vector. Passing that output to which() returns the index of each TRUE value which can be used to subset the larger vector to return all of the elements that match the 'pattern', regardless of order. Passing the subset vector to unique() eliminates duplicates so that there is only one occurence of each element from the larger vector that matches the elements and length of the 'pattern' vector.
For example:
> num.data <- c(1, 10, 1, 6, 3, 4, 5, 1, 2, 3, 4, 5, 9, 10, 1, 2, 3, 4, 5, 6)
> num.pattern.1 <- c(1,6,3,4,5)
> num.pattern.2 <- c(1,2,3,4,5)
> num.pattern.3 <- c(1,2,3,4,6)
> unique(num.data[which(num.data %in% num.pattern.1)])
[1] 1 6 3 4 5
> unique(num.data[which(num.data %in% num.pattern.2)])
[1] 1 3 4 5 2
> unique(num.data[which(num.data %in% num.pattern.3)])
[1] 1 6 3 4 2
Notice that the first result matches the order of num.pattern.1 by coincidence. The other two vectors do not match the order of the pattern vectors.
To find the exact sequence within num.data that matches the patterns you can use something similar to the following function:
set.seed(12102015)
test.data <- sample(c(1:99), size = 500, replace = TRUE)
test.pattern.1 <- test.data[90:94]
find_vector <- function(test.data, test.pattern.1) {
# List of all the vectors from test.data with length = length(test.pattern.1), currently empty
lst <- vector(mode = "list")
# List of vectors that meet condition 1, currently empty
lst2 <- vector(mode = "list")
# List of vectors that meet condition 2, currently empty
lst3 <- vector(mode = "list")
# A modifier to the iteration variable used to build 'lst'
a <- length(test.pattern.1) - 1
# The loop to iterate through 'test.data' testing for conditions and building lists to return a match
for(i in 1:length(test.data)) {
# The list is build incrementally as 'i' increases
lst[[i]] <- test.data[c(i:(i+a))]
# Conditon 1
if(sum(lst[[i]] %in% test.pattern.1) == length(test.pattern.1)) {lst2[[i]] <- lst[[i]]}
# Condition 2
if(identical(lst[[i]], test.pattern.1)) {lst3[[i]] <- lst[[i]]}
}
# Remove nulls from 'lst2' and 'lst3'
lst2 <- lst2[!sapply(lst2, is.null)]
lst3 <- lst3[!sapply(lst3, is.null)]
# Return the intersection of 'lst2' and 'lst3' which should be a match to the pattern vector.
return(intersect(lst2, lst3))
}
For reproducibility I used set.seed() and then created a test data set and pattern. The function find_vector() takes two arguments: first, test.data that is the larger numerical vector you wish to check for pattern vectors and second, test.pattern.1 that is the shorter numerical vector you wish to find in test.data. First, three lists are created: lst to hold test.data divided into smaller vectors of length equal to the length of the pattern vector, lst2 to hold the pattern vectors from lst that satisfy the first condition, and lst3 to hold from lst the vectors that satisfy the second condition. The first condition tests that the elements of the vectors in lst are in the pattern vector. The second condition tests that the vector from lst matches the pattern vector by order and by element.
One problem with this approach is that NULL values are introduced into each list when the conditions are not satisfied, but the process stops when the conditions are satisfied. For reference you may print the lists to see all the vectors tested, the vectors that meet the first condition, and the vectors that meet the second condition. The nulls can be removed. With the nulls removed, finding the intersection of lst2 and lst3 will reveal the pattern matched identically in test.data.
To use the function make sure to explicitly define test.data <- 'a numeric vector' and test.pattern.1 <- 'a numeric vector'. No special packages are needed. I didn't do any benchmarking, but the function appears to work fast. I also did not look for scenarios where the function would fail.

Access R output with subsetting

I do have the output of a function which looks like this
function(i,var1,list1)->h
and then the output
value
2.8763
There is a line break in the output and I only need the number bit of the result but not the string. Hence, I tried to use h[1] but this is
value
2.87..
and length(h) is also equal 1. Is there any way to access only the number in this case?
Thanks,

you are accessing only the value and the name you see is the name of the element in a vector you return. you can get rid of those names / attributes like this:
> v <- c("a" = 1, "b" = 2)
> v
a b
1 2
> attributes(v) <- NULL
> v
[1] 1 2

igraph assign a vector as an attribute for a vertex

I am trying to assign a vector as an attribute for a vertex, but without any luck:
# assignment of a numeric value (everything is ok)
g<-set.vertex.attribute(g, 'checked', 2, 3)
V(g)$checked
.
# assignment of a vector (is not working)
g<-set.vertex.attribute(g, 'checked', 2, c(3, 1))
V(g)$checked
checking the manual, http://igraph.sourceforge.net/doc/R/attributes.html
it looks like this is not possible. Is there any workaround?
Up till now the only things I come up with are:
store this
information in another structure
convert vector to a string with delimiters and store as a string

This works fine:
## replace c(3,1) by list(c(3,1))
g <- set.vertex.attribute(g, 'checked', 2, list(c(3, 1)))
V(g)[2]$checked
[1] 3 1
EDIT Why this works?
When you use :
g<-set.vertex.attribute(g, 'checked', 2, c(3, 1))
You get this warning :
number of items to replace is not a multiple of replacement length
Indeed you try to put c(3,1) which has a length =2 in a variable with length =1. SO the idea is to replace c(3,1) with something similar but with length =1. For example:
length(list(c(3,1)))
[1] 1
> length(data.frame(c(3,1)))
[1] 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R subscript based on a vector - r

You could just use v as an argument if those were the lines whose "value"s you wanted: > df[v,]$value [1] 100 200 200

df[v,2] # minimum characters :)

Related

R: Can the entries of a data frame be vectors of length > 1?

R: Sum values that are in different parts of vector

Approximate pattern matching in a sequence of integer data and extraction using R

Access R output with subsetting

igraph assign a vector as an attribute for a vertex

Categories

Resources