I extract certain values out of dataset Z (the positions are given in dataset A) using a loop function.
#Exemplary datasets
Z <- data.frame(Depth=c(0.02,0.04,0.06,0.08,0.10,0.12,0.14,0.16,0.18,0.2),
Value=c(10,12,5,6,7,4,3,2,11,13))
A <- data.frame(Depth=c(0.067, 0.155))
for (n in c(1:nrow(A)))
+ {find_values <- Z$Value[Z$Depth>=A$Depth[n]][1]
+ print(find_values)}
#Result
[1] 6
[1] 2
The result seems to consist of values in two seperate vectors. How can I merge them in an easy way to one vector as follows?
[1] 6, 2
Thanks in advance!
For your code to work as it is you can store them using index in for loop
for (n in seq_len(nrow(A))) {
find_values[n] <- Z$Value[Z$Depth>=A$Depth[n]][1]
}
find_values
#[1] 6 2
However, you can simplify this with sapply by doing
sapply(A$Depth, function(x) Z$Value[which.max(Z$Depth >= x)])
#[1] 6 2
We can use a vectorized approach
Z$Value[findInterval(A$Depth, Z$Depth) + 1]
#[1] 6 2
Related
For example, I have two vectors:
x1 <- c(0,1,2)
x2 <- c(1,2,0,1,2,3,4,5,0,1,0,1,2)
I would like to find the position where x1 is first present x2, which is in this case is 3.
I have tried which(), match(), but both seems to find elements, instead of vectors. Is there a R function that can achieve this or a way to construct one?
Thank you!
If you are looking for the first match of 0:
match(x1[1], x2)
But for the whole vector:
> which(apply(t(embed(x2, length(x1))) == rev(x1), 2, all))[1]
[1] 3
>
Edit:
To give NA on no-match:
cond <- apply(t(embed(x2, length(x1))) == rev(x1), 2, all)
if (any(cond)) which(cond)[1] else NA
Explanation:
This uses embed to split x2 into chunks by the length of x1, and it applies it and detects whether it is equivalent to x1, and gives a boolean vector (or TRUEs and FALSEs). Then finally it uses which to get the index of the occurrences of TRUE, and I use [1] to get the first value.
For the edit, I detect if in the conditioned boolean vector contains any TRUEs, if so, do the same thing as mentioned above, if not, give NA.
One option would be to do rolling comparison.
x1 <- c(0,1,2)
x2 <- c(1,2,0,1,2,3,4,5,0,1,0,1,2)
which(zoo::rollapply(x2, length(x1), function(x) all(x == x1)))
#[1] 3 11
To get the 1st occurrence -
which(zoo::rollapply(x2, length(x1), function(x) all(x == x1)))[1]
#[1] 3
For all elements for vector x1.
sapply(x1, function(x) match(x,x2))
[1] 3 1 2
Another approach with stringr;
library(tidyverse)
x1_ <- paste0(x1,collapse="")
x2_ <- paste0(x2,collapse="")
str_locate_all(x2_, x1_)[[1]]
gives,
start end
[1,] 3 5
[2,] 11 13
which returns empty if there is no match.
I'm trying to learn R and a sample problem is asking to only reverse part of a string that is in alphabetical order:
String: "abctextdefgtext"
StringNew: "cbatextgfedtext"
Is there a way to identify alphabetical patterns to do this?
Here is one approach with base R based on the patterns showed in the example. We split the string to individual characters ('v1'), use match to find the position of characters with that of alphabet position (letters), get the difference of the index and check if it is equal to 1 ('i1'). Using the logical vector, we subset the vector ('v1'), create a grouping variable and reverse (rev) the vector based on grouping variable. Finally, paste the characters together to get the expected output
v1 <- strsplit(str1, "")[[1]]
i1 <- cumsum(c(TRUE, diff(match(v1, letters)) != 1L))
paste(ave(v1, i1, FUN = rev), collapse="")
#[1] "cbatextgfedtext"
Or as #alexislaz mentioned in the comments
v1 = as.integer(charToRaw(str1))
rawToChar(as.raw(ave(v1, cumsum(c(TRUE, diff(v1) != 1L)), FUN = rev)))
#[1] "cbatextgfedtext"
EDIT:
1) A mistake was corrected based on #alexislaz's comments
2) Updated with another method suggested by #alexislaz in the comments
data
str1 <- "abctextdefgtext"
You could do this in base R
vec <- match(unlist(strsplit(s, "")), letters)
x <- c(0, which(diff(vec) != 1), length(vec))
newvec <- unlist(sapply(seq(length(x) - 1), function(i) rev(vec[(x[i]+1):x[i+1]])))
paste0(letters[newvec], collapse = "")
#[1] "cbatextgfedtext"
Where s <- "abctextdefgtext"
First you find the positions of each letter in the sequence of letters ([1] 1 2 3 20 5 24 20 4 5 6 7 20 5 24 20)
Having the positions in hand, you look for consecutive numbers and, when found, reverse that sequence. ([1] 3 2 1 20 5 24 20 7 6 5 4 20 5 24 20)
Finally, you get the letters back in the last line.
In the example below, I would like the know the number of 010 sequences, or the number of 1010 sequences. Below is a workable example;
x <- c(1,0,0,1,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,0,1,0)
In this example, the number of 010 sequences would be 6 and the number of 1010 sequences would be 4.
What would be the most efficient/simplest way to count the number of consecutive sequences?
A stringless way:
f = function(x, patt){
if (length(x) == length(patt)) return(as.integer(x == patt))
w = head(seq_along(x), 1L-length(patt))
for (k in seq_along(patt)) w <- w[ x[w + k - 1L] == patt[k] ]
w
}
length(f(x, patt = c(0,1,0))) # 6
length(f(x, patt = c(1,0,1,0))) # 4
Alternatives. From #cryo11, here's another way:
function(x,patt) sum(apply(embed(x,length(patt)),1,function(x) all(!xor(x,patt))))
or another variation:
function(x,patt) sum(!colSums( xor(patt, t(embed(x,length(patt)))) ))
or with data.table:
library(data.table)
setkey(setDT(shift(x, seq_along(patt), type = "lead")))[as.list(patt), .N]
(The shift function is very similar to embed.)
Another solution would be this:
library(stringr)
x <- c(1,0,0,1,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,0,1,0)
xx = paste0(x, collapse = "")
str_count(xx, '(?<=010)')
[1] 6
str_count(xx, '(?<=1010)')
[1] 4
As #Pierre Lafortune pointed out in the comments this can be done without using any packages:
length(gregexpr("(?<=010)", xx, perl=TRUE)[[1]])
[1] 6
logic : take a substr of length of pattern you are searching for and compare it with the pattern.
xx = paste0(x, collapse = "")
# [1] "1001000111001010101010"
# case 1 :
xxx = "010"
sum(sapply(1:(length(x)-nchar(xxx)+1), function(i) substr(xx,i,i+nchar(xxx)-1)==xxx))
# [1] 6
# case 2 :
xxx = "1010"
# [1] 4
R introduced the startsWith function in 3.3.0. Using this and substring, we can implement #joel.wilson's method as
sum(startsWith(substring(paste(x, collapse=""),
head(seq_along(x), -2), tail(seq_along(x), -2)), "010"))
Here, substring constructs all three character adjacent sets and startsWith tests if each of these is the same as "010". The TRUE values are then summed together.
I have a vector as below
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
here i want to extract the first number before the "X" for each of the elements.
In case of situations with 2 "X" i.e. "6X2X75CL" the number 12 (6 multiplied by 2) should be calculated.
expected output
6, 24, 12, 168
Thank you for the help...
Here's a possible solution using regular expressions :
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
# this regular expression finds any group of digits followed
# by a upper-case 'X' in each string and returns a list of the matches
tokens <- regmatches(data,gregexpr('[[:digit:]]+(?=X)',data,perl=TRUE))
res <- sapply(tokens,function(x)prod(as.numeric(x)))
> res
[1] 6 24 12 168
Here is a method using base R:
dataList <- strsplit(data, split="X")
sapply(dataList, function(x) Reduce("*", as.numeric(head(x, -1))))
[1] 6 24 12 168
strplit breaks up the vector along "X". The resulting list is fed to sapply which the performs an operation on all but the final element of each vector in the list. The operation is to transform the elements into numerics and the multiply them. The final element is dropped using head(x, -1).
As #zheyuan-li comments, prod can fill in for Reduce and will probably be a bit faster:
sapply(dataList, function(x) prod(as.numeric(head(x, -1))))
[1] 6 24 12 168
We can also use str_extract_all
library(stringr)
sapply(str_extract_all(data, "\\d+(?=X)"), function(x) prod(as.numeric(x)))
#[1] 6 24 12 168
ind=regexpr("X",data)
val=as.integer(substr(data, 1, ind-1))
data2=substring(data,ind+1)
ind2=regexpr("[0-9]+X", data2)
if (!all(ind2!=1)) {
val2 = as.integer(substr(data2[ind2==1], 1, attr(ind2,"match.length")[ind2==1]-1))
val[ind2==1] = val[ind2==1] * val2
}
I want to apply a function over a data frame. The function takes V1 as arg1 and V2 as arg2 and I want to write the result to V3 or some other vector.
Is there an easy and compact way to do this? I've posted a (non-working) example below.
Thanks
Stu
my.func <- function(X, Y) {
return(X + Y)
}
a <- c(1,2,3)
b <- c(4,5,6)
my.df <- data.frame(a, b)
apply(my.df, 1, my.func, X="a", Y="b")
mapply() is made for this.
Either of the following will do the job. The advantage of the second approach is that it scales nicely to functions that take an arbitrary number of arguments.
mapply(my.func, my.df[,1], my.df[,2])
# [1] 5 7 9
do.call(mapply, c(FUN=list(my.func), unname(my.df)))
# [1] 5 7 9
I feel this would be better approached using with than mapply if you're calling elements inside a data.frame:
with(my.df,my.func(X=a,Y=b))
#[1] 5 7 9
It's still quite a clean method even if you need to do the explicit conversion from a matrix:
with(data.frame(my.mat),my.func(X=a,Y=b))
#[1] 5 7 9
There isn't really any need for an *apply function here. Vectorization would suffice:
my.df$c <- my.df$a + my.df$b
# a b c
#1 1 4 5
#2 2 5 7
#3 3 6 9
Your apply solution can't work the way you have written it because apply does not pass a named vector through to your function: e.g.
colnames(my.df)
#[1] "a" "b"
apply( my.df , 1 , colnames )
#NULL
For your example, rowSums(my.df) will do the job. For more complicated tasks, you can use the mapply function. For example: mapply(my.func, my.df[a], my.df[b]).
Alternatively, you could rewrite your function to take a vector argument:
my.otherfunc <- function(x) sum(x)
apply(my.df, 1, my.otherfunc)
It's important to understand that when apply feeds each row or column into the function, it's sending one vector, not a list of separate entries. So you should give it a function with a single (vector) argument.