Hello I have a vector such as
x<-c(**131144**,**1311605**,1311766,1312289,1312804) in R
And then another data frame like:
v1 , v2
**131144,1283758**
**1283758,19527672**
**1311605,19950311**
198151,37268685
**19950311,35307140**
11281862,11292508
35261079,26296073
625349,37306860
84255273,84259752
I would like to end up with a final vector like this one
x<-c(**19527672**,**19950311**,1311766,1312289,1312804)
Is like to iteratively searching for a value and when a match is found updating it and then keep searching for the updated value until no match found.
Thks in advance.
An option with igraph
g <- graph_from_data_frame(df)
v <- membership(components(g))
tb <- by(names(v), v, function(x) x[degree(g, x, mode = "out") == 0])
m <- unname(v[as.character(x)])
ifelse(is.na(m), x, as.numeric(tb[m]))
gives
[1] 19527672 35307140 1311766 1312289 1312804
where plot(g) shows
Alittle tweak to the solution found here and we have the following:
relation <- function(vec, dat){
.relation <- function(x){
k = unique(c(dat[dat[, 1] %in% x, 2], x, dat[dat[, 2] %in% x, 1]))
if(setequal(x,k)) tail(k, 2)[1] else Recall(k)
}
y <- unique(vec)
sapply(y, .relation)[match(vec, y)]
}
relation(x, df)
[1] 19527672 35307140 1311766 1312289 1312804
Related
cbind() function works as x <- cbind(a,b)
where column name 'b' can be specified for the function b = get(paste0('var',i)),
that is x <- cbind(a,b = get(paste0('var',i)))
I am trying to do the following:
x <- cbind(a, get(paste0('var',i))) = j), where "j" can be a vector or a function.
however, got the following error: Error: unexpected '=' in "x <- cbind(a, get(paste0('var',i))) = j)"
If i just specify "x <- cbind(a, get(paste0('var',i))))", then the 2nd column name is "get(paste0('var',i))))", which is not convenient.
How can I define column names with a function get(paste()) within cbind() or rbind() or bind_cols()? Or what would be the alternative solution?
An example would have been helpful to understand the problem but maybe this?
x <- cbind(a, j)
colnames(x)[2] <- get(paste0('var',i))
Or if you want to do it in single line -
x <- cbind(a, setNames(j, get(paste0('var',i))))
We can use
x <- data.frame(a, j)
colnames(x)[2] <- get(paste('var', i, sep=""))
Or use tibble
tibble(a, !! b := j)
I hope this question is posed clearly! I have looked at many guides on loops and if_else clauses etc. but have not managed to figure this out.
I am trying to find passages in a large set of txt files where a number (say, 5) of keywords occur. Example keywords are "motion" and "cause".
My data is tidy (the txt files have been split so that there is one word per row) and using regular expressions I have added columns (one for each keyword) that say "TRUE" if the row contains the keyword, and are false otherwise.
Now in order to find passages of interest, I want to make a copy of each column that says "TRUE" in the same rows, but also in the 250 rows above and below those rows. So for example I want to copy the column that says "TRUE" when the row contains the word "motion", such that in the new column the 500 words surrounding the word "motion" are also "TRUE" (i.e. the 250 rows above and below the one where the word is).
The idea is that I can then easily check whether there are any rows where all of the copied columns are true, indicating that there is a 500-word passage where all my keywords occur.
I have tried learning about and using loops in various ways to make these copied columns, but I have not had any success so far. This how my latest attempt looks, but it seems to have just designated the same rows as "TRUE" 250 times, rather than making the next 250 rows "TRUE". (It also gave the error message "Problem with 'mutate()' input 'copied_column'. subscript out of bounds i input 'copied_column' is 'case_when(...)'.")
n <-1
corpus <- corpus #>#
mutate(copied_column = case_when(
str_detect(original_column, "TRUE") ~ (repeat{
n <- n+1
str_detect(orginal_column, "FALSE")
if (n == 250) {
break
}
})
))
If anyone has any suggestion they would be most welcome. If you know any functions that I probably should be using or if you know how to properly use the ones in the above example, that would really help me out a lot.
Maybe the function below can solve the problem. Tested with fake data.
segmentTRUE <- function(X, y, dist){
f <- function(y, n, d){
from <- max(1, y - d)
to <- min(n, y + d)
from:to
}
y <- deparse(substitute(y))
w <- which(X[[y]])
i <- Reduce(union, mapply(f, w, MoreArgs = list(n = nrow(X), d = dist)))
X[i, y] <- TRUE
X[[y]]
}
Test
Make up some data and run the function in 3 different ways, two of them in a magrittr pipe.
x <- rep(FALSE, 5e1)
x[c(2, 10, 35, 47)] <- TRUE
df1 <- data.frame(words = rep(letters, length.out = 5e1), x)
head(df1)
d <- 5
segmentTRUE(df1, x, d)
df1 %>% segmentTRUE(x, d)
df1 %>% mutate(x = segmentTRUE(., x, d))
Edit
With nrow(df1) == 1e4, the following function is orders of magnitude faster than the Reduce version.
segmentTRUE2 <- function(X, y, dist){
f <- function(y, n, d){
max(1, y - d):min(n, y + d)
}
y <- deparse(substitute(y))
w <- which(X[[y]])
i <- unique(unlist(mapply(f, w, MoreArgs = list(n = nrow(X), d = dist))))
X[i, y] <- TRUE
X[[y]]
}
identical(segmentTRUE(df1, x, d), segmentTRUE2(df1, x, d))
#[1] TRUE
In matlab there is a way to find the values in one vector but not in the other.
for example:
x <- c(1,2,3,4)
y <- c(2,3,4)
is there any function that would tell me that the value in x that's not in y is 1?
you can use the setdiff() (set difference) function:
> setdiff(x, y)
[1] 1
Yes. For vectors you can simply use the %in% operator or is.element() function.
> x[!(x %in% y)]
1
For a matrix, there are many difference approaches. merge() is probably the most straight forward. I suggest looking at this question for that scenario.
The help file in R for setdiff, union, intersect, setequal, and is.element provides information on the standard set functions in R.
setdiff(x, y) returns the elements of x that are not in y.
As noted above, it is an asymmetric difference.
So for example:
> x <- c(1,2,3,4)
> y <- c(2,3,4,5)
>
> setdiff(x, y)
[1] 1
> setdiff(y, x)
[1] 5
> union(setdiff(x, y), setdiff(y, x))
[1] 1 5
x[is.na(match(x,y))]
setdiff() is a tricky function because the output is dependent on the order of the input. You can instead write a simple function as such that does the exact opposite of intersect. This is far better.
>difference <- function(x, y) {
c(setdiff(x, y), setdiff(y, x))
}
#Now lets test it.
>x <- c(1,2,3,4)
>y <- c(2,3,4,5)
>difference(x,y)
[1] 1 5
If:
x <- c(1,2,3,4)
y <- c(2,3,4)
Any of these expressions:
setdiff(x, y)
x[!(x %in% y)]
x[is.na(match(x,y))]
x[!(is.element(x,y))]
will give you the right answer [1] 1, if the goal is to find the values/characters in x, that is not present in y.
However, applying the above expressions can be tricky and can give undesirable results depending on the nature of the vector, and the position of x and y in the expression. For instance, if:
x <- c(1,1,2,2,3,4)
y <- c(2,3,4)
and the goal is just to find the unique values/characters in x, that is not present in y or vice-versa. Applying any of these expressions will still give the right answer [1] 1:
union(setdiff(x, y), setdiff(y, x))
Thanks to contribution of Jeromy Anglim
OR:
difference <- function(x, y) {
c(setdiff(x, y), setdiff(y, x))
}
difference(y,x)
Thanks to contribution of Workhouse
Problem
I´m making a function that describes the changes in the temporal state of a given time series. It will say if the value of a given column is more, less, equal than the previous one, and print the result:
It could be in the same data frame or in other different object. I´m doing
it to transform the data in order to be good for survival analysis.
What has been done
I already made an if else ladder that looks like this: where (x) is an i column in a data drame and (y) is the column just before it (i-1). However, I am clueless about how to define the first line of the function to actually do this operation in each column of the data frame(counting from the second one), also to dont crash with the last column
func_name <- function (x, columns) {
if (x == NA) {
print("gone")
} else if (x < y) {
print("less")
} else if (x > y) {
print("more")
} else if (x = y) {
print("same")
} else {
print ("")
}
}
What is being expected
Ideally will be transforming something like this:
Id <- c(1,2,3)
Time1 <- c(3,3,4)
Time2 <- c(2,5,4)
Time3 <- c(1,5,8)
df <- data.frame(Id,Time1,Time2,Time3)
df
Into something like this:
Id <- c(1,2,3)
Time1 <- c(3,3,4)
Time2 <- c("Less","More","Same")
Time3 <- c("Less","Same","More")
df2 <- data.frame(Id,Time1,Time2,Time3)
df2
Any help, highly apreciated!
Solutions: Both #Andrew and #Cole solution works solving the problem!
This sounds like it is what you are looking for. It is not a custom function, but if can be adapted if you need one. Hope this helps!
# Select the columns you need. NOTE: used [-1] to remove starting time column
cols <- grep("Time", names(df), fixed = T)[-1]
# Use case_when with your conditions
df[cols] <- lapply(cols, function(i) dplyr::case_when(
is.na(df[i]) ~ "Gone",
df[i] > df[i-1] ~ "More",
df[i] < df[i-1] ~ "Less",
df[i] == df[i-1] ~ "Same"
))
df
Id Time1 Time2 Time3
1 1 3 Less Less
2 2 3 More Same
3 3 4 Same More
Here's the use of mapply with an anonymous function inside:
df <- data.frame(Id,Time1,Time2,Time3)
df[, 3:4] <- mapply(function(x, y) ifelse(y < x , 'Less', ifelse(y > x, 'More', 'Same'))
, df[, 2:3]
, df[, 3:4])
df
mapply will walk along each field of the datasets and apply a function. In other words, I am taking the difference between df[, 2] and df[, 3], and then df[, 3] and df[, 4]. I could have also done something like:
fx_select <- function(x, y) {
ifelse(y < x, 'Less', ifelse(y > x, 'More', 'Same'))
}
df[, 3:4] <- mapply(fx_select, df[, 2:3], df[, 3:4])
And here's one more approach:
df[3:4] <- lapply(sign(df[2:3] - df[3:4]) + 2,
function(x) c('More', 'Same', 'Less')[x]
)
I'm exercising my function writing skills today. Can someone explain why the function I wrote doesn't remove columns 2 and 3 from the data frame?
data <- data.frame(x = 2, y = 3, z = 4)
rmvar <- function(x){
lapply(X = x, FUN = function(x){
x <- NULL})}
rmvar(data[,2:3])
You could modify it
rmvar <- function(x, indx){
x[indx] <- lapply(x[indx], FUN=function(x) x <- NULL)
x
}
rmvar(data, 2:3)
# x
#1 2
As #nico mentioned in the comments, this is easier by just data[-(2:3)]. But, I guess you want to do this with lapply/NULL.