JsonPath: return first find - jsonpath

I would like to retrieve only the first name ocurrence.
[
{
"name": "xx",
"lastname": "yy",
"child": [
{
"name": "x2"
},
{
"name": "x3"
}
]
},
{
"name": "yy",
"lastname": "xz"}
]
with this regex: $..name the result is:
[
"xx",
"x2",
"x3",
"yy"
]
and with this $..name[0] is:
[
"x",
"x",
"x",
"y"
]
but I'm looking for this result:
[
"xx"
]
Note: it have to be in only one regex, I can't store result to after manipulate it

$[0].name
will give you the first match only:
[
"xx"
]

Related

How can I split a string "a\n\nb" into c("a", "", "", "b")?

When I stringr::str_split "\n\na\n" by "\n", I obtain c("", "", "a", "").
I expected I can obtain c("a", "", "", "b") when I stringr::str_split "a\n\nb" by "\n", but I obtained c("a", "", "b") instead. How can I obtain c("a", "", "", "b") by splitting "a\n\n\a"?
Try:
stringr::str_split("a\n\nb", "\n")
Expect:
c("a", "", "", "b")
Result:
c("a", "", "b")
Assuming that we want to extract each character separately, just split on the blank and remove all the \n to return blanks for the split elements
str_remove_all(strsplit(str1, "")[[1]], "\n")
[1] "a" "" "" "b"
Or use perl = TRUE with \n?
strsplit(str1, "\n?", perl = TRUE)[[1]]
[1] "a" "" "" "b"
If it should be at each \n only
strsplit(str_replace(str1, "(\n\\S)", "\n\\1"), "\n")[[1]]
[1] "a" "" "" "b"
data
str1 <- "a\n\nb"
You are getting the correct result, even if it's not what you expect. The length of the result equals the number of delimiters in the string + 1. Thus if you want an extra field, you need to add an extra delimiter:
x1 <- "a\n\nb"
x2 <- "abc\n\ndef"
strsplit(gsub("(\n{2,})", "\\1\n", x1), "\n")[[1]]
[1] "a" "" "" "b"
strsplit(gsub("(\n{2,})", "\\1\n", x2), "\n")[[1]]
[1] "abc" "" "" "def"

How to interleave vector of strings with a string

I have the following vectors example:
v1 <- c("AA", "BB")
v2 <- c("AA", "BB", "CCC")
Note that the length of each vector can be varied.
What I want to do is to interleave each vector with a string:
linker <- "xxx"
Resulting in this:
c("AA", "xxx", "BB")
c("AA","xxx", "BB", "xxx", "CCC")
How can I achieve that?
You could use an rbind trick here:
v1 <- c("AA", "BB")
v2 <- c("AA", "BB", "CCC")
linker <- "xxx"
head(c(rbind(v2, linker)), -1)
[1] "AA" "xxx" "BB" "xxx" "CCC"
We can try strsplit + paste0 like below
> strsplit(paste0(v2, collapse = sprintf(" %s ", linker)), " ")[[1]]
[1] "AA" "xxx" "BB" "xxx" "CCC"
Here you have a possible answer:
v1 <- c("AA", "BB")
v2 <- c("AA", "BB", "CCC")
linker <- "xxx"
interleave <- function(v,l){
result <- c()
for(i in 1:length(v)){
if(i!=1){
result <- c(result,l,v[i])
}else{
result <- c(result,v[i])
}
}
return(result)
}
interleave(v1,linker)
interleave(v2,linker)
Results:
> interleave(v1,linker)
[1] "AA" "xxx" "BB"
>
> interleave(v2,linker)
[1] "AA" "xxx" "BB" "xxx" "CCC"
We can use append, and then head to remove the trailing linker:
append_linker <- function(vector, linker){
lapply(vector, \(x) append(x, linker)) |>
unlist() |>
head(-1)
}
append_linker(v1, linker)
[1] "AA" "xxx" "BB"
append_linker(v2, linker)
[1] "AA" "xxx" "BB" "xxx" "CCC"
A creative application of mapply and c:
v1 <- c("AA", "BB")
v2 <- c("AA", "BB", "CCC")
linker <- 'xxx'
c(mapply(c, linker, v1))[-1]
[1] "AA" "xxx" "BB"
c(mapply(c, linker, v2))[-1]
[1] "AA" "xxx" "BB" "xxx" "CCC"

R: How to delete words other than specific words in a corpus

In the corpus "tkn_pb" , I would like to delete all words except for some keywords I chose (ex. "attack" and "gunman"). Is it possicle to do this?
You can use whichand grepl to subset your corpus:
Data:
sample_tokens <- c("word", "another","a", "new", "word token", "one", "more", "and", "another one")
Remove all words except "a" and "and":
sample_tokens[which(grepl("\\b(a|and)\\b", sample_tokens))]
[1] "a" "and"
EDIT:
If the corpus is a list, then this solution suggested by #John would work:
Data:
sample_tokens <- list(c("word", "another","a", "new", "word token", "one", "more", "and", "another one"),
c("yet", "a", "few", "more", "words"),
c("and", "so on"))
lapply(sample_tokens, function(x) x[which(grepl("\\b(a|and)\\b", x))])
[[1]]
[1] "a" "and"
[[2]]
[1] "a"
[[3]]
[1] "and"

r - creating dummy columns of U.S. regions from a single states column

I have been working with a data set that has a single column of states with 3,000 observations. In order to run a neural network I was attempting to dummy code the states into region columns including pacific, central, eastern, AK, HI, and mountain.
The following code works but I feel like there must be an easier way.
Packages installed:
library(tidyverse)
library(readr)
library(FNN)
library(rpart)
library(C50)
library(nnet)
library(FME)
The for loop I have been using.
for (i in 1:length(churn$Churn.)) {
if(churn$State[i]== "CT" | churn$State[i]== "DE"| churn$State[i]== "FL" | churn$State[i]== "GA" | churn$State[i]== "IN" | churn$State[i]== "ME" | churn$State[i]== "MD" | churn$State[i]== "MA" |churn$State[i]== "MI" |churn$State[i]== "NH" |churn$State[i]== "NJ" | churn$State[i]== "NY" |churn$State[i]== "NC" | churn$State[i]== "OH" |churn$State[i]== "PA" |churn$State[i]== "RI" |churn$State[i]== "SC" | churn$State[i]== "VT" | churn$State[i]== "VA" |churn$State[i]== "DC" | churn$State[i]== "WV" ) {
churn$state.cat.east[i]<-1
} else {
churn$state.cat.east[i]<-0
}
}
for (i in 1:length(churn$Churn.)) {
if(churn$State[i]== "AL" | churn$State[i]== "AR" | churn$State[i]== "IL" | churn$State[i]== "IA" | churn$State[i]== "KS" | churn$State[i]== "KY" | churn$State[i]== "LA" | churn$State[i]== "MN" | churn$State[i]== "MS" | churn$State[i]== "MO" | churn$State[i]== "NE" | churn$State[i]== "ND" | churn$State[i]== "OK" | churn$State[i]== "SD" | churn$State[i]== "TN" | churn$State[i]== "TX" | churn$State[i]== "WI" ) {
churn$state.cat.central[i]<-1
} else {
churn$state.cat.central[i]<-0
}
}
This is my first post on here so hopefully I have everything I need & thanks for the help!
You can make this in 2 lines, using ifelse & %in% operator :
#FIRST STATEMENT
east <- c("CT", "DE", "FL", "GA", "IN", "ME", "MD", "MA", "MI", "NH", "NJ", "NY", "NC", "OH", "PA", "RI", "SC", "VT", "VA", "DC", "WV")
churn$state.cat.east <- ifelse(churn$State %in% east,1,0)
Repeat the same for central values
#2ND STATEMENT
central <- c("AL" , "AR" , "IL" , "IA" , "KS" , "KY" , "LA" , "MN" , "MS" , "MO" , "NE" , "ND" , "OK" , "SD" , "TN" , "TX" , "WI")
churn$state.cat.central <- ifelse(churn$State %in% central,1,0)
Hope this will helps
Gottavianoni
Another option could be to use the inbuilt data available in R
#Sample data
churn <- data.frame(state=c('CA', 'NY', 'TX', 'CA', 'TX'), stringsAsFactors = F)
#map each state to it's division using inbuilt data
data(state)
churn$state_division <- sapply(churn$state, function(x) state.division[which(state.abb==x)])
#dummy code the new column created using above mapping
library(dummies)
churn <- dummy.data.frame(churn, names="state_division", sep = "-")
We can do this without ifelse as as.integer coerces to binary
churn$state.cat.east <- with(churn,as.integer( State %in% c("CT", "DE", "FL", ...)))
churn$state.cat.central <- with(churn,as.integer( State %in% c("AL" , "AR", ...)))
NOTE: The ... refers to other states
If we need to create for all the regions
library(purrr)
library(dplyr)
state.region %>%
unique %>%
as.character %>%
set_names(.) %>%
map_df(~ as.integer(setNames(state.region, state.abb)[churn$State] %in% .x) ) %>%
bind_cols(churn, .)
data
set.seed(24)
churn <- data.frame(State = sample(state.abb, 100, replace = TRUE), stringsAsFactors = FALSE)

gsub function, exact match of the pattern

I have a list of words included in the data frame called remove. I want to remove all the words in text. I want to remove the exact words.
remove <- data.frame("the", "a", "she")
text <- c("she", "he", "a", "the", "aaaa")
for (i in 1:3) {
text <- gsub(data[i, 1], "", text)
}
Attached is the result returned
#[1] "" "he" "" "" ""
However what I am expecting is
#[1] "" "he" "" "" "aaaa"
I also tried the following code, but it does return the expected result:
for (i in 1:3) {
text <- gsub("^data[i, 1]$", "", text)
}
Thanks so much for your help.
For exact match, use value matching (%in%)
remove<-c("the","a","she") #I made remove a vector too
replace(text, text %in% remove, "")
#[1] "" "he" "" "" "aaaa"
A simple base R solution is:
text[!text %in% as.vector(unlist(remove, use.names = FALSE))]

Resources