How to see the difference in two strings

How to see the difference in two strings - r

I'm trying to find the difference between two columns in a CSV file, which I named Test.
I'd like to add a new column called 'Results' that contains the difference between Events_1 & Events_2. If there is no difference the Results can be blank.
This is a basic example, for what I'm trying to accomplish, the real list contains hundreds of events in both columns.

Not tested with your data, but
vec2 <- c("hello,goodbye","hello,goodbye")
vec1 <- c("hello","hello,goodbye")
Map(setdiff, strsplit(vec2, "[,\\s]+"), strsplit(vec1, "[,\\s]+"))
# [[1]]
# [1] "goodbye"
# [[2]]
# character(0)
If you need them to be comma-delimited strings, then
mapply(function(a,b) paste(setdiff(a,b), collapse=","), strsplit(vec2, "[,\\s]+"), strsplit(vec1, "[,\\s]+"))
# [1] "goodbye" ""

Related

R - filter a list based on names which start with a numerical value

I have the following list in R:
x <- list("a"="m","a2"="test","001"="test2","002"="test3")
$a
[1] "m"
$a2
[1] "test"
$`001`
[1] "test2"
$`002`
[1] "test3"
I want to filter this list so that it returns only the items which begin with a number, i.e. it would return:
x$001 and x$002

Peter hasn't picked it up yet, so I'll post my comment as an answer. We can use the regex pattern "^[0-9]" to find strings that start with a number. Applying that to the names of your list:
x[grepl("^[0-9]", names(x))]
# $`001`
# [1] "test2"
#
# $`002`
# [1] "test3"

Not exactly sure what you mean here, but two possibilities that take advantage of the fact that you can filter a list by supplying a vector within single brackets
If what you want is elements of the list that have numbers in them:
x[sapply(x, function(i){grepl("[0-9]", i)})]
If what you want is elements of the list that have a name that can be interpreted as a number:
x[!is.na(as.numeric(names(x)))]

Strip off tracking from URL using R

I have a dataframe with a column of URLs from which I want to remove everything after the first question mark. Some URLs have no question mark, and I want these to remain unchanged. In short, I want to strip off all the tracking. This is a sample URL.
https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/?utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy
This is the result I'm looking for.
https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/

Assuming your dataframe is called df and it has a column in it named url:
df$url <- sub('\\?.*', '', df$url)

With strsplit:
url <- "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/?utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"
result <- strsplit(url, "\\?")[[1]][1]
Output:
> result
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"
And here is an example of using it on a vector rather than a single string:
strings <- c("here?string", "another?string", "stringnoquestion", "one?more")
> sapply(strsplit(strings, "\\?"), function(x) x[1])
[1] "here" "another" "stringnoquestion" "one"
strsplit returns a list because it is written to work for vectors as well as singular elements. So in the first example the [[1]] was accessing the first element of the list and then the [1] was accessing the first element of that, the url before the ?.
Here is the first example broken out in to steps:
# Returns a list of length one
> strsplit(url, "\\?")
[[1]]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"
[2] "utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"
# Each element of the list is a vector
> strsplit(url, "\\?")[[1]]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"
[2] "utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"
# The first element of that vector
> strsplit(url, "\\?")[[1]][1]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"

Extracting coefficients while looping over variable names

I'm working on some time-series stuff in R (version 3.4.1), and would like to extract coefficients from regressions I ran, in order to do further analysis.
All results are so far saved as uGARCHfit objects, which are basically complicated list objects, from which I want to extract the coefficients in the following manner.
What I want is in essence this:
for(i in list){
i_GARCH_mxreg <- i_GARCH#fit$robust.matcoef[5,1]
}
"list" is a list object, where every element is the name of one observation. For now, I want my loop to create a new numeric object named as I specified in the loop.
Now this obviously doesn't work because the index, 'i', isn't replaced as I would want it to be.
How do I rewrite my loop appropriately?
Minimal working example:
list <- as.list(c("one", "two", "three"))
one_a <- 1
two_a <- 2
three_a <- 3
for (i in list){
i_b <- i_a
}
what this should give me would be:
> one_b
[1] 1
> two_b
[1] 2
> three_b
[1] 3
Clarification:
I want to extract the coefficients form multiple list objects. These are named in the manner 'string'_obj. The problem is that I don't have a function that would extract these coefficients, the list "is not subsettable", so I have to call the individual objects via obj#fit$robust.matcoef[5,1] (or is there another way?). I wanted to use the loop to take my list of strings, and in every iteration, take one string, add 'string'_obj#fit$robust.matcoef[5,1], and save this value into an object, named again with " 'string'_name "
It might well be easier to have this into a list rather than individual objects, as someone suggest lapply, but this is not my primary concern right now.
There is likely an easy way to do this, but I am unable to find it. Sorry for any confusion and thanks for any help.

The following should match your desired output:
# your list
l <- as.list(c("one", "two", "three"))
one_a <- 1
two_a <- 2
three_a <- 3
# my workspace: note that there is no one_b, two_b, three_b
ls()
[1] "l" "one_a" "three_a" "two_a"
for (i in l){
# first, let's define the names as characters, using paste:
dest <- paste0(i, "_b")
orig <- paste0(i, "_a")
# then let's assign the values. Since we are working with
# characters, the functions assign and get come in handy:
assign(dest, get(orig) )
}
# now let's check my workspace again. Note one_b, two_b, three_b
ls()
[1] "dest" "i" "l" "one_a" "one_b" "orig" "three_a"
[8] "three_b" "two_a" "two_b"
# let's check that the values are correct:
one_b
[1] 1
two_b
[1] 2
three_b
[1] 3
To comment on the functions used: assign takes a character as first argument, which is supposed to be the name of the newly created object. The second argument is the value of that object. get takes a character and looks up the value of the object in the workspace with the same name as that character. For instance, get("one_a") will yield 1.
Also, just to follow up on my comment earlier: If we already had all the coefficients in a list, we could do the following:
# hypothetical coefficients stored in list:
lcoefs <- list(1,2,3)
# let's name the coefficients:
lcoefs <- setNames(lcoefs, paste0(c("one", "two", "three"), "_c"))
# push them into the global environment:
list2env(lcoefs, env = .GlobalEnv)
# look at environment:
ls()
[1] "dest" "i" "l" "lcoefs" "one_a" "one_b" "one_c"
[8] "orig" "three_a" "three_b" "three_c" "two_a" "two_b" "two_c"
one_c
[1] 1
two_c
[1] 2
three_c
[1] 3
And to address the comments, here a slightly more realistic example, taking the list-structure into account:
l <- as.list(c("one", "two", "three"))
# let's "hide" the values in a list:
one_a <- list(val = 1)
two_a <- list(val = 2)
three_a <- list(val = 3)
for (i in l){
dest <- paste0(i, "_b")
orig <- paste0(i, "_a")
# let's get the list-object:
tmp <- get(orig)
# extract value:
val <- tmp$val
assign(dest, val )
}
one_b
[1] 1
two_b
[1] 2
three_b
[1] 3

a list of multiple lists of 2 for synonyms

I want to read the synonyms from a csv file , where the first word is the "main" word and the rest of the words in the same record are its synonyms
now i basically want to create a list like i would have in R ,
**synonyms <- list(
list(word="ss", syns=c("yy","yyss")),
list(word="ser", syns=c("sert","sertyy","serty"))
)**
This gives me a list as
synonyms
[[1]]
[[1]]$word
[1] "ss"
[[1]]$syns
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$syns
[1] "sert" "sertyy" "serty"
which is essentially a list of lists of "word" and "syns".
how do i go about creating the similar list while reading the word and synonyms from a csv file
any pointers would help !! Thanks

This process should return what you want.
# read in data using readLines
myStuff <- readLines(textConnection(temp))
This will return a character vector with one element per line in the file. Note that textConnection is not necessary for reading in files. Just supply the file path. Now, split each vector element into a vectors using strsplit and return a list.
myList <- strsplit(myStuff, split=" ")
Now, separate the first element from the remaining element for each vector within the list.
result <- lapply(myList, function(x) list(word=x[1], synonyms=x[-1]))
This returns the desired result. We use lapply to move through the list items. For each list item, we return a named list where the first element, named word, corresponds to the first element of the vector that is the list item and the remaining elements of this vector are placed in a second list element called synonyms.
result
[[1]]
[[1]]$word
[1] "ss"
[[1]]$synonyms
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$synonyms
[1] "sert" "sertyy" "serty"
[[3]]
[[3]]$word
[1] "at"
[[3]]$synonyms
[1] "ate" "ater" "ates"
[[4]]
[[4]]$word
[1] "late"
[[4]]$synonyms
[1] "lated" "lates" "latee"
data
temp <-
"ss yy yyss
ser sert sertyy serty
at ate ater ates
late lated lates latee"

Storing array of words from string in a list of full names in R

I have a list of 120777 records which contains names of people. I want to store an array of name parts for each record in the dataset. I tried this in R.
my_list$name_parts<- strsplit(my_list$name, " ")
I get a my_list$name_parts as a list of 120777 items. When I try querying the number of words in each name using length(my_list$name_parts), I get 120777 for all.

Let's use this simple example:
my_list <- list()
my_list$name <- c("toto t. tutu", "foo bar")
To get the number of words, you can do that:
lapply(strsplit(my_list$name," "), length)
which gives in the simple example above:
[[1]]
[1] 3
[[2]]
[1] 2
To avoid getting a list, you can even do:
unlist(lapply(strsplit(my_list$name," "), length))
[1] 3 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to see the difference in two strings - r

Related

R - filter a list based on names which start with a numerical value

Strip off tracking from URL using R

Extracting coefficients while looping over variable names

a list of multiple lists of 2 for synonyms

Storing array of words from string in a list of full names in R

Categories

Resources