I have two character objects, I need to see how many characters they have in common and then print them. I have no problem seeing how many they have in common, but I can't seem to figure out the code to print them. Here's a simple exemple:
LETTERS
list <- c("A", "H", "J", "K")
length(na.exclude(pmatch(LETTERS[1:20],list[1:3])))
print(pmatch(LETTERS[1:20],list[1:3]))
This prints:
LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
list <- c("A", "H", "J", "K")
length(na.exclude(pmatch(LETTERS[1:20],list[1:3])))
[1] 3
print(pmatch(LETTERS[1:20],list[1:3]))
[1] 1 NA NA NA NA NA NA 2 NA 3 NA NA NA NA NA NA NA NA NA NA
So I know that there are 3 in common and I know their positions but how do I make it print "A" "H" "J"?
Try using %in%
> LETTERS[LETTERS %in% list]
[1] "A" "H" "J" "K"
For your example:
myletters<-LETTERS[1:20]
> myletters[myletters %in% list[1:3]]
[1] "A" "H" "J"
Alternative: using pmatch as suggested by you
pmatch(list[1:3],myletters) # gives the indices
[1] 1 8 10
myletters[pmatch(list[1:3],myletters)] # get the letters
[1] "A" "H" "J"
If you want only the final result as a set (duplicates removed), use this:
intersect(LETTERS, c("A", "H", "J"))
If you want to use partial matching, you must observe that pmatch does not allow more than one element in the first input matching the same one in the second. Notice the difference:
mylist <- c("B","A","B","2")
> pmatch(mylist, LETTERS)
[1] 2 1 NA NA
> Vectorize(pmatch, "x")(mylist, LETTERS)
B A B 2
2 1 2 NA
Now, if you want to print the elements of mylist that match (partially) with the elements of, say, LETTERS, keeping the order and duplicates, you can use this:
> mylist[!is.na(Vectorize(pmatch, "x")(mylist, LETTERS))]
[1] "B" "A" "B"
Related
So I am doing an analysis of tweets from different accounts using get_timeline from rtweet. It returns a df with 90 variables, which is great. However, one of them, the variable hashtags, gives me either NA (no hashtags used in the tweet, one hashtag or a list of all the hashtags. So, I want to create different variables for each of the hashtags in order to save the tweets into a CSV to use powerBI and do some graphs.
Thefore, my question is can you split all the elements of the list into different variables containing a single word each?
As I understand your problem you do not need to split the list in order to get all single or unique list entries, but use a combination of unlist and unique instead.
Let's assume you have a list of hashtags (just letters in the example) with different lengths, l_hashtags .
Some hashtags are repetitions.
unlisting the list will give you vector with all hashtags, including all repetitions.
applying unique to this unlisted l_hastag gives you the unique members of the original list.
l_hashtags <- list(c(LETTERS[1:2]), rep(NA,5), LETTERS[5:15], c('A', 'N', 'N', 'J', 'K'))
l_hashtags
#> [[1]]
#> [1] "A" "B"
#>
#> [[2]]
#> [1] NA NA NA NA NA
#>
#> [[3]]
#> [1] "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
#>
#> [[4]]
#> [1] "A" "N" "N" "J" "K"
table(unlist(l_hashtags))
#>
#> A B E F G H I J K L M N O
#> 2 1 1 1 1 1 1 2 2 1 1 3 1
l_hashtags_unlisted <- unlist(l_hashtags)
unique(l_hashtags_unlisted)
#> [1] "A" "B" NA "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
You can of course put all this into one single line:
unique(unlist(l_hashtags))
# [1] "A" "B" NA "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
My data structure is as follows:
m
[[1]]
[[1]][[1]]
[1] "g" "g" "h" "k" "k" "k" "l"
[[2]]
[[2]][[1]]
[1] "g" "h" "k" "k" "k" "l" "g"
[[3]]
[[3]][[1]]
[1] "g" "h" "h" "h" "k" "l" "h"
I want to find positions of each unique characters simultaneously. Individually I can obtain positions of each character using the following code:
t<-list()
for (i in 1:length(m)){
for (j in m[[i]][[1]]){
if (j=="k"){
t[[i]]<-grep(j,m[[i]][[1]],fixed=TRUE)}}}
The result I obtain is as follows:
t
[[1]]
[1] 4 5 6
[[2]]
[1] 3 4 5
[[3]]
[1] 5
There are 4 unique characters in list m and I will get 4 lists with positions of unique characters using my code but I have to manually enter each character into the loop. I need a single code which will calculate positions for all the
unique characters simultaneously.
The vectors in your list seem all to be of equal length (if not this could be adjusted by making them equal length). I would first restructure the data:
m <- list(list(c("g", "g", "h", "k", "k", "k", "l")),
list(c("g", "h", "k", "k", "k", "l", "g")))
m <- do.call(rbind, lapply(m, "[[", 1))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "g" "g" "h" "k" "k" "k" "l"
#[2,] "g" "h" "k" "k" "k" "l" "g"
Then you can use outer to make all comparisons at once:
res <- which(outer(m, unique(c(m)), "=="), arr.ind = TRUE)
res <- as.data.frame(res)
res$dim3 <- factor(res$dim3, labels = unique(c(m)))
names(res) <- c("list_element", "vector_element", "letter")
#check
res[res$letter == "k" & res$list_element == 1, "vector_element"]
[1] 4 5 6
This solution won't work well if your data is huge.
We can use a double loop with lapply where we check position using which for each unique value in m
lst <- lapply(unique(unlist(m)), function(x)
lapply(m, function(y) which(x == y[[1]])))
lst
#[[1]]
#[[1]][[1]]
#[1] 1 2
#[[1]][[2]]
#[1] 1 7
#[[1]][[3]]
#[1] 1
#[[2]]
#[[2]][[1]]
#[1] 3
#[[2]][[2]]
#[1] 2
#[[2]][[3]]
#[1] 2 3 4 7
......
To identify which values represent which character we can name the list
names(lst) <- unique(unlist(m))
lst
#$g
#$g[[1]]
#[1] 1 2
#$g[[2]]
#[1] 1 7
#$g[[3]]
#[1] 1
...
I'm working with a matrix that looks like this input.
I'm trying to replace numbers on column 2 by their corresponding row name. I.e. all 1s would be replaced by row.name(matrix). Thus, I'd have the following output.
The actual matrix is too large for loop application... I'm sorry I'm using images since I found it easier to represent this on excel. I'm also sorry about being quite new at R...
Vectorized approach (should be the fastest you can get):
mat <- matrix(c(letters[1:11], 1,1,1,2,2,3,3,3,4,4,4), ncol = 2)
colnames(mat) <- c("A", "B")
rownames(mat) <- 1:11
> mat
A B
1 "a" "1"
2 "b" "1"
3 "c" "1"
4 "d" "2"
5 "e" "2"
6 "f" "3"
7 "g" "3"
8 "h" "3"
9 "i" "4"
10 "j" "4"
11 "k" "4"
mat[, "B"] <- mat[as.numeric(mat[, "B"]), "A"]
> mat
A B
1 "a" "a"
2 "b" "a"
3 "c" "a"
4 "d" "b"
5 "e" "b"
6 "f" "c"
7 "g" "c"
8 "h" "c"
9 "i" "d"
10 "j" "d"
11 "k" "d"
Or you could use sapply:
mat[, "B"] <- sapply(mat[, "B"], function(x) mat[as.numeric(x), "A"])
Edit: I've put the vectorized solution at the top, as this is clearly the faster (or even fastest?) approach.
In R studio, I am looking to create a vector for country names. They are enclosed in my data set in column 1. Countryvec gives factor names
"Australia Australia ..."
x just gives the names of Russia, country 36, country ends up being
1,1,...,2,2,...,4,4.. etc.
They are also not in order, 3 ends up between 42 and 43. How do I make the numbers the factors?
gdppc=read.xlsx("H:/dissertation/ALL/YAS.xlsx",sheetIndex = 1,startRow = 1)
countryvec=gdppc[,1]
country=c()
for (j in 1:43){
x=rep(countryvec[j],25)
country=append(country,x)
}
You need to retrieve the levels attribute
set.seed(7)
v <- factor(letters[rbinom(20, 10, .5)])
> c(v)
[1] 6 4 2 2 3 5 3 6 2 4 2 3 5 2 4 2 4 1 6 3
> levels(v)[v]
[1] "h" "e" "c" "c" "d" "f" "d" "h" "c" "e" "c" "d" "f" "c" "e" "c" "e" "a" "h" "d"
You'll probably need to modify the code to inside the loop:
x <- rep(levels(countryvec)[countryvec][j], 25)
Or convert the vector prior to the loop:
countryvec <- levels(countryvec)[countryvec]
The match(x, y) function is perfect to search the elements of the vector x within the elements of vector y. But what is an efficient and easy way to do the similar job when y is a list of vectors - of possibly different lengths?
I mean the result should be a vector of the same length as x, and the i-th element should be the first member of y that contains the i-th element of x, or NA.
To find the element of y in which each element of x (first) occurs, try this:
## First, a reproducible example
set.seed(44)
x <- letters[1:25]
y <- replicate(4, list(sample(letters, 8)))
y
# [[1]]
# [1] "t" "h" "m" "n" "a" "d" "i" "b"
#
# [[2]]
# [1] "c" "l" "z" "a" "s" "d" "i" "u"
#
# [[3]]
# [1] "b" "k" "e" "g" "o" "i" "h" "j"
#
# [[4]]
# [1] "g" "i" "f" "r" "h" "w" "l" "o"
## Find the element of y first containing the letters a-j
breaks <- c(0, cumsum(sapply(y, length))) + 1
findInterval(match(x, unlist(y)), breaks)
# [1] 1 1 2 1 3 4 3 1 1 3 3 2 1 1 3 NA NA 4 2 1 2 NA 4 NA NA