R - Finding duplicates in list entries

R - Finding duplicates in list entries - r

I am trying to figure out how to get duplicates out of list objects in R.
So my example list:
examplelist <- list(a = c("blue", "red", "yellow"),
b = c("red", "black", "green"),
c = c("black", "green", "brown"))
What I would like to get as a result:
duplicates: c("red", "black", "green")
vector of all entries, without double entries: c("blue", "red", "yellow", "black", "green", "brown")
I was not able to find a function for that other than duplicated() which just checks my list objects in total but not the entries itselves.
Thank you for your help :)

You can unlist first:
unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"
unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"
If you only want the vector (without the names), use unname:
unlisted <- unname(unlist(examplelist))

Related

Delete duplicate word, comma and whitespace

How can I delete all the duplicate words alongside the following comma and whitespace using Regex in R?
So far I have come up with the following regular expression, that matches the duplicate, however not the comma and whitespace. :
(\b\w+\b)(?=[\S\s]*\b\1\b)
An example list would be:
blue, red, blue, yellow, green, blue
The output should look like:
blue, red, yellow, green
So it would have to match two of the "blue" in this case, as well as the following comma and whitespace (if there is any).

Depends if your list is truly a list or if it is a string with comma's
# your data is actually already a list/vector
v <- c("blue", "red", "blue", "yellow", "green", "blue")
unique(v)
[1] "blue" "red" "yellow" "green"
# if your data is actually a comma seperated string
s <- "blue, red, blue, yellow, green, blue"
# if output needs to be a vector
unique(strsplit(s, ", ")[[1]])
[1] "blue" "red" "yellow" "green"
# if output needs to be a string again
paste(unique(strsplit(s, ", ")[[1]]), collapse = ", ")
[1] "blue, red, yellow, green"
Example based on the list column in a data.table or data.frame
dt <- data.table(
id = rep(1:5),
colors = list(
c("blue", "red", "blue", "yellow", "green", "blue"),
c("blue", "blue", "yellow", "green", "blue"),
c("blue", "red", "blue", "yellow"),
c("red", "red", "yellow", "yellow", "green", "blue"),
c("black")
)
)
## using data.table
library(data.table)
setDT(dt)
# use colors instead of clean_list to just fix the existing column
dt[, clean_list := lapply(colors, function(x) unique(x))]
## using dplyr
library(dplyr)
# use colors instead of clean_list to just fix the existing column
dt %>% mutate(clean_list = lapply(colors, function(x) unique(x)))
dt
# id colors clean_list
# 1: 1 blue,red,blue,yellow,green,blue blue,red,yellow,green
# 2: 2 blue,blue,yellow,green,blue blue,yellow,green
# 3: 3 blue,red,blue,yellow blue,red,yellow
# 4: 4 red,red,yellow,yellow,green,blue red,yellow,green,blue
# 5: 5 black black
# or just simply in base
dt$colors <- lapply(dt$colors, function(x) unique(x))

We could use paste with unique and collapse:
paste(unique(string), collapse= (", "))
[1] "blue, red, yellow, green"
data:
string <- c("blue", "red", "blue", "yellow", "green", "blue")

Change vertex color when plotting community object in R igraph

I want to run a community detection algorithm on a graph g to get community object cd and then use plot(cd,g) to make a graph where the communities are contained in translucent blobs. However, I also want to colour the vertices, and it seems as if plot overrides the vertex colouring I give to V(g)$names.
Here's an example to show what I mean:
v1 <- c(1,1,2,3,4,4,6,7,8,9)
v2 <- c(2,3,3,4,5,6,5,8,9,7)
graph <- data.frame(v1,v2)
g <- graph.data.frame(graph, directed=FALSE)
cd <- fastgreedy.community(g)
vcolor <- c("white", "white", "white",
"blue", "blue", "blue",
"red", "red", "red")
vertex_attr(g)$color <- vcolor
plot(g)
plot(cd,g)
When you plot(g), the vertices are red, white and blue. However, when you plot(cd,g) they are blue, green and orange.
plot(g):
plot(cd,g):
I want to keep the translucent blobs, but force my own coloring. The reason is I want to compare community membership (blobs) to another classification (vertex colors).
I did not have the same problem when I changed vertex labels. Also I should note that this did not work:
plot(cd,g,vertex.color=vcolor)

I appreciate this is an old question but I have been struggling with it also and think I have found a simple answer, in case its helpful. Using you example:
v1 <- c(1,1,2,3,4,4,6,7,8,9)
v2 <- c(2,3,3,4,5,6,5,8,9,7)
graph <- data.frame(v1,v2)
g <- graph.data.frame(graph, directed=FALSE)
cd <- fastgreedy.community(g)
vcolor <- c("white", "white", "white",
"blue", "blue", "blue",
"red", "red", "red")
plot(cd,g,
col = vcolor)

create a vector in R using variable names

I have a variable called school_name
I am creating a vector to define colors that I will use later in ggplot2.
colors <- c("School1" = "yellow", "School2" = "red", ______ = "Orange")
In my code I am using the variable school_name for some logic want to add that as the third element of my vector. The value changes in my for loop and cannot be hard-coded.
I have tried the following but it does not work.
colors <- c("School1" = "yellow", "School2" = "red", get("school_name") = "Orange")
Please can someone help me with this

You can use structure:
school_name = "coolSchool"
colors <- structure(c("yellow", "red", "orange"), .Names = c("School1","School2", school_name))

You can just set the names of the colors using names():
colors <- c("yellow", "red", "orange")
names(colors) <- c("School1", "School2", school_name)

This also works:
school_name <- "school3"
colors <- c("School1" = "yellow", "School2" = "red")
colors[school_name] <- "Orange"
# School1 School2 school3
# "yellow" "red" "Orange"

changing the position of headings

I am trying to draw pie charts using R with the following code. The headings are far from the pie charts. I would like to get the pie charts just below the headings. How can I do that?
x <- c(632,20,491,991,20)
y <- c(37376,41770,5210,5005,3947)
names <- c("alpha","beta","gamma","delta","omega")
par(mfrow=c(1,2))
pie(x, names, col = c("red", "yellow", "blue", "green", "cyan"), main="PIE CHART 1")
pie(y, names,col = c("red", "yellow", "blue", "green", "cyan"), main="PIE CHART 2")

x <- c(632,20,491,991,20)
y <- c(37376,41770,5210,5005,3947)
names <- c("alpha","beta","gamma","delta","omega")
par(fig=c(0,0.5,0,1))
pie(x, names, col = c("red", "yellow", "blue", "green", "cyan"))
title("CHART 1", line=-3)
par(fig=c(0.5,1,0,1),new=TRUE)
pie(y, names,col = c("red", "yellow", "blue", "green", "cyan"))
title("CHART 2", line=-3)
Alterations:
Par - change control to fig=c(x,x,y,y) to specify that you want each plot to take up a portion of the window so as it is at the moment I have got each pie chart taking up half of the plot window
Par new=TRUE states that you want a second plot "overlaid"
Title - separate from plot, line=x states where you want the title to sit, play around with various - figures until you get what you want
As an alternative, you can also keep using mfrow:
par(mfrow=c(1,2))
pie(x, names, col = c("red", "yellow", "blue", "green", "cyan"))
title("PIE CHART 1", line=-1)
pie(y, names, col = c("red", "yellow", "blue", "green", "cyan"))
title("PIE CHART 2", line=-1)

Find Max Color & Count

I have a matrix in the following format:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "blue" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[2,] "green" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[3,] "yellow" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[4,] "red" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[5,] "blue" "red" "green" "blue" "blue" "red" "green" "blue" "blue"
[6,] "green" "red" "green" "blue" "blue" "red" "green" "blue" "blue"
...
How do I quickly calculate the max color and count per row.
For instance, for row 1, it would be "blue, 6". I am doing this via an apply command that calls "table".
However, my matrix has 1.9 million rows so it takes too long. How can I vectorize this?

How many different possibilities do you have for each cell of the matrix? Is it just like in your example? If yes something like the following may be faster
dat <- structure(c("blue", "green", "yellow", "red", "blue", "green",
"red", "red", "red", "red", "red", "red", "red", "red", "blue",
"blue", "blue", "blue", "green", "green", "red", "blue", "blue",
"blue", "blue", "blue", "blue", "red", "blue", "blue", "blue",
"blue", "blue", "blue", "blue", "red", "red", "red", "red", "red",
"red", "blue", "green", "green", "green", "green", "green", "green",
"blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue",
"blue", "blue", "blue", "blue", "blue", "blue", "green"), .Dim = c(7L,
9L))
values <- c("blue", "red", "green", "yellow")
counts <- vapply(values, function(value) rowSums(dat == value),
numeric(nrow(dat))) # Thanks to #RichardScriven for the improvement :)
counts
# blue red green yellow
# [1,] 6 2 1 0
# [2,] 5 2 2 0
# [3,] 5 2 1 1
# [4,] 5 3 1 0
# [5,] 5 2 2 0
# [6,] 4 2 3 0
# [7,] 4 4 1 0
max.value.col <- max.col(counts)
max.value <- colnames(counts)[max.value.col]
max.counts <- counts[cbind(1:nrow(counts), max.value.col)]
paste(max.value, max.counts, sep = ", ")
# [1] "blue, 6" "blue, 5" "blue, 5" "blue, 5" "blue, 5" "blue, 4"
If you want to get the names of all columns, if there is a tie, the following would work but may take a while (not sure about the performance of apply in this case)
max.value.all.cols <- counts == counts[cbind(1:nrow(counts), max.value.col)]
paste(
apply(max.value.all.cols, 1, function(r) paste(paste(colnames(counts)[r],
collapse = ", "))),
max.counts, sep = ", ")

Here's an actual data.table solution I think. Leverages data.table's fast .N for counting row frequencies
library(data.table)
flip <- data.table(t(mat))
tally <- lapply(names(flip),
function(x) {
setnames(flip[, .N, by=eval(x)][order(-N)][1,],
c('clr', 'N')) } )
do.call(rbind, tally)
# clr N
# 1: blue 6
# 2: blue 5
# 3: blue 5
# 4: blue 5
# 5: blue 5
# 6: blue 4
I take the matrix and transpose it, then do counts by each column (i.e. by each row of the original matrix). The setnames bit is required so that we can conveniently collapse the results together, but if you are happy to get the results in list form it's not required.
I used the same data as others:
mat <-
matrix(c( "blue","red","blue","blue","blue","red","green","blue","blue",
"green","red","blue","blue","blue","red","green","blue","blue",
"yellow","red","blue","blue","blue","red","green","blue","blue",
"red","red","blue","blue","blue","red","green","blue","blue",
"blue","red","green","blue","blue","red","green","blue","blue",
"green","red","green","blue","blue","red","green","blue","blue"),
ncol = 9, byrow = TRUE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - Finding duplicates in list entries - r

Related

Delete duplicate word, comma and whitespace

Change vertex color when plotting community object in R igraph

create a vector in R using variable names

changing the position of headings

Find Max Color & Count

Categories

Resources