Check for unique elements - r

just a simple question.
I have a data frame(only one vector is shown) that looks like:
cln1
A
b
A
A
c
d
A
....
I would like the following output:
cln1
b
c
d
In other words I would like to remove all items that are replicated. The functions "unique" as well as "duplicated" return the output including the replicated element represented one time. I would like to remove it definitively.

You can use setdiff for that :
R> v <- c(1,1,2,2,3,4,5)
R> setdiff(v, v[duplicated(v)])
[1] 3 4 5

You could use count from the plyr package to count the occurences of an item, and delete all who occur more than once.
library(plyr)
l = c(1,2,3,3,4,5,6,6,7)
count_l = count(l)
x freq
1 1 1
2 2 1
3 3 2
4 4 1
5 5 1
6 6 2
7 7 1
l[!l %in% with(count_l, x[freq > 1])]
[1] 1 2 4 5 7
Note the !, which means NOT. You of course put this in a oneliner:
l[!l %in% with(count(l), x[freq > 1])]

Another way using table:
With #juba's data:
as.numeric(names(which(table(v) == 1)))
# [1] 3 4 5
For OP's data, since its a character output, as.numeric is not required.
names(which(table(v) == 1))
# [1] "b" "c" "d"

Related

How to sort a vector and print Top X value when the value is flat?

R language : How to sort a vector and print Top X value when the value is flat ?
If I have a vector like
v <- c(1,2,3,3,4,5)
I want to print the TOP1~TOP3 values.
So I use:
sort(v)[1:3]
[1] 1 2 3
In this case,TOP3 have 2 value
what I want to print is:
[1] 1 2 3 3
and their index
One way to do it:
v[v %in% sort(v)[1:3]]
# [1] 1 2 3 3
# following up OP's comment, if you want ordered outcomes:
# sort(v[v %in% sort(v)[1:3]])
We can use top_n from dplyr
library(dplyr)
data.frame(v) %>% top_n(-3)
# v
#1 1
#2 2
#3 3
#4 3
this returns a dataframe, if you want a vector pull it
data.frame(v) %>% top_n(-3) %>% pull(v)
#[1] 1 2 3 3

How to check if a list contains a certain element in R

I have the following list
A = list(c(1,2,3,4), c(5,6,7,8), c(4,6,2,3,1), c(6,2,1,7,12, 15, 16, 10))
A
[[1]]
[1] 1 2 3 4
[[2]]
[1] 5 6 7 8
[[3]]
[1] 4 6 2 3 1
[[4]]
[1] 6 2 1 7 12 15 16 10
I want to check if the element 2 is present each list or not. If it exists, then I need to assign 1 to that corresponding list.
Thank you in advance.
#jasbner's comment can be further refined to
1 * sapply(A, `%in%`, x = 2)
# [1] 1 0 1 1
In this case sapply returns a logical vector, and then multiplication by 1 coerces TRUE to 1 and FALSE to 0. Also, as the syntax is x %in% table, we may avoid defining an anonymous function function(x) 2 %in% x and instead write as above. Lastly, using sapply rather than lapply returns a vector rather than a list, which seems to be what you are after.
Here is an option with tidyverse
library(tidyverse)
map_lgl(A, `%in%`, x = 2) %>%
as.integer
#[1] 1 0 1 1
Here is a simple version with replacement!
lapply(A, function(x) ifelse(x==2,1,x))

Change dataframe values R using different column name provided?

I have the following data frame:
Column1 Default_Val
1 A 2
2 B 2
3 C 2
4 D 2
5 E 2
...
colnames: "Column1" "Default_Val"
rownames: "1" "2" "3" "4" "5"
This data frame is part of my function and this function changes the default values according to some if's.
I want to generalize the assignment process because I want to support different column names of this data frame.
Please advise how can I change the default value without being dependent of column names?
Here is what I did so far:
df[Column1 == "A","Default_Val"]
[1] 2
df[Column1 == "A","Default_Val"] = 2
df[Column1 == "A","Default_Val"]
[1] 1
I want something generalized like:
t <- colnames(df)
df[t[1] == "A", t[2]] = 7
For some reason it doesn't work (each time this happens I love Python more :)).
Please advise.
I think it must be straightforward. Please check if this solves your problem.
> df
Column1 Default_val
1 A 1
2 B 3
3 A 4
4 C 1
5 D 4
> df[2][df[1] == 'A'] = 3
> df
Column1 Default_val
1 A 3
2 B 3
3 A 3
4 C 1
5 D 4

How to subset a list of vectors based on a vector of indexes?

I would like to use purrr to subset the elements from this list
u <- list(a=1:10, b=1:10)
using maxCol as the highest bound for a vector of indexes starting from 1. For example, suppose that
maxCol <- c(6L, 3L)
Then the output should look like
$a
[1] 1 2 3 4 5 6
$b
[1] 1 2 3
In fact, I want to keep all the values from position 1 to position 6 from a, and from position 1 to 3 from b.
I know how to do it with a loop but I would like to use purrr . For example, I thought something like this would work but it didn't:
map2(u, maxRow, u[1:maxCol])
We can use Map from base R
Map(head, u, n = maxCol)
This worked for me:
map2(u,maxCol,head)
Output
$a
[1] 1 2 3 4 5 6
$b
[1] 1 2 3

Create a vector listing run length of original vector with same length as original vector

This problem seems trivial but I'm at my wits end after hours of reading.
I need to generate a vector of the same length as the input vector that lists for each value of the input vector the total count for that value. So, by way of example, I would want to generate the last column of this dataframe:
> df
customer.id transaction.count total.transactions
1 1 1 4
2 1 2 4
3 1 3 4
4 1 4 4
5 2 1 2
6 2 2 2
7 3 1 3
8 3 2 3
9 3 3 3
10 4 1 1
I realise this could be done two ways, either by using run lengths of the first column, or grouping the second column using the first and applying a maximum.
I've tried both tapply:
> tapply(df$transaction.count, df$customer.id, max)
And rle:
> rle(df$customer.id)
But both return a vector of shorter length than the original:
[1] 4 2 3 1
Any help gratefully accepted!
You can do it without creating transaction counter with:
df$total.transactions <- with( df,
ave( transaction.count , customer.id , FUN=length) )
You can use rle with rep to get what you want:
x <- rep(1:4, 4:1)
> x
[1] 1 1 1 1 2 2 2 3 3 4
rep(rle(x)$lengths, rle(x)$lengths)
> rep(rle(x)$lengths, rle(x)$lengths)
[1] 4 4 4 4 3 3 3 2 2 1
For performance purposes, you could store the rle object separately so it is only called once.
Or as Karsten suggested with ddply from plyr:
require(plyr)
#Expects data.frame
dat <- data.frame(x = rep(1:4, 4:1))
ddply(dat, "x", transform, total = length(x))
You are probably looking for split-apply-combine approach; have a look at ddply in the plyr package or the split function in base R.

Resources