How to create a vector with factor level frequency? - r

I have a factor F. I need to create a vector V of the same length of F in which there are values of the frequencies of factor levels.
For example:
F <- factor(c("a","b","c","b","a","a","a","b"))
table(F)
F
a b c
4 3 1
V should be:
V
[1] 4 3 1 3 4 4 4 3

We can use ave
ave(seq_along(X), X, FUN = length)
#[1] 4 3 1 3 4 4 4 3
Or use the table itself
as.vector(table(X)[X])
#[1] 4 3 1 3 4 4 4 3

x <- c("a","b","c","b","a","a","a","b")
Then, depending on whether you want the output to be named,
table(x)[x]
# x
# a b c b a a a b
# 4 3 1 3 4 4 4 3
c(table(x)[x])
# a b c b a a a b
# 4 3 1 3 4 4 4 3
as.numeric(table(x)[x])
# [1] 4 3 1 3 4 4 4 3
unname(table(x)[x])
# [1] 4 3 1 3 4 4 4 3

You can try this:
t=table(F)
as.numeric(sapply(1:length(F), function(i) t[F[i]]))
output
[1] 4 3 1 3 4 4 4 3

Related

Producing all combinations of two column values in R

I have a data.frame with two columns
> data.frame(a=c(5,4,3), b =c(1,2,4))
a b
1 5 1
2 4 2
3 3 4
I want to produce a list of data.frames with different combinations of those column values; there should be a total of six possible scenarios for the above example (correct me if I am wrong):
a b
1 5 1
2 4 2
3 3 4
a b
1 5 1
2 4 4
3 3 2
a b
1 5 2
2 4 1
3 3 4
a b
1 5 2
2 4 4
3 3 1
a b
1 5 4
2 4 2
3 3 1
a b
1 5 4
2 4 1
3 3 2
Is there a simple function to do it? I don't think expand.grid worked out for me.
Actually expand.grid can work here, but it is not recommended since it's rather inefficient when you have many rows in df (you need to subset n! out of n**n if you have n rows).
Below is an example using expand.grid
u <- do.call(expand.grid, rep(list(seq(nrow(df))), nrow(df)))
lapply(
asplit(
subset(
u,
apply(u, 1, FUN = function(x) length(unique(x))) == nrow(df)
), 1
), function(v) within(df, b <- b[v])
)
One more efficient option is to use perms from package pracma
library(pracma)
> lapply(asplit(perms(df$b),1),function(v) within(df,b<-v))
[[1]]
a b
1 5 4
2 4 2
3 3 1
[[2]]
a b
1 5 4
2 4 1
3 3 2
[[3]]
a b
1 5 2
2 4 4
3 3 1
[[4]]
a b
1 5 2
2 4 1
3 3 4
[[5]]
a b
1 5 1
2 4 2
3 3 4
[[6]]
a b
1 5 1
2 4 4
3 3 2
Using combinat::permn create all possible permutations of b value and for each bind it with a column.
df <- data.frame(a= c(5,4,3), b = c(1,2,4))
result <- lapply(combinat::permn(df$b), function(x) data.frame(a = df$a, b = x))
result
#[[1]]
# a b
#1 5 1
#2 4 2
#3 3 4
#[[2]]
# a b
#1 5 1
#2 4 4
#3 3 2
#[[3]]
# a b
#1 5 4
#2 4 1
#3 3 2
#[[4]]
# a b
#1 5 4
#2 4 2
#3 3 1
#[[5]]
# a b
#1 5 2
#2 4 4
#3 3 1
#[[6]]
# a b
#1 5 2
#2 4 1
#3 3 4

Removing contents from a dataset in R

I am trying to remove couple elements from a dataset. It has A,B,C,1,2,3,4,5 as its contents:
>dataset
[1] A 4 3 C 3 3 3 C 3 B 3 4 3 3 3 B 3 3 5 3 3 4 A 3 3 5 3 3 4 3 2 3 C 6 A 3 3
[38] 3 A 3 3 A 3 3 3 3 3 A 3 C B 3 B 3 A 3 1 8 1 1 C 1 1 3 3 3 3 B 3 A A 3 5 3
I want to remove all "A"s and "B"s from the dataset.
The expected dataset should only have 1,2,3,4,5,C as its elements.
I have tried with following codes but could not succeed:
>rm(dataset$"B") # to remove "B"s
> x.sub <- subset(dataset, "B" > 1) #to remove Bs appearing more than once
Do you know how can I remove them?
dataset <- dataset[!(dataset %in% c('A','B'))]

Create a new variable which count length of duplicate in R

I have a data frame,I want to create a variable z,count duplicate of "y variable", if y have 1,1 set z = 2,2, if y have 3,3,3, set z = 3,3,3.
x = c("a","b","c","d","e","a","b","c","d","e","a","b","c")
y = c(1,1,2,2,2,3,3,4,4,4,5,5,5)
data <- data.frame(x,y)
data
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
Thanks for your help.
You can try the rle:
data$z <- with(data, unlist(mapply(rep, rle(y)$lengths, rle(y)$lengths)))
data
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
If your your variable y is sorted as an increasing sequence as you say, then the following solution will work:
# calculate counts of each level
counts <- table(data$y)
# fill in z
data$z <- counts[match(data$y, names(counts))]
Note, however, that this method will fail if y is not ordered and, since you want to restart the count when a different level occurs. For these purposes, #psidom's solution is more robust to mis-ordered data as rle will reset the count.
This method calculates the total occurrences of a level and then feeds these total counts to the proper location using match.
Here is a quick method using dplyr, and its rather intuitive syntax:
library(dplyr)
left_join(data, data %>%
group_by(y) %>%
summarize(z = n()),
by = "y")
x y z
1 a 1 2
2 b 1 2
3 c 2 3
4 d 2 3
5 e 2 3
6 a 3 2
7 b 3 2
8 c 4 3
9 d 4 3
10 e 4 3
11 a 5 3
12 b 5 3
13 c 5 3
We can do this easily with data.table
library(data.table)
setDT(data)[, z := .N , rleid(y)]
data
# x y z
# 1: a 1 2
# 2: b 1 2
# 3: c 2 3
# 4: d 2 3
# 5: e 2 3
# 6: a 3 2
# 7: b 3 2
# 8: c 4 3
# 9: d 4 3
#10: e 4 3
#11: a 5 3
#12: b 5 3
#13: c 5 3
Or using rle from base R without any loops
inverse.rle(within.list(rle(data$y), values <- lengths))
#[1] 2 2 3 3 3 2 2 3 3 3 3 3 3
Or another base R method with ave
with(data, ave(y, cumsum(c(TRUE, y[-1]!= y[-length(y)])), FUN=length))
#[1] 2 2 3 3 3 2 2 3 3 3 3 3 3

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

Relevel a factor

There is given a unordered factor ID, a reference vector for the rank of each level and a label for each level. Now I want to order the ID's by given rank and after that I want to overrider the labels in the factor.
Could you give a advise if there is a better way to do so:
ID<-factor(c(1,2,2,3,1,3,3,2,1,1)+10)
Rank<-c("11"=3,"12"=1,"13"=2)
Label<-c("11"="B","12"="A","13"="C")
ID.Rank<-factor(ID, levels=names(Rank),labels=Rank)
ID.Rank<-factor(ID.Rank, levels=sort(Rank),order=T)
ID.Label<-factor(ID, levels=names(Label),labels=Label)
data.frame(ID,ID.Rank,ID.Label)
### here is importent that ID.Rank has a certain order.
factor(ID.Rank, labels=Label[match(levels(ID.Rank), Rank)])
If I understood your question correctly, here is how you can solve the problem.
set.seed(2)
ID<-as.numeric(ID)
df1<-as.data.frame(ID)
> df1
ID
1 1
2 1
3 3
4 2
5 3
6 2
7 3
8 3
9 2
10 3
df2<-as.data.frame(Rank)
df2$ID<-rownames(df2)
> df2
Rank ID
1 3 1
2 1 2
3 2 3
df3<-merge(df1,df2,by="ID")
ID Rank
1 1 3
2 1 3
3 2 1
4 2 1
5 2 1
6 3 2
7 3 2
8 3 2
9 3 2
10 3 2
df3$Rank is what you are looking as the final result. You can convert that to factor.
Updated as per comments: If you want the original order of ID:
df1$IDo<-rownames(df1)
df3
ID IDo Rank
1 1 1 3
2 1 7 3
3 1 4 3
4 2 3 1
5 2 9 1
6 2 10 1
7 3 2 2
8 3 5 2
9 3 6 2
10 3 8 2
myFac <- factor(ID, levels=Rank, labels=names(Rank) )
myFac
[1] 3 3 2 2 3 1 1 2 2 3
Levels: 1 < 2 < 3
match(levels(myFac), names(Label) )
[1] 1 2 3
Label[match(levels(myFac), names(Label) )]
1 2 3
"B" "A" "C"
levels(myFac) <- Label[match(levels(myFac), names(Label) )]
myFac
#-----
[1] C C A A C B B A A C
Levels: B < A < C
Assuming Rank and Label are always in the same order, you just need to order the labels appropriately and then use them to create the ordered factor.
ID <- factor(c(1,2,2,3,1,3,3,2,1,1)+10)
Rank <- c("11"=3,"12"=1,"13"=2)
Label <- c("11"="B","12"="A","13"="C")
Label <- Label[order(Rank)]
factor(ID, levels=names(Label), labels=Label, order=TRUE)
## [1] B A A C B C C A B B
## Levels: A < C < B

Resources