Removing contents from a dataset in R - r

I am trying to remove couple elements from a dataset. It has A,B,C,1,2,3,4,5 as its contents:
>dataset
[1] A 4 3 C 3 3 3 C 3 B 3 4 3 3 3 B 3 3 5 3 3 4 A 3 3 5 3 3 4 3 2 3 C 6 A 3 3
[38] 3 A 3 3 A 3 3 3 3 3 A 3 C B 3 B 3 A 3 1 8 1 1 C 1 1 3 3 3 3 B 3 A A 3 5 3
I want to remove all "A"s and "B"s from the dataset.
The expected dataset should only have 1,2,3,4,5,C as its elements.
I have tried with following codes but could not succeed:
>rm(dataset$"B") # to remove "B"s
> x.sub <- subset(dataset, "B" > 1) #to remove Bs appearing more than once
Do you know how can I remove them?

dataset <- dataset[!(dataset %in% c('A','B'))]

Related

Producing all combinations of two column values in R

I have a data.frame with two columns
> data.frame(a=c(5,4,3), b =c(1,2,4))
a b
1 5 1
2 4 2
3 3 4
I want to produce a list of data.frames with different combinations of those column values; there should be a total of six possible scenarios for the above example (correct me if I am wrong):
a b
1 5 1
2 4 2
3 3 4
a b
1 5 1
2 4 4
3 3 2
a b
1 5 2
2 4 1
3 3 4
a b
1 5 2
2 4 4
3 3 1
a b
1 5 4
2 4 2
3 3 1
a b
1 5 4
2 4 1
3 3 2
Is there a simple function to do it? I don't think expand.grid worked out for me.
Actually expand.grid can work here, but it is not recommended since it's rather inefficient when you have many rows in df (you need to subset n! out of n**n if you have n rows).
Below is an example using expand.grid
u <- do.call(expand.grid, rep(list(seq(nrow(df))), nrow(df)))
lapply(
asplit(
subset(
u,
apply(u, 1, FUN = function(x) length(unique(x))) == nrow(df)
), 1
), function(v) within(df, b <- b[v])
)
One more efficient option is to use perms from package pracma
library(pracma)
> lapply(asplit(perms(df$b),1),function(v) within(df,b<-v))
[[1]]
a b
1 5 4
2 4 2
3 3 1
[[2]]
a b
1 5 4
2 4 1
3 3 2
[[3]]
a b
1 5 2
2 4 4
3 3 1
[[4]]
a b
1 5 2
2 4 1
3 3 4
[[5]]
a b
1 5 1
2 4 2
3 3 4
[[6]]
a b
1 5 1
2 4 4
3 3 2
Using combinat::permn create all possible permutations of b value and for each bind it with a column.
df <- data.frame(a= c(5,4,3), b = c(1,2,4))
result <- lapply(combinat::permn(df$b), function(x) data.frame(a = df$a, b = x))
result
#[[1]]
# a b
#1 5 1
#2 4 2
#3 3 4
#[[2]]
# a b
#1 5 1
#2 4 4
#3 3 2
#[[3]]
# a b
#1 5 4
#2 4 1
#3 3 2
#[[4]]
# a b
#1 5 4
#2 4 2
#3 3 1
#[[5]]
# a b
#1 5 2
#2 4 4
#3 3 1
#[[6]]
# a b
#1 5 2
#2 4 1
#3 3 4

Subset rows excluse special values

I want to subset rows which do not contain special values. For example:
df <- data.frame(a=c(1,2,2,3,4,4),b=c(-9999,2,3,4,5,6),c=c(2,3,4,-9999,2,4))
a b c
1 1 -9999 2
2 2 2 3
3 2 3 4
4 3 4 -9999
5 4 5 2
6 4 6 4
df has many rows and columns , I want to subset the rows which don't contain -9999. Expect result as follow codes:
df[which(df$a!=-9999,df$b!=-9999,df$c!=-9999),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4
when columns are to many to type above logical judge, how to subset it?
You can try this one:
temp <- which(df == "-9999",arr.ind = T)
df[-unique(temp[,1]),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4

merge/join two long df in R

I have two dataframes a and b which I would like to combine
a <- data.frame(g=c("1","2","2","3","3","3","4","4","4","4"),h=c("1","1","2","1","2","3","1","2","3","4"))
b <- data.frame(g=c("1","2","3","3","3","4","4","4","4","4"),i=c("1","2","3","2","1","2","3","4","5","6"))
g represents a grouping variable and h and i the columns I want to merge/join
> a
g h
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
9 4 3
10 4 4
> b
g i
1 1 1
2 2 2
3 3 3
4 3 2
5 3 1
6 4 2
7 4 3
8 4 4
9 4 5
10 4 6
a and b should be merged on the level of the grouping variable g whereas identical values of h and i should be put together (independant of the order they appear in h/i) and not identical values should be combined once (not all possible combinations).
a final df would look like:
g h i
1 1 1 1
2 2 1 <NA>
3 2 2 2
4 3 1 1
5 3 2 2
6 3 3 3
7 4 1 <NA>
8 4 2 2
9 4 3 3
10 4 4 4
11 4 <NA> 5
12 4 <NA> 6
I need that df to perform a correlation analysis.
Sounds like a merge on h==i, while retaining i, so create a new variable x to join on, and keep join results from both sides (all=TRUE). With a large hat-tip to #Moody_Mudskipper:
merge(transform(a,x=h), transform(b,x=i), all=TRUE)
# g x h i
#1 1 1 1 1
#2 2 1 1 <NA>
#3 2 2 2 2
#4 3 1 1 1
#5 3 2 2 2
#6 3 3 3 3
#7 4 1 1 <NA>
#8 4 2 2 2
#9 4 3 3 3
#10 4 4 4 4
#11 4 5 <NA> 5
#12 4 6 <NA> 6
We can also do this with dplyr
library(dplyr)
a %>%
mutate(x = h) %>%
full_join(mutate(b, x = i)) %>%
select(-x)

How to create a vector with factor level frequency?

I have a factor F. I need to create a vector V of the same length of F in which there are values of the frequencies of factor levels.
For example:
F <- factor(c("a","b","c","b","a","a","a","b"))
table(F)
F
a b c
4 3 1
V should be:
V
[1] 4 3 1 3 4 4 4 3
We can use ave
ave(seq_along(X), X, FUN = length)
#[1] 4 3 1 3 4 4 4 3
Or use the table itself
as.vector(table(X)[X])
#[1] 4 3 1 3 4 4 4 3
x <- c("a","b","c","b","a","a","a","b")
Then, depending on whether you want the output to be named,
table(x)[x]
# x
# a b c b a a a b
# 4 3 1 3 4 4 4 3
c(table(x)[x])
# a b c b a a a b
# 4 3 1 3 4 4 4 3
as.numeric(table(x)[x])
# [1] 4 3 1 3 4 4 4 3
unname(table(x)[x])
# [1] 4 3 1 3 4 4 4 3
You can try this:
t=table(F)
as.numeric(sapply(1:length(F), function(i) t[F[i]]))
output
[1] 4 3 1 3 4 4 4 3

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

Resources