I have this R data frame:
v1 <- LETTERS[1:10]
v2 <- LETTERS[1:4]
v3 <- LETTERS[4:5]
dat <- data.frame(cbind(v1,v2,v3))
v1 v2 v3
A A D
B B E
C C D
D D E
E A D
F B E
G C D
H D E
I A D
J B E
I would like to get a count of the number of occurrences of a given value (e.g. "A") for each column,
and save that as a new column in my data frame.
I my dataframe I want to calculate the occurrences af "A" in column v1 thru v3, and make a new column (CountA) with the count of A's.
My desired output would be:
v1 v2 v3 CountA
A A D 2
B B E 0
C C D 0
D D E 0
E A D 1
F B E 0
G C D 0
H D E 0
I A D 1
J B E 0
Try this:
dat$CountA <- rowSums(dat=="A")
Related
I got these two data frames:
a <- c('A','B','C','D','E','F','G','H')
b <- c(1,2,1,3,1,3,1,6)
c <- c('K','K','H','H','K','K','H','H')
frame1 <- data.frame(a,b,c)
a <- c('A','A','B','B','C','C','D','D','E','E','F','F','G','H','H')
d <- c(5,5,6,3,1,9,1,0,2,3,6,5,5,5,4)
e <- c('W','W','D','D','D','D','W','W','D','D','W','W','D','W','W')
frame2<- data.frame(a,d,e)
And now I want to include the column 'e' from 'frame2' into 'frame1' depending on the matching value in column 'a' of both data frames. Note: 'e' is the same for all rows with the same value in 'a'.
The result should look like this:
a b c e
1 A 1 K W
2 B 2 K D
3 C 1 H D
4 D 3 H W
5 E 1 K D
6 F 3 K W
7 G 1 H D
8 H 6 H W
Any sugestions?
You can use match to matching value in column 'a' of both data frames:
frame1$e <- frame2$e[match(frame1$a, frame2$a)]
frame1
# a b c e
#1 A 1 K W
#2 B 2 K D
#3 C 1 H D
#4 D 3 H W
#5 E 1 K D
#6 F 3 K W
#7 G 1 H D
#8 H 6 H W
or using merge:
merge(frame1, frame2[!duplicated(frame2$a), c("a", "e")], all.x=TRUE)
you can perform join operation on 'a' column of both dataframes and take those values only which are matched. you can do left join , and after that remove 'a' column from 2nd dataframe and also remove rest of the columns, which are'nt needed from 2nd dataframe.
Using dplyr :
library(dplyr)
frame2 %>%
distinct(a, e, .keep_all = TRUE) %>%
right_join(frame1, by = 'a') %>%
select(-d) %>%
arrange(a)
# a e b c
#1 A W 1 K
#2 B D 2 K
#3 C D 1 H
#4 D W 3 H
#5 E D 1 K
#6 F W 3 K
#7 G D 1 H
#8 H W 6 H
I have the following table that I generated using the table(data$a, data$b) function
a b c NA
d 0 45 42 63 0
e 0 12 45 63 0
f 0 95 65 21 0
NA 0 0 0 0 0
How can I remove the columns with " " and NA?
Here is a reproducible example
a b
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
b d
b d
b e
b e
b e
b e
c e
c e
c e
c e
c e
c e
c e
c e
c e
c e
c f
c f
c f
c f
c f
c f
c f
c f
c f
c f
c f
Note that there are no "" or NAs in the set, but they still appear in the table
In this table, both of the variables are factors.
Thank you!
It is possible that the NAs are character strings "NA" instead of NA, otherwise, the table would pick up with default useNA= "no" and remove it. One option is to change the values '' and "NA" to NA
df1[df1 == "NA"|df1 == ""] <- NA
Assuming that we have two column dataframe and all of the columns are character class
Update
If the dataset have "NA" or "", it would be a factor class column with unused levels already existing. One option is droplevels and then apply the table
table(droplevels(df1))
If we create a table called "mytable", you could try the following:
bad_cols <- which(colnames(mytable) == "NA" || colnames(mytable) == "")
mytable <- mytable[, -bad_cols]
This will first find the positions in which we either have NA or "" in the column, then we exclude it via subsetting and save it in the variable „mytable“ again.
How can R report the actual name i, when using it to name columns and lists in a for loop.
For example, using the following data:
z <- data.frame(x= c(1,2,3,4,5), y = c("a", "b", "v", "d", "e"))
When I reference i from the loop when creating the columns it names it i as the column names.
a_final <- NULL
for(i in z$x){
print(data.frame(i = z$y))
}
Instead, I'd like the columns to be named by the value of each i in the loop, instead.
I'd like the results to look something like:
1 2 3 4 5 6
a a a a a a
b b b b b b
c c c c c c
d d d d d d
e e e e e e
You could create a matrix with data from z$y and dimensions same as nrow(z) and convert it into dataframe.
as.data.frame(matrix(z$y, ncol = nrow(z), nrow = nrow(z)))
# V1 V2 V3 V4 V5
#1 a a a a a
#2 b b b b b
#3 c c c c c
#4 d d d d d
#5 e e e e e
We can also use replicate
as.data.frame(replicate(nrow(z), z$y))
I have many columns (120) in a dataframe.
I would like to create a new column in this dataframe, with each row containing the unique values (ignoring NAs) from the values across each row in the 120 columns. For example:
V1 V2 V3 V4 V5
a a NA c d
c d e f e
x x x NA NA
a V6 column should be added
V6
a c d
c d e f
x
Any suggestion is more than welcome!
Wannes
By using apply and toString
db$New=apply(db,1,function(x) toString(sort(unique(x[!is.na(x)]))))
db
V1 V2 V3 V4 V5 New
1 a a <NA> c d a, c, d
2 c d e f e c, d, e, f
3 x x x <NA> <NA> x
Or using paste
db$New=apply(db,1,function(x) paste(sort(unique(x[!is.na(x)]),collapse = ' ')))
db
V1 V2 V3 V4 V5 New
1 a a <NA> c d a c d
2 c d e f e c d e f
3 x x x <NA> <NA> x
The added sort ensures that the same set of unique values always appears in the same order.
This question already has answers here:
How to find mode across variables/vectors within a data row in R
(3 answers)
Closed 9 years ago.
Is it possible to count unique elements in data frame row and return one with maximum occurrence and as result form the vector.
example:
a a a b b b b -> b
c v f w w r t -> w
s s d f b b b -> b
You can use apply to use table function on every row of dataframe.
df <- read.table(textConnection("a a a b b b b\nc v f w w r t\ns s d f b b b"), header = F)
df$result <- apply(df, 1, function(x) names(table(x))[which.max(table(x))])
df
## V1 V2 V3 V4 V5 V6 V7 result
## 1 a a a b b b b b
## 2 c v f w w r t w
## 3 s s d f b b b b
Yes with table
x=c("a", "a", "a", "b" ,"b" ,"b" ,"b")
table(x)
x
a b
3 4
EDIT with data.table
DT = data.table(x=sample(letters[1:5],10,T),y=sample(letters[1:5],10,T))
#DT
# x y
# 1: d a
# 2: c d
# 3: d c
# 4: c a
# 5: a e
# 6: d c
# 7: c b
# 8: a b
# 9: b c
#10: c d
f = function(x) names(table(x))[which.max(table(x))]
DT[,lapply(.SD,f)]
# x y
#1: c c
Note that if you want to keep ALL max's, you need to ask for them explicitly.
You can save them as a list inside the data.frame. If there is only one per row, then the list will be simplified to a common vector
df$result <- apply(df, 1, function(x) {T <- table(x); list(T[which(T==max(T))])})
With Ties for max:
df2 <- df[, 1:6]
df2$result <- apply(df2, 1, function(x) {T <- table(x); list(T[which(T==max(T))])})
> df2
V1 V2 V3 V4 V5 V6 result
1 a a a b b b 3, 3
2 c v f w w r 2
3 s s d f b b 2, 2
With No Ties for max:
df$result <- apply(df, 1, function(x) {T <- table(x); list(T[which(T==max(T))])})
> df
V1 V2 V3 V4 V5 V6 V7 result
1 a a a b b b b 4
2 c v f w w r t 2
3 s s d f b b b 3