Delete multiple rows based on some constrains - r

I'm using R and I am trying to delete some rows from a data frame based on some constrains. so, if I got
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
I'd like to delete all the rows having a "N" at some given columns such as R1, R3, R4. For one single column, I found this solution: delete row for certain constrains
d <- dat[dat[,"R1"]!="N",]
which works fine. but if I put multiple columns as
d <- dat[dat[,c("R1","R3","R4")]!="N",]
I got lots of extra rows full of NA. So where am I wrong?

You can use
dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
# Cs R1 R2 R3 R4 R5 R6
#5 c5 Y Y Y Y Y Y
Or, if you don't like excessive typing:
dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]
This will first test each "cell" of columns "R1", "R3" and "R4" of your data whether it is equal to "N" and then calculate the sums of TRUE values per row. If no "N" is present in a row, the sum is equal to 0 and will be kept. I added drop=FALSE to keep the structure as a data.frame.
Note after a comment by OP:
If you subset only 1 column of a data.frame without specifying a drop=TRUE option, the default behavior of [.data.frame is to coerce the resulting 1-column-data.frame to an atomic vector. Then, rowSums wouldn't work on that resulting vector. To avoid that, change your code to:
dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ]
Sample data:
set.seed(5)
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))

You could make a 'keep'-variable consisting of booleans for each row:
keep <- apply(dat[,c("R1","R3","R4")],
MARGIN=1,
FUN=function(x){all(x!='N')})
res <- dat[keep,]
> res
Cs R1 R2 R3 R4 R5 R6
1 c1 Y Y Y Y Y Y
data:
seed used: 1234
dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3",
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L,
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L,
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L,
1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N",
"Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3",
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")

Related

overlapping unique dataframes in R

My two dataframes are:
df1<-structure(list(header1 = structure(1:4, .Label = c("a", "b",
"c", "d"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
and
df2<-structure(list(sample_x = structure(c(1L, 1L, 2L, 3L), .Label = c("0",
"a", "c"), class = "factor"), sample_y = structure(c(1L, 3L,
2L, 4L), .Label = c("0", "a", "m", "t"), class = "factor"), sample_z = structure(c(3L,
2L, 1L, 1L), .Label = c("0", "a", "c"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
0s in df2 means no values.
Now I want to overlap df1 and df2 to make an output dataframe(df3):
df3<-structure(list(sample_x = c(2L, 2L, 0L), sample_y = c(1L, 3L,
2L), sample_z = c(2L, 2L, 0L)), class = "data.frame", row.names = c("overlap_df1_df2",
"unique_df1", "unique_df2"))
I tried the datatable function foverlaps:
setkeyv(df1, names(df1))
setkeyv(df2, names(df2))
df3<-foverlaps(df1,df2)
But seems like I need to have some common column names in these two dataframes, which is obviously not the case.
Thank you!
Loop through columns, and use set operations:
sapply(df2, function(i){
x = i[ !is.na(i) ]
o = intersect(df1$header1, x)
u_df1 = setdiff(df1$header1, o)
u_df2 = setdiff(x, o)
c(o = length(o),
u_df1 = length(u_df1),
u_df2 = length(u_df2))
})
# sample_x sample_y sample_z
# o 2 1 2
# u_df1 2 3 2
# u_df2 0 2 0
A solution using map:
library(purrr)
rbind(
overlap = map_dbl(df2, ~length(intersect(df1$header1, .x))),
unique_df1 = map_dbl(df2, ~length(setdiff(df1$header1, .x))),
unique_df2 = unique_df1 - overlap
)
sample_x sample_y sample_z
overlap 2 1 2
unique_df1 2 3 2
unique_df2 0 2 0

How to make a color bar using three columns?

I have a table like following:
ID type group
A3EP 1 M
A3MA 2 M
A459 3 M
A3I1 5 M
A9D2 7 M
A3M9 4 M
A7XP 6 M
A4ZP 8 M
I want to make a color bar like following: Red color represents "group" and below that each color represents "type" and below that I want the "ID" names.
Can anyone please tell me how to do this? Thank you.
mypalette <- rainbow(8)
barplot(rep(0.5,8), width=1, space=0, col=mypalette, axes=F)
text(df$type-.5, .2, df$ID, srt=90)
rect(0, .4, 8, .5, col="red")
text(4, .45, "M")
Input data:
df <- structure(list(ID = structure(c(1L, 4L, 5L, 2L, 8L, 3L, 7L, 6L
), .Label = c("A3EP", "A3I1", "A3M9", "A3MA", "A459", "A4ZP",
"A7XP", "A9D2"), class = "factor"), type = 1:8, group = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "M")), .Names =
c("ID",
"type", "group"), row.names = c(NA, -8L), class = "data.frame")

R : Finding the corresponding row value

I'm trying to get the data from column one that matches with column 2 but only on the "B" values. Need to somehow make the true values a list.
Need this to repeat for 50,000 rows. Around 37,000 of them are true.
I'm incredibly new to this so any help would be nice.
Data <- data.frame(
X = sample(1:10),
Y = sample(c("B", "W"), 10, replace = TRUE)
)
Count <- 1
If(data[count,2] == "B") {
List <- list(data[count,1]
Count <- count + 1
#I'm not sure what to use to repeat I just put
Repeat
} else {
Count <- count + 1
Repeat
}
End result should be a list() of only column one data.
In this if rows 1-5 had "B" I want the column one numbers from that.
Not sure if I understood correctly what you're looking for, but from the comments I would assume that this might help:
setNames(data.frame(Data[1][Data[2]=="B"]), "selected")
# selected
#1 2
#2 5
#3 7
#4 6
No loop needed.
data
Data <- structure(list(X = c(10L, 4L, 9L, 8L, 3L, 2L, 5L, 1L, 7L, 6L),
Y = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L),
.Label = c("B", "W"), class = "factor")),
.Names = c("X", "Y"), row.names = c(NA, -10L),
class = "data.frame")

R program, ?count, rename "freq" to something else

I am studying this webpage, and cannot figure out how to rename freq to something else, say number of times imbibed
Here is dput
structure(list(name = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Bill", "Llib"), class = "factor"), drink = structure(c(2L,
3L, 1L, 4L, 2L, 3L, 1L, 4L), .Label = c("cocoa", "coffee", "tea",
"water"), class = "factor"), cost = 1:8), .Names = c("name",
"drink", "cost"), row.names = c(NA, -8L), class = "data.frame")
And this is working code with output. Again, I'd like to rename the freq column. Thanks!
library(plyr)
bevs$cost <- as.integer(bevs$cost)
count(bevs, "name")
Output
name freq
1 Bill 4
2 Llib 4
Are you trying to do this?
counts <- count(bevs, "name")
names(counts) <- c("name", "number of times imbibed")
counts
The count() function returns a data.frame. Just rename it like any other data.frame:
counts <- count(bevs, "name")
names(counts)[which(names(counts) == "freq")] <- "number of times imbibed"
print(counts)
# name number of times imbibed
# 1 Bill 4
# 2 Llib 4

Converting bar chart to pie chart in R

I have following data and code:
dd
grp categ condition value
1 A X P 2
2 B X P 5
3 A Y P 9
4 B Y P 6
5 A X Q 4
6 B X Q 5
7 A Y Q 8
8 B Y Q 2
>
>
dput(dd)
structure(list(grp = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("A", "B"), class = "factor"), categ = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
condition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("P",
"Q"), class = "factor"), value = c(2, 5, 9, 6, 4, 5, 8, 2
)), .Names = c("grp", "categ", "condition", "value"), out.attrs = structure(list(
dim = structure(c(2L, 2L, 2L), .Names = c("grp", "categ",
"condition")), dimnames = structure(list(grp = c("grp=A",
"grp=B"), categ = c("categ=X", "categ=Y"), condition = c("condition=P",
"condition=Q")), .Names = c("grp", "categ", "condition"))), .Names = c("dim",
"dimnames")), row.names = c(NA, -8L), class = "data.frame")
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)
How can I convert this bar chart to pie chart? I want 4 pies here with their sizes corresponding to heights of respective bars here. I tried following but they did not work:
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar()
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar('y')
I also tried to make pie chart similar to Pie charts in ggplot2 with variable pie sizes but I am not able to manage with my data. Thanks for your help.
Using the same idea as in the link you posted, you could add a column size do your dataframe that would be the sum of the values for each group, and use that as the width argument:
library(dplyr)
dd<-dd %>% group_by(categ,grp) %>% mutate(size=sum(value))
ggplot(dd, aes(x=size/2,y=value,fill=condition,width=size))+geom_bar(position="fill",stat='identity')+facet_grid(grp~categ)+coord_polar("y")
You want the group and category both to be variables for the grid, and not inside any plot. Here are two different layouts. X ought to be any single item, string, or something else.
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(~grp+categ)+coord_polar("x")
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(grp~categ)+coord_polar("x")
Something strange happened with the top opening here, maybe its just my interface. Should get you going enough though!

Resources