dataseti have a data and want to select all rows which has value > 1 in r.
i tried
sel <- apply(data[,collist],1,function(row) "1" %in% row)
but it is not working and give me whole a data frame,
[data set][1]
how can i subset these data?
thanks
The Note at the end shows the data used in the examples below. I have changed the headings as shown since the ones provided in the question are unwieldy and have removed the column of minus signs.
1) Using that data, the correct answer to the question of selecting all rows with a 1 in any column is that only the first two data rows are selected and that is, in fact, what happens:
subset(data, A == 1 | B == 1 | C == 1)
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
2) This version does not make use of the headings:
has1 <- rowSums(data == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
3) Although the above should work it would be a bit safer to just check the numeric columns which for this data can be done like this:
has1 <- rowSums(data[-1] == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
4) or if we did not know which columns were numeric:
is.num <- sapply(data, is.numeric)
has1 <- rowSums(data[is.num] == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
Note
As the question did not provide input in reproducible form, the input shown in such form is assumed to be:
Lines <- 'Hugo_Symbol "A - 3 A- A9J" "B - F2 - 7273 - 01" "C - FB - AAPP - 01"
ACAP3 0 0 - 1
ACTRT2 0 0 - 1
AGRN 0 0 - 0
ANKRD65 0 0 - 0
ATAD3A 0 0 - 0
'
data <- read.table(text = Lines, skip = 1, col.names = c("Sym", "A", "B", "X", "C"),
colClasses = c(NA, NA, NA, "NULL", NA))
The above produces this:
data
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
## 3 AGRN 0 0 0
## 4 ANKRD65 0 0 0
## 5 ATAD3A 0 0 0
Related
I am tasked to create the vector
0 1 0 1 0 1 0 1 0 1
using two approaches without using c() or rep() in R.
I have tried a bunch of methods, but none of them seem to work.
Here are some of my attempts (all of which have failed) -
vector(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
a<-seq(from = 0, to = 1 , by = 1)
a
replicate(5, a)
b<-1*(0:1)
do.call(cbind, replicate(5, b, simplify=FALSE))
Any help on this would be appreciated! Thank you.
We can use bitwAnd
> bitwAnd(0:9, 1)
[1] 0 1 0 1 0 1 0 1 0 1
or kronecker
> kronecker(as.vector(matrix(1, 5)), 0:1)
[1] 0 1 0 1 0 1 0 1 0 1
> kronecker((1:5)^0, 0:1)
[1] 0 1 0 1 0 1 0 1 0 1
or outer
> as.vector(outer(0:1, (1:5)^0))
[1] 0 1 0 1 0 1 0 1 0 1
Solution 1: Generalized Function my_rep()
A generalized solution my_rep() for any vector x you wish repeated n times
my_rep <- function(x, n) {
return(
# Use modulo '%%' to subscript the original vector (whose length I'll call "m"), by
# cycling 'n' times through its indices.
x[0:(length(x) * n - 1) %% length(x) + 1]
# 1 2 ... m 1 2 ... m 1 2 ... m
# | 1st cycle | | 2nd cycle | ... | nth cycle |
)
}
which can solve this case
my_rep(x = 0:1, n = 5)
# [1] 0 1 0 1 0 1 0 1 0 1
and many others
# Getting cute, to make a vector of strings without using 'c()'.
str_vec <- strsplit("a b ", split = " ")[[1]]
str_vec
# [1] "a" "b" ""
my_rep(x = str_vec, n = 3)
# [1] "a" "b" "" "a" "b" "" "a" "b" ""
Solution 2: Binary Vector of Arbitrary Length
Another quick solution, for a 0 1 0 1 ... 0 1 vector of arbitrary length l
# Whatever length you desire.
l <- 10
# Generate a vector of alternating 0s and 1s, of length 'l'.
(1:l - 1) %% 2
which yields the output:
[1] 0 1 0 1 0 1 0 1 0 1
Note
Special thanks to #Adam, who figured out 0:9 %% 2 on their own, shortly after my comment with that same solution; and who gracefully retracted their initial answer in favor of mine. :)
Exploiting boolean coercion.
+(1:10*c(-1, 1) > 0)
# [1] 0 1 0 1 0 1 0 1 0 1
Or without c().
+(1:10*(0:1*2) - 1 > 0)
# [1] 0 1 0 1 0 1 0 1 0 1
Here is a way using the apply functions.
unlist(lapply(1:5, function(x) 0:1))
# [1] 0 1 0 1 0 1 0 1 0 1
Similar but with replicate.
as.vector(replicate(5, 0:1))
# [1] 0 1 0 1 0 1 0 1 0 1
And just in case you love trig.
abs(as.integer(cos((1:10 * pi) / 2)))
# [1] 0 1 0 1 0 1 0 1 0 1
And here is one last one that I consider cheating just because. This one generalizes to any vector you want!
unlist(unname(read.table(textConnection("0 1 0 1 0 1 0 1 0 1"))))
We can use purrr::accumulate, and a simple negate(!) operation.
accumulate will perform the same operation recursively over its data argument and output all intermediate results.
In this case, it can be broken down into:
output[1] <-0
output[2] <-!output[1]
output[3] <-!output[2]
...
the output would then be c(0, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), which is coerced to numeric.
purrr::accumulate(0:9, ~!.x)
[1] 0 1 0 1 0 1 0 1 0 1
Firstly we will make a list of given no. and then apply unlist() function on list to convert it into a vector as shown in below code:
my_list = list(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
v = unlist(my_list)
print(v)
[ 1 ] 0 1 0 1 0 1 0 1 0 1
I have below-mentioned dataframe in R:
ID source_field_1 field_1 source_field_3 field_3
ER-1 AC45U CD34I 1992-01-23 23/01/1992
ER-2 AB15X 1971-01-23 23/1/1971
ER-3 DB22U AC22Z 1962-11-13 3/11/1962
ER-4 CF12R BA23D 1992-01-23 23/01/1992
I need a group by count of change of characters from column source_field_1 to field_1, from A to Z and from 0 to 9.
Required Output:
source_field_1 A B C D E . . . Z
A 1
B 1
C 1 1
D 1
E
F 1
.
. 1
. 1
Z
Need the same structure for numerical characters as well for both field_1 and field_3.
df1 <- na.omit(df)
create <- function(from,to,nm)
{
s <- sprintf("[^%s]",paste0(nm,collapse = ""))
from <- unlist(strsplit(gsub(s,"",from),""))
to <- unlist(strsplit(gsub(s,"",to),""))
table(from,to)
}
create(df1$source_field_1,df1$field_1,0:9)
to
from 2 3 4
1 1 0 0
2 2 1 0
4 0 1 0
5 0 0 1
create(df1$source_field_1,df1$field_1,LETTERS)
to
from A B C D I Z
A 0 0 1 0 0 0
B 0 0 1 0 0 0
C 0 1 0 1 0 0
D 1 0 0 0 0 0
F 1 0 0 0 0 0
R 0 0 0 1 0 0
U 0 0 0 0 1 1
This is rather simple to achieve by splitting up each character and using the table function.
library(stringr)
df <- [your df]
out <- vector('list', nrow(df))
for(i in seq_along(out)){
#Split both columns
splitted_str <- str_split(unlist(df[i, c('source_field_1', 'field_1')]), '')
#Alternative in base R:
#gsub(LETTERS, '', unlist(df[i, c('source_field_1', 'field_1')]))
#convert to factors, "levels" will be used in our columns
splitted_str <- lapply(splitted_str, factor, levels = LETTERS)
#Create table. dnn sets the names shown for column/rows
out[[i]] <- table(splitted_str, dnn = c('source_field_1', 'field_1'))
}
note that i abuse the fact that factor(...) sets all values not in levels to NA, and by default table(...) excludes these in the table.
Obviously this could all be combined into a single line
out <- lapply(seq(nrow(df)),
function(x) table(lapply(str_split(unlist(df[i, c('source_field_1', 'field_1')]), ''), factor, levels = LETTERS), dnn = c('source_field_1', 'field_1'))
)
I have a data table and one of the columns is a bunch of 0's and 1's, just like vec below.
vec = c(rep(1, times = 6), rep(0, times = 10), rep(1, times = 11), rep(0, times = 4))
> vec
[1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
What I want to do is to split the data everytime there's a change in that column from 0 to 1 or vice-versa. Here is what I have done so far:
b = c(vec[1],diff(vec))
rowby = numeric(0)
for (i in 2:(length(b))) {
if (b[i] != 0) {
rowby <- c(rowby, i-1)
}
}
splitted_data <- split(vec, cumsum(c(TRUE,(1:length(vec) %in% rowby)[-length(vec)])))
There must be some thing right under my nose I can't see. What is a correct way to do this? This works for the example above, but not generally.
Try
split(vec,cumsum(c(1, abs(diff(vec)))))
#$`1`
#[1] 1 1 1 1 1 1
#$`2`
#[1] 0 0 0 0 0 0 0 0 0 0
#$`3`
#[1] 1 1 1 1 1 1 1 1 1 1 1
#$`4`
#[1] 0 0 0 0
Or use rle
split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values))))
With current versions of data.table, rleid is one function which can be used for this job:
library(data.table)#v1.9.5+
split(vec,rleid(vec))
I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.
Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?
I have some qualitative data that I have coded into various categories and I want to provide summaries for subgroups. The RQDA package is great for coding interviews but I've struggled with creating summaries for open ended survey responses. I've managed to export the coded file into HTML, and copy/paste into Excel. I now have 500 lines with all the categories in distinct columns however the same code may appear in different columns. For example, some data:
a <- c("ResponseA", "ResponseB", "ResponseC", "ResponseD", "NA")
b <- c("ResponseD", "ResponseC", "NA", "NA","NA")
c <- c("ResponseB", "ResponseA", "ResponseE", "NA", "NA")
d <- c("ResponseC", "ResponseB", "ResponseA", "NA", "NA")
df <- data.frame (a,b,c,d)
I'd like to be able to run something like
df$ResponseA <- recode (df$a | df$b | df$c, "
'ResponseA' = '1';
else='0' ")
df$ResponseB <- recode (df$a | df$b | df$c, "
'ResponseB' = '1';
else='0' ")
In short, I'd like scan 9 columns and recode into a single binary variable.
If I understand the question correctly, perhaps you can try something like this:
## Convert your data into a long format first
dfL <- cbind(id = sequence(nrow(df)), stack(lapply(df, as.character)))
## The next three lines are mostly cleanup
dfL$id <- factor(dfL$id, sequence(nrow(df)))
dfL$values[dfL$values == "NA"] <- NA
dfL <- dfL[complete.cases(dfL), ]
## `table` is the real workhorse here
cbind(df, (table(dfL[1:2]) > 0) * 1)
# a b c d ResponseA ResponseB ResponseC ResponseD ResponseE
# 1 ResponseA ResponseD ResponseB ResponseC 1 1 1 1 0
# 2 ResponseB ResponseC ResponseA ResponseB 1 1 1 0 0
# 3 ResponseC NA ResponseE ResponseA 1 0 1 0 1
# 4 ResponseD NA NA NA 0 0 0 1 0
# 5 NA NA NA NA 0 0 0 0 0
You can also try the following:
(table(rep(1:nrow(df), ncol(df)), unlist(df)) > 0) * 1L
#
# NA ResponseA ResponseB ResponseC ResponseD ResponseE
# 1 0 1 1 1 1 0
# 2 0 1 1 1 0 0
# 3 1 1 0 1 0 1
# 4 1 0 0 0 1 0
# 5 1 0 0 0 0 0