Select rows in a dataframe based on values of all columns [duplicate] - r

This question already has an answer here:
Subset dataframe such that all values in each row are less than a certain value
(1 answer)
Closed 6 years ago.
I feel this question is very simple, but I am new in R, and I don't know how to solve it.
I have a dataframe df with 100 rows. The first column is Patient_ID and all the others are measurements of T cells over time. I want to select the rows (the patients) in which all the cell measurements are lower than 200.
My idea (maybe very complicated) was:
f200 = function(x){x "inferior to" 200}
df2 = f200(df[,2:10])
select the rows where all elements are True, i.e., where product of all elements is equal to 1... But I don't know how to write this! Can you help me? Or tell me a simpler way?

We can try with Reduce and &
df[Reduce(`&`, lapply(replace(df[-1], is.na(df[-1]), 0), `<`, 200)),]
# ID col1 col2
#1 1 NA 24
#2 2 20 NA
data
set.seed(24)
df <- data.frame(ID=1:4, col1 = c(NA, 20, 210, 30), col2 = c(24, NA, 30, 240))

Related

Select factor values with level NA [duplicate]

This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed 5 years ago.
How can I avoid using a loop to subset a dataframe based on multiple factor levels?
In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected".
Working example:
#sample data
Code<-c("A","B","C","D","C","D","A","A")
Value<-c(1, 2, 3, 4, 1, 2, 3, 4)
data<-data.frame(cbind(Code, Value))
selected<-c("A","B") #want rows that contain A and B
#Begin subsetting
result<-data[which(data$Code==selected[1]),]
s1<-2
while(s1<length(selected)+1)
{
result<-rbind(result,data[which(data$Code==selected[s1]),])
s1<-s1+1
}
This is a toy example of a much larger dataset, so "selected" may contain a great number of elements and the data a great number of rows. Therefore I would like to avoid the loop.
You can use %in%
data[data$Code %in% selected,]
Code Value
1 A 1
2 B 2
7 A 3
8 A 4
Here's another:
data[data$Code == "A" | data$Code == "B", ]
It's also worth mentioning that the subsetting factor doesn't have to be part of the data frame if it matches the data frame rows in length and order. In this case we made our data frame from this factor anyway. So,
data[Code == "A" | Code == "B", ]
also works, which is one of the really useful things about R.
Try this:
> data[match(as.character(data$Code), selected, nomatch = FALSE), ]
Code Value
1 A 1
2 B 2
1.1 A 1
1.2 A 1

How to count the number of occurence of First Charcter of each string of a column in R [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 4 years ago.
I have a data set which has a single column containing multiple names.
For eg
Alex
Brad
Chrisitne
Alexa
Brandone
And almost 100 records like this. I want to display record as
A 2
B 2
C 1
Which means i need to show this frequency from higher to lower and if there is a tie breaker , the the values should be shown in Alphabetical Order .
I have been trying to solve this but i am not able to.
Any pointer on these ?
df <- data.frame(name = c("Alex", "Brad", "Brad"))
first_characters <- substr(df$name, 1, 1)
result <- sort(table(first_characters), decreasing = TRUE)
# from wide to long
data.frame(result)

Recursively compute difference between consecutive records within each group [duplicate]

This question already has answers here:
Calculate difference between values in consecutive rows by group
(4 answers)
Calculating the difference between consecutive rows by group using dplyr?
(2 answers)
Closed 5 years ago.
Goodmorning StackOverflow,
I have seen answer to similar question, however, they do not consider the group_ID and are not efficient enough to be run on massive datasets.
I am struggling to find a solution to the following task:
within the consecutive elements of each group_ID, recursively compute the difference with the previous element starting from the second to the last element belonging to that group_ID.
Therefore, considering the following sample data:
data <- data.frame(time = c(1:3, 1:4),
group_ID = c(rep(c("1", "2"), c(3, 4))),
value = c(0, 400, 2000, 0, 500, 2000, 2120))
The expected result of the solution I am trying to find is:
solution_df <- data.frame(time = c(1:3, 1:4),
group_ID = c(rep(c("1", "2"), c(3, 4))),
difference = c(NA, 400, 1600, NA, 500, 1500, 120))
It is critical to bear in mind the dataset is massive and the solution must be efficient.
I hope the question was clear, otherwise please ask for further details.
You could use data.table for grouping and diff to calculate the differences.
library(data.table)
setDT(data)
data[, .(time = time,
difference = c(NA, diff(value))), by = group_ID]
# group_ID time difference
#1: 1 1 NA
#2: 1 2 400
#3: 1 3 1600
#4: 2 1 NA
#5: 2 2 500
#6: 2 3 1500
#7: 2 4 120
I don't know what is supposed to be recursive here.

Extract matrix column with it's name [duplicate]

This question already has an answer here:
How to subset matrix to one column, maintain matrix data type, maintain row/column names?
(1 answer)
Closed 5 years ago.
Let's say you have a matrix defined as
m1 = matrix(
rbind(c(12,8,9),c(4100,3600,3200)),
byrow=FALSE,
nrow=2,
ncol=3,
dimnames=list(c("Days","Amount"),c("Col1","Col2","Col3"))
)
Which yields:
Col1 Col2 Col3
Days 12 8 9
Amount 4100 3600 3200
And you need to show (knowing the position of the column, here 3) name of the column and its values, so that you have information about the parameters like:
Days Amount
9 3200
But you also need to know the column name which carries some real information about its values (ie hotel name).
The above you could achieve with m1[, 3] like in this question, but how does one print it together with the column header? (here "Col3")
We can use drop = FALSE without converting to data.frame
m1[,3, drop = FALSE]
# Col3
#Days 9
#Amount 3200
You could coerce m1 to data.frame and slice the required column
as.data.frame(m1)[3]
#OR
as.data.frame(m1)["Col3"]
# Col3
#Days 9
#Amount 3200

R Shuffle values in columns 3-6 based on values in columns 1 AND 2

I am very new to R and I really try to get better, but I have been stuck on the following problem for some time now:
I have a data frame with let's say 6 columns and 20 rows. What I need to do is shuffle my data per column, but only for columns 3-6 based on the values of columns 1 AND 2.
I will try to give an example to make it more clear: I am dealing with two quantified transcripts of two speakers each. So column 1 is a number for the Talk, column 2 is a number for the speaker. So now I need to filter my data by Talk and Speaker and then randomly shuffle my data in all other columns. And I need to repeat this for all talks and all speakers.
Does anyone have any ideas how to approach this?
We can try
library(data.table)
setDT(df1)[, lapply(.SD, function(x) x[sample(seq_along(x))]), .(Talk, Speaker)]
Or using dplyr
library(dplyr)
df1 %>%
group_by(Talk, Speaker) %>%
mutate_each(funs(.[sample(row_number())]))
data
set.seed(49)
df1 <- data.frame(Talk = rep(1:3, each = 3), Speaker = sample(1:3, 9,
replace=TRUE), col3 = rnorm(9), col4 = rnorm(9), col5 = rnorm(9), col6 = rnorm(9))

Resources