Count values per rows in a data frame R

Count values per rows in a data frame R - r

I know, there is other questions like this one but none of them answer my specific problem.
On my data frame, I need to count the number of values in each rows between cols 3 and 8.
I want a simple NB.VAL like in Excel..
base_graphs$NB <- rowSums(!is.na(base_graphs)) # with this code, I count all values except NAs but I can't select specific columns
How to create this new column "NB" on my data frame "base_graphs" ?

You were really close:
base_graphs$NB <- rowSums(!is.na(base_graphs[, 3:8]))
The [, 3:8] subsets and selects columns 3 through 8.

apply can apply a function to each row of a data frame. Try:
base_graphs$NB <- apply(base_graphs[3:8], 1, function (x) sum(is.na(x)))

Related

Extracting values of a list containing several dataframes with the condition that a specific column of the data frame is positive by lapply?

Dear Stackoverflow members,
I am currently searching for a lapply command line to do the following:
I have a list with 95 data frames.
Each data frame has 6 columns, where column 3 contains positive and negative values. The accodring values are in column 4.
What I want to do now is to extract for each data frame the positive values of column 3 and the beloning values of column 4.
I had in mind that there are 2 possibilities:
1) using the lapply command where I need a function (which I am so far not able to create) or
2) using a for loop which is not working like i want to do it.
It would be wonderful if you could help me with that.
I whish you a nice weekend,
Chris

Assuming the dataframes are called df1, df2 etc, we can get them in a list using mget, select rows with positive values in 3rd column and select 3rd and 4th column.
list_df <- lapply(mget(paste0('df', 1:95)), function(df)
subset(df, df[[3]] > 0, select = 3:4))
If the data is already in the list, we can do :
list_df <- lapply(All_data, function(df) subset(df, df[[3]] > 0, select = 3:4))

Select whole columns by a word in the first row

I have a data frame (df) given from an excel sheet. In the first row of the date frame it's always "correct" or "wrong", the other rows are filled with data.
Now I want to select all the Columns where the first row says "correct" by using the function apply.
I tried:
apply(df,2,function(df) grepl ("correct",df))
The answer is just a data frame with TRUE and FALSE. How can I select the columns without losing the data in the other rows?

You shouldn't need a loop. The following should work,
df[,df[1,] == 'correct']

i <- sapply(df, function(x) x[1] =='correct')
df[,i]

Selecting different numbers of columns on each row of a data frame

This question is about selecting a different number of columns on every row of a data frame. I have a data frame:
df = data.frame(
START=sample(1:2, 10, repace=T), END=sample(2:4, 10, replace=T),
X1=rnorm(10), X2=rnorm(10), X3=rnorm(10), X4=rnorm(10)
)
I would like to have a way without loops to select columns (START[i]:END[i])+2 on row i for all rows of my data frame.

Base R solution
lapply(split(df,1:nrow(df)),function(row) row[(row$START+2):(row$END+2)])
Or something similar as given in the comment above (I would store the output in a list)
library(plyr)
alply(df,1,function(row) row[(row$START+2):(row$END+2)])
Edit per request of OP:
To get a TRUE/FALSE index matrix, use the following R base solution
idx_matrix=col(df)>=df$START+2&col(df)<=df$END+2
df[idx_matrix]
Note, however, that you lose some information here (compared to the list based solution).

How to remove rows where columns satisfy certain condition in data frame

I have a data frame that looks like this
df <- data.frame(cbind(1:10, sample(c(1:5), 10, replace=TRUE)))
# in real case the columns could be more than two
# and the column name could be anything.
What I want to do is to remove all rows where the value of all its columns
is smaller than 5.
What's the way to do it?

df[!apply(df,1,function(x)all(x<5)),]

First of all ...please stop using cbind to create data.frames. You will be sorry if you continue. R will punish you.
df[ !rowSums(df <5) == length(df), ]
(The length() function returns the number of columns in a dataframe.)

Subset a dataframe based on a single condition applied to multiple columns

I've had a look through the existing subset Q&A's on this site and couldn't quite find what I was looking for.
I want to subset a data frame based on one condition (e.g. if the value is below 5). However, I only want the rows where the value in all of the columns is below 5.
For example using the iris dataset - I would like to select all the rows where columns 1-3 all have values below 5.
subdata <- iris[which(iris[,1:3]<5),]
This doesn't do it for me. I get lots of NA rows at the bottom of the subset data.
Any help much appreciated!

Try
subdata <- iris[apply(iris[,1:3] < 5, 1, all),]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Count values per rows in a data frame R - r

You were really close: base_graphs$NB <- rowSums(!is.na(base_graphs[, 3:8])) The [, 3:8] subsets and selects columns 3 through 8.

apply can apply a function to each row of a data frame. Try: base_graphs$NB <- apply(base_graphs[3:8], 1, function (x) sum(is.na(x)))

Related

Extracting values of a list containing several dataframes with the condition that a specific column of the data frame is positive by lapply?

Select whole columns by a word in the first row

Selecting different numbers of columns on each row of a data frame

How to remove rows where columns satisfy certain condition in data frame

Subset a dataframe based on a single condition applied to multiple columns

Categories

Resources