R - counting with NA in dataframe [duplicate] - r

This question already has answers here:
ignore NA in dplyr row sum
(6 answers)
Closed 4 years ago.
lets say that I have this dataframe in R
df <- read.table(text="
id a b c
1 42 3 2 NA
2 42 NA 6 NA
3 42 1 NA 7", header=TRUE)
I´d like to calculate all columns to one, so result should look like this.
id a b c d
1 42 3 2 NA 5
2 42 NA 6 NA 6
3 42 1 NA 7 8
My code below doesn´t work since there is that NA values. Please note that I have to choose columns that I want to count since in my real dataframe I have some columns that I don´t want count together.
df %>%
mutate(d = a + b + c)

You can use rowSums for this which has an na.rm parameter to drop NA values.
df %>% mutate(d=rowSums(tibble(a,b,c), na.rm=TRUE))
or without dplyr using just base R.
df$d <- rowSums(subset(df, select=c(a,b,c)), na.rm=TRUE)

Related

R - Select all rows that have one NA value at most? [duplicate]

This question already has answers here:
How to delete rows from a dataframe that contain n*NA
(4 answers)
Closed 3 days ago.
I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).
For example:
Var1 Var2 Var3
1 NA NA
2 34 NA
3 NA NA
4 NA 55
5 NA NA
6 40 28
What I would like returned:
Var1 Var2 Var3
2 34 NA
4 NA 55
6 40 28
This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.
na_rows <- df[!complete.cases(df), ]
A base R solution:
df[rowSums(is.na(df)) <= 1, ]
Its dplyr equivalent:
library(dplyr)
df %>%
filter(rowSums(is.na(pick(everything()))) <= 1)

Rearrange a dataframe into a matrix based on column values in R? [duplicate]

This question already has answers here:
Pivoting data in R
(5 answers)
Closed 2 years ago.
I have a dataframe with three columns: values, type and class.
It looks like this:
df = data.frame(value = c(1:10), type = c("a","b","c","a","b","b","c","a","b","b"),
class = c("aos","aos","ezx","ezx","kl","kl","wq","wq","us","us"))
value type class
1 a aos
2 b aos
3 c ezx
4 a ezx
5 b kl
6 b kl
7 c wq
8 a wq
9 b us
10 b us
I want to rearrange it into a matrix where the columns represent the different type values and the rows the class values, and where the intersecting values are the mean value of the original data from the dataframe.
The matrix I am looking for should look like this:
aos ezx kl wq us
a 1 4 8
b 2 5.5 9.5
c 3 7
We can use tapply from base R to return the structure
with(df, tapply(value, list(type, class), FUN = mean))
# aos ezx kl us wq
#a 1 4 NA NA 8
#b 2 NA 5.5 9.5 NA
#c NA 3 NA NA 7
Or with pivot_wider making use of values_fn
library(tidyr)
library(dplyr)
library(tibble)
df %>%
pivot_wider(names_from = class, values_from = value, values_fn = mean) %>%
column_to_rownames('type') %>%
as.matrix
# aos ezx kl wq us
#a 1 4 NA 8 NA
#b 2 NA 5.5 NA 9.5
#c NA 3 NA 7 NA

copy values from different columns based on conditions (r code)

I have data like one in the picture where there are two columns (Cday,Dday) with some missing values.
There can't be a row where there are values for both columns; there's a value on either one column or the other or in neither.
I want to create the column "new" that has copied values from whichever column there was a number.
Really appreciate any help!
Since no row has a value for both, you can just sum up the two existing columns. Assume your dataframe is called df.
df$'new' = rowSums(df[,2:3], na.rm=T)
This will sum the rows, removing NAs and should give you what you want. (Note: you may need to adjust column numbering if you have more columns than what you've shown).
The dplyr package has the coalesce function.
library(dplyr)
df <- data.frame(id=1:8, Cday=c(1,2,NA,NA,3,NA,2,NA), Dday=c(NA,NA,NA,3,NA,2,NA,1))
new <- df %>% mutate(new = coalesce(Dday, Cday, na.rm=T))
new
# id Cday Dday new
#1 1 1 NA 1
#2 2 2 NA 2
#3 3 NA NA NA
#4 4 NA 3 3
#5 5 3 NA 3
#6 6 NA 2 2
#7 7 2 NA 2
#8 8 NA 1 1

Data.table: rbind a list of data tables with unequal columns [duplicate]

This question already has answers here:
rbindlist data.tables with different number of columns
(1 answer)
Rbind with new columns and data.table
(5 answers)
Closed 4 years ago.
I have a list of data tables that are of unequal lengths. Some of the data tables have 35 columns and others have 36.
I have this line of code, but it generates an error
> lst <- unlist(full_data.lst, recursive = FALSE)
> model_dat <- do.call("rbind", lst)
Error in rbindlist(l, use.names, fill, idcol) :
Item 1362 has 35 columns, inconsistent with item 1 which has 36 columns. If instead you need to fill missing columns, use set argument 'fill' to TRUE.
Any suggestions on how I can modify that so that it works properly.
Here's a minimal example of what you are trying to do.
No need to use any other package to do this. Just set fill=TRUE in rbindlist.
You can do this:
df1 <- data.table(m1 = c(1,2,3))
df2 <- data.table(m1 = c(1,2,3), m2=c(3,4,5))
df3 <- rbindlist(list(df1, df2), fill=T)
print(df3)
m1 m2
1: 1 NA
2: 2 NA
3: 3 NA
4: 1 3
5: 2 4
6: 3 5
If I understood your question correctly, I could possibly see only two options for having your data tables appended.
Option A: Drop the extra variable from one of the datasets
table$column_Name <- NULL
Option B) Create the variable with missing values in the incomplete dataset.
full_data.lst$column_Name <- NA
And then do rbind function.
Try to use rbind.fill from package plyr:
Input data, 3 dataframes with different number of columns
df1<-data.frame(a=c(1,2,3,4,5),b=c(1,2,3,4,5))
df2<-data.frame(a=c(1,2,3,4,5,6),b=c(1,2,3,4,5,6),c=c(1,2,3,4,5,6))
df3<-data.frame(a=c(1,2,3),d=c(1,2,3))
full_data.lst<-list(df1,df2,df3)
The solution
library("plyr")
rbind.fill(full_data.lst)
a b c d
1 1 1 NA NA
2 2 2 NA NA
3 3 3 NA NA
4 4 4 NA NA
5 5 5 NA NA
6 1 1 1 NA
7 2 2 2 NA
8 3 3 3 NA
9 4 4 4 NA
10 5 5 5 NA
11 6 6 6 NA
12 1 NA NA 1
13 2 NA NA 2
14 3 NA NA 3

Loop through columns and apply ddply [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
My data frame looks like this:
Stage Var1 var2 Var1 var2
A 1 11 9 12
A 2 NA 3 13
A NA NA 2 10
B 4 14 1 4
B NA NA 4 2
B 6 16 6 8
B 7 17 100 9
C 8 NA 4 6
C 9 19 34 12
C 10 NA 5 18
C 1 0 6 3
I would like to split the dataframe using ddply, apply mean() for each group. Later it has to be looped for all the columns. Hence i am trying something like this:
for(i in names(NewInput)){
NewInput[[i]] <- ddply(NewInput , "Model_Stage", function(x) {
mean.Cycle2 <- mean(x$NewInput[[i]])
})
}
The above code works fine without for loop (i.e) ddply works fine with one variable. However when I run through columns using for loop i am getting several warnings
In loop_apply(n, do.ply):argument is not numeric or logical: returning NA
Question:
-> How to loop through ddply over all the variables using for loop?
-> Is it possible to use apply()?
Thank you.
-Chris
You can try
library(plyr)
ddply(df1, .(Stage), colwise(mean, na.rm=TRUE))
Other options include
library(dplyr)
df1 %>%
group_by(Stage) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
Or
library(data.table)
setDT(df1)[, lapply(.SD, mean, na.rm=TRUE), Stage]
Or using base R
aggregate(.~Stage, df1, FUN=mean, na.rm=TRUE, na.action=NULL)

Resources