How to use "apply" function with 2 condition in R? - r

I have a string variable in dataframe and want to delete some rows that contain strings like "A" or "B". I used these codes but they didn't work :
isna=apply(DATA[1], 2, function(x)x!="A"|"B")
isna=apply(DATA[1], 2, function(x)x!="A"||"B")

Is there a reason you need to use apply?
DATA <- data.frame(code=sample(LETTERS[1:5],10, replace = TRUE))
subset(DATA, code!="A" & code!="B")

if I understood what you need correctly, then this is also an option:
library(dplyr)
# an exemplary dataframe
df <- data.frame(col1 = sample(LETTERS[1:5], 20, replace = TRUE),
col2 = 1:20)
df
# the filter for choosing the rows
filter(df, !col1 %in% c("A", "B"))

isna=apply(DATA[1], 2, function(x)(x!="A")&(x!="B"))
DATA <- DATA[isna,]

Related

Passing dataframe as argument to function

I am writing a function to process data from a huge dataframe (row by row) which always has the same column names. So I want to pass the dataframe itself as a function to read out the information I need from the individual rows. However, when I try to use it as argument I can't read the information from it for some reason.
Dataframe:
DF <- data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"))
My code:
List <- do.call(list, Map(function(DT) {
DT <- as.data.frame(DT)
aa <- as.numeric(strsplit(DT$Age, ","))
mean.aa <- mean(aa)
},
DF))
Trying this I get a list with the column names, but all Values are NULL.
Expected output :
My expected output is a list with length equal to the number of rows in the data frame. Under each list index there should be another list with the age of the corresponding row (an also other stuff from the same row of the data table, later).
DF <- apply(data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"), "mean.aa" = c(179.7143, 100.8571)), 1, as.list)
What am I doing wrong?
Here is one way :
DF <- data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"))
apply(DF, 1, function(row){
aa <- as.numeric(strsplit(row["Age"], ",")[[1]])
row["mean.aa"] <- mean(aa)
as.list(row)
})

How to lappy() over selective columns? - R

I am a novice R programmer. I am wondering how to lappy over a dataframe but avoiding certain columns.
# Some dummy dataframe
df <- data.frame(
grp = c("A", "B", "C", "D"),
trial = as.factor(c(1,1,2,2)),
mean = as.factor(c(44,33,22,11)),
sd = as.factor(c(3,4,1,.5)))
df <- lapply(df, function (x) {as.numeric(as.character(x))})
However, the method I used introduces NAs by coercion.
Would there to selectively (or deselectively) lapply over the dataframe while maintaining the integrity of the dataframe?
In other words, would there be a way to convert only mean and sd to numerics? (In general form)
Thank you
Try doing this:
df[,3:4] <- lapply(df[,3:4], function (x) {as.numeric(as.character(x))})
You are simply passing function to the specified columns. You can also provide a condition to select subset of your columns, something like excluding the ones you don't want to cast.
col = names(df)[names(df)!=c("grp","trial")]
df[,col] <- lapply(df[,col], function (x) {as.numeric(as.character(x))})
Well as you might have guessed, there are many ways. Since you seem to be doing in place substitution, actually, a for loop would be suitable.
df <- data.frame(
grp = c("A", "B", "C", "D"),
trial = as.factor(c(1,1,2,2)),
mean = as.factor(c(44,33,22,11)),
sd = as.factor(c(3,4,1,.5)))
my_cols <- c("trial", "mean", "sd")
for(mc in my_cols) {
df[[mc]] <- as.numeric(as.character(df[[mc]]))
}
If you want to convert selectively by column names:
library(dplyr)
df %>%
mutate_if(names(.) %in% c("mean", "sd"),
function(x) as.numeric(as.character(x)))

Create Value in final column of dataframe based on multiple columns

I have a dataframe that looks like this (but with a lot more variables/columns)
set.seed(5)
id<-seq(5)*floor(runif(5,min=1000, max=10000))
vals1<-c("Y","N","N","N","N")
vals2<-c("N","N","N","N","N")
vals3<-c("N","N","N","Y","N")
df<-data.frame(id,vals1,vals2,vals3)
I'd like to create a final column in the frame such that it generates a final flag with the following logic: If there is any value of 'Y' for any id the final flag is 'Y', otherwise it would be a 'N'. So, for this dataframe the 1st and 4th ids (2801, 14236) has a 'Y' in the final column and the rest have an 'n' for the final column. I tried a few approaches like apply and if...else to no avail.
Initialize by assigning "N" to every row. In next step, for the rows with "Y" (check using apply), assign "Y"
df$final = "N"
df$final[apply(df, 1, function(a) "Y" %in% a)] = "Y"
A solution for your letter encoding below.
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# If you really want to use the letter encoding, my solution works as below
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x == 'Y')})
However, I think you should use a boolean (TRUE/FALSE) for this.
Works well in combination with apply and any
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# Convert your labels into booleans:
df[,2:4] <- df[,2:4] == 'Y'
# Then summarise across rows
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x)})
Somewhat similar to the #d.b answer:
df$final <- apply(df, 1, function(x) c("N","Y")[any(x == "Y")+1])

How can I make list of R dataframe columns conditional on value of another column

Lets say that I have a dataframe:
df <- data.frame(VAR1 = c(1,2,3,4,5,6), VAR2 = c("A","A","A","B","B","B"))
and I want to make a list of VAR1 values grouped by each VAR2 level:
myList <- list(c(1,2,3), c(4,5,6))
I can use:
myList <- list(df[df$VAR2 == "A", ]$VAR1, df[df$VAR2 == "B", ]$VAR1)
Ideally though I'd like to use more straightforward solution w/o hardcoding because I have larger data with many levels in the factor variable.
We can use split
split(df$VAR1, df$VAR2)

R filter rows : where clause : from dataframe

I am trying to filter a dataframe in R as follows.
Let mydf be the dataframe having two columns A and B.
Let udf be another dataframe having 1 column A.
I want to do the following.
Select rows from mydf where mydf[A] is in udf[A]
I am using dplyr and tried something on the lines as
T = filter(mydf, A %in% udf['A'])
That clearly doesn't work. Is there a straightforward workaround for this without explicitly writing for loop ? Thanks a lot!
You could use inner_join from dplyr
library(dplyr)
r1 <- inner_join(mydf, udf, by='A')
Or using filter as commented by #BondedDust
r2 <- filter(mydf, A %in% udf[['A']])
identical(r1, r2)
#[1] TRUE
Or using data.table
library(data.table)
setkey(setDT(mydf),A)[udf, nomatch=0]
data
set.seed(24)
mydf <- as.data.frame(matrix(sample(1:10,2*10, replace=TRUE),
ncol=2, dimnames=list(NULL, LETTERS[1:2])) )
set.seed(29)
udf <- data.frame(A=sample(1:10,6,replace=TRUE))
You can simply pip data and use the left_join function.
Here is a reproducible example for this:
Data:
set.seed(123)
colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
numbers<-sample(1:15,replace=T)
group<-sample(LETTERS, 15, replace=T)
mydf<-data.frame(colors,shapes,numbers,group)
mydf
mydf2<- mydf %>%
filter (colors=="yellow")
mydf3 <- mydf %>% left_join(mydf2)

Resources