if function for rowSums_Modify the code - r

I want to get summation over several columns and make a new column based on them. So I use
df$Sum <-rowSums(df[,grep("y", names(df))])
But sometimes df just includes one column and in this case, I will get the error. Since this function is part of my long programming procedure, I was wondering how I can make an if function in a way that If df[,grep("y", names(df))] includes just one column then get sum is equal to df[,grep("y", names(df))] otherwise if df[,grep("y", names(df))] have more at leat two columns get the summation over them?
suppose:
require(stats); require(graphics)
attach(cars)
cars$y1<-seq(20:69)
#cars$y2<-seq(30:79)
df<-cars
df$Sum <-rowSums(df[,grep("y", names(df))])

You can use drop = FALSE when subsetting:
df$Sum <-rowSums(df[,grep("y", names(df)), drop = FALSE])
This keeps df as a data frame even if you are selecting only one column.

Related

assign individual vector element to individual columns by rown and by name - r

I work mostly with data table but a data frame solution would work as well.
I have the result of an apply which returns this data structure
applyres=structure(c(0.0260, 3.6775, 0.92
), .Names = c("a.1", "a.2", "a.3"))
Then I have a data table
coltoadd=c('a.1','a.2','a.3')
dt <- data.table(variable1 = factor(c("what","when","where")))
dt[,coltoadd]=as.numeric(NA)
Now I would like to add the elements of applyres to the corresponding columns, just one row at a time, because applyres is calculated from another function. I have tried different assignments but nothing seems to work. Ideally I would like to assign based on column name, just in case the columns change order in one of the two structures.
This doesn't work
dt[1,coltoadd]=applyres
I also tried
dt[1,coltoadd := applyres]
And tried to change applyrest to a matrix or a data table and transpose.
I would like to do something like this
dt[1,coltoadd[i]]=applyres[coltoadd[i]]
But not sure if it should go in a loop, doesn't seem the best way to do it.
How do I avoid doing single assignments if I have a large number of columns?
1) data.frame Convert to data.frame, perform the assignments and convert back.
DF <- as.data.frame(dt)
DF[1, -1] <- applyres
# perform remaining of assignments
dt <- as.data.table(DF)
2) loop Another possibility is a for loop:
for(i in 2:ncol(dt)) dt[1, i] <- applyres[i-1]

Adding a column based on values of other columns

The variable Jaehrlichkeit is basically a factor with 3 levels: HQ30, HQ100, HQ300. I want R to read Jaehrlichkeit. If Jaehrlichkeit = HQ30, the copy the value from the column intHQ30 in the correponding row and paste it in the newly created column Intensitaet. Repeat this for HQ100 and HQ300.
I was trying to combine the mutate function with nested ifelse but keep getting errors. Can please someone help me out? or maybe suggest an easier solution?
We can do this with row/column indexing. Get the names of the columns that start with 'int' followed by 'HQ' and some numbers (\\d+) using grep. Then, get the column index for each row by matching the 'Jaehrlichkeit' with the substring of 'v1', cbind with the row sequence and use that to extract the values from the intHQ columns and assign it to create the 'Intensitaet'
v1 <- grep("^intHQ\\d+", names(sub1), value = TRUE)
sub1$Intensitaet <- sub1[v1][cbind(1:nrow(sub1),
match(sub1$Jaehrlichkeit, sub("int", "", v1)))]
Another option would be to split, and apply, i.e.
do.call(rbind, lapply(split(df, df$Jaehrlichkeit), function(i) {
i$Intensitaet <- i[[grep(i$Jaehrlichkeit[1], names(i))]]; i
}))
Since Jaehrlichkeit is of type factor, you could do this vectorized:
r <- sub1[,match(paste0("int", levels(sub1$Jaehrlichkeit)), names(sub1))]
sub1$Intensitaet <- r[cbind(seq(nrow(r)), as.numeric(sub1$Jaehrlichkeit))]
First you get the value of columns intHQ100, intHQ30 and intHQ300 in your data frame in the order of levels(sub1$Jaehrlichkeit).
Then you generate the indices and create the Intensitaet column.

How to apply ifelse function by column names?

I know there are many similar questions around but I'm afraid couldn't get my head around this particular one, though obviously it is very simple!
I am trying to write a simple ifelse function to be applied over a series of columns in a data frame by using column names (rather than numbers). What I try to do is to create a single u_all variable as shown below without typing column names repeatedly.
dat <- data.frame(id=c(1:20),u1 = sample(c(0:1),20,replace=T) , u2 = sample(c(0:1),20,replace=T) , u3 = sample(c(0:1),20,replace=T))
dat<-within(dat,u_all<-ifelse (u1==1 | u2==1 |u3==1,1,0))
dat
I tried many variants of apply but clearly I'm not on the right track as those grouping functions replicate the ifelse function on each column separately.
dat2 <- data.frame(id=c(1:20),u1 = sample(c(0:1),20,replace=T) , u2 = sample(c(0:1),20,replace=T) , u3 = sample(c(0:1),20,replace=T))
dat2<-cbind(dat2,sapply(dat2[,grepl("^u\\d{1,}",colnames(dat2))],
function(x){ u_all<-ifelse(x==1 & !is.na(x),1,0)}))
dat2
This line from the OP
dat<-within(dat,u_all<-ifelse (u1==1 | u2==1 |u3==1,1,0))
can instead be written as
dat$u_all <- +Reduce("|", dat[, c("u1", "u2", "u3")])
How it works, in terms of intermediate objects:
D = dat[, c("u1", "u2", "u3")] uses the names of the columns to subset the data frame.
r = Reduce("|", D) collapses the data by putting | between each pair of columns. The result is a logical (TRUE/FALSE) vector.
To convert r to a 0/1 integer vector, you could use ifelse(r,1L,0L) or as.integer(r) (since TRUE/FALSE converts to 1/0 by default) or just the unary +, like +r.
If you want to avoid using column names (it's really not clear to me from the post), you can construct D = dat[-1] to exclude the first column instead.
You were almost there, here's a solution using apply over rows and using all to transform a vector of tests to a single digit.
dat2$u_all <- apply(dat2[,-1], MARGIN=1, FUN=function(x){
any(x==1)&all(!is.na(x))*1
}
)

automatic column prefix with cbind and just one column

I have some trouble with a script which uses cbind to add columns to a data frame. I select these columns by regular expression and I love that cbind automatically provides a prefix if you add more then one column. Bit this is not working if you just append one column... Even if I cast this column as a data frame...
Is there a way to get around this behaviour?
In my example, it works fine for columns starting with a but not for b1 column.
df <- data.frame(a1=c(1,2,3),a2=c(3,4,5),b1=c(6,7,8))
cbind(df, log=log(df[grep('^a', names(df))]))
cbind(df, log=log(df[grep('^b', names(df))]))
cbind(df, log=as.data.frame(log(df[grep('^b', names(df))])))
A solution would be to create an intermediate dataframe with the log values and rename the columns :
logb = log(df[grep('^b', names(df))]))
colnames(logb) = paste0('log.',names(logb))
cbind(df, logb)
What about
cbw <- c("a","b") # columns beginning with
cbw_pattern <- paste0("^",cbw, collapse = "|")
cbind(df, log=log(df[grep(cbw_pattern, names(df))]))
This way you do select both pattern at once. (all three columns).
Only if just one column is selected the colnames wont fit.

How to apply operation and sum over columns in R?

I want to apply some operations to the values in a number of columns, and then sum the results of each row across columns. I can do this using:
x <- data.frame(sample=1:3, a=4:6, b=7:9)
x$a2 <- x$a^2
x$b2 <- x$b^2
x$result <- x$a2 + x$b2
but this will become arduous with many columns, and I'm wondering if anyone can suggest a simpler way. Note that the dataframe contains other columns that I do not want to include in the calculation (in this example, column sample is not to be included).
Many thanks!
I would simply subset the columns of interest and apply everything directly on the matrix using the rowSums function.
x <- data.frame(sample=1:3, a=4:6, b=7:9)
# put column indices and apply your function
x$result <- rowSums(x[,c(2,3)]^2)
This of course assumes your function is vectorized. If not, you would need to use some apply variation (which you are seeing many of). That said, you can still use rowSums if you find it useful like so. Note, I use sapply which also returns a matrix.
# random custom function
myfun <- function(x){
return(x^2 + 3)
}
rowSums(sapply(x[,c(2,3)], myfun))
I would suggest to convert the data set into the 'long' format, group it by sample, and then calculate the result. Here is the solution using data.table:
library(data.table)
melt(setDT(x),id.vars = 'sample')[,sum(value^2),by=sample]
# sample V1
#1: 1 65
#2: 2 89
#3: 3 117
You can easily replace value^2 by any function you want.
You can use apply function. And get those columns that you need with c(i1,i2,..,etc).
apply(( x[ , c(2, 3) ])^2, 1 ,sum )
If you want to apply a function named somefunction to some of the columns, whose indices or colnames are in the vector col_indices, and then sum the results, you can do :
# if somefunction can be vectorized :
x$results<-apply(x[,col_indices],1,function(x) sum(somefunction(x)))
# if not :
x$results<-apply(x[,col_indices],1,function(x) sum(sapply(x,somefunction)))
I want to come at this one from a "no extensions" R POV.
It's important to remember what kind of data structure you are working with. Data frames are actually lists of vectors--each column is itself a vector. So you can you the handy-dandy lapply function to apply a function to the desired column in the list/data frame.
I'm going to define a function as the square as you have above, but of course this can be any function of any complexity (so long as it takes a vector as an input and returns a vector of the same length. If it doesn't, it won't fit into the original data.frame!
The steps below are extra pedantic to show each little bit, but obviously it can be compressed into one or two steps. Note that I only retain the sum of the squares of each column, given that you might want to save space in memory if you are working with lots and lots of data.
create data; define the function
grab the columns you want as a separate (temporary) data.frame
apply the function to the data.frame/list you just created.
lapply returns a list, so if you intend to retain it seperately make it a temporary data.frame. This is not necessary.
calculate the sums of the rows of the temporary data.frame and append it as a new column in x.
remove the temp data.table.
Code:
x <- data.frame(sample=1:3, a=4:6, b=7:9); square <- function(x) x^2 #step 1
x[2:3] #Step 2
temp <- data.frame(lapply(x[2:3], square)) #step 3 and step 4
x$squareRowSums <- rowSums(temp) #step 5
rm(temp) #step 6
Here is an other apply solution
cols <- c("a", "b")
x <- data.frame(sample=1:3, a=4:6, b=7:9)
x$result <- apply(x[, cols], 1, function(x) sum(x^2))

Resources