how to divide column and this following in a data frame - r

I tried to create a function that divide every column by this following in data frame for example if I have a data frame like that:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
I would like to create a function that divide col1 by col2, col3 by col4 ....col(n-1) by col(n) to the end of the data frame and print a data frame that bind all the output lists.
I created a function that divide column and this following but isn't a loop function.
bigfunction<-function(data,n){
n<-1
data[,1]
data[,n+1]
d<-(data[,n]/data[,n+1])
print(as.list(d))}

Vectorise that calculation!
df <- data.frame(a = 1:3, b = 4:6, c = 7:9, d = 10:12)
df[c(1,3)]/df[c(2,4)]
# a c
#1 0.25 0.7000000
#2 0.40 0.7272727
#3 0.50 0.7500000
divdf <- function(data) {
data[seq(1,ncol(data),2)]/data[seq(2,ncol(data),2)]
}
divdf(df)
# a c
#1 0.25 0.7000000
#2 0.40 0.7272727
#3 0.50 0.7500000
You could add some further error checking to this to make sure you always have an even number of columns etc, but this is the basic logic that you can add to.

You could try something like this:
fun1 <- function(df){
for (i in 1:ncol(df)){
if (i%%2 == 1){next}
else{
temp <- df[, i-1]/df[, i]
temp_df <- cbind(temp_df, temp)
}
}
return(temp_df)
}
df <- data.frame(a = 1:3, b = 4:6, c = 7:9, d = 10:12)
temp_df <- data.frame(id = 1:nrow(df))
new_df <- fun1(df)
I have created a temporary dataframe to keep cbinding the vectors. You can remove the id column later on and change the column names as per requirement. This assumes that you have an even number of columns (if not then it will simply ignore the last one)

Related

How do I pass multiple columns from a dataframe as individual arguments to a custom function in R

I have a dataframe with variable number of columns. I want to pass
for instance let's say I have the dataframe df and I want to pass columns a and b as individual arguments to my custom function; but the issue is that the list of column names of interest changes depending on the outcome of another operation and could be any lemgth etc.
df <- tibble(a = c(1:3), b = c(4:6), c=(7:9), d=c(10:12))
custom_function <- function(...){ do something }
custom_function(df$a, df$b)
I haven't found a clean way to achieve this. Any help would be great.
UPDATE
for better clarity I need to add that the challenge is the fact the list of columns of interest is retrieved from another variable. for instance col_names <- c("a","b")
We can capture the ... as a list can then apply the function within do.call
custom_function <- function(fn, data = NULL, ...) {
args <- list(...)
if(length(args) == 1 && is.character(args[[1]]) && !is.null(data)) {
args <- data[args[[1]]]
}
do.call(fn, args)
}
custom_function(pmax, df$a, df$b)
[1] 4 5 6
and we can pass ridge
> custom_function(ridge, df$a, df$b)
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
attr(,"class")
[1] "coxph.penalty"
...
> custom_function(ridge, data = df, c("a", "b"))
a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
attr(,"class")
[1] "coxph.penalty"
attr(,"pfun")
...
> custom_function(ridge, data = df, col_names)
a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
attr(,"class")
...
If your outcome is a data.frame, a possibility is to use the the curly-curly tidyverse operator, using the command {{}}, the goal of this operator is to allow us to have an argument passed to our function refering to a column inside a dataframe.
Data
df <- tibble(a = c(1:3), b = c(4:6), c=(7:9), d=c(10:12))
Example
library(dplyr)
operations <- function(df,col1,col2){
df %>%
summarise(
n = n(),
addition = {{col1}} + {{col2}},
subtraction = {{col1}} - {{col2}},
multiplication = {{col1}} * {{col2}},
division = {{col1}} / {{col2}}
)
}
Output
operations(df,a,b)
# A tibble: 3 x 5
n addition subtraction multiplication division
<int> <int> <int> <int> <dbl>
1 3 5 -3 4 0.25
2 3 7 -3 10 0.4
3 3 9 -3 18 0.5

R- Insert rows of another dataframe after every row of dataframe?

I have 2 dataframe: mydata1 and mydata2, both 222x80.
I want to create a new dataframe in which after every row of mydata1 I add the row (same index) of mydata2.
I tried with transform function, but the output is duplicating the rows of the same dataframe.
I can't substitute all columns values.
If someone has suggestion, Thank you!!
insert.mydataFeat <- transform(mydata1, colnames(mydata1)=colnames(mydata2))
out.mydataFeat <- rbind(mydata1, insert.mydataFeat)
#reorder the rows
n <- nrow(mydata1)
out.mydataFeat<-out.mydataFeat[kronecker(1:n, c(0, n), "+"), ]
out.mydataFeat
You can use indexing trick after combining the data with rbind.
mydata1 <- data.frame(col1 = 1:5, col2 = 'A')
mydata2 <- data.frame(col1 = 6:10, col2 = 'B')
combine_df <- rbind(mydata1, mydata2)
combine_df <- combine_df[rbind(1:(nrow(combine_df)/2),
((nrow(combine_df)/2) +1):nrow(combine_df)), ]
# col1 col2
#1 1 A
#6 6 B
#2 2 A
#7 7 B
#3 3 A
#8 8 B
#4 4 A
#9 9 B
#5 5 A
#10 10 B
where
rbind(1:(nrow(combine_df)/2), ((nrow(combine_df)/2) +1):nrow(combine_df))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 2 3 4 5
#[2,] 6 7 8 9 10
the above creates a two row matrix with 1st row as row numbers from 1st dataframe and 2nd row as row numbers from second dataframe and we use that to subset combine_df.

Generate all unique pairs of factor and align other variables in same way R

I have a data frame with unique personal IDs and measurements for each individual, e.g.
> personDat <- data.frame(personID = c("A","B","C","D"),value = rnorm(4))
> personDat
personID value
1 A -0.9246883
2 B 0.5175514
3 C -1.0109688
4 D 1.1614124
Now I need to create all unique pairs of individuals, which I can do with the combn function:
> perCombs <- t(combn(personDat$personID,2))
> perCombs
[,1] [,2]
[1,] A B
[2,] A C
[3,] A D
[4,] B C
[5,] B D
[6,] C D
Levels: A B C D
Now I would like to add two extra columns to perCombs, one for the corresponding measurement/value from the second column in personDat for the first personID and then the corresponding value for the second personID.
Basically I need to do a binary operation on the all values of unique pairs of personID, and if I have the columns I can do it in a fast vectorised manner.
EDIT: The naive way to do this would be:
perCombs <- data.frame(per1 = perCombs[,1],
per2 = perCombs[,2],
val1 = matrix(0,6,1),
val2 = matrix(0,6,1))
for(i in 1:6){
print(i)
perCombs[i,3] <- personDat[as.character(personDat$personID)==as.character(perCombs[i,1]),2]
perCombs[i,4] <- personDat[as.character(personDat$personID)==as.character(perCombs[i,2]),2]
}
We can use match
cbind(perCombs, `dim<-`(personDat$value[match(perCombs,
personDat$personID)], dim(perCombs)))

Insert nonexistent columns in matrix or dataframe in given order

I am on the lookout for a function in R that would check for the presence of particular columns, e.g.
cols=c("a","b","c","d")
in a matrix or dataframe that would insert a column with NAs in case any columns did not exist (in the position in which the columns are given in vector cols). Say if you had a matrix or dataframe with named columns "a", "d", that it would insert a column "b" and "c" filled up with NAs before column "d", and that any columns not listed in cols would be deleted (e.g. column "e"). What would be the easiest and fastest way to achieve this (I am dealing with a fairly large dataset of ca. 1 million rows)? Or is there already some function that does this?
I would separate the creation step and the ordering step. Here is an example:
cols <- letters[1:4]
## initialize test data set
my.df <- data.frame(a = rnorm(100), d = rnorm(100), e = rnorm(100))
## exclude columns not in cols
my.df <- my.df[ , colnames(my.df) %in% cols]
## add missing columns filled with NA
my.df[, cols[!(cols %in% colnames(my.df))]] <- NA
## reorder
my.df <- my.df[, cols]
Other approach I also just discovered using match, but only works for matrices:
# original matrix
matrix=cbind(a = 1:2, d = 3:4)
# required columns
coln=c("a","b","c","d")
colnmatrix=colnames(matrix)
matrix=matrix[,match(coln,colnmatrix)]
colnames(matrix)=coln
matrix
a b c d
[1,] 1 NA NA 3
[2,] 2 NA NA 4
Another possibility if your data is in a matrix
# original matrix
m1 <- cbind(a = 1:2, d = 3:4)
m1
# a d
# [1,] 1 3
# [2,] 2 4
# matrix will all columns, filled with NA
all.cols <- letters[1:4]
m2 <- matrix(nrow = nrow(m1), ncol = length(all.cols), dimnames = list(NULL, all.cols))
m2
# a b c d
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# replace columns in 'NA matrix' with values from original matrix
m2[ , colnames(m1)] <- m1
m2
# a b c d
# [1,] 1 NA NA 3
# [2,] 2 NA NA 4

merge multiple objects automatically using their names in a list with quotes in R

I have some objects with names a1, b1, c1 and would like to merge them all in a df using a list with their names instead of writing them down manually without quotes. The problem is that I don't know how to remove the quotes in order to merge them. Here's my code:
a1=rnorm(10)
b1=rnorm(10)
c1=rnorm(10)
mylist=c("a1","b1","c1")
mylist2=gsub('"',"",mylist)
myarray=merge(mylist2)
These aren't data.frame's so you can't merge them as you have in your example. Perhaps you meant to cbind these?
You can use do.call and use your actual data.frame's in a list and cbind them that way...
mylist = list(a1,b1,c1)
do.call("cbind",mylist)
# [,1] [,2] [,3]
# [1,] 0.4221196 -1.2364700 1.71030549
# [2,] 0.2190202 -0.7730380 -0.27255412
# [3,] -0.1123769 -0.3365485 0.99418659
# [4,] 0.2940520 -1.2661584 -0.28545402
# [5,] 0.6301444 -1.3027926 -1.15401858
# [6,] 0.3505416 0.1636393 0.18114359
# [7,] -1.4592066 1.5832108 0.01407487
# [8,] -1.4251704 -1.1620232 -0.86712358
# [9,] -0.2840417 -2.3878617 0.57925139
#[10,] -0.9331564 1.1445266 -1.64355007
Of course here, you can just do cbind( a1, b1 , c1 ), so this notation is only handy if your vectors are in a list.
If you do want to merge them, the problem is that merge merges two data.frames, so you need a recursive function that adds a new data.frame to the result as you move through the list. Fortunately, such a function exists (it is actually quite simple to do), and is in Haldey's reshape package...
a1 <- data.frame( ID = 1:10 , A = rnorm(10) )
b1 <- data.frame( ID = 1:10 , B = rnorm(10) )
c1 <- data.frame( ID = 1:10 , C = rnorm(10) )
mylist <- list( a1 , b1 , c1 )
require(reshape)
merge_recurse( mylist )
# ID A B C
#1 1 0.4922820 1.44436959 0.49294607
#2 2 1.0198506 0.80738257 -1.51090757
#3 3 0.2403974 0.47383044 -0.74280235
#4 4 0.9697800 -1.06054666 -1.11042732
#5 5 1.4001970 -0.30221304 1.62866212
#6 6 0.4705122 0.02784419 -0.05886697
#7 7 -0.4259260 0.29810051 0.77933144
#8 8 -0.5102871 0.36181297 1.51223053
#9 9 1.1900207 -0.60902034 -0.32316668
#10 10 -0.1694786 0.20842787 -0.33366816

Resources