Assuming I have a dataframe consisting of three columns
df1 <- data.frame(a=runif(10),b=runif(10),c=runif(10),d=runif(10))
And want to have a column of the products of all combinations except for a column multiplied by itself
a*b, a*c, a*d, b*c, b*d, c*d
The solution I'm looking for should work for any number of columns, not just five
We can use combn to create combination of names of dataframe taken 2 at a time and then write a custom function which subsets the dataframe and multiply it with each other.
combn(names(df1), 2, function(x) df1[x[1]] * df1[x[2]], simplify = FALSE)
This command returns a list of 6 dataframes (a*b, a*c, a*d, b*c, b*d, c*d) for the given example.
We could use combn directly on the dataset, specify the m as 2 to select pairwise combination of columns, specify the FUN as Reduce with its parameter f as * to multiply the corresponding elements of each pairwise column
combn(df1, 2, FUN = Reduce, f = `*`)
Related
I have an (5x4) matrix in R, namely data defined as follows:
set.seed(123)
data <- matrix(rnorm(5*4,mean=0,sd=1), 5, 4)
and I want to create 4 different matrices that follows this formula: Assume that data[,1] = [A1,A2,A3,A4,A5]. I want to create the following matrix:
A1-A1 A1-A2 A1-A3 A1-A4 A1-A5
A2-A1 A2-A2 A2-A3 A2-A4 A2-A5
G1 = A3-A1 A3-A2 A3-A3 A3-A4 A3-A5
A4-A1 A4-A2 A4-A3 A4-A4 A4-A5
A5-A1 A5-A2 A5-A3 A5-A4 A5-A5
Similarly for the other columns i want to calculate at once all the G matrices (G1,G2,G3,G4). How can i achieve that with the sapply funciton?
We may use elementwise subtraction of column with outer
outer(data[,1], data[,1], `-`)
If it should be done on each column, loop over the columns (or do asplit with MARGIN = 2 to split by column), loop over the list and apply the outer
lapply(asplit(data, 2), function(x) outer(x, x, `-`))
New to R btw so I am sorry if it seems like a stupid question.
So basically I have a dataframe with 100 rows and 3 different columns of data. I also have a vector with 3 thresholds, one for each column. I was wondering how you would filter out the values of each column that are superior to the value of each threshold.
Edit: Sry for the incomplete question.
So essentially what i would like to create is a function (that takes a dataframe and a vector of tresholds as parameters) that applies every treshold to their respective column of the dataframe (so there is one treshhold for every column of the dataframe). The number of elements of each column that “respect” their treshold should later be put in a vector. So for example:
Column 1: values = 1,2,3. Treshold = (only values lower than 3)
Column 2: values = 4,5,6. Treshold = (only values lower than 6)
Output: A vector (2,2) since there are two elements in each column that are under their respective tresholds.
Thank you everyone for the help!!
Your example data:
df <- data.frame(a = 1:3, b = 4:6)
threshold <- c(3, 6)
One option to resolve your question is to use sapply(), which applies a function over a list or vector. In this case, I create a vector for the columns in df with 1:ncol(df). Inside the function, you can count the number of values less than a given threshold by summing the number of TRUE cases:
col_num <- 1:ncol(df)
sapply(col_num, function(x) {sum(df[, x] < threshold[x])})
Or, in a single line:
sapply(1:ncol(df), function(x) {sum(df[, x] < threshold[x])})
I have a matrix like
mat <- matrix(sample(100,100,replace=TRUE),nr=10)
I would now like to remove the 3 biggest values of each column so I would then have a new matrix with 7 rows.
I tried to make vectors of each column and then remove the 3 biggest values there with
x1 = x[x!=max(x)]
x2 = x1[x1!=max(x1)]
x3 = x2[x2!=max(x2)]
and then put the vectors into a new matrix, but as my matrices sometimes have a lot of columns I'd like to find a easier way.
Thanks for your help
We could loop through the columns using apply with MARGIN=2, sort each column and remove the three highest values with head
apply(mat, 2, FUN=function(x) head(sort(x),-3))
Or if we want to keep the order, use rank to get the numeric index, get a logical index by comparing with 1:3, negate (!) and subset the columns.
apply(mat, 2, FUN=function(x) x[!rank(-x, ties.method='first') %in% 1:3])
I've been having problems with this one for a while.
What I would like, is to apply a function to a data.frame that is divided by factors. This data frame has n>2 columns of values that I need to use for this function.
For the sake of this example, this dataset has a column of 5 factors (a,b,c,d,e), and 2 columns of values (values1,values2). I would like to apply a number of functions that takes into account each column of values (auto.arima first and forecast.Arima, in this case). A dataset to play follows:
library(forecast)
set.seed(2)
dat <- data.frame(factors = letters[1:5],values1 = rnorm(50), values2 =rnorm(50))
This previous dataset has a column of 5 factors (a,b,c,d,e), and 2 columns of values (values1,values2). I would like (for the sake of the exercise), to apply auto.arima to values1 and values 2, per factor. My expected output would be something that, per factor, takes into account both columns of values, and forecasts both (each as its own univariate time series). So if the dataset has 5 factors and 2 columns of values, I would need 10 lists/data.frames.
Some options that did not work: Splitting the data.frame per factor via:
split(dat, dat$factor)
And then using rapply:
rapply(dat,function(x) forecas.Arima(auto.arima(x)),dat$factors)
Or lapply:
lapply(split(dat,dat$factors), function(x) forecast.Arima(auto.arima(x)))
And some other combinations, all to no avail.
I thought that the easiest solution would involve a function in the apply family, but any solution would be valid.
Is this what you're looking for?
m = melt(dat, id.vars = "factors")
l = split(m, paste(m$factors, m$variable))
lapply(l, function(x) forecast.Arima(auto.arima(x$value)))
i.e. splitting the data into 10 different frames, then applying the forecast on the values?
The problem with you apply solutions is that you were passing the whole dataframe to the auto.arima function which take a vector so you'd need something like this:
lapply(split(dat,dat$factors), function(df) {
apply(df[,-1], 2, function(col) forecast.Arima(auto.arima(col)))
})
This splits the dataframe as before on the factors and then applies over each column (ignoring the first which is the factor) the auto.arima function wrapped in forecast.Arima. This returns a list of lists (5 factors by 2 values) so allows you to keep values1 and values2 separate.
You can use unlist(x, recursive=FALSE) to flatten this to a list of 10.
I have a very general question on data manipulations in R, and I am seeking a convenient and fast way. Suppose I have a matrix of dimension (R)-by-(nxm), i.e. R rows and n times m columns.
set.seed(999)
n = 5; m = 10; R = 100
ncol = m*n
mat = matrix(rnorm(n*m*R), nrow=R, ncol=ncol)
Now I want to have a new matrix (call it new.mat) of dimension (R)-by-(m), i.e. given a certain row of mat, I want to calculate a number (say sum) for the first n elements, then a number for the next n elements, and so on. In this way, the first row of mat ends up with m numbers. The same thing is done for every other row of mat.
For the given example above, the 1st element of the 1st row of the new matrix new.mat should be sum(mat[1,1:5]), the 2nd element is sum(mat[1,6:10]), and the last element is sum(mat[1,46:50]). The 2nd row of new.mat is (sum(mat[2,1:5]), sum(mat[2,6:10),...).
If possible, avoiding for loops is preferred. Thank you!
rowsum is a useful function here. You will have to do a bit of transposing to get what you want
You need to create a grouping vector that is something like c(1,1,1,1,1,2,2,2,2,2,....,10,10,10,10,10)
grp <- rep(seq_len(ceiling(ncol(mat)/5)), each = 5, length.out = ncol(mat))
# this will also work, but may be less clear why.
# grp <- (seq_len(ncol(mat))-1) %/%5
rowsum computes column sums across rows of a numeric matrix-like object for each level of a grouping variable
You are looking for row sums across columns, so you will have to transpose your results (and your input)
t(rowsum(t(mat),grp))