How to repeatedly add values in data frame rows? - r

I illustrate my question with a small date frame such as:
X1 X2 X3
1 0 1 2
2 0 1 3
3 0 1 4
4 0 2 3
5 0 2 4
6 0 3 4
7 1 2 3
8 1 2 4
9 1 3 4
10 2 3 4
(The real one will have a huge number of rows...)
I have to expand each row of this data frame with 12 additional values, considering that the 3 values already present are the 3 starting terms of a series defined by the recurrence equation:
U(n) = U(n-1) - Min(U(n-2), U(n-3))
Consider for example the 1st row with 0, 1, 2. The next term (4th) has to be :
2 - Min(1, 0) = 2 - 0 = 2
etc. At the end, my first row will be :
0 1 2 2 1 -1 -2 -1 1 3 4 3 0 -3 -3
And I have to repeat this operation on each row of my initial data frame. Of course, I know I can use intricated loops "for {***}" to do this, but it's time consuming.
Is there any way to build the final data frame column by column? (I mean not listing the rows but constructing at once entire columns based on the recurrence equation)

You do not have to write an 'intricate loop' and work row by row. You can write just one simple loop and calculate column by column such as:
# recreate the sample dataframe
data <- data.frame(x1 = c(rep(0, 6), 1,1,1,2),
X2 = c(1,1,1,2,2,4,2,2,3,3),
X3 = c(2,3,4,3,4,4,3,4,4,4))
# create placeholder dataframe full of zeros
temp <- data.frame(matrix(data = 0, nrow = 10, ncol = 15))
# write the original dataframe into the placeholder dataframe
temp[, 1:3] <- data
# for columns 4 to 15 in the placeholder dataframe
for(i in 4:15)
{
# calculate each column based on the given formula
temp[, i] <- temp[, i-1] - pmin(temp[,i-2], temp[, i-3])
}

Related

Use if-else function on data frame with multiple values

I have a data frame that contains multiple values in each spot, like this:
ID<-c(1,1,1,2,2,2,2,3,3,4,4,4,5,6,6)
W<-c(29,72,32,33,34,44,42,78,32,42,18,26,10,34,39)
df1<-data.frame(ID, W)
df<-ddply(df1, .(ID), summarize,
X=paste(unique(W),collapse=","))
ID X
1 1 29,72,32
2 2 33,34,44,42
3 3 78,32
4 4 42,18,26
5 5 10
6 6 34,39
I am trying to generate another column using an if-else function so that every ID that has an X value greater than 70 will show a 1, and all others will show a 0, like this:
ID X Y
1 1 29,72,32 1
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
This is the code that I tried:
df$Y <- ifelse(df$X>=70, 1, 0)
But it doesn't work; it only seems to put the first value of each spot through the function:
ID X Y
1 1 29,72,32 0
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
It worked fine on my one column that has only one value per spot. Is there a way to get to the if-else function to evaluate every value in each spot and assign a 1 if any of them fit the statement?
Thank you, I'm sorry that I do not know a lot of R vocabulary yet.
As 'X' is a string, we can split the 'X' at the , to create a list of vectors, loop over the list with map check if there are any numeric converted values are greater than 70
library(dplyr)
library(purrr)
df %>%
mutate(Y = map_int(strsplit(X, ","), ~ +(any(as.numeric(.x) > 70))))

Creating a matrix with extra row and column information in R

I'm am trying to create a matrix of certain pairwise values, first by doing the calculations in a matrix and then melt it and join in some extra information. I would like to also include that extra information on the columns and rows so that I achieve something like the following if I convert it back into a matrix format (or data frame or whatever is possible):
X.col 1 2 3 4
Y.col 1 2 3 4
Z.col 1 2 3 4
Col 1 2 3 4
X.row Y.row Z.row Row
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
or perhaps without the names, like this:
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
Basically x,y,z contain some extra information on some products which ID's are stored in row and col. I'm doing some pairwise comparisons which I then present in a matrix for our managers, who would also like to see that extra information along with the matrix as shown above.
So for the data, let a df contain the melted matrix joined with the extra information:
row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #e.g. product id
col = rep(c(1,2,3,4), 4) #e.g. product id
value = c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1) #pairwise index value, calculated from comparing product in row with product in col
x.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some x information on row id
y.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some y information on row id
z.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some z information on row id
x.col = rep(c(1,2,3,4), 4) #some x information on col id
y.col = rep(c(1,2,3,4), 4) #some y information on col id
z.col = rep(c(1,2,3,4), 4) #some z information on col id
df <- data.frame(row, col, value, x.row, y.row, z.row, x.col, y.col, z.col)
The question is then: how to accomplish that matrix visual as shown above, or something like it in R?
It is fairly easy to go about the issue in excel since it is cell-based, but I'm more interested in a solution in R (if possible). So I guess I'm looking for inspiration on how I might get about it, or maybe even a specific solution on how to do it. I've been thinking if it is possible using the openxlsx package, and manipulating a sheet in excel through R. Or maybe using lists, and storing them on the DF... Or heatmaply (which has an option for e.g. a dendrogram above a heatmap).
I must admit, however, I'm stuck. I can't get my head around it... So I guess I'm looking for your expertise :)

How can I merge dataframes of unequal length but known chunk length?

I have a model where individuals can die and reproduce. I record information from the model at set intervals. I know the identity of the individuals and the iteration number I sampled from:
df1<-data.frame(
who= c(1,2,3,4,1,2,3,3,5),
iteration = c(1,1,1,1,2,2,2,3,3)
)
df1
But each of the individuals has a list of numbers associated with it that I want to track. Because each individual has more than one number associated with it, I get two data frames of unequal sample size.
df2 <- data.frame(values=c(1,1, # id = 1
1,2, # id = 2
2,1, # id = 3
0,0, # id = 4
1,1, # id = 1
1,2, # id = 2
2,1, # id = 3
2,1, # id = 3
0,0)) # id = 5
df2
I want to bind them so the 'who' variable is matched up with its value. I did the following to split the values up into the right sized chunks but now I'm stuck.
df3 <- split(df2$values, ceiling(seq_along(df2$values)/2))
I should get something out that looks like this:
who iteration value1 value2
1 1 1 1
2 1 1 2
3 1 2 1
4 1 0 0
1 2 1 1
2 2 1 2
3 2 2 1
3 3 2 1
5 3 0 0
Here, we split the 'values' column based on a grouping index created with %% into a list of vectors, then make the list element pad with NA at the end (in case if there are less number of elements) by assigning the length<- to the maximum length of list element
lst <- split(df2$values, (seq_along(df2$values)-1) %% 2 +1)
m1 <- do.call(cbind, lapply(lst, "length<-", max(lengths(lst))))
cbind(df1, m1)

How do I identifying the first zero in a group of ordered columns?

I'm trying to format a dataset for use in some survival analysis models. Each row is a school, and the time-varying columns are the total number of students enrolled in the school that year. Say the data frame looks like this (there are time invariate columns as well).
Name total.89 total.90 total.91 total.92
a 8 6 4 0
b 1 2 4 9
c 7 9 0 0
d 2 0 0 0
I'd like to create a new column indicating when the school "died," i.e., the first column in which a zero appears. Ultimately I'd like to have this column be "years since 1989" and can re-name columns accordingly.
A more general version of the question, for a series of time ordered columns, how do I identify the first column in which a given value occurs?
Here's a base R approach to get a column with the first zero (x = 0) or NA if there isn't one:
data$died <- apply(data[, -1], 1, match, x = 0)
data
# Name total.89 total.90 total.91 total.92 died
# 1 a 8 6 4 0 4
# 2 b 1 2 4 9 NA
# 3 c 7 9 0 0 3
# 4 d 2 0 0 0 2
Here is an option using max.col with rowSums
df1$died <- max.col(!df1[-1], "first") * NA^!rowSums(!df1[-1])
df1$died
#[1] 4 NA 3 2

For loop to paste rows to create new dataframe from existing dataframe

New to SO, but can't figure out how to get this code to work. I have a dataframe that is very large, and is set up like this:
Number Year Type Amount
1 1 A 5
1 2 A 2
1 3 A 7
1 4 A 1
1 1 B 5
1 2 B 11
1 3 B 0
1 4 B 2
This goes onto multiple for multiple numbers. I want to take this dataframe and make a new dataframe that has two of the rows together, but it would be nested (for example, row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4) where each combination of each year is together within types and numbers.
Example output:
Number Year Type Amount Number Year Type Amount
1 1 A 5 1 2 A 2
1 1 A 5 1 3 A 7
1 1 A 5 1 4 A 1
1 2 A 2 1 3 A 7
1 2 A 2 1 4 A 1
1 3 A 7 1 4 A 1
I thought that I would do a for loop to loop within number and type, but I do not know how to make the rows paste from there, or how to ensure that I am only getting the combinations of the rows once. For example:
for(i in 1:n_number){
for(j in 1:n_type){
....}}
Any tips would be appreciated! I am relatively new to coding, so I don't know if I should be using a for loop at all. Thank you!
df <- data.frame(Number= rep(1,8),
Year = rep(c(1:4),2),
Type = rep(c('A','B'),each=4),
Amount=c(5,2,7,1,5,11,0,2))
My interpretation is that you want to create a dataframe with all row combinations, where Number and Type are the same and Year is different.
First suggestion - join on Number and Type, then remove rows that have different Year. I added an index to prevent redundant matches (1 with 2 and 2 with 1).
df$index <- 1:nrow(df)
out <- merge(df,df,by=c("Number","Type"))
out <- out[which(out$index.x>out$index.y & out$Year.x!=out$Year.y),]
Second suggestion - if you want to see a version using a loop.
out2 <- NULL
for (i in c(1:(nrow(df)-1))){
for (j in c((i+1):nrow(df))){
if(df[i,"Year"]!=df[j,"Year"] & df[i,"Number"]==df[j,"Number"] & df[i,"Type"]==df[j,"Type"]){
out2 <- rbind(out2,cbind(df[i,],df[j,]))
}
}
}

Resources