I have a very large dataframe that looks like so:
month <- c(201101, 201101, 201101, 201102, 201102, 201102, 201103, 201103, 201103, 201104, 201104, 201104)
su <- as.factor(c(045110B238, 045110B238, 045110B238, 045110B238, 045110B238,045110B238, 045110B238, 045110B238, 045110B238, 045110B238, 045110B238, 045110B238))
item <- as.factor(c(045110B238A01, 045110B238A02, 045110B238A03, 045110B238A01, 045110B238A02, 045110B238A03, 045110B238A01, 045110B238A02, 045110B238A03, 045110B238A01, 045110B238A02, 045110B238A03))
item.dlq <- c(1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1)
df <- data.frame(month, su, item, item.dlq)
Using the item.dlq variable I count the cumulative number of months for which each itemcode has item.dlq == 1:
library(dplyr)
df <- data.frame(df %>%
group_by(itemcode, grp = cumsum(item.dlq == 0)) %>%
mutate(item.cum.dlq = cumsum(item.dlq)))
which should give me a vector like so:
item.cum.dlq <- c(1, 1, 1, 2, 0, 2, 3, 1, 3, 4, 2, 4)
Based on the information above, I would like to
create a variable that counts the number of consecutive months in which ALL items for the su have values of dlq==1.
count the number of consecutive months when at least 1 itemcode has a value of 1. For example, where month is equal to 201102 (i.e. 2/2011), item 045110B238A02 has item.dlq == 0, so only 2/3 items have dlq == 1.
Note that there is only one value of su in the example above, but there are many in the full data frame I am working with. I would also like to compress the data frame as well, if possible, to avoid carrying around unnecesary observations. Here is what the raw data would look like without compressing:
su.cum.fulldlq <- c(1, 1, 1, 0, 0, 0, 1, 1, 1, 2, 2, 2) ## all items dlq ==1
su.cum.partdlq <- c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0) ## at least 1 item but not all have dlq == 1
If the data frame were compressed, it would look like so:
month <- c(201101, 201102, 201103, 201104)
su <- c(045110B238, 045110B238, 045110B238, 045110B238)
su.cum.fulldlq <- c(1, 0, 1, 2)
su.cum.partdlq <- c(0, 1, 0, 0)
I was thinking something along the lines of this, but I keep getting error messages.
df <- data.frame(df %>%
group_by(su, month)) %>%
mutate(burden = n_distinct(itemcode)) # count number of items
mutate(dlq.items = n_distinct(dlq == 1)) %>% # count number of items where dlq == 1
mutate(full.dlq = ifelse(burden == dlq_items, 1, 0)) %>% # if number of items equals the number of items with dlq == 1, then full.dlq == 1.
after this i am not certain at all.
Is there a way to do so using dplyr? If not, any other approaches would be welcome. If something is not clear please comment and I will change it. Either way, any help or suggestions would be greatly appreciated. Thanks so much!
Related
The sample data is as follows
ID <- c(1, 2, 3)
O1D1 <- c(0, 0, 0)
O1D2 <- c(0, 0, 0)
O1D3 <- c(0, 10, 0)
O2D1 <- c(0, 0, 0)
O2D2 <- c(0, 0, 0)
O2D3 <- c(18, 0, 17)
O3D1 <- c(0, 9, 0)
O3D2 <- c(20, 1, 22)
O3D3 <- c(0, 0, 0)
x <- data.frame(ID, O1D1, O1D2, O1D3, O2D1, O2D2, O2D3, O3D1, O3D2, O3D3)
I created a new column with some conditional logic.
Say, the new column is n
x$n <- (x$O1D3 > 0 & x$O2D3 == 0)
> x$n
[1] FALSE TRUE FALSE
What I am looking to get instead is a column with values such as
> x$n
[1] 0 10 0
Or, in other words, the values of O1D3 should replace TRUE values in the n column and the FALSE values can be replaced with 0.
Thanks for your time and help.
I have line items indicating which groups my customers are members of.
cols <- c("CustomerName", "Magazines", "Books", "Emails")
df <- data.frame(matrix(ncol = length(cols), nrow=0))
colnames(df) <- cols
df[nrow(df) + 1,] <- c("Alice", 1, 0, 1)
df[nrow(df) + 1,] <- c("Bob", 0, 1, 1)
df[nrow(df) + 1,] <- c("Chris", 1, 1, 1)
df[nrow(df) + 1,] <- c("Darcy", 0, 1, 1)
How do I summarize data of this shape into a single summary row with columns & counts for each possible group-combination?
Desired output:
df_DesiredOutput <- c("Books" = 0, "Magazines" = 0, "Emails" = 0, "BooksMagazines" = 0, "BooksEmails" = 1, "MagazinesEmails" = 2, "BooksMagazinesEmails" = 1)
The transformation should be agnostic to the number of products as well as their actual product names.
How can I assign a value into a matrix based in a vector condition index. A working example is:
# Input:
r <- c(2, 1, 3)
m <- matrix(rep(0, 9), nrow = 3)
# Desired output
result <- matrix(c(0, 1, 0,
1, 0, 0,
0, 1, 0), nrow = 3)
result.
# I try with this notation but it does not work:
sapply(1:3, function(x)m[x, r[x]] <- 1)
We use row/column indexing to assign
m[cbind(seq_len(nrow(m)), r)] <- 1
Or using replace
replace(m, cbind(seq_len(nrow(m)), r), 1)
I have a data frame that includes many variables. Here is a shortened version of what I have so far:
n_20010_0_0 <- c(1,2,3,4)
n_20010_0_1 <- c(0, -2, NA, 4)
n_20010_0_2 <- c(3, 0, -7, 2)
x <- data.frame (n_20010_0_0, n_20010_0_1, n_20010_0_2)
I created a new variable that returns whether or not there is a 1 within the list of variables:
MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")
x$MotherCAD_0_0 <- apply(x, 1, function(x) as.integer(any(x[MotherIllness0] == 1, na.rm = TRUE)))
I would like to keep the NAs as 0's, but I would also like to recode it so that if there is a -7 the new value is NA.
This is what I've tried and it doesn't work:
x$MotherCAD_0_0[MotherIllness0 == -7] <- NA
you don't need to define MotherIllness0, the argument 1 in your apply function takes care of that.
Here's a line of code that does both things you want.
MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")
x$MotherCAD_0_0<- apply(x[,MotherIllness0], 1, function(x) ifelse(any(x==-7), NA,
as.integer(any(x==1, na.rm=T))))
I assumed that a row with both 1s and -7s should have NA for the new variable. If not, then this should work:
x$MotherCAD_0_0<- apply(x[,MotherIllness0], 1, function(x) ifelse(any(x==1, na.rm=T), 1,
ifelse(any(x==-7), NA, 0)))
Note that with the example you have above, these two lines should produce the same outcome.
Here's another way to do it, without using any if-else logic:
# Here's your dataset, with a row including both 1 and -7 added:
x <- data.frame (n_20010_0_0 = c(1, 2, 3, 4, 1),
n_20010_0_1 = c(0, -2, NA, 4, 0) ,
n_20010_0_2 = c(3, 0, -7, 2, -7)
)
# Your original function:
MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")
x$MotherCAD_0_0 <- apply(x, MARGIN = 1, FUN = function(x) {
as.integer(
any(x[MotherIllness0] == 1, na.rm = TRUE)
)
})
# A simplified version
x$test <- apply(x, MARGIN = 1, FUN = function(row) {
as.integer(
any(row[MotherIllness0] == 1, na.rm = TRUE) &
!any(row[MotherIllness0] == -7, na.rm = TRUE)
)
})
A couple of notes: the name of x in an anonymous function like function(x) can be anything, and you'll save yourself a lot of confusion by calling it what it is (I named it row above).
It's also unlikely that you actually need to convert your result column to integer - logical columns are easier to interpret, and they work the same as 0-1 columns for just about everything (e.g., TRUE + FALSE equals 1).
I first want to generate multinomially distributed data using r, and then I want the data in its "raw" form. So for an example, say that I have generated data by
set.seed(1)
df <- as.data.frame(cbind(rmultinom(1, 13, c(0.1, 0.3, 0.4, 0.2)), seq(from = 0, to = 3, by = 1)))
I get
V1 V2
3 0
2 1
4 2
4 3
I then want the data in a vector at the individual level, so that it looks like
0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
Is there an easy way to do this? I´m new to this and it wasn´t as easy as I thought it would be. I tried to create a function that looked something like
xcv <- vector(length = m)
asdf <- function(x, n){
for(i in 1:n){
xcv[j] <- seq(from = x[i,2], to = x[i,2], length.out = x[i,1])
}
return(xcv)
}
This did not work at all, so I hope to get some help.