How can i count occurrence with few variables in R - r

I have some example data.frame:
x<- data.frame(c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1),c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1))
colnames(x) <- c('PV','LA','Wiz','LAg')
I want to count occurrence by hole row. The result should look like:
PV LA Wiz Lag Replace
0 0 0 0 1
1 1 1 1 2
2 2 2 2 2
1 2 1 2 1
2 1 2 1 1
The row 0 0 0 0 was replaced 1, row 1 1 1 1 was replaced 2 times etc.
Do you have any idea, how can I do it ?

Maybe you want this?
as.data.frame(table(do.call(paste, x[,-1])))
# Var1 Freq
#1 0 0 0 0 1
#2 1 1 1 1 2
#3 1 2 1 2 1
#4 2 1 2 1 1
#5 2 2 2 2 2

Related

R: Long-data: how to remove all following obs within same ID once condition is met?

I have long data looking like this for example:
ID time condition
1 1 0
1 2 0
1 3 0
1 4 1
2 1 0
2 2 1
2 3 1
2 4 0
3 1 1
3 2 1
3 3 0
3 4 0
4 1 0
4 2 1
4 3 NA
4 4 NA
I want to only keep those rows before condition is met once so I want:
ID time condition
1 1 0
1 2 0
1 3 0
1 4 1
2 1 0
2 2 1
3 1 1
4 1 0
4 2 1
I tried to loop but a) it said looping is not good coding style in R and b) it won't work.
Sidenote: just if you are wondering, it does make sense that IDs have condition and then lose it again in my example, but I am only interested in when they first had it.
Thank you.
Here's an easy way with dplyr:
library(dplyr)
df %>% group_by(ID) %>%
filter(row_number() <= which.max(condition) | sum(condition) == 0)
# # A tibble: 7 x 3
# # Groups: ID [3]
# ID time condition
# <int> <int> <int>
# 1 1 1 0
# 2 1 2 0
# 3 1 3 0
# 4 1 4 1
# 5 2 1 0
# 6 2 2 1
# 7 3 1 1
It relies on which.max which returns the index of the first maximum value in vector. The | sum(condition) == 0 takes care to keep censored cases (where condition is always 0).
Using this data:
1 1 0
1 2 0
1 3 0
1 4 1
2 1 0
2 2 1
2 3 1
2 4 0
3 1 1
3 2 1
3 3 0
3 4 0')

Copy and multiply values between data frames according to the group

I have a dataframe DF1. id denotes participant's number, and then we have few observations (rows) for each participant:
id blocktype condition blocknr markodd
1 1 1 1 0
1 3 2 2 0
1 3 3 2 0
2 1 2 1 0
2 1 1 2 0
2 1 1 2 0
3 4 1 1 0
3 1 1 2 0
3 2 1 2 0
I also have another data frame DF2, with additional data, this time with single line for each person:
id taskorder exporder
1 1 1
2 2 1
3 1 2
I would like to take a value from DF2 for each id, and copy and multiply it across all observations for the respective id, all in a new column of DF1, so that I get this:
id blocktype condition blocknr markodd taskorder
1 1 1 1 0 1
1 3 2 2 0 1
1 3 3 2 0 1
2 1 2 1 0 2
2 1 1 2 0 2
2 1 1 2 0 2
3 4 1 1 0 1
3 1 1 2 0 1
3 2 1 2 0 1
Can you please tip me how to do it? dplyr solution would be most preferable!
Try this :
DF1 <- DF1 %>% left_join(DF2, by="id") %>% dplyr::select(colnames(DF1), taskorder)

R For Loop Not Working

Mydata set test is below. I want to create a new variable "indicator" which is=1 if all variables equal 1 (example row 3) or else 0.
id X10J X10f X10m X10ap X10myy X10junn X10julyy
1 1001 2 2 2 2 2 2 2
2 1002 1 1 -1 2 1 1 1
3 1003 1 1 1 1 1 1 1
4 1004 1 1 2 1 1 1 1
12 1012 1 2 1 1 1 1 1
i created the following for loop:
for (i in c(test$X10J,test$X20f,test$X10m,test$X10ap,test$Xmyy,test$X10junn,test$X10julyy)){
if(i==1){
test$indicator=1
}else if(i==2|i==-1){
test$indicator=0
}
}
this creates a variable with all values=1 instead of 0 and -1.
A vectorized solution:
test$indicator <- ifelse(rowSums(test[,-1] ==1)==ncol(test[,-1]),1,0)
No need for a for loop. You can use apply
> test$indicator <- apply(test[-1], 1, function(x) ifelse(all(x == 1), 1, 0))
> test
id X10J X10f X10m X10ap X10myy X10junn X10julyy indicator
1 1001 2 2 2 2 2 2 2 0
2 1002 1 1 -1 2 1 1 1 0
3 1003 1 1 1 1 1 1 1 1
4 1004 1 1 2 1 1 1 1 0
12 1012 1 2 1 1 1 1 1 0
You could just use:
indicator <- apply(test[,-1], 1, function(row)
{
ifelse(all(row==1), 1, 0)
})
Note: the second parameter of apply is 1 if you for rows and 2 for columns.

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset
I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.
Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3
Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:
Product_ID PI t nPIcomp
1 0 1 2
1 1 1
1 1 1
1 0 2 4
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1 1
2 1 1
2 0 2 1
2 1 2
2 0 3 3
2 1 3
2 1 3
2 1 3
I hope I have made my issue clear :)
Thanks in advance,
Henrik
Something like this?
x <- read.table(text = "Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3", header = TRUE)
find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count
Product_ID PI t nPIcomp
1 1 0 1 2
2 1 1 1 NA
3 1 1 1 NA
4 1 0 2 4
5 1 1 2 NA
6 1 1 2 NA
7 1 1 2 NA
8 1 1 2 NA
9 2 0 1 1
10 2 1 1 NA
11 2 0 2 1
12 2 1 2 NA
13 2 0 3 3
14 2 1 3 NA
15 2 1 3 NA
16 2 1 3 NA
I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:
dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
FUN = function(z) {
n <- sum(z)
c(n, rep(NA, n))
}))
The same idea can be used with the data.table package if your data is large and speed is a concern.
Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...
ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
# Product_ID t nPIcomp
#1 1 1 2
#2 1 2 4
#3 2 1 1
#4 2 2 1
#5 2 3 3

Changing the ID value based on another column

I have a large data set that looks something like this:
Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 1 3 2
1 100 1 4 2
0 0 2 1 1
0 0 2 2 1
1 150 2 3 1
1 100 2 4 2
I want to make a new ID column based on when there is a new path_no, then the ID will change. So I am hoping it will look something like this:
Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 2 3 2
1 100 2 4 2
0 0 3 1 1
0 0 3 2 1
1 150 3 3 1
1 100 4 4 2
I think rleid from data.table should do the trick. Here's one solution that uses data.table and dplyr:
dplyr::mutate(df, ID = data.table::rleid(path_no))
Conv. Rev. ID Order path_no
1 0 0 1 1 1
2 1 50 1 2 1
3 0 0 2 3 2
4 1 100 2 4 2
5 0 0 3 1 1
6 0 0 3 2 1
7 1 150 3 3 1
8 1 100 4 4 2
Or with data.table only:
dt <- setDT(df)
dt[, ID := rleid(path_no)][]
Conv. Rev. ID Order path_no
1: 0 0 1 1 1
2: 1 50 1 2 1
3: 0 0 2 3 2
4: 1 100 2 4 2
5: 0 0 3 1 1
6: 0 0 3 2 1
7: 1 150 3 3 1
8: 1 100 4 4 2
Data:
text <- "Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 1 3 2
1 100 1 4 2
0 0 2 1 1
0 0 2 2 1
1 150 2 3 1
1 100 2 4 2"
df <- read.table(text = text, stringsAsFactors = FALSE, header = TRUE)
Can go for a simple for loop:
vals <- c(1, 1, 1, 2, 2, 2, 1, 1, 2)
nobs <- length(vals)
idx <- rep(1, nobs)
for (i in 2:nobs) {
if (vals[i] != vals[i-1]) {
idx[i] <- idx[i-1] + 1
} else {
idx[i] <- idx[i-1]
}
}

Resources