The results of:
BB= RB[RB$Rep, %in% c(“1”,”3”)] and
Bb=subset(RB,Rep ==c(“1”,”3”) )
are different.
Please tell me what the problem is?

When you use == the comparison is done in a sequential order.
Consider this example :
df <- data.frame(a = 1:6, b = c(1:3, 3:1))
# a b
#1 1 1
#2 2 2
#3 3 3
#4 4 3
#5 5 2
#6 6 1
When you use :
subset(df, b == c(1, 3))
# a b
#1 1 1
#4 4 3
1st value of b is compared with 1, 2nd with 3. Now as you have vector of shorter length, the values are recycled meaning 3rd value is again compared to 1, 4th value with 3 and so on until end of the dataframe. Hence, you get row 1 and 4 as output here.
When you use %in% it checks for either 1 or 3 is present in b. So it selects all the rows where value 1 or 3 is present in b.
subset(df, b %in% c(1, 3))
# a b
#1 1 1
#3 3 3
#4 4 3
#6 6 1


I want to count the unique combinations of a variable that appear per group.
For example:
df <- data.frame(id = c(1,1,1,2,2,2,3,3,4,4,4,5,6,6,7,7,7),
status = c("a","b","c","a","b","c","b","c","b","c","d","b","b","c","b","c", "d"))
> df
id status
1 1 a
2 1 b
3 1 c
4 2 a
5 2 b
6 2 c
7 3 b
8 3 c
9 4 b
10 4 c
11 4 d
12 5 b
13 6 b
14 6 c
15 7 b
16 7 c
17 7 d
So that, for example, I can tally how many times a given combination of "status" appears.
By hand, for example, I see that "a,b,c" appears twice total (id's 1 and 2).
The result I think I am looking for would be something like:
abc 2
bc 3
b 1
An option with tidyverse where group by 'id', paste the 'status' and get the count
df %>%
group_by(id) %>%
summarise(status = str_c(status, collapse="")) %>%
# A tibble: 4 x 2
# status n
# <chr> <int>
#1 abc 2
#2 b 1
#3 bc 2
#4 bcd 2
Here is a base R option via aggregate
> aggregate(.~status,rev(aggregate(.~id,df,paste0,collapse = "")),length)
status id
1 abc 2
2 b 1
3 bc 2
4 bcd 2
You can use the apply family of functions too with tapply and lapply to get there with table.
tap <- tapply(df$status, df$id ,FUN= function(x) unique(x))
lap <- lapply(tap,FUN = function(x) paste0(x,collapse=""))
status <- unlist(lap)
df1 <- data.frame(table(status))
> df1
status Freq
1 abc 2
2 b 1
3 bc 2
4 bcd 2

I have a data.frame with n rows and I would like to repeat this rows according to the observation of another variable
This is an example for a data.frame
df <- data.frame(a=1:3, b=letters[1:2])
a b
1 1 a
2 2 b
3 3 c
And this one is an example for a variable
df1 <- data.frame(x=1:3)
1 1
2 2
3 3
In the next step I would like to repeat every row from the df with the observation of df1
So that it would look like this
a b
1 1 a
2 2 b
3 2 b
4 3 c
5 3 c
6 3 c
If you have any idea how to solve this problem, I would be very thankful
You simply can repeat the index like:
# a b
#1 1 a
#2 2 b
#2.1 2 b
#3 3 c
#3.1 3 c
#3.2 3 c
or not fixed to size 3

I have been using R for the past couple days and I have question that I am a little stumped on. I have a dataframe with bidder names and bids where some of the bids are empty. I am having trouble implementing a dynamic way to take the average bid for each unique bidder and apply that to the empty cells. This line of code below will take the mean bid for all of the unique bidders. All I need to do is place the mean value of unique_bid in the empty cells that shares the same bidder.
unique_bid <- aggregate(bid ~ bidder, auction[complete.cases(auction),], mean)
Here is a picture of what the dataframe looks like.
You could use ave.
df = data.frame(a = c(1,1,1,2,2,2), b=c(1,2,NA,4,5,NA),c= c(1,2,3,4,5,6))
> df
a b c
1 1 1 1
2 1 2 2
3 1 NA 3
4 2 4 4
5 2 5 5
6 2 NA 6
sel =$b)
df$b[sel] = ave(df$b, df$a, FUN = function(x){mean(x, na.rm = T)})[sel]
ave will use apply the function FUN to df$b while grouping by df$a. The sel will select NA elements of df$b and replace them by the correponding function's result.
> df
a b c
1 1 1.0 1
2 1 2.0 2
3 1 1.5 3
4 2 4.0 4
5 2 5.0 5
6 2 4.5 6

I just migrated from Python to R and I would like to know if there is any function in R which is similar to pandas.MultiIndex.from_product?
letters <- c('a', 'b')
numbers <- c(1, 2, 3)
df <- somefunction(letters, numbers)
letters numbers
1 a 1
2 a 2
3 a 3
4 b 1
5 b 2
6 b 3
> letters <- c('a', 'b')
> numbers <- c(1, 2, 3)
> expand.grid(letters=letters, numbers=numbers)
letters numbers
1 a 1
2 b 1
3 a 2
4 b 2
5 a 3
6 b 3
You can also use CJ from the data.table package. It is faster. But the result is not an ordinary dataframe, it is a datatable:
> library(data.table)
> CJ(letters=letters, numbers=numbers)
letters numbers
1: a 1
2: a 2
3: a 3
4: b 1
5: b 2
6: b 3

R - Using for loop to conditionally change values in a dataframe

All of the variables are on the same scale in the data.frame 1-5.
Example of data.frame
5 2 4 1
3 5 5 2
1 1 3 4
For all values that equal 5 I would like to change it to 1.
for 4 change to 2.
for 2 change to 4.
for 1 change to 5.
Example of data.frame after values have been changed.
1 4 2 5
3 1 1 4
5 5 3 2
What I have tired.
for(b in colnames(rpi_invert)){
rpi_invert[[b]][rpi_invert[[b]] == 5] <- 1
rpi_invert[[b]][rpi_invert[[b]] == 4] <- 2
rpi_invert[[b]][rpi_invert[[b]] == 2] <- 4
rpi_invert[[b]][rpi_invert[[b]] == 1] <- 5
This will only change the values in the first row and not the second column.
for(b in colnames(rpi_invert)){
rpi_invert <- ifelse(rpi_invert[[b]] == 5,1,
ifelse(rpi_invert[[b]] == 4,2,
ifelse(rpi_invert[[b]] == 2,4,
ifelse(rpi_invert[[b]] == 1,5,rpi_invert[[b]]))))
But this gives me the error:
Error in rpi_invert[[b]] : subscript out of bounds
If I try to the same methods for an individual column instead of looping through the data.frame then both methods work so I am not sure what is the problem.
I am sure what I am trying to do can be done more efficiently without a for loop probably with some type of apply function but I am not sure how.
Any help will be appreciated please let me know if further information is needed.
You can try (if your data.frame is df):
# A B C D
#1 1 4 2 5
#2 3 1 1 4
#3 5 5 3 2
or, same but written a bit differently: 6-df
