Delete rows with multiple conditions in R - r

I have tried to look through these examples https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2/
Delete rows based on multiple conditions in r
but its now working on my code
I seem to be able to delete all of station 7, or not delete any, but I only want to delete depth 1 and depth 2 of station 7 but keep depth 3. Is this possible?
Station <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7,7,8, 8, 8, 9, 9,9)
Depth <- c(1, 2, 3, 1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3)
Value <- c(5, 8, 3, 2, 6, 8, 3, 6, 3, 8, 3, 5, 7, 2, 6, 9, 1, 3, 456, 321, 2, 5, 7, 4, 2, 6, 8)
df <- data.frame(Station, Depth, Value)
df
a <- df[!(df$Station == 7 & df$Depth == 1 ) | !(df$Station == 7 & df$Depth == 2 ),]
a

Try
a <- df[!( (df$Station == 7 & df$Depth == 1 ) | (df$Station == 7 & df$Depth == 2 )),]
a
or more compact one
a <- df[!( df$Station == 7 & (df$Depth == 1 | df$Depth == 2 )),]
a

Here are couple of ways to write this -
subset(df, !(Station == 7 & Depth %in% 1:2))
Or -
subset(df, Station != 7 | Station == 7 & Depth == 3)
The same expression can also be used in dplyr::filter if you prefer that.

Related

Calculate intraclass correlation by group in R

I need some programming/statistic help.
I have a database with multiple groups (variable "group"). The members of each group rated some items (in our example-dataset the variables "var1", "var2" and "var3").
I would like to get the intraclass variance for each group. In particular i would like to calculate the r*wg(j), ICC(1) and ICC(2).
I looked for a solution but the icc function in r expect to have the raters (my team members) as columns and not as row. I could find a way to do it by creating a subset for every group and then transposing every dataset but I believe there is an easier solution.
Thanks to anyone who can help me with this.
group <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4)
var1 <- c(4, 5, 4, 2, 3, 4, 5, 3, 5, 8, 4, 3, 4, 4, 5)
var2 <- c(2, 3, 4, 2, 4, 4, 5, 6, 6, 9, 3, 3, 2, 5, 4)
var3 <- c(4, 5, 6, 2, 3, 6, 7, 6, 7, 8, 5, 6, 3, 3, 6)
df <- data.frame(group, var1, var2, var3)

Calculate the mean after filtering and groupby

I have a large dataframe of message exchanges that looks like this:
structure(list(from = c(1, 8, 3, 3, 8, 1, 4, 5, 8, 3, 1, 8, 4,
1, 4, 8, 1, 4, 5, 8, 3, 1, 8, 1, 4, 8), to = c(8, 3, 8, 54, 3,
4, 1, 6, 7, 1, 4, 3, 8, 8, 1, 3, 4, 1, 6, 7, 1, 4, 3, 8, 1, 3
), time = c(63200, 81282, 81543, 81548, 81844, 82199, 82514,
82711, 82739, 82814, 82936, 83889, 84207, 84427, 85523, 85545,
86883, 87187, 87701, 89004, 89619, 92662, 93384, 93443, 94042,
94203), month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6), day = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 15, 15, 15, 15, 15, 15
)), class = "data.frame", row.names = c(NA, -26L))
I'm aiming to calculate the average of the differences in time between the first and the last message someone gets in a day.
So, what I'm aiming is to filter the dataset by an index if it is present either on column "to" of "from", group by day using both the month ("month") and the number of the day in the month ("day"), then I want to calculate the difference between the first and the last message in each day and then I want to average those differences.
At the end I should get a dataframe with the indexes and the daily average for each index. Like this:
index avg
1 1 9429.333
2 3 2590.667
3 4 1982.000
4 8 7338.000
The value for 1 is the average of the differences between the max and min of time for each day: 19164 (for day 1 in month 2), 4251(for day 2 in month 4) and 4423 (for day 15 in month 6).(Note: when the difference is equal to 0 the number should be excluded from the average as in day 3 month 4 for index 8)
Right now I'm trying this, but it does not work
dur<-function(x)max(x)-min(x) #The function to calculate the difference. In other cases I need to use other functions of my own
#index are the Names of the indexes for which I want the calculation
index <- c(1, 3, 4, 8)
names(index) <- index
index %>%
map_dfr(~ df %>% filter(from == .x | to == .x) %>% group_by (month,day) %>%
summarize(result = dur(time)) %>%
summarize(mdur = mean(result)) ,.id = "index")`
The one below works to calculate the time difference for all messages, but I also need the daily average
index %>%
map_dfr(~ df %>%
filter(from == .x | to == .x) %>%
summarize(result = dur(time)),
.id = "index")
library(dplyr)
df = data.frame(from = c(1, 8, 3, 3, 8, 1, 4, 5, 8, 3, 1, 8, 4, 1, 4, 8, 1, 4, 5, 8, 3, 1, 8, 1, 4, 8, 2 ,3),
to = c(8, 3, 8, 54, 3, 4, 1, 6, 7, 1, 4, 3, 8, 8, 1, 3, 4, 1, 6, 7, 1, 4, 3, 8, 1, 3, 5, 8),
time = c(63200, 81282, 81543, 81548, 81844, 82199, 82514, 82711, 82739, 82814, 82936, 83889, 84207, 84427, 85523, 85545, 86883, 87187, 87701, 89004, 89619, 92662, 93384, 93443, 94042, 94203, 12402, 24932),
month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 9, 9),
day = c(1, 1, 1, 15, 15, 22, 22, 22, 25, 25, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 15, 15, 15, 18, 18, 18, 9, 9))
df2 <- df %>% group_by(day, month) %>% summarise(f = first(time), l = last(time)) %>% mutate(diff = l - f) %>% group_by(month) %>% summarise(mt = sum(diff)/length(which(diff!=0)))
This gives:
> df2
# A tibble: 4 × 2
month mt
<dbl> <dbl>
1 2 4806.5
2 4 1834.5
3 6 2262.5
4 9 12530.0
Is this what you are after?
Although you have mentioned something about a person, your data does not include a person column, so I assume this is data from the same person. If you have multiple people, it's just a matter of applying this code to each person separately.

split vector after all predefined set of elements occured

I have to do the following:
I have a vector, let as say
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
I have to subset the remainder of a vector after 1, 2, 3, 4 occurred at least once.
So the subset new vector would only include 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1.
I need a relatively easy solution on how to do this. It might be possible to do an if and while loop with breaks, but I am kinda struggling to come up with a solution.
Is there a simple (even mathematical way) to do this in R?
Use sapply to find where each predefined number occurs first time.
x[-seq(max(sapply(1:4, function(y) which(x == y)[1])))]
# [1] 4 5 5 3 2 11 1 3 3 4 1
Data
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
You can use run length encoding for this
x = c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
encoded = rle(x)
# Pick the first location of 1, 2, 3, and 4
# Then find the max index location
indices = c(which(encoded$values == 1)[1],
which(encoded$values == 2)[1],
which(encoded$values == 3)[1],
which(encoded$values == 4)[1])
index = max(indices)
# Find the index of x corresponding to your split location
reqd_index = cumsum(encoded$lengths)[index-1] + 2
# Print final split value
x[reqd_index:length(x)]
The result is as follows
> x[reqd_index:length(x)]
[1] 4 5 5 3 2 11 1 3 3 4 1

Operation Inside a Dataframe

I'm working on the following df:
Num1 <- c(1, 2, 1, 3, 4, 4, 6, 2)
Num2 <- c(3, 3, 2, 1, 1, 2,4, 4)
Num3 <- c(2, 2, 3, 4, 3, 5, 5, 7)
Num4 <- c(1, 3, 3, 1, 2,3, 3, 6)
Num5 <- c(2, 1, 1, 1, 5, 3, 2, 1)
df <- data.frame(Num1, Num2, Num3, Num4, Num5)
I need to create a new matrix having the first column as df[1] - df[2], the second as df[2] - df[3] and so on.
How about this?
mapply('-', df[-length(df)], df[-1])
Or (as mentioned by #Pierre Lafortune)
df[-length(df)] - df[-1]

merge table in R

I have the 2 tables as below
subj <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
gamble <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
ev <- c(4, 5, 6, 4, 5, 6, 4, 5, 6)
table1 <- data.frame(subj, gamble, ev)
subj2 <- c(1, 2, 3)
gamble2 <- c(1, 3, 2)
table2 <- data.frame(subj2, gamble2)
I want to merge the two tables by gamble, only choose the gamble from table 1 which has the same number to gamble in table 2. The expected output is as follows:
sub gamble ev
1 1 4
2 3 6
3 2 5
You are looking for merge
merge(table1, table2, by.x=c("subj", "gamble"), by.y=c("subj2", "gamble2"), all=FALSE, sort=TRUE)
edited as per Ananda's helpful observation

Resources