I have the 2 tables as below
subj <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
gamble <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
ev <- c(4, 5, 6, 4, 5, 6, 4, 5, 6)
table1 <- data.frame(subj, gamble, ev)
subj2 <- c(1, 2, 3)
gamble2 <- c(1, 3, 2)
table2 <- data.frame(subj2, gamble2)
I want to merge the two tables by gamble, only choose the gamble from table 1 which has the same number to gamble in table 2. The expected output is as follows:
sub gamble ev
1 1 4
2 3 6
3 2 5
You are looking for merge
merge(table1, table2, by.x=c("subj", "gamble"), by.y=c("subj2", "gamble2"), all=FALSE, sort=TRUE)
edited as per Ananda's helpful observation
Related
I have a column in my dataframe containing ascending numbers which are interrupted by Zeros.
I would like to find all rows which come before a Zero and create a new datatable containing only these rows.
My Column: 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0
What I need: 4, 6
Any help would be much appreciated! Thanks!
A dplyr solution:
library(dplyr)
df %>%
filter(lead(x) == 0, x != 0)
#> x
#> 1 4
#> 2 6
Created on 2021-07-08 by the reprex package (v2.0.0)
data
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
Welcome to SO!
You can try with base R. The idea is to fetch the rownames of the rows before the 0 and subset() the df by them:
# your data
df <- data.frame(col = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
# an index that get all the rownames before the 0
index <- as.numeric(rownames(df)[df$col == 0]) -1
# here you subset your original df by index: there is also a != 0 to remove the 0 before 0
df_ <- subset(df, rownames(df) %in% index & col !=0)
df_
col
4 4
12 6
Using base R:
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0),
y = LETTERS[1:13])
df[diff(df$x)<0,]
x y
4 4 D
12 6 L
Using Run Lengths in base R. To get the index of x, add the run lengths until 0 value occurs.
x <- c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0)
y <- rle(x)
x[cumsum(y$lengths)][which(y$values == 0) - 1]
# [1] 4 6
I have tried to look through these examples https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2/
Delete rows based on multiple conditions in r
but its now working on my code
I seem to be able to delete all of station 7, or not delete any, but I only want to delete depth 1 and depth 2 of station 7 but keep depth 3. Is this possible?
Station <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7,7,8, 8, 8, 9, 9,9)
Depth <- c(1, 2, 3, 1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3,1, 2, 3)
Value <- c(5, 8, 3, 2, 6, 8, 3, 6, 3, 8, 3, 5, 7, 2, 6, 9, 1, 3, 456, 321, 2, 5, 7, 4, 2, 6, 8)
df <- data.frame(Station, Depth, Value)
df
a <- df[!(df$Station == 7 & df$Depth == 1 ) | !(df$Station == 7 & df$Depth == 2 ),]
a
Try
a <- df[!( (df$Station == 7 & df$Depth == 1 ) | (df$Station == 7 & df$Depth == 2 )),]
a
or more compact one
a <- df[!( df$Station == 7 & (df$Depth == 1 | df$Depth == 2 )),]
a
Here are couple of ways to write this -
subset(df, !(Station == 7 & Depth %in% 1:2))
Or -
subset(df, Station != 7 | Station == 7 & Depth == 3)
The same expression can also be used in dplyr::filter if you prefer that.
I have to do the following:
I have a vector, let as say
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
I have to subset the remainder of a vector after 1, 2, 3, 4 occurred at least once.
So the subset new vector would only include 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1.
I need a relatively easy solution on how to do this. It might be possible to do an if and while loop with breaks, but I am kinda struggling to come up with a solution.
Is there a simple (even mathematical way) to do this in R?
Use sapply to find where each predefined number occurs first time.
x[-seq(max(sapply(1:4, function(y) which(x == y)[1])))]
# [1] 4 5 5 3 2 11 1 3 3 4 1
Data
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
You can use run length encoding for this
x = c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
encoded = rle(x)
# Pick the first location of 1, 2, 3, and 4
# Then find the max index location
indices = c(which(encoded$values == 1)[1],
which(encoded$values == 2)[1],
which(encoded$values == 3)[1],
which(encoded$values == 4)[1])
index = max(indices)
# Find the index of x corresponding to your split location
reqd_index = cumsum(encoded$lengths)[index-1] + 2
# Print final split value
x[reqd_index:length(x)]
The result is as follows
> x[reqd_index:length(x)]
[1] 4 5 5 3 2 11 1 3 3 4 1
I have a list of 1000 elements each of them are composed by other lists of different lengths (of about 6000 average length). I need to save it into a .csv or preferably .txt file. As it is a very big object, here I show you the problem through a simple example.
Given the following list of list, that consists of 2 lists which in turn consist respectively of 4 and 6 elements, as follow:
[[1]]
[[1]][[1]]
[1] 7 3 5 4 8
[[1]][[2]]
[1] 5 7 8
[[1]][[3]]
[1] 1 5
[[1]][[4]]
[1] 6
[[2]]
[[2]][[1]]
[1] 1 7 3 4 5 9
[[2]][[2]]
[1] 5 9 2 1
[[2]][[3]]
[1] 6 2 4
[[2]][[4]]
[1] 6 1
[[2]][[5]]
[1] 5 9
[[2]][[6]]
[1] 6
I need to save this list of list in a .csv or preferably .txt file in order to maintain the reference of the list numbers, for example where the first two numbers refers to the list order, as follow:
1, 1, 7, 3, 5, 4, 8
1, 2, 5, 7, 8
1, 3, 1, 5
1, 4, 6
2, 1, 1, 7, 3, 4, 5, 9
2, 2, 5, 9, 2, 1
2, 3, 6, 2, 4
2, 4, 6, 1
2, 5, 5, 9
2, 6, 6
Has anyone idea about how I could do that? Here is the data in reproducible form:
mylist <- list(list(c(7, 3, 5, 4, 8), c(5, 7, 8), c(1, 5), 6), list(c(1,
7, 3, 4, 5, 9), c(5, 9, 2, 1), c(6, 2, 4), c(6, 1), c(5, 9),
6))
Here's an example. (ccat() isn't really necessary, it's just a helper function to save a little bit of typing. If you instead define ccat with file="" this will print to the console instead.)
ccat <- function(...,file="myfile.txt") {
cat(...,file=file,append=TRUE)
}
for (i in seq_along(mylist)) {
for (j in seq_along(mylist[[i]])) {
ccat(i,j,mylist[[i]][[j]],sep=", ")
ccat("\n")
}
ccat("\n")
}
Consider writeLines() with nested lapply(). Below writes to file and creates corresponding newlist object to memory:
file <- "/path/to/myfile.txt"
conn <- file(description=file, open="w")
newlist <- lapply(seq_len(length(mylist)), function(i){
lapply(seq_len(length(mylist[[i]])), function(j) {
temp <- c(i, j, mylist[[i]][[j]])
writeLines(text=paste(temp, collapse=","), con=conn, sep="\r\n")
})
})
close(conn)
Maybe something like this-
mylist <- list(list(c(7, 3, 5, 4, 8), c(5, 7, 8), c(1, 5), 6),
list(c(1, 7, 3, 4, 5, 9), c(5, 9, 2, 1), c(6, 2, 4), c(6, 1), c(5, 9), 6))
output <- NULL
count <- 1L
output <- plyr::ldply(lapply(mylist, function(x)
{
return(cbind(count,seq(x),plyr::ldply(x,rbind)))
count <<- count + 1
}))
#Then write the output as a csv file
write.csv(output, file = "output.csv")
I'm working on the following df:
Num1 <- c(1, 2, 1, 3, 4, 4, 6, 2)
Num2 <- c(3, 3, 2, 1, 1, 2,4, 4)
Num3 <- c(2, 2, 3, 4, 3, 5, 5, 7)
Num4 <- c(1, 3, 3, 1, 2,3, 3, 6)
Num5 <- c(2, 1, 1, 1, 5, 3, 2, 1)
df <- data.frame(Num1, Num2, Num3, Num4, Num5)
I need to create a new matrix having the first column as df[1] - df[2], the second as df[2] - df[3] and so on.
How about this?
mapply('-', df[-length(df)], df[-1])
Or (as mentioned by #Pierre Lafortune)
df[-length(df)] - df[-1]