I have a dataframe df, I would like to find peaks and valleys for each column and then replace the points where peaks and valleys are present with the value 1.
Here I made an example by applying it to only one column.
Is it possible to do this for all the columns in the dataframe?
df <- data.frame(a = sample(1:10,10),
b = sample(1:10,10),
c = sample(1:10,10),
d = sample(1:10,10),
e = sample(1:10,10))
vallys<- findValleys(df$b, thresh =0)
peaks <- findPeaks(df$b, thresh = 0)
df$b <- rep(0, nrow(df))
df$b <- replace(df$b, peaks, values=1)
df$b <- replace(df$b, vallys, values=1)
Thank you
The easiest thing is to put your code into a function.
library(quantmod)
replace_peaks_valleys <- function(x) {
valleys <- findValleys(x, thresh = 0)
peaks <- findPeaks(x, thresh = 0)
new_col <- rep(0, length(x))
new_col <- replace(new_col, peaks, values = 1)
new_col <- replace(new_col, valleys, values = 1)
return(new_col)
}
Then you can choose whether to do it in base R, dplyr or data.table.
base R
As you want to assign back to your original data frame, in base R you can do (note the square brackets or it will return a list):
df[] <- lapply(df, replace_peaks_valleys)
head(df)
# a b c d e
# 1 0 0 0 0 0
# 2 0 0 0 0 0
# 3 1 1 1 1 1
# 4 1 0 1 1 0
# 5 1 1 0 1 0
# 6 0 1 1 1 1
dplyr
Alternatively, with dplyr you can just do:
library(dplyr)
df |>
mutate(
across(
a:e, replace_peaks_valleys
)
)
# a b c d e
# 1 0 0 0 0 0
# 2 0 0 0 0 0
# 3 1 1 1 1 1
# 4 1 0 1 1 0
# <etc>
data.table
You can also do this with data.table:
library(data.table)
dt <- setDT(df)
dt[, lapply(.SD, replace_peaks_valleys)]
# a b c d e
# 1: 0 0 0 0 0
# 2: 0 0 0 0 0
# 3: 1 0 1 1 1
# 4: 1 1 0 0 0
# <etc>
N.B. I used set.seed(1) before I ran your code - if you do this as well you should exactly the same output.
Function definition
I just copied and pasted your code and made it into a function. You could change it so you assign 0 or 1 to the existing vector, rather than creating a new vector every time:
replace_peaks_valleys2 <- function(x) {
valleys <- findValleys(x, thresh = 0)
peaks <- findPeaks(x, thresh = 0)
x[] <- 0
x[c(peaks,valleys)] <- 1
return(x)
}
Related
I would like to add a varying number (X) of columns with 0 to an existing data.frame within a function.
Here is an example data.frame:
dt <- data.frame(x=1:3, y=4:6)
I would like to get this result if X=1 :
a x y
1 0 1 4
2 0 2 5
3 0 3 6
And this if X=3 :
a b c x y
1 0 0 0 1 4
2 0 0 0 2 5
3 0 0 0 3 6
What would be an efficient way to do this?
We can assign multiple columns to '0' based on the value of 'X'
X <- 3
nm1 <- names(dt)
dt[letters[seq_len(X)]] <- 0
dt[c(setdiff(names(dt), nm1), nm1)]
Also, we can use add_column from tibble and create columns at a specific location
library(tibble)
add_column(dt, .before = 1, !!!setNames(as.list(rep(0, X)),
letters[seq_len(X)]))
A second option is cbind
f <- function(x, n = 3) {
cbind.data.frame(matrix(
0,
ncol = n,
nrow = nrow(x),
dimnames = list(NULL, letters[1:n])
), x)
}
f(dt, 5)
# a b c d e x y
#1 0 0 0 0 0 1 4
#2 0 0 0 0 0 2 5
#3 0 0 0 0 0 3 6
NOTE: because letters has a length of 26 the function would need some adjustment regarding the naming scheme if n > 26.
You can try the code below
dt <- cbind(`colnames<-`(t(rep(0,X)),letters[seq(X)]),dt)
If you don't care the column names of added columns, you can use just
dt <- cbind(t(rep(0,X)),dt)
which is much shorter
I have the following data frame in R:
Row number A B C D E F G H I J
1 1 1 0 0 1 0 0 1 1
2 1 0 0 0 1 0 0 1
3 1 0 0 0 1 0 0 1 1
I am trying to calculate the number of times the number changes between 1 and 0 excluding the Nulls
The result I am expecting is this
Row Number No of changes
---------- --------------
1 4
2 4
3 4
An explanation for row 1
In row 1, A has a null so we exclude that.
B and C have 1 which is our first set of values.
D and E have 0 which is our second set of values. Now Change = 1
F has our third set of values which is 1. Now Change = 1+1
G and H have 0 which is our third set of values. Now Change = 1+1+1
I and J have 1 which is our fourth set of values. Now Change = 1+1+1+1 =4
Here's a tidyverse approach.
I gather into longer format (from tidyr::pivot_longer), then add a helper column noting when we have a change from 0 to 1 or from 1 to 0, and then sum those by row.
library(tidyverse)
df %>%
# before tidyr 1.0, this would be gather(col, value, -1)
pivot_longer(-1, "col") %>%
group_by(Row.number) %>%
mutate(chg = value == 1 & lag(value) == 0 |
value == 0 & lag(value) == 1) %>%
summarize(no_chgs = sum(chg, na.rm = T))
# A tibble: 3 x 2
Row.number no_chgs
<int> <int>
1 1 4
2 2 4
3 3 4
Sample data:
df <- read.table(
header = T,
stringsAsFactors = F,
text = "'Row number' A B C D E F G H I J
1 NA 1 1 0 0 1 0 0 1 1
2 NA NA 1 0 0 0 1 0 0 1
3 NA 1 0 0 0 1 0 0 1 1")
Here's a data.table solution:
library(data.table)
dt <- as.data.table(df)
dt[,
no_change := max(rleid(na.omit(t(.SD)))) - 1,
by = RowNumber
]
dt
Alternatively, here's a base version:
apply(df[, -1],
1,
function(x) {
complete_case = complete.cases(x)
if (sum(complete_case) > 0) {
return(length(rle(x[complete_case])$lengths) - 1)
} else {
return (0)
}
}
)
I want to randomly insert 1's in the columns of a data frame that do not currently have 1 in them. Using different seeds for each of the variables.
Below is the code I have written so far:
# create the data frame
df <- data.frame(A = c(0,0,0,0,0,0,0,0,0,0),
B = c(0,0,0,0,0,0,0,0,0,0),
C = c(0,1,0,0,0,1,0,1,0,0),
D = c(0,0,0,0,0,0,0,0,0,0),
E = c(0,1,0,1,0,0,0,0,0,0))
# get index of columns that have 1's in them
one_index <- which(grepl(pattern = 1, df))
# function to randomly put 1's with seperate seeds
funcccs <- function(x){
i = 0
for (i in 1:ncol(x)) {
set.seed(i + 1)
x[sample(nrow(x),3)] <- 1
}}
# Apply the function to the columns that do not have 1
funcccs(df[,-one_index])
Below is the error message I get:
Error in [<-.data.frame (*tmp*, sample(nrow(x), 3), value = 1) :
new columns would leave holes after existing columns
Based on the above example, the function should randomly insert 3 values of 1 in variables 'A', 'B' and 'D', because these 3 variables do not currently have 1's in them.
Any help will be appreciated. Thanks
df <- data.frame(A = c(0,0,0,0,0,0,0,0,0,0),
B = c(0,0,0,0,0,0,0,0,0,0),
C = c(0,1,0,0,0,1,0,1,0,0),
D = c(0,0,0,0,0,0,0,0,0,0),
E = c(0,1,0,1,0,0,0,0,0,0))
one_index <- which(grepl(pattern = 1, df))
funcccs <- function(x){
i = 0
for (i in 1:ncol(x)) {
set.seed(i + 1)
x[sample(nrow(x),3),i]= 1
}
return(x)
}
df[,-one_index]=funcccs(df[,-one_index])
You where choosing the whole matrix insted of the i column.
> df
A B C D E
1 0 0 0 1 0
2 1 1 1 0 1
3 0 0 0 1 0
4 0 1 0 0 1
5 1 0 0 0 0
6 0 0 1 1 0
7 1 0 0 0 0
8 0 1 1 0 0
9 0 0 0 0 0
10 0 0 0 0 0
I have the following sample
id <- c("a","b","a","b","a","a","a","a","b","b","c")
SOG <- c(4,4,0,0,0,0,0,0,0,0,9)
data <- data.frame(id,SOG)
I would like in a new column the cumulative value when SOG == 0.
with the following code
tmp <- rle(SOG) #run length encoding:
tmp$values <- tmp$values == 0 #turn values into logicals
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values]) #cumulative sum of TRUE values
inverse.rle(tmp) #inverse the run length encoding
I create the column "stop":
data$Stops <- inverse.rle(tmp)
and I can get in it:
[1] 0 0 1 1 1 1 1 1 1 1 0
But I would like to have instead
[1] 0 0 1 2 3 3 3 3 4 4 0
I mean that when the level of the factor "id" is different from the previous row, I want to jump to the next "stop" (i+1).
have a look a the dplyr package
library(dplyr)
data %>%
mutate(
Stops = ifelse(
SOG > 0,
0,
cumsum(SOG == 0 & lag(id) != id)
)
)
We can try
library(data.table)
setDT(data1)[, v1 := if(all(!SOG)) c(TRUE, id[-1]!= id[-.N]) else
rep(FALSE, .N), .(grp = rleid(SOG))][,cumsum(v1)*(!SOG)]
#[1] 0 0 1 2 3 3 3 3 4 4 0 0 0 0 5 5 0 6 6 0
Using the old data
setDT(data)[, v1 := if(all(!SOG)) c(TRUE, id[-1]!= id[-.N])
else rep(FALSE, .N), .(grp = rleid(SOG))][,cumsum(v1)*(!SOG)]
#[1] 0 0 1 2 3 3 3 3 4 4 0
data
id <- c("a","b","a","b","a","a","a","a","b","b","c","a","a","a","a","a","a","a","a", "a")
SOG <- c(4,4,0,0,0,0,0,0,0,0,9,1,5,3,0,0,4,0,0,1)
data1 <- data.frame(id, SOG, stringsAsFactors=FALSE)
I have a list of dataframes with some overlapping columns in each. The number of dataframes in the list is unknown. How can I efficiently, in base, rbind the dataframes together and fill in non overlapping columns with zeros?
Example data:
x <- data.frame(a=1:2, b=1:2, c=1:2)
y <- data.frame(a=1:2, r=1:2, f=1:2)
z <- data.frame(b=1:3, c=1:3, v=1:3, t=c("A", "A", "D"))
L1 <- list(x, y, z)
Desired output:
a b c f r t v
1 1 1 1 0 0 0 0
2 2 2 2 0 0 0 0
3 1 0 0 1 1 0 0
4 2 0 0 2 2 0 0
5 0 1 1 0 0 A 1
6 0 2 2 0 0 A 2
7 0 3 3 0 0 D 3
Pad out each data frame with the missing columns, then rbind them:
allnames <- unique(unlist(lapply(L1, names)))
do.call(rbind, lapply(L1, function(df) {
not <- allnames[!allnames %in% names(df)]
df[, not] <- 0
df
}))
I have an old (and probably inefficient) function that does this. I've made one modification here to allow the fill to be specified.
RBIND <- function(datalist, keep.rownames = TRUE, fill = NA) {
Len <- sapply(datalist, ncol)
if (all(diff(Len) == 0)) {
temp <- names(datalist[[1]])
if (all(sapply(datalist, function(x) names(x) %in% temp))) tryme <- "basic"
else tryme <- "complex"
}
else tryme <- "complex"
almost <- switch(
tryme,
basic = { do.call("rbind", datalist) },
complex = {
Names <- unique(unlist(lapply(datalist, names)))
NROWS <- c(0, cumsum(sapply(datalist, nrow)))
NROWS <- paste(NROWS[-length(NROWS)]+1, NROWS[-1], sep=":")
out <- lapply(1:length(datalist), function(x) {
emptyMat <- matrix(fill, nrow = nrow(datalist[[x]]), ncol = length(Names))
colnames(emptyMat) <- Names
emptyMat[, match(names(datalist[[x]]),
colnames(emptyMat))] <- as.matrix(datalist[[x]])
emptyMat
})
do.call("rbind", out)
})
Final <- as.data.frame(almost, row.names = 1:nrow(almost))
Final <- data.frame(lapply(Final, function(x) type.convert(as.character(x))))
if (isTRUE(keep.rownames)) {
row.names(Final) <- make.unique(unlist(lapply(datalist, row.names)))
}
Final
}
Here it is on your sample data.
RBIND(L1, fill = 0)
# a b c r f v t
# 1 1 1 1 0 0 0 0
# 2 2 2 2 0 0 0 0
# 1.1 1 0 0 1 1 0 0
# 2.1 2 0 0 2 2 0 0
# 1.2 0 1 1 0 0 1 A
# 2.2 0 2 2 0 0 2 A
# 3 0 3 3 0 0 3 D