I have some data with
h = c[157 144 80 106 124 46 207 188 190 208 143 170 162 178 155 163 162 149 135 160 149 147 133 146 126 120 151 74 122 145 160 155 173 126 172 93]
Then I get local maxima with diff function
max = [1 5 7 10 12 14 16 20 24 27 31 33 35 36]
I have simple matlab code to find spline interpolation
maxenv = spline(max,h(max),1:N);
This code will show result
maxenv = 157 86.5643152828762 67.5352696350679 84.9885891697257 124 169.645228239041 207 224.396380746179 223.191793341491 208 185.421032390413 170 172.173624690130 178 172.759468849065 163 158.147870987344 157.874968589889 159.414581897490 160 157.622863516083 153.308219179638 148.839465253375 146 146.051320982064 148.167322961480 151 153.474200222188 155.606188003845 157.685081783579 160 163.653263154551 173 186.027639098700 172 93
Now I'm trying with R but got some errors
maxenv <- spline(max, h(max), seq(N))
Then error
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
You're code and question are really not clear, so I'm not 100% sure this is what you are looking for:
# vector of values
h <- c(157, 144, 80, 106, 124, 46, 207, 188, 190, 208, 143, 170, 162, 178, 155,
163, 162, 149, 135, 160, 149, 147, 133, 146, 126, 120, 151, 74, 122, 145,
160, 155, 173, 126, 172, 93)
# local maxima
lmax <- h[c(1, which(diff(sign(diff(h)))==-2)+1, length(h))]
# spline calculation
spl <- spline(1:length(lmax), lmax)
# visual inspection
plot(spl)
lines(spl)
Related
Good evening dear community, I have the following query please. I have the following dataframe of genes that has more than 100 thousand values and are found according to the total column between values of 221 and 213 (ie there are values for: 213, 214, 215, 216, 216, 217, 218, 219, 220 and 221).
mix total
1 A2M-ACTB 221
2 A2M-ACTG1 221
3 A2M-ANXA1 221
4 A2M-APP 221
5 A2M-B2M 221
6 A2M-CD24 221
7 A2M-CD74 221
8 A2M-COL1A2 221
9 A2M-COL3A1 221
10 A2M-DSP 221
11 A2M-EEF1A1 221
12 A2M-ENO1 221
13 A2M-FN1 221
14 A2M-GAPDH 221
15 A2M-HLA-B 221
16 A2M-HSP90AB1 221
17 A2M-MGP 221
18 A2M-RPL13A 221
19 A2M-RPS6 221
20 A2M-TM4SF1 221
And what I would like first is to count that the number of gene rows that present 221, then 220, and so on up to 213. Subsequently I would like to separate into individual objects each count.
I hope I have explained clearly, thank you very much for your suggestions.
Kind regards
I try to create a weekly cumulative result from this daily data with detail below
date A B C D E F G H
16-Jan-22 227 441 3593 2467 9 6 31 2196
17-Jan-22 224 353 3555 2162 31 5 39 2388
18-Jan-22 181 144 2734 2916 0 0 14 1753
19-Jan-22 95 433 3610 3084 42 19 10 2862
20-Jan-22 141 222 3693 3149 183 19 23 2176
21-Jan-22 247 426 3455 4016 68 0 1 2759
22-Jan-22 413 931 4435 4922 184 2 39 3993
23-Jan-22 389 1340 5433 5071 200 48 27 4495
24-Jan-22 281 940 6875 5009 343 47 71 3713
25-Jan-22 314 454 5167 4555 127 1 68 3554
26-Jan-22 315 973 5789 3809 203 1 105 4456
27-Jan-22 269 1217 6776 4578 227 91 17 5373
28-Jan-22 248 1320 5942 3569 271 91 156 4260
29-Jan-22 155 1406 6771 4328 426 44 109 4566
Solution using data.table and lubridate
library(lubridate)
library(data.table)
setDT(df)
df[, lapply(.SD, sum), by = isoweek(dmy(date))]
# isoweek A B C D E F G H
# 1: 2 227 441 3593 2467 9 6 31 2196
# 2: 3 1690 3849 26915 25320 708 93 153 20426
# 3: 4 1582 6310 37320 25848 1597 275 526 25922
I wanted to provide a solution using the tidyverse principles.
You will need to use the group_by and summarize formulas and this very useful across() function.
#recreate data in tribble
df <- tribble(
~"date", ~"A", ~"B", ~"C", ~"D", ~"E", ~"F", ~"G", ~H,
"16-Jan-22",227, 441, 3593, 2467, 9, 6, 31, 2196,
"17-Jan-22",224, 353, 3555, 2162, 31, 5, 39, 2388,
"18-Jan-22",181, 144, 2734, 2916, 0, 0, 14, 1753,
"19-Jan-22",95, 433, 3610, 3084, 42, 19, 10, 2862,
"20-Jan-22",141, 222, 3693, 3149, 183, 19, 23, 2176,
"21-Jan-22",247, 426, 3455, 4016, 68, 0, 1, 2759,
"22-Jan-22",413, 931, 4435, 4922, 184, 2, 39, 3993,
"23-Jan-22",389, 1340, 5433, 5071, 200, 48, 27, 4495,
"24-Jan-22",281, 940, 6875, 5009, 343, 47, 71, 3713,
"25-Jan-22",314, 454, 5167, 4555, 127, 1, 68, 3554,
"26-Jan-22",315, 973, 5789, 3809, 203, 1, 105, 4456,
"27-Jan-22",269, 1217, 6776, 4578, 227, 91, 17, 5373,
"28-Jan-22",248, 1320, 5942, 3569, 271, 91, 156, 4260,
"29-Jan-22",155, 1406, 6771, 4328, 426, 44, 109, 4566)
#change date to format date
df$date <- ymd(df$date)
# I both create a new column "week_num" and group it by this variables
## Then I summarize that for each column except for "date", take each sum
df %>%
group_by(week_num=lubridate::isoweek(date)) %>%
summarize(across(-c("date"),sum))
# I get this results
week_num A B C D E F G H
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 2017 6028 33189 26785 990 243 310 25464
2 4 1482 4572 34639 26850 1324 131 400 23080
Group_by() and summarize() are relatively straight forward. Across() is a fairly new verb that is super powerful. It allows you to reference columns using tidy selection principles (eg. starts_with(), c(1:9), etc) and if you apply it a formula it will allow that formula to each of the selected columns. Less typing!
Alternatively you would have to individually sum each column A=sum(A) which is more typing.
My goal is to perform multiple column operations in one line of code without hard coding the variable names.
structure(list(Subject = 1:6, Congruent_1 = c(359, 391, 384,
316, 287, 403), Congruent_2 = c(361, 378, 322, 286, 276, 363),
Congruent_3 = c(342, 355, 334, 274, 297, 335), Congruent_4 = c(365,
503, 324, 256, 266, 388), Congruent_5 = c(335, 354, 320,
272, 260, 337), Incongruent_1 = c(336, 390, 402, 305, 310,
400), Incongruent_2 = c(366, 407, 386, 280, 243, 393), Incongruent_3 = c(323,
455, 317, 308, 259, 325), Incongruent_4 = c(361, 392, 357,
274, 342, 350), Incongruent_5 = c(300, 366, 378, 263, 258,
349)), row.names = c(NA, 6L), class = "data.frame")
My data looks like this.
I want need to do column subtraction and save those new values into new columns. For example, a new column by the name of selhist_1 should be computed as Incongruent_1 - Congruent_1. I tried to write a for loop that indexes the existing columns by their names and creates new columns using the same indexing variable:
for(i in 1:5)(
DP4 = mutate(DP4, as.name(paste("selhistB_",i,sep="")) = as.name(paste("Incongruent_",i,sep="")) - as.name(paste("Congruent_",i,sep="")))
)
but I received this error:
Error: unexpected '=' in: "for(i in 1:5)( DP4 = mutate(DP4, as.name(paste("selhistB_",i,sep="")) ="
I rather use this modular approach, as opposed to hard coding and writing out "selhistB = incongruent_1 - congruent_1" five times, using the mutate() function.
I also wonder if i could achieve the same goal on the long version of this data, and maybe it would make more sense.
library(dplyr)
d %>%
pivot_longer(-Subject,
names_to = c(".value", "id"),
names_sep = "_") %>%
mutate(selhistB = Incongruent - Congruent) %>%
pivot_wider(names_from = id, values_from = c(Congruent, Incongruent, selhistB))
Or just skip the last pivot, and keep everything long.
As long as you are already using tidyverse packages, the following code will do exactly what you need:
library(dplyr)
for(i in 1:5){
DP4 <- DP4 %>% mutate(UQ(sym(paste0("selhistB_",i))) :=
UQ(sym(paste0("Incongruent_",i))) - UQ(sym(paste0("Congruent_",i))))
}
DP4
Subject Congruent_1 Congruent_2 Congruent_3 Congruent_4 Congruent_5
1 1 359 361 342 365 335
2 2 391 378 355 503 354
3 3 384 322 334 324 320
4 4 316 286 274 256 272
5 5 287 276 297 266 260
6 6 403 363 335 388 337
Incongruent_1 Incongruent_2 Incongruent_3 Incongruent_4 Incongruent_5
1 336 366 323 361 300
2 390 407 455 392 366
3 402 386 317 357 378
4 305 280 308 274 263
5 310 243 259 342 258
6 400 393 325 350 349
selhistB_1 selhistB_2 selhistB_3 selhistB_4 selhistB_5
1 23 -5 19 4 35
2 1 -29 -100 111 -12
3 -18 -64 17 -33 -58
4 11 6 -34 -18 9
5 -23 33 38 -76 2
6 3 -30 10 38 -12
You can use split.default and split on column names suffix, then loop over the list and subtract column 2 from column 1, i.e.
lapply(split.default(df[-1], sub('.*_', '', names(df[-1]))), function(i) i[1] - i[2])
Using subtract over all matching columns, then cbind, try:
x <- df1[, grepl("^C", colnames(df1)) ] - df1[, grepl("^I", colnames(df1)) ]
names(x) <- paste0("selhistB_", seq_along(names(x)))
res <- cbind(df1, x)
res
Subject Congruent_1 Congruent_2 Congruent_3 Congruent_4 Congruent_5
1 1 359 361 342 365 335
2 2 391 378 355 503 354
3 3 384 322 334 324 320
4 4 316 286 274 256 272
5 5 287 276 297 266 260
6 6 403 363 335 388 337
Incongruent_1 Incongruent_2 Incongruent_3 Incongruent_4 Incongruent_5
1 336 366 323 361 300
2 390 407 455 392 366
3 402 386 317 357 378
4 305 280 308 274 263
5 310 243 259 342 258
6 400 393 325 350 349
selhistB_1 selhistB_2 selhistB_3 selhistB_4 selhistB_5
1 23 -5 19 4 35
2 1 -29 -100 111 -12
3 -18 -64 17 -33 -58
4 11 6 -34 -18 9
5 -23 33 38 -76 2
6 3 -30 10 38 -12
I have a simple dataset as attached. I see a clear outlier, (Qty=6), which should get corrected after processing it through tsclean.
c(6, 187, 323, 256, 289, 387, 335, 320, 362, 359, 426, 481,
356, 408, 497, 263, 330, 521, 406, 350, 478, 320, 339)
What I have:
library(forecast)
data1 <- read_csv("sample.csv", col_names = FALSE)
count_qty <-ts(data1, frequency = 12)
data1$clean_qty = tsclean(count_qty)
and the data returns
X1 clean_qty[,"X1"]
<dbl> <dbl>
1 6 6
2 187 187
3 323 323
4 256 256
5 289 289
6 387 387
7 335 335
8 320 320
9 362 362
10 359 359
# ... with 13 more rows
The first item should be removed.
You can remove outliers using boxplot:
vec1[! vec1 %in% boxplot(vec1, plot = F)$out]
# [1] 323 256 289 387 335 320 362 359 426 481 356 408 497 263 330 521 406 350 478 320 339
Note that 187 is also an outlier. As you said, 6 is the obvious one;
Sample datas :
> ind1
Ind Gb19a Gb19b Gb24a Gb24b Gb28a Gb28b Gb11a Gb11b
1 9-2-J1-N3 378 386 246 248 360 372 162 261
2 9-2-J1-N3 380 386 246 248 360 372 187 261
14 9-2-J1-N3 380 386 246 248 NA NA NA NA
15 9-2-J1-N3 NA 246 248 360 187 NA NA NA
16 9-2-J1-N3 380 386 380 386 378 386 380 386
17 9-2-J1-N3 380 386 246 248 360 372 187 261
19 9-2-J1-N3 360 372 360 372 360 372 360 372
20 9-2-J1-N3 187 261 187 261 162 261 187 261
21 9-2-J1-N3 380 386 240 246 360 372 187 NA
> class(ind1)
[1] "data.frame"
So I need to count, for every columns, how many values but the most common one there is. Expected output would be :
Gb19a 3
Gb19b 3
Gb24a 5
ect...
I have a solution given by folks here from a previous question I asked, (thanks to them) that explicitly do calculation for every variable, but I don't think it's a workable solution for my situation.
> table(ind1$Gb19a)
187 360 378 380
1 1 1 5
counts1 <- as.data.frame(table(ind1$Gb19a), stringsAsFactors = FALSE)
modal_value1 <- which.max(counts1$Freq)
(sum(counts1$Freq)-counts1$Freq[modal_value1])
[1] 3
How to apply this to entire data.frame ?
As always, thanx for any help !
You just say the word !
("How to apply this to entire data.frame?")
countValsButMostFreq <- function(values){
counts1 <- as.data.frame(table(values), stringsAsFactors = FALSE)
modal_value1 <- which.max(counts1$Freq)
return (sum(counts1$Freq)-counts1$Freq[modal_value1])
}
ind1 <- rbind.data.frame(
c('9-2-J1-N3', 378, 386, 246, 248, 360, 372, 162, 261),
c('9-2-J1-N3', 380, 386, 246, 248, 360, 372, 187, 261),
c('9-2-J1-N3', 380, 386, 246, 248, NA, NA, NA, NA),
c('9-2-J1-N3', NA, 246, 248, 360, 187, NA, NA, NA),
c('9-2-J1-N3', 380, 386, 380, 386, 378, 386, 380, 386),
c('9-2-J1-N3', 380, 386, 246, 248, 360, 372, 187, 261),
c('9-2-J1-N3', 360, 372, 360, 372, 360, 372, 360, 372),
c('9-2-J1-N3', 187, 261, 187, 261, 162, 261, 187, 261),
c('9-2-J1-N3', 380, 386, 240, 246, 360, 372, 187, NA))
colnames(ind1) <- c('Ind', 'Gb19a', 'Gb19b', 'Gb24a', 'Gb24b', 'Gb28a', 'Gb28b', 'Gb11a', 'Gb11b')
res <- apply(X=ind1,MARGIN=2,FUN=countValsButMostFreq)
res
Result:
Ind Gb19a Gb19b Gb24a Gb24b Gb28a Gb28b Gb11a Gb11b
0 3 3 5 5 3 2 3 2
Here is an example of doing this for mtcars:
as.data.frame(
lapply(mtcars,
function(x)unname(tail(sort(table(x)), 1))
)
)
mpg cyl disp hp drat wt qsec vs am gear carb
1 2 14 3 3 3 3 2 18 19 15 10
How does this work?
Set up a function to get the frequency count for a single column:
Use table to get your counts
Sort the results
Get the last value with tail
Use unname to drop the name
Then simply pass that to lapply and convert the results to a data.frame
You're looking for the apply family. I'd probably use sapply here but that's your choice.
ind1 <- read.table(text="Ind Gb19a Gb19b Gb24a Gb24b Gb28a Gb28b Gb11a Gb11b
1 9-2-J1-N3 378 386 246 248 360 372 162 261
2 9-2-J1-N3 380 386 246 248 360 372 187 261
14 9-2-J1-N3 380 386 246 248 NA NA NA NA
15 9-2-J1-N3 NA 246 248 360 187 NA NA NA
16 9-2-J1-N3 380 386 380 386 378 386 380 386
17 9-2-J1-N3 380 386 246 248 360 372 187 261
19 9-2-J1-N3 360 372 360 372 360 372 360 372
20 9-2-J1-N3 187 261 187 261 162 261 187 261
21 9-2-J1-N3 380 386 240 246 360 372 187 NA", header=TRUE)
hapax <- function(x) {x <- na.omit(x); length(setdiff(unique(x), x[duplicated(x)]))}
sapply(ind1, hapax)
apply(mymatrix, 2, myfunc) runs myfunc(onecolumn) on each column matrix, or data frame.
myfunc would be the code you posted to calculate the sum, except ind1$Gb19a is replaced with onecolumn.