Rolling Sample standard deviation in R - r

I wanted to get the standard deviation of the 3 previous row of the data, the present row and the 3 rows after.
This is my attempt:
mutate(ming_STDDEV_SAMP = zoo::rollapply(ming_f, list(c(-3:3)), sd, fill = 0)) %>%
Result
ming_f
ming_STDDEV_SAMP
4.235279667
0.222740262
4.265353
0.463348209
4.350810667
0.442607461
3.864739333
0.375839159
3.935632333
0.213821765
3.802632333
0.243294783
3.718387667
0.051625808
4.288542333
0.242010836
4.134689
0.198929941
3.799883667
0.112733475
This is what I expected:
ming_f
ming_STDDEV_SAMP
4.235279667
0.225532646
4.265353
0.212776157
4.350810667
0.23658801
3.864739333
0.253399417
3.935632333
0.26144862
3.802632333
0.246259684
3.718387667
0.20514358
4.288542333
0.208578409
4.134689
0.208615874
3.799883667
0.233948429

It doesn't match your output exactly, but perhaps this is what you need:
zoo::rollapply(quux$ming_f, 7, FUN=sd, partial=TRUE)
(It also works replacing 7 with list(-3:3).)
This expression isn't really different from your sample code, but the output is correct. Perhaps your original frame has a group_by still applied?
Data
quux <- structure(list(ming_f = c(4.235279667, 4.265353, 4.350810667, 3.864739333, 3.935632333, 3.802632333, 3.718387667, 4.288542333, 4.134689, 3.799883667), ming_STDDEV_SAMP = c(0.225532646, 0.212776157, 0.23658801, 0.253399417, 0.26144862, 0.246259684, 0.20514358, 0.208578409, 0.208615874, 0.233948429)), class = "data.frame", row.names = c(NA, -10L))

Related

Separate several special rows based on row names by R

I have a table that I want to separate some rows by row name.
phi psi
A-MET-0 NA 158.0945
A-VAL-1 -144.914 137.4733
A-LEU-2 -95.021 149.7834
A-SER-3 -82.6826 166.6769
B-GLU-4 -62.5314 -36.2634
H-GLY-5 -59.6463 -42.0355
A-GLU-6 -68.6914 -40.2354
AB-TRP-7 -63.5297 -36.769
A-GLN-8 -64.8056 -38.5128
I will separate a specific row according to the following command
tor$tbl["A-LEU-2",]
But when I try to separate a few rows, it gets an error.
I use the following command to separate several rows.
tor$tbl["A-LEU-2":"AB-TRP-7", ]
How can I separate several rows from the table based on the row name?
Thanks for guiding me.
Here's a function to do that -
subset_by_rowname_range <- function(data, start_name, end_name) {
rn <- rownames(data)
data[match(start_name, rn) : match(end_name, rn), ]
}
subset_by_rowname_range(df, "A-LEU-2", "AB-TRP-7")
# phi psi
#A-LEU-2 -95.0210 149.7834
#A-SER-3 -82.6826 166.6769
#B-GLU-4 -62.5314 -36.2634
#H-GLY-5 -59.6463 -42.0355
#A-GLU-6 -68.6914 -40.2354
#AB-TRP-7 -63.5297 -36.7690
subset_by_rowname_range(df, "H-GLY-5", "A-GLN-8")
# phi psi
#H-GLY-5 -59.6463 -42.0355
#A-GLU-6 -68.6914 -40.2354
#AB-TRP-7 -63.5297 -36.7690
#A-GLN-8 -64.8056 -38.5128
data
df <- structure(list(phi = c(NA, -144.914, -95.021, -82.6826, -62.5314,
-59.6463, -68.6914, -63.5297, -64.8056), psi = c(158.0945, 137.4733,
149.7834, 166.6769, -36.2634, -42.0355, -40.2354, -36.769, -38.5128
)), class = "data.frame", row.names = c("A-MET-0", "A-VAL-1",
"A-LEU-2", "A-SER-3", "B-GLU-4", "H-GLY-5", "A-GLU-6", "AB-TRP-7",
"A-GLN-8"))
A simple solution is by inserting row numbering, which can be used as indices for slice:
df %>%
mutate(row = row_number()) %>%
slice(1:8)
Alternatively, this would work too: here, which grabs the row.names' indices, by which slice can filter:
df %>%
slice(which(grepl("A-LEU-2", row.names(df))):which(grepl("AB-TRP-7", row.names(df))))

How to select columns in R with colSums as condition?

I have tried to find common question, but any of the is just like this.
I'am trying to filter my data table with colSums. This means that if colSums gives certain amount(lets say under 5000) I want to include or exclude this certain column and I want to repeat this with loop or apply that it does this to whole data table. Basically this shouldn't be that hard, but I'm not sure what I'm doing wrong, maybe someone can help from here.
Below there is preperesation of my data and my code. I used dput function to reprepesent the data.
There are many different codes that i have tried, but none of them have worked. I thinks this is closest, but when I use code line from below, it gives me this type of warning message: "Error: expecting a one sided formula, a function, or a function name."
I have been using dplyr package, but others should be base functions.
> dput(data999[1:2, ])
KER000_349094 = c(0.1806,
0.1806), KER000_349085 = c(0.1832, 0.1832), KER000_351771 = c(0.1858,
0.1858), KER000_60103549 = c(0.1034, 0.1034), KER000_391452 = c(0.0016,
0.0016), KER000_345696 = c(0.1718, 0.1718), KER000_342793 = c(0.189230769230769,
0.189230769230769), KER000_345615 = c(0.0165384615384615,
0.0165384615384615), KER000_344065 = c(0.0592307692307692,
0.0592307692307692), KER000_353687 = c(0.188076923076923,
0.188076923076923), KER000_340589 = c(2.44, 2.44), KER000_346489 = c(0,
0), KER000_348357 = c(0.16, 0.16), KER000_363845 = c(3.135,
3.135), KER000_60029018 = c(0.115, 0.115), KER000_341255 = c(0,
0)), row.names = 1:2, class = "data.frame")
jeejee = apply(data999, 2, function(x) select_if(colSums(x <= 5000)))
Copying my comment, since it seems to be the answer.
data999[,colSums(data999)<=5000]
to select all columns whose sum is <= 5000.

Labelling Variables

I have a series of variables that fall under one related question: lets say there are 20 such variables in my dataframe, each one corresponds to an option on a MC question. They are titled popn1, popn2......popn20.
I want to label each variable by its option, as an example: (popn1 = Everyone; popn2=Children)
I'm using the labelVector package.
Is there a way I can do it without writing out each variable name? Ex. is there a paste function I can use, such as
df2 <- Set_label(df1,
(paste0(popn, 1:20) = "Everyone", "Children", .... "Youth"?)
This can be done in base R quite easily. Here's some sample data (using columns instead of 20, to make it easier to view)
popn1 popn2 popn3 popn4 popn5
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
It looks like you already have your new column names in a character vector:
your_column_names <- c("Everyone", "Youth", "Someone", "Something", "Somewhere")
Then you just use the setNames argument on the column names for your data:
colnames(data) <- setNames(your_column_names, colnames(data))
Everyone Youth Someone Something Somewhere
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
Sample Data:
data <- structure(list(popn1 = c(-0.408514139489243, 3.13789432899688,
2.40732780606037), popn2 = c(3.24071608151551, 2.51269963339946,
1.47561933493116), popn3 = c(2.73083728435832, 2.02154567048998,
2.44974180329751), popn4 = c(6.42872215439841, 3.3333709733048,
2.81744655980154), popn5 = c(8.0152099281755, 5.65440141443164,
6.29556905855252)), class = "data.frame", row.names = c(NA, -3L
))

Multiple Histograms On 1 page (without making long data)

I want to make a histogram for each column. Each Column has three values (Phase_1_Mean, Phase_2_Mean and Phase_3_Mean)
The output should be:
12 histograms (because we have 12 rows), and per histogram the 3 values showed in a bar (Y axis = value, X axis = Phase_1_Mean, Phase_2_Mean and Phase_3_Mean).
Stuck: When I search the internet, almost everyone is making a "long" data frame. That is not helpful with this example (because than we will generate a value "value". But I want to keep the three "rows" separated.
At the bottom you can find my data. Appreciated!
I tried this (How do I generate a histogram for each column of my table?), but here is the "long table" problem, after that I tried Multiple Plots on 1 page in R, that solved how we can plot multiple graphs on 1 page.
dput(Plots1)
structure(list(`0-0.5` = c(26.952381, 5.455598, 28.32947), `0.5-1` =
c(29.798635,
25.972696, 32.87372), `1-1.5` = c(32.922764, 41.95935, 41.73577
), `1.5-2` = c(31.844156, 69.883117, 52.25974), `2-2.5` = c(52.931034,
128.672414, 55.65517), `2.5-3` = c(40.7, 110.1, 63.1), `3-3.5` =
c(73.466667,
199.533333, 70.93333), `3.5-4` = c(38.428571, 258.571429, 95),
`4-4.5` = c(47.6, 166.5, 233.4), `4.5- 5` = c(60.846154,
371.730769, 74.61538), `5-5.5` = c(7.333333, 499.833333,
51), `5.5-6` = c(51.6, 325.4, 82.4), `6-6.5` = c(69, 411.5,
134)), class = "data.frame", .Names = c("0-0.5", "0.5-1",
"1-1.5", "1.5-2", "2-2.5", "2.5-3", "3-3.5", "3.5-4", "4-4.5",
"4.5- 5", "5-5.5", "5.5-6", "6-6.5"), row.names = c("Phase_1_Mean",
"Phase_2_Mean", "Phase_3_Mean"))
Something which is showed in this example (which didn't worked for me, because it is Python) https://www.google.com/search?rlz=1C1GCEA_enNL765NL765&biw=1366&bih=626&tbm=isch&sa=1&ei=Yqc8XOjMLZDUwQLp9KuYCA&q=multiple+histograms+r&oq=multiple+histograms+r&gs_l=img.3..0i19.4028.7585..7742...1.0..1.412.3355.0j19j1j0j1......0....1..gws-wiz-img.......0j0i67j0i30j0i5i30i19j0i8i30i19j0i5i30j0i8i30j0i30i19.j-1kDXNKZhI#imgrc=L0Lvbn1rplYaEM:
I think you have to reshape to long to make this work, but I don't see why this is a problem. I think this code achieves what you want. Note that there are 13 plots because you have 13 (not 12) columns in the dataframe you posted.
# Load libraries
library(reshape2)
library(ggplot2)
Plots1$ID <- rownames(Plots1) # Add an ID variable
Plots2 <- melt(Plots1) # melt to long format
ggplot(Plots2, aes(y = value, x = ID)) + geom_bar(stat = "identity") + facet_wrap(~variable)
Below is the resulting plot. I've kept it basic, but of course you can make it pretty by adding further layers.

Find zero crossing in R

If I have the following data:
df <- structure(list(x = c(1.63145539094563, 1.67548187017034, 1.71950834939504,
1.76353482861975, 1.80756130784445, 1.85158778706915, 1.89561426629386,
1.93964074551856, 1.98366722474327, 2.02769370396797, 2.07172018319267,
2.11574666241738, 2.15977314164208, 2.20379962086679, 2.24782610009149,
2.2918525793162, 2.3358790585409, 2.3799055377656, 2.42393201699031,
2.46795849621501, 2.51198497543972, 2.55601145466442, 2.60003793388912,
2.64406441311383, 2.68809089233853, 2.73211737156324, 2.77614385078794,
2.82017033001265, 2.86419680923735, 2.90822328846205, 2.95224976768676,
2.99627624691146, 3.04030272613617, 3.08432920536087, 3.12835568458557,
3.17238216381028, 3.21640864303498, 3.26043512225969, 3.30446160148439,
3.3484880807091, 3.3925145599338, 3.4365410391585, 3.48056751838321,
3.52459399760791, 3.56862047683262, 3.61264695605732, 3.65667343528202,
3.70069991450673, 3.74472639373143, 3.78875287295614), y = c(24.144973858154,
18.6408277478876, 21.9174270206615, 22.8017876727379, 20.9766270378248,
18.604384256745, 18.4805250429826, 15.8436744335752, 13.6357170277296,
11.6228806771368, 9.4065868126964, 6.81644596802601, 4.41187500831424,
4.31911614349431, 0.678259284890563, -1.18632719250877, -2.32986407762089,
-3.84480566043122, -5.24738510499144, -5.20160089844013, -5.42094587600499,
-5.39886757202858, -5.26753920575326, -4.68727963638973, -2.73267203102102,
0.296905237887623, 2.45725152489283, 5.12102449689086, 7.13986218237411,
10.2044876281093, 14.4358946463429, 19.0643081865458, 22.8920445618834,
26.7229418763085, 31.3776791707576, 36.19058349817, 41.2843224331918,
46.3396522631345, 51.4321502764393, 56.4080998038294, 61.5215778808583,
66.6845421308734, 71.3912749310486, 76.0856977880158, 80.7039319129457,
84.4095953723555, 88.0163019647757, 89.918078622734, 91.6341473685881,
94.0404562451352)), class = c("tbl_df", "tbl", "data.frame"), .Names = c("x",
"y"), row.names = c(NA, -50L))
Plot:
How do I find the exact x value when y == 0? I tried doing interpolation, but it does not necessarily give me a y value equals to zero. Does anyone know of a function to find zero crossings?
Firstly, one can define a corresponding (linearly) interpolated function with
approxfun(df$x, df$y)
where the result looks like
curve(approxfun(df$x, df$y)(x), min(df$x), max(df$x))
Those zero crossing then can be seen as the roots of this function. In base R there is a function uniroot, but it looks for a single root, while in your case we have two. Hence, one option would be the rootSolve package as in
library(rootSolve)
uniroot.all(approxfun(df$x, df$y), interval = range(df$x))
# [1] 2.263841 2.727803

Resources