I have a dataset with multiple stations, depths and concentration. I am trying to find the difference in depth (or the thickness) based on where the minimum concentration increases by 0.1
For example: At station 1, the maximum depth is 14m. There is a conc of 0.1 at 4m and it increases to 0.2 at 6m. But then it goes down again to 0.1 at 10m and stays that way till 12m before it increases. It increases only by 0.05 at 13m. At 14m, the concentration is increased by 0.1. So 14m is the deepest (or maximum depth) where the lowest conc is found. I need to find a way to fix my code to find that 14... (i.e. where concentration increases by 0.1). I can find the max depth for a given station and the minimum concentration.
This code gives me a column with maximum depth for each station (max_depth) and another column on what the minimum concentration is for each station (min_conc).
How do I find at what depth does the lowest concentration increase by 0.1?
Im trying to use 'which' max and min but I can't figure out the code.. How to use Dplyr's Summarize and which() to lookup min/max values
station <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4)
depth <- c(1, 2, 3, 6, 8, 9, 10, 11, 12, 13, 14, 1, 3, 4, 6, 8, 10, 11, 14, 1, 2, 4, 6, 8, 9, 10, 15, 18, 20, 1, 2, 4, 6, 8, 10, 11)
conc <- c(0.4, 0.4, 0.3, 0.1, 0.2, NA, 0.2, 0.1, 0.1, 0.1, 0.15, 0.2, 0.5, 0.4, 0.3, 0.6, 0.4, 0.2, 0.1, 0.2, 0.3, 0.2, 0.5, 0.5, 0.3, 0.2, 0.1, 0.2, 0.2, 0.2, 0.8, 0.6, 0.4, 0.3, 0.2, 0.3, 0.3)
df <- cbind(station, depth, conc)
(df <- as.data.frame(df))
(depth <- df %>%
group_by(station) %>%
summarize(
Max_depth=miss(max(depth)),
min_conc=miss(min(conc, na.rm=TRUE)),
press_depth = depth[tail(which(conc == min(conc, na.rm = TRUE)), 1)]))
when I try this instead:
press_depth = depth[tail(which(conc == min(conc > 0.1, na.rm = TRUE)), 1)])
I get an error: Column press_depth must be length 1 (a summary value), not 0
I'm not sure I understood completely what you are asking, but hopefully this can help you start. If it's different from what you have in mind, let me know:
library(dplyr)
df <- data_frame(
station = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4),
depth = c(1, 2, 4, 6, 8, 9, 10, 11, 12, 13, 14, 1, 3, 4, 6, 8, 10, 11, 14, 1, 2, 4, 6, 8, 9, 10, 15, 18, 20, 1, 2, 4, 6, 8, 10, 11),
conc = c(0.4, 0.6, 0.3, 0.2, 0.2, NA, 0.2, 0.2, 0.2, 0.1, 0.2, 0.5, 0.4, 0.3, 0.6, 0.4, 0.2, 0.1, 0.2, 0.3, 0.2, 0.5, 0.5, 0.3, 0.2, 0.1, 0.2, 0.2, 0.2, 0.8, 0.6, 0.4, 0.3, 0.2, 0.3, 0.3)
)
df %>%
group_by(station) %>%
mutate(conc_diff = lead(conc) - conc,
dept_diff = lead(depth) - depth) %>%
filter(conc_diff == .1, conc == min(conc, na.rm = TRUE)) %>%
filter(depth == max(depth))
#> # A tibble: 3 x 5
#> # Groups: station [3]
#> station depth conc conc_diff dept_diff
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 13 0.1 0.1 1
#> 2 2 11 0.1 0.1 3
#> 3 3 10 0.1 0.1 5
Created on 2020-06-17 by the reprex package (v0.3.0)
Related
I have a dataframe. Suppose it has 15 rows.
I want to assign a vector with length 12 to the rows 1 to 12 of one of its columns. How to do that?
month <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
DF <- c(0.2, 0.3, 0.5, 0.9, 0.8, 0.7, 1.2, 1.4, 1.5, 0.6, 0.4, 1, 0.4, 0.8, 1.3)
b <- c(1, 2, 1, 4, 1, 1, 1, 1, 3, 1, 1, 1)
df <- data.frame(DF)
df['cum_pd'] <- NA
df['marg_pd'] <- NA
rownames(df) <- month
df[1:12, "cum_pd"] <- b # This part is my question
Pretty contrived but this will work for arbitrary lengths of df and b assuming you're always setting the first n rows where n = len(b).
df$cum_pd <- c(b, rep(NA, nrow(df) - length(b)))
That said, this type of data editing is dubious code at best.
I have a random directed weighted graph gg, it has the next structure:
gg <-
structure(list(10, TRUE, c(0, 0, 1, 2, 2, 5, 5, 6, 6, 6, 6, 9,
9, 9, 9, 9), c(6, 9, 3, 0, 5, 3, 7, 1, 3, 5, 8, 2, 4, 6, 7, 8
), c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), c(3,
7, 11, 2, 5, 8, 12, 4, 9, 0, 13, 6, 14, 10, 15, 1), c(0, 2, 3,
5, 5, 5, 7, 11, 11, 11, 16), c(0, 1, 2, 3, 6, 7, 9, 11, 13, 15,
16), list(c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("C", "D", "I", "J", "K", "N", "O",
"Q", "S", "T"), color = c("yellow", "red", "red", "red",
"red", "red", "green", "red", "red", "green")), .Names = c("name",
"color")), structure(list(weight = c(0.5, 0.5, 1, 0.333333333333333,
0.333333333333333, 0.333333333333333, 0.333333333333333,
0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2)), .Names = "weight")),
<environment>), class = "igraph")
I need to find all walks from the root (yellow node) to leaves (red nodes). Leaves defined by (a) edge direction and (b) the distance -- from the root to the leave should be two edges only.
In my case, the root is C and leaves should be D, J, N, S, I, K, Q.
I tried to define the (a) condition only.
root <- "C"
leaves = which(degree(gg, v = V(gg), mode = "out")==0, useNames = T)
leaves
# J K Q S
# 4 5 8 9
plot(gg, layout = layout.reingold.tilford(gg, root=root),
edge.arrow.size=0.2, edge.curved=T,
edge.label = round(E(gg)$weight,2))
Question. How to define the (b) condition and add to leaves set D, N, I, K nodes?
Here's one way to do it: use shortest_paths to get all the vertices that are exactly two edges from the root node.
two.edges.from.root = unlist(sapply(shortest_paths(gg,
from = as.numeric(V(gg)["C"]),
mode = "out")$vpath,
function(x) { if(length(x) == 3) { x[3] } }))
I have data that looks like:
year mean.streak
1958 2.142857
1959 3.066667
1960 2.166667
1961 2.190476
The code for my plot with localized regression looks like:
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
geom_smooth(method = 'loess')
and outputs:
I'd like to capture the somewhat sinusoidal pattern of the data by passing a smooth line through all of the data points, rather than the typical jagged geom_line. I tried polynomial interpolation by writing:
ggplot(df)+
geom_point(aes(x = year, y = mean.streak, colour = year), size = 3) +
stat_smooth(aes(x = year, y = mean.streak), method = "lm",
formula = y ~ poly(x, 57), se = FALSE)
Taken from this thread. But I get the error:
Warning message:
Computation failed in `stat_smooth()`:
'degree' must be less than number of unique points
seemingly because there are too many datapoints, as this answer seems to indicate.
Is there a way to pass a smooth line through all the data with 59 data points?
Full data is:
structure(list(year = 1958:2016, mean.streak = c(2.14285714285714,
3.06666666666667, 2.16666666666667, 2.19047619047619, 2.35, 2.42857142857143,
2.28571428571429, 1.92592592592593, 1.69230769230769, 2.61111111111111,
3, 2.94117647058824, 2.2, 2.5, 2.13636363636364, 1.76923076923077,
1.36111111111111, 1.41176470588235, 1.76, 2, 2.63157894736842,
2.08695652173913, 2.86666666666667, 2.125, 3, 3.125, 2.57894736842105,
1.84, 1.46666666666667, 1.7037037037037, 1.625, 1.67741935483871,
1.84, 1.6, 3, 3.11111111111111, 3.66666666666667, 4.18181818181818,
2.85714285714286, 3.66666666666667, 2.66666666666667, 2.92857142857143,
3.1875, 2.76923076923077, 5.375, 5.18181818181818, 4.08333333333333,
6.85714285714286, 2.77777777777778, 2.76470588235294, 3.15384615384615,
3.83333333333333, 3.06666666666667, 3.07692307692308, 4.41666666666667,
4.9, 5.22222222222222, 5, 5.27272727272727), median.streak = c(1,
3, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2,
2, 3, 2, 2, 2.5, 2, 2, 1, 1, 1, 1, 1, 1, 1.5, 2, 4, 4, 1, 3,
2, 2.5, 2, 2, 5.5, 4, 2.5, 9, 2, 2, 2, 1.5, 2, 3, 2.5, 4.5, 4,
5, 4), max.streak = c(6, 6, 9, 7, 5, 5, 7, 4, 3, 7, 9, 7, 6,
6, 6, 4, 3, 4, 4, 10, 8, 6, 6, 5, 10, 8, 5, 6, 3, 4, 4, 4, 4,
5, 8, 8, 11, 8, 8, 11, 10, 5, 12, 7, 10, 12, 12, 10, 7, 10, 10,
14, 9, 7, 9, 12, 10, 14, 12), mean.std = c(-0.73762950487994,
-0.480997734887942, -0.517355702126398, -0.387678832192802, -0.315808940316265,
-0.455313725347534, -0.520453518496716, -0.598412265824216, -0.523171795723798,
-0.62285788065637, -0.54170040191883, -0.590289727314622, -0.468222025966258,
-0.639180735884434, -0.656427002478427, -0.565745564840106, -0.473399411312895,
-0.564475310127763, -0.493531273810312, -0.543209721496256, -0.640240670332106,
-0.510337503791441, -0.596096374402028, -0.504696265560619, -0.620412635042488,
-0.497008319856979, -0.546623513153538, -0.613345407826292, -0.564945850817486,
-0.581770706442245, -0.5709080560492, -0.627986564445679, -0.680973485641403,
-0.548092447365696, -0.554620596559388, -0.483847268000936, -0.67619820292833,
-0.613245144944101, -0.509832316970819, -0.302654541906113, -0.623276311320811,
-0.431421947082012, -0.525548788393688, -0.244995094473986, -0.412444188256097,
-0.112114155982405, -0.299486359079708, -0.300201791042539, -0.240281366191648,
-0.359719754440627, -0.511417389357902, -0.474906675611613, -0.312106332395495,
-0.449137693833681, -0.526248555772371, -0.56052848268042, -0.390017880007091,
-0.537267264953157, -0.444528236868953)), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("year", "mean.streak", "median.streak",
"max.streak", "mean.std"), row.names = c(NA, -59L))
Adjust the span:
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
stat_smooth(method = 'loess', span = 0.3)
Or use a spline:
library(splines)
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
stat_smooth(method = 'lm', formula = y ~ ns(x, 10))
Generally, you don't want to fit an extremely high-degree polynomial. Such fits look awful. It would be much better to fit an actual time series model to your data:
library(forecast)
library(zoo)
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
geom_line(data = data.frame(year = sort(streaks$year),
mean.streak = fitted(auto.arima(zoo(streaks$mean.streak,
order.by = streaks$year)))),
show.legend = FALSE)
I'm a beginneR using R Studio with R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" in Windows 7.
Data I'm using...
> dput(head(data,20))
structure(list(case = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), age = c(37, 42, 44, 40, 26, 29, 42, 26,
18, 56, 29, 66, 71, 26, 30, 48, 39, 65, 65, 48), bmi = c(25.95,
29.07, 27.63, 27.4, 25.34, 31.38, 25.08, 28.01, 24.69, 25.06,
27.68, 23.51, 29.86, 21.72, 25.95, 22.86, 23.53, 21.3, 33.2,
29.39), ord.bmi = c(3, 3, 3, 3, 3, 4, 3, 3, 2, 3, 3, 2, 3, 2,
3, 2, 2, 2, 4, 3), alcohol = c(2, 2, 1, 1, 2, 1, 1, 1, 1, 1,
2, 1, 1, 1, 1, 1, 2, 2, 1, 1), tobacco = c(1, 1, 1, 2, 2, 1,
2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1), dent.amalgam = c(1,
2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1), exp.years = c(7,
9, 9, 5, 2, 10, 15, 5, 1, 40, 10, 50, 50, 1, 12, 22, 22, 30,
40, 30), mn = c(0, 0, 0, 1.5, 1.5, 1, 0, 0, 0.5, 0.5, 1, 1, 0,
0, 0, 0.5, 0, 0.5, 2, 1), bn = c(2.5, 5, 2.5, 2, 1.5, 4, 2, 1.5,
4.5, 4.5, 2.5, 2, 6, 2, 5, 4, 1, 1.5, 7, 1.5), ln = c(0.5, 1.5,
0, 2, 1.5, 1.5, 1, 0.5, 2, 2, 1, 1, 4.5, 0, 2, 1, 3, 2, 3, 3),
pn = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5,
0.5, 0.5, 0, 0), cc = c(0, 1, 0, 2, 2, 4, 1, 1.5, 4.5, 2,
0, 3.5, 2, 1.5, 2, 1.5, 0.5, 1, 2, 1.5), kr = c(0, 0, 0,
0, 0, 0, 0.5, 0, 0.5, 1, 0, 0.5, 1.5, 0.5, 0.5, 0.5, 0, 0.5,
0, 0), kl = c(0.5, 2, 0, 1.5, 1.5, 0, 2, 0, 2, 2, 0, 1.5,
1.5, 1, 4, 3, 2, 3.5, 4.5, 2)), .Names = c("case", "age",
"bmi", "ord.bmi", "alcohol", "tobacco", "dent.amalgam", "exp.years",
"mn", "bn", "ln", "pn", "cc", "kr", "kl"), row.names = c(NA,
20L), class = "data.frame")
I'm plotting two different densities (which I get using density.a <- lapply(data[which(data$case == 0),], density) and density.b <- lapply(data[which(data$case == 1),], density)), and everything seems to work fine:
plot.densities <- function(sample.a, sample.b){ # declaring the function arguments
for(i in seq(length(sample.a))){ # for every element in the first argument (expected equal lengths)
plot(range(sample.a[[i]]$x, sample.b[[i]]$x), # generate a plot
range(sample.a[[i]]$y, sample.b[[i]]$y),
xlab = names(sample.a[i]), ylab = "Density", main = paste(names(sample.a[i]), "density plot"))
lines(sample.a[[i]], col = "red") # red lines
lines(sample.b[[i]], col = "green") #green lines
}
}
When I call the function, I get plots like this:
Then, if I want to fill the line between the two curves, I add the polygon function and looks like this:
filled.plot <- function(sample.a, sample.b){ # declaring the function arguments
for(i in seq(length(sample.a))){ # for every element in the first argument (expected equal lengths)
plot(range(sample.a[[i]]$x, sample.b[[i]]$x), # generate a plot
range(sample.a[[i]]$y, sample.b[[i]]$y),
xlab = names(sample.a[i]), ylab = "Density",
main = paste(names(sample.a[i])))
lines(sample.a[[i]], col = "red") # red lines
lines(sample.b[[i]], col = "green") #green lines
polygon(x = c(range(sample.a[[i]]$x, sample.b[[i]]$x),
rev(range(sample.a[[i]]$x, sample.b[[i]]$x))),
y = c(range(sample.a[[i]]$y, sample.b[[i]]$y),
rev(range(sample.a[[i]]$x, sample.b[[i]]$x))),
col = "skyblue")
}
}
But when I call the filled.plot function, I get plots like this:
I'm stuck, and some help would be just fine!
Thanks in advance.
Try with ggplot (I have changed the case value of rows 11:20 to 2):
ggplot()+
geom_density(data=testdf[testdf$case==1,], aes(age),fill='red', alpha=0.5)+
geom_density(data=testdf[testdf$case==2,], aes(age), fill='green', alpha=0.5)
dput(head(P[,1:2],12))
structure(list(valoracion.c1 = list(c(0.75, 1, 1), c(0.75, 1,
1), c(0.75, 1, 1), c(0.75, 1, 1), c(0.75, 1, 1), c(0.5, 0.75,
1), c(0.75, 1, 1), c(0.75, 1, 1), c(0.5, 0.75, 1), c(0.75, 1,
1), c(0.5, 0.75, 1), c(0.75, 1, 1)), valoracion.c2 = list(c(0.75,
1, 1), c(0.75, 1, 1), c(0.5, 0.75, 1), c(0.75, 1, 1), c(0.75,
1, 1), c(0.75, 1, 1), c(0.5, 0.75, 1), c(0.75, 1, 1), c(0.25,
0.5, 0.75), c(0.25, 0.5, 0.75), c(0.75, 1, 1), c(0.5, 0.75, 1
))), .Names = c("valoracion.c1", "valoracion.c2"), row.names = c(NA,
12L), class = "data.frame")
I´d like to get the average for each column and preserve the data structure. I have tried
somthing like that
dat2 <- P[1,]
dat2[]<-(lapply(P,function(x) list(Reduce(mean,x))))
Show Traceback
Rerun with Debug
Error in mean.default(init, x[[i]]) :
'trim' must be numeric of length one
Could someone help me?
Assuming that the data you showed is similar to the one below:
dat2 <- dat1[1,]
dat2[] <- lapply(dat1, function(x) list(Reduce(`+`, x)))
dat2
# Col1 Col2 Col3
#1 18, 19, 23 20, 15, 25 22, 21, 17
head(dat1,3)
# Col1 Col2 Col3
#1 2, 1, 3 2, 1, 3 2, 3, 1
#2 2, 1, 3 3, 1, 2 1, 3, 2
#3 1, 2, 3 3, 1, 2 2, 3, 1
data
set.seed(45)
dat1 <- data.frame(Col1=I(lapply(1:10, function(i) sample(3))),
Col2=I(lapply(1:10, function(i) sample(3))),
Col3= I(lapply(1:10, function(i) sample(3))))