I have a massive data file that I am breaking down into day blocks by person and then plotting events that occurred during the day and the duration of those events (either A, B or C)
Data is structured like below: t_z is the interval between rows, period is the event variable, this example is for one individual for one day ( actual data is xdays xpersons)
intervals <- c(0,5.1166667,6.2166667,3.5166667,0.06666667,3.0666667,6.3,
2.3833333,0.06666667,4.7,18.666667,17.383333,21.533333,
0.1,0.08333333,0.85)
period <- c("C", "B", "A", "B", "C", "B", "C", "B",
"C", "B", "C", "B", "C", "B", "C", "B")
i <- as.data.frame(intervals)
p <- as.data.frame(period)
d <- cbind(i,p)
Getting a bar plot is easy enough but it stacks all "periods" into blocks by day:
d$id<-1
e <- ggplot(d,aes(id))
e + geom_bar(aes(fill=period))
Simple aggregated stacked bar of time data:
However, I would like each "period" to be represented discretely and by its magnitude:
Periods as discrete stacked blocks example:
Thanks YBS but your method comes close but the size of the periods is not correct any ideas? The first C=5 is not the same size as the first A=5?
intervals <- c(5, 15, 5, 3,7,3,6, 2)
period <- c("C","B","A","B","C","B","C","B")
d <- data.frame(intervals,period)
colors=c("red","blue","green")
dc <- data.frame(period=unique(d$period),colors)
d2 <- d %>% mutate(nid = paste0(d$period,'_',row_number()))
d3 <- left_join(d2,dc, by="period")
d3$id<-1
e <- ggplot(d3,aes(x=id, y=intervals)) +
geom_col(aes(fill=nid))
e + scale_fill_manual(name='period', labels=d3$period, values=d3$colors )
The trick is to create a newid with all the discrete values, and then reverting back to initial period values via scale_fill_manual. You can use coord_flip() to make it horizontal and change the legend position as necessary. Perhaps this is the desired output.
intervals <- c(0, 5.1166667, 6.2166667, 3.5166667,0.6666667,3.0666667,6.3, 2.3833333)
#,0.06666667 , 4.7,18.666667,17.383333,21.533333, 0.1,0.08333333,0.85)
period <- c("C", "B", "A", "B", "C", "B", "C", "B")
# ,"C", "B", "C", "B", "C", "B", "C", "B")
d <- data.frame(intervals,period)
colors=c("red", "blue","green")
dc <- data.frame(period=unique(d$period),colors)
d2 <- d %>% mutate(nid = paste0(d$period,'_',row_number()))
d3 <- left_join(d2,dc, by="period")
d3$id<-1
e <- ggplot(d3,aes(x=id, y=intervals)) +
geom_col(aes(fill=nid))
e + scale_fill_manual(name='period', labels=d3$period, values=d3$colors )
Related
Short question:
I can substitute certain variable values like this:
values <- c("a", "b", "a", "b", "c", "a", "b")
df <- data.frame(values)
What's the easiest way to replace all the values of df$values by "x" (where the value is neither "a" or "b")?
Output should be:
c("a", "b", "a", "b", "x", "a", "b")
Your example is a bit unclear and not reproducible.
However, based on guessing what you actually want, I could suggest trying this option using the data.table package:
df[values %in% c("a", "b"), values := "x"]
or the dplyr package:
df %>% mutate(values = ifelse(values %in% c("a","b"), x, values))
What about:
df[!df[, 1] %in% c("a", "b"), ] <- "x"
values
1 a
2 b
3 a
4 b
5 x
6 a
7 b
I have a dataset like the one shown below
library(tidyverse)
dat <- data.frame(col.1 = 1:16,
col.2 = c("B", "B", "B", "B", "B", "B", "A", "B",
"A", "A", "B", "A", "A", "A", "A", "A"),
col.3 = c(30, 60, 75, 105, 40, 80, -20, 60, -20, -60, 40,
-40,-105,-20,-20,-45),
col.4 = c(39.34775, 31.66806, 28.57107, 28.43085, 29.30417, 36.21187,
40.29794, 40.70641, 65.85152, 66.85943, 69.26766, 67.24402,
74.85330, 79.17230, 78.75405, 64.47038))
dat
I'm trying to reach the final column which looks like this:
dat.2 <- dat %>%
mutate(col.Final = c(1180.43, 1900.08, 2142.83, 2985.24, 1172.17,
2896.95, -629.63, 2442.38, -655.37, -1966.11,
2770.71, -1460.48, -3833.76, -730.24, -730.24,
-1643.04))
So far, I have tried using mutate() function to reach this point.
dat.1 <- dat %>%
mutate(col.5 = col.3*col.4) %>%
mutate(col.6 = cumsum(col.3)) %>%
mutate(col.7 = if_else(col.2 == 'B', col.6, col.6 - col.3),
col.8 = col.3/col.7)
When I'm trying to reach the final column I'm not getting the same results.
dat.1 %>%
mutate(col.9 = if_else(col.2 == 'A', col.8*lag(cumsum(col.5)), col.5))
Note: This same calculation was done successfully using Excel's SUMIFS() function.
I'm Trying to get the same results with R instead.
I have seen some of the Q&A for similar posts but still stuck with the final calculation. In Excel, it felt as if iteration was performed for certain condition and then the next condition was executed. Though, not sure what was done using excel, I think, somehow this is possible using R as well. Just unable to figure out how to get that.
Any help would be appreciated at this point.
Update:
Values for col.5 and col.8 corresponding to col.2:
col.2 = c("B", "B", "B", "B", "B", "B", "A", "B",
"A", "A", "B", "A", "A", "A", "A", "A")
col.5 <- c(1180.4325, 1900.0836, 2142.8302, 2985.2393, 1172.1668,
2896.9496, -805.9588, 2442.3846, -1317.0304, -4011.5658,
2770.7064, -2689.7608, -7859.5965, -1583.4460, -1575.0810,
-2901.1671)
col.8 <-c(1.00000000, 0.66666667, 0.45454545, 0.38888889, 0.12903226,
0.20512821, -0.05128205, 0.13953488, -0.04651163, -0.14634146,
0.10256410,-0.10256410, -0.30000000, -0.08163265, -0.08888889,
-0.21951220)
Verifying values Using Hand Calculation!
Calculations using col.5 & col.8
for "B" from top :
1180.43 + 1900.08 + 2142.83 + 2985.24 + 1172.17 + 2896.95 = 12277.7020
for A after :
12277.7020 x -0.05128205 = -629.6266509 .... the 1st desired value for A
for "B" after:
12277.720 - 629.6266509 = 11648.07535
11648.07535 + 2442.3846 = 14090.45995
for "A" after:
14090.45995 x -0.04651163 = -655.37026 ... 2nd desired Value for A
for "A" after:
14090.45995 - 655.37026 = 13435.08969
13435.08969 x -0.14634146 = -1966.110641 ... 3rd desired value for A
and so on....
I hope this explains.
I am struggling with a recursive function, who's goal is to determine which raw materials belong to which product. I clouldn't figure out, how to handle multiple possible paths in data frame "db". The wanted function should give: A-B-C-E, A-B-C-F, A-B-D-F for db. My function works for "da". I added it to show what I am after, and it is a bit like bill of materials explosion, but not exactly.
da <- data.frame(parent = c("A", "B", "B", "C", "D"),
child = c("B", "C", "D", "E", "F"),
stringsAsFactors = FALSE)
db <- data.frame(parent = c("A", "B", "B", "C", "D", "C"),
child = c("B", "C", "D", "E", "F", "F"),
stringsAsFactors = FALSE)
my_path <- function(a, df) {
b <- df$parent[df$child == a]
if (length(b) == 0) {
return(a)
} else {
return(c(my_path(b, df), a))
}
}
end_points <- da$child[is.na(match(da$child, da$parent))]
lapply(end_points, function(x) my_path(x, da)) # -> ok
end_points <- db$child[is.na(match(db$child, db$parent))]
lapply(end_points, function(x) my_path(x, db)) # -> not ok
Thx & kind regards
This is a job for igraph:
#the data
db <- data.frame(parent = c("A", "B", "B", "C", "D", "C"),
child = c("B", "C", "D", "E", "F", "F"),
stringsAsFactors = FALSE)
#create a graph
library(igraph)
g <- graph_from_data_frame(db)
#plot the graph
plot(g)
#find all vertices that have no ingoing resp. outgoing edges
starts <- V(g)[degree(g, mode = "in") == 0]
finals <- V(g)[degree(g, mode = "out") == 0]
#find paths, you need to loop if starts is longer than 1
res <- all_simple_paths(g, from = starts[[1]], to = finals)
#[[1]]
#+ 4/6 vertices, named, from 4b85bd1:
#[1] A B C E
#
#[[2]]
#+ 4/6 vertices, named, from 4b85bd1:
#[1] A B C F
#
#[[3]]
#+ 4/6 vertices, named, from 4b85bd1:
#[1] A B D F
#coerce to vectors
lapply(res, as_ids)
Let's assume that we have following toy data:
library(tidyverse)
data <- tibble(
subject = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3),
id1 = c("a", "a", "b", "a", "a", "a", "b", "a", "a", "b"),
id2 = c("b", "c", "c", "b", "c", "d", "c", "b", "c", "c")
)
which represent network relationships for each subject. For example, there are three unique subjects in the data and the network for the first subject could be represented as sequence of relations:
a -- b, a --c, b -- c
The task is to compute centralities for each network. Using for loop this is straightforward:
library(igraph)
# Get unique subjects
subjects_uniq <- unique(data$subject)
# Compute centrality of nodes for each graph
for (i in 1:length(subjects_uniq)) {
current_data <- data %>% filter(subject == i) %>% select(-subject)
current_graph <- current_data %>% graph_from_data_frame(directed = FALSE)
centrality <- eigen_centrality(current_graph)$vector
}
Question: My dataset is huge so I wonder how to avoid explicit for loop. Should I use apply() and its modern cousins (maybe map() in the purrr package)? Any suggestions are greatly welcome.
Here is an option using map
library(tidyverse)
library(igraph)
map(subjects_uniq, ~data %>%
filter(subject == .x) %>%
select(-subject) %>%
graph_from_data_frame(directed = FALSE) %>%
{eigen_centrality(.)$vector})
#[[1]]
#a b c
#1 1 1
#[[2]]
# a b c d
#1.0000000 0.8546377 0.8546377 0.4608111
#[[3]]
#a b c
#1 1 1
I implemented the FR test here and now I would like to test it by means of visualizing the resulting minimum spanning trees in R. The vertices and edges should be plotted in a coordinate system.
Moreover I want to set the color for every dot (depending on to which sample it belongs) and express a possible third dimension through the size of the dots.
This is what I have got so far:
library(ggplot2)
nodes <- data.frame(cbind(c("A", "A", "A", "B", "B", "B"), c(1,2,3,8,2,1), c(6,3,1,4,5,6)))
edges <- data.frame(cbind(c("A", "A", "A"), c("A", "B", "B"), c(1,3,2), c(6,1,5), c(2,8,1), c(3,4,6)))
p <- ggplot() +
geom_point(nodes, aes(x=nodes[,2], y=nodes[,3])) +
geom_line(edges)
p
I also think igraph would be best here...
nodes <- data.frame(a=c("A", "A", "A", "B", "B", "B"), b=c(1,2,3,8,2,1),
d=c(6,3,1,4,5,6))
#cbind made your nodes characters so i have removed it here
edges <- data.frame(a=c("A", "A", "A"), b=c("A", "B", "B"), d=c(1,3,2),
e=c(6,1,5), f=c(2,8,1), g=c(3,4,6))
Here is an example using your data as above, to produce the colours colouring with the coordinate layout system coords
library(igraph)
from <- c(rep(edges[,3],3),rep(edges[,4],2),edges[,5])
to <- c(edges[,4],edges[,5],edges[,6],edges[,5],edges[,6],edges[,6])
myedges <- data.frame(from,to)
actors <- data.frame(acts=c(1,2,3,4,5,6,8))
colouring <- sample(colours(), 7)
sizes <- sample(15,7)
coords<-cbind(x=runif(7,0,1),y=runif(7,0,1))
myg <- graph.data.frame(myedges, vertices=actors, directed=FALSE)
V(myg)$colouring <- colouring
V(myg)$sizes <- sizes
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes,
layout=coords,edge.color="#55555533")
for plotting a spanning there are also many options, e.g.
d <- c(1,2,3)
E(myg)$colouring <- "#55555533"
E(myg, path=d)$colouring <- "red"
V(myg)[ d ]$colouring <- "red"
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes
,edge.width=3,layout=coords,edge.color=E(myg)$colouring )
with axes:
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes
,edge.width=3,layout=coords,edge.color=E(myg)$colouring, axes=TRUE )
and use rescale=FALSE to keep original axes scale