I have the following data frame:
structure(list(C1 = c(1, 2, 2, 3, 4, 5, 5, 6), C2 = c(3.5, 3,
2.5, 2, 3, 2, 3, 5), C3 = c(6.5, 8, 9, 5, 7, 4, 3, 6)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
The first column is an index. The first observation is characterised by 1 point, the second by 2 points.
I need to make the intersection of all combinations of observations, one way. The result creates a new dataframe with a new index, with again some observations that are characterised by 2 rows/points: 1-2, 1-3, 1-4, 1-5, 1-6, 2-3, 2-4, 2-5, 2-6, 3-4, 3-5, 3-6, 4-5, 4-6, 5-6:
df2 = structure(list(C1 = c(1, 2, 3, 4, 4, 5, 6,7, 8, 8, 9, 10, 11, 11, 12, 13, 13, 14, 15, 15), C2 = c(3,2,3,3,2,3.5,2,3,2,3,3,2,3,2,2,2,3,3,2,3), C3 = c(6.5,5,6.5,3,4,6,5,7,4,3,6,5,3,4,5,4,3,6,4,3)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
where 3 in the first column is the new observation created by intersecting the 2 former.
I though I could use pmin in each row but it does not work. Can somenone tackle this?
I am not sure if the code is the thing you want, where cummin() is used
df2 <- cbind(df[1],cummin(df[-1]))
> df2
C1 C2 C3
1 1 3.5 6.5
2 2 3.0 6.5
3 2 2.5 6.5
4 3 2.0 5.0
5 4 2.0 5.0
6 5 2.0 4.0
7 5 2.0 3.0
8 6 2.0 3.0
DATA
df <- structure(list(C1 = c(1, 2, 2, 3, 4, 5, 5, 6), C2 = c(3.5, 3,
2.5, 2, 3, 2, 3, 5), C3 = c(6.5, 8, 9, 5, 7, 4, 3, 6)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
Related
I have a df like this
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5))
I want to create a new column (NEW) which says BLUE or RED based on columns b2 and b3. so, if column b2 is Greater than or equal to 100 0R b3 is Greater than or equal to 75, then input BLUE otherwise input RED.
So that I will have something like this:
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5),
NEW = c("BLUE", "BLUE", "BLUE", "BLUE", "RED", "RED", "RED", "RED", "BLUE"))
I have been able to work this out using this:
library (tidyverse)
greater_threshold <- 99.9
greater_threshold1 <- 74.9
my_df1 <- my_df %>%
mutate(NEW = case_when(b2 > greater_threshold ~ "BLUE",
b3 > greater_threshold1 ~ "BLUE",
+ T~"RED"))
At the moment, you can see that I am setting my 'greater threshold' to be slightly less than the required value. Although it works well. My question is this. Is there a way I set set my 'greater threshold to be ≥ 100 for b2 and ≥ 75 for b3.
For this example, I'd go whit if_else instead of case_when:
library(dplyr)
greater_threshold <- 100
greater_threshold1 <- 75
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5)
)
my_df1 <- my_df %>%
mutate(
NEW = if_else(
b2 >= greater_threshold | b3 >= greater_threshold1,
"BLUE",
"RED"
)
)
my_df1
# b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 NEW
# 1 2 100 75 NA 2 9 1 NA 4 14 BLUE
# 2 6 4 79 6 12 2 3 8 5 2 BLUE
# 3 3 106 8 NA 1 4 7 4 7 4 BLUE
# 4 6 102 0 10 7 6 7 5 9 2 BLUE
# 5 4 6 2 12 8 7 4 1 5 1 RED
# 6 2 6 3 8 5 6 2 4 1 1 RED
# 7 1 1 9 3 5 6 2 1 1 1 RED
# 8 9 1 5 6 6 7 9 3 2 1 RED
# 9 NA 7 80 2 NA 9 5 6 NA 5 BLUE
I am trying to sort my data in descending or ascending order regardless of the data in the rows. I made a dummy example below:
A <- c(9,9,5,4,6,3,2,NA)
B <- c(9,5,3,4,1,4,NA,NA)
C <- c(1,4,5,6,7,4,2,4)
base <- data.frame(A,B,C)
df <- base
df$A <- sort(df$A,na.last = T)
df$B <- sort(df$B,na.last = T)
df$C <- sort(df$C)
We get this
structure(list(A = c(2, 3, 3, 4, 4, 4, 5, 5, 6, 9, 9, NA), B = c(1,
2, 3, 4, 4, 4, 5, 5, 9, 10, NA, NA), C = c(1, 2, 3, 4, 4, 4,
5, 5, 6, 7, 8, 8)), row.names = c(NA, -12L), class = "data.frame")
I want to get something similar to df but my data have hundreds of columns, is there an easier way to do it?
I tried arrange_all() but the result is not what i want.
library(tidyverse)
test <- base%>%
arrange_all()
Obtaining this:
structure(list(A = c(2, 3, 3, 4, 4, 4, 5, 5, 6, 9, 9, NA), B = c(NA,
2, 4, 4, 5, 10, 3, 4, 1, 5, 9, NA), C = c(2, 3, 4, 6, 8, 5, 5,
8, 7, 4, 1, 4)), class = "data.frame", row.names = c(NA, -12L
))
You can sort each column individually :
library(dplyr)
base %>% mutate(across(.fns = sort, na.last = TRUE))
# A B C
#1 2 1 1
#2 3 3 2
#3 4 4 4
#4 5 4 4
#5 6 5 4
#6 9 9 5
#7 9 NA 6
#8 NA NA 7
Or in base R :
base[] <- lapply(base, sort, na.last = TRUE)
This question already has answers here:
How collect additional row data on binned data in R
(1 answer)
Group value in range r
(3 answers)
Closed 3 years ago.
I am doing a statistic analysis in a big data frame (more than 48.000.000 rows) in r. Here is an exemple of the data:
structure(list(herd = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), cows = c(1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16), `date` = c("11/03/2013",
"12/03/2013", "13/03/2013", "14/03/2013", "15/03/2013", "16/03/2013",
"13/05/2012", "14/05/2012", "15/05/2012", "16/05/2012", "17/05/2012",
"18/05/2012", "10/07/2016", "11/07/2016", "12/07/2016", "13/07/2016",
"11/03/2013", "12/03/2013", "13/03/2013", "14/03/2013", "15/03/2013",
"16/03/2013", "13/05/2012", "14/05/2012", "15/05/2012", "16/05/2012",
"17/05/2012", "18/05/2012", "10/07/2016", "11/07/2016", "12/07/2016",
"13/07/2016", "11/03/2013", "12/03/2013", "13/03/2013", "14/03/2013",
"15/03/2013", "16/03/2013", "13/05/2012", "14/05/2012", "15/05/2012",
"16/05/2012", "17/05/2012", "18/05/2012", "10/07/2016", "11/07/2016",
"12/07/2016", "13/07/2016"), glicose = c(240666, 23457789, 45688688,
679, 76564, 6574553, 78654, 546432, 76455643, 6876, 7645432,
876875, 98654, 453437, 98676, 9887554, 76543, 9775643, 986545,
240666, 23457789, 45688688, 679, 76564, 6574553, 78654, 546432,
76455643, 6876, 7645432, 876875, 98654, 453437, 98676, 9887554,
76543, 9775643, 986545, 240666, 23457789, 45688688, 679, 76564,
6574553, 78654, 546432, 76455643, 6876)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -48L))
I need to identify how many cows are in the following category of glicose by herd and by date:
<=100000
100000 and <=150000
150000 and <=200000
200000 and <=250000
250000 and <=400000
>400000
I tried to use the functions filter() and select() but could not categorize the variable like that.
I tried either to make a vector for each category but it did not work:
ht <- df %>% group_by(herd, date) %>%
filter(glicose < 100000)
Actually I do not have a clue of how I could do this. Please help!
I expect to get the number of cows in each category of each herd based on each date in a table like this:
Calling your data df,
df %>%
mutate(glicose_group = cut(glicose, breaks = c(0, seq(1e5, 2.5e5, by = 0.5e5), 4e5, Inf)),
date = as.Date(date, format = "%d/%m/%Y")) %>%
group_by(herd, date, glicose_group) %>%
count
# # A tibble: 48 x 4
# # Groups: herd, date, glicose_group [48]
# herd date glicose_group n
# <dbl> <date> <fct> <int>
# 1 1 2012-05-13 (0,1e+05] 1
# 2 1 2012-05-14 (4e+05,Inf] 1
# 3 1 2012-05-15 (4e+05,Inf] 1
# 4 1 2012-05-16 (0,1e+05] 1
# 5 1 2012-05-17 (4e+05,Inf] 1
# 6 1 2012-05-18 (4e+05,Inf] 1
# 7 1 2013-03-11 (2e+05,2.5e+05] 1
# 8 1 2013-03-12 (4e+05,Inf] 1
# 9 1 2013-03-13 (4e+05,Inf] 1
# 10 1 2013-03-14 (0,1e+05] 1
# # ... with 38 more rows
I also threw in a conversion to Date class, which is probably a good idea.
structure(list(a = c(NA, 3, 4, NA, 3, "Council" , "Council", 1), b = c("Council A", 3, 4,
"Council B", 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4,
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4,
2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA,
8L), class = "data.frame")
I am trying to convert objects in a using dplyr mutuate and case_when based on text in b . I want to convert values in a to Council if b contains Council in the string.
The code i've used is DF %>% select(a, b) %>% mutate(a =case_when(grepl("Council", b) ~"Council"))
However all values become NA in a if they do not contain the string Council. I've reviewed other posts and attempted various methods including ifelse. I want to maintain the same dataframe just make any NA values in a be converted to Council but only in the cases where it is NA values.
From ?case_when
If no cases match, NA is returned.
So for the cases when there is no "Council" word in b it returns NA.
You need to define the TRUE argument in case_when and assign it to a to keep the values unchanged when the condition is not met.
library(dplyr)
df %>%
mutate(a = case_when(grepl("Council", b) ~"Council",
TRUE ~ a))
# a b c d e f
#1 Council Council A 6 6 1 2
#2 3 3 3 2 2 3
#3 4 4 6 4 4 4
#4 Council Council B 5 5 5 2
#5 3 6 3 3 6 2
#6 Council 7 6 7 7 7
#7 Council 2 5 2 6 5
#8 1 6 3 6 3 2
In this case you could also achieve your result using base R
df$a[grepl("Council", df$b)] <- "Council"
You can also use str_detect from the package stringr to achieve your objective.
library(dplyr)
library(stringr)
df <- structure(list(a = c(NA, 3, 4, NA, 3, "Council" , "Council", 1), b = c("Council A", 3, 4,
"Council B", 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4,
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4,
2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA,
8L), class = "data.frame")
df %>%
mutate(a=ifelse(str_detect(b,fixed("council",ignore_case = T)) & is.na(a),"Council",a))
I'm trying to calculate the clusters of a network using igraph in R, where all nodes are connected. The plot seems to work OK, but then I'm not able to return the correct groupings from my clusters.
In this example, the plot shows 4 main clusters, but in the largest cluster, not all nodes are connected:
I would like to be able to return the following list of clusters from this graph object:
[[1]]
[1] 8 9
[[2]]
[1] 7 10
[[3]]
[1] 4 6 11
[[4]]
[1] 2 3 5
[[5]]
[1] 1 3 5 12
Example code:
library(igraph)
topology <- structure(list(N1 = c(1, 3, 5, 12, 2, 3, 5, 1, 2, 3, 5, 12, 4,
6, 11, 1, 2, 3, 5, 12, 4, 6, 11, 7, 10, 8, 9, 8, 9, 7, 10, 4,
6, 11, 1, 3, 5, 12), N2 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3,
3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
11, 11, 11, 12, 12, 12, 12)), .Names = c("N1", "N2"), row.names = c(NA,
-38L), class = "data.frame")
g2 <- graph.data.frame(topology, directed=FALSE)
g3 <- simplify(g2)
plot(g3)
The cliques function gets me part of the way there:
tmp <- cliques(g3)
tmp
but, this list also gives groupings where not all nodes connect. For example, this clique includes the nodes 1,2,3,5 but 1 only connects to 3, and 2 only connects to 3 and 5, and 5 only connects to 2 :
topology[tmp[[31]],]
# N1 N2
#6 3 2
#7 5 2
#8 1 3
Thanks in advance for any help.
You could use maximal.cliques in the igraph package. See below.
# Load package
library(igraph)
# Load data
topology <- structure(list(N1 = c(1, 3, 5, 12, 2, 3, 5, 1, 2, 3, 5, 12, 4,
6, 11, 1, 2, 3, 5, 12, 4, 6, 11, 7, 10, 8, 9, 8, 9, 7, 10, 4,
6, 11, 1, 3, 5, 12), N2 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3,
3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
11, 11, 11, 12, 12, 12, 12)), .Names = c("N1", "N2"), row.names = c(NA,
-38L), class = "data.frame")
# Get rid of loops and ensure right naming of vertices
g3 <- simplify(graph.data.frame(topology[order(topology[[1]]),],directed = FALSE))
# Plot graph
plot(g3)
# Calcuate the maximal cliques
maximal.cliques(g3)
# > maximal.cliques(g3)
# [[1]]
# [1] 9 8
#
# [[2]]
# [1] 10 7
#
# [[3]]
# [1] 2 3 5
#
# [[4]]
# [1] 6 4 11
#
# [[5]]
# [1] 12 1 5 3