I know how to do this in excel, but am trying to translate into R and create a new column. In R I have a data frame called CleanData. I want to see how many times the value in each row of column A shows up in all of column B. In excel it would read like this:
=COUNTIF(B:B,A2)>0,C="Purple")
The second portion would be a next if / and statement. It would look like this in excel:
=IF(AND(COUNTIF(B:B,A2)>0,C="Purple"),"Yes", "No")
Anyone know where to start?
I have tried mutating and also this:
sum(CleanData$colA == CleanData$colB)
and am getting no values
You don't need any extra packages, here is a solution with the base R function ifelse, which is a frequently very useful function you should learn. An example:
set.seed(7*11*13)
DF <- data.frame(cond=rnorm(100), X= sample(c("Yes","No"), 100, replace=TRUE))
with(DF, sum(ifelse( (cond>0)&(X=="Yes"), 1, 0)))
I think this will capture your if/countif scenario:
library(dplyr)
CleanData %>%
mutate(YesOrNo = case_when(Color != "Purple" ~ "No", is.na(LABEL1) | !nzchar(LABEL1) ~ "No", !LABEL1 %in% LABEL2 ~ "No", TRUE ~ "Yes"))
# LABEL1 LABEL2 Color YesOrNo
# 1 HELLO <NA> Purple Yes
# 2 <NA> HELLO!!! Blue No
# 3 HELLO$$ <NA> Purple Yes
# 4 <NA> HELLO Blue No
# 5 HELLOOO <NA> Purple Yes
# 6 <NA> <NA> Purple No
# 7 <NA> HELLOOO Blue No
# 8 <NA> HELLO$$ Blue No
# 9 <NA> HELLO Yellow No
Data
CleanData <- structure(list(LABEL1 = c("HELLO", NA, "HELLO$$", NA, "HELLOOO", NA, NA, NA, NA), LABEL2 = c(NA, "HELLO!!!", NA, "HELLO", NA, NA, "HELLOOO", "HELLO$$", "HELLO"), Color = c("Purple", "Blue", "Purple", "Blue", "Purple", "Purple", "Blue", "Blue", "Yellow")), class = "data.frame", row.names = c(NA, -9L))
or programmatically,
CleanData <- data.frame(LABEL1=c("HELLO",NA,"HELLO$$",NA,"HELLOOO",NA,NA,NA,NA), LABEL2=c(NA,"HELLO!!!",NA,"HELLO",NA,NA,"HELLOOO","HELLO$$","HELLO"),Color=c("Purple","Blue","Purple","Blue","Purple","Purple","Blue","Blue","Yellow"))
Related
I am currently working with a data frame that looks like this:
Example <- structure(list(ID = c(12301L, 12301L, 15271L, 11888L, 15271L,
15271L, 15271L), StationOwner = c("Brian", "Brian", "Simon",
"Brian", "Simon", "Simon", "Simon"), StationName = c("Red", "Red",
"Red", "Green", "Yellow", "Yellow", "Yellow"), Parameter = c("Rain - Daily",
"Temperature -Daily", "VPD - Daily", "Rain - Daily", "Rain - Daily",
"Temperature -Daily", "VPD - Daily")), class = "data.frame", row.names = c(NA,
-7L))
I am looking into using str_detect to filter for example all the observation that start with “Rain –“ and adding what comes after under a new column called "Rain". I have been able to filter out only the values that start with “Rain” using str_detect but have not found a way to assign them automatically. Is there a specific function that would help with this? Appreciate the pointers, thanks!
Example of desired output that I am trying to achieve:
Desired <- structure(list(ID = c(12301L, 15271L, 12301L, 15271L
), StationOwner = c("Brian", "Simon", "Brian", "Simon"), StationName = c("Red",
"Red", "Green", "Yellow"), Rain = c("Daily", NA, "Daily", "Daily"
), Temperature = c("Daily", NA, NA, "Daily"), VDP = c(NA, "Daily",
NA, "Daily")), class = "data.frame", row.names = c(NA, -4L))
Directly using pivot_wider:
pivot_wider(Example, names_from = Parameter, values_from = Parameter,
names_repair = ~str_remove(.,' .*'),values_fn = ~str_remove(.,'.*- ?'))
# A tibble: 4 x 6
ID StationOwner StationName Rain Temperature VPD
<int> <chr> <chr> <chr> <chr> <chr>
1 12301 Brian Red Daily Daily NA
2 15271 Simon Red NA NA Daily
3 11888 Brian Green Daily NA NA
4 15271 Simon Yellow Daily Daily Daily
It's not using str_detectbut can achive Desired by
library(dplyr)
Example %>%
separate(Parameter, c('a', 'b'), sep = "-") %>%
mutate(across(where(is.character), ~trimws(.x))) %>%
pivot_wider(id_cols = c("ID","StationOwner", "StationName"), names_from = "a", values_from = "b")
ID StationOwner StationName Rain Temperature VPD
<int> <chr> <chr> <chr> <chr> <chr>
1 12301 Brian Red Daily Daily NA
2 15271 Simon Red NA NA Daily
3 11888 Brian Green Daily NA NA
4 15271 Simon Yellow Daily Daily Daily
I have a dataframe like this one:
Name Characteristic_1 Characteristic_2
Apple Yellow Italian
Pear British Yellow
Strawberries French Red
Blackberry Blue Austrian
As you can see the Characteristic can be in different Columns depending in the row. I would like to obtain a dataframe where each column contains only the values of a specific Characteristic.
Name Characteristic_1 Characteristic_2
Apple Yellow Italian
Pear Yellow British
Strawberries Red French
Blackberry Blue Austrian
My idea is to use the case_when function but I would like to know if there are Faster ways to achieve the same result.
Example data:
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
I suspect there is an easier way of solving the issue, but here is one potential solution:
# Load the libraries
library(tidyverse)
# Load the data
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
# R has 657 built in colour names. You can see them using the `colours()` function.
# Chances are your colours are contained in this list.
# The `str_to_title()` function capitalizes every colour in the list
list_of_colours <- str_to_title(colours())
# If your colours are not contained in the list, add them using e.g.
# `list_of_colours <- c(list_of_colours, "Octarine")`
# Create a new dataframe ("df2") by taking the original dataframe ("df")
df2 <- df %>%
# Create two new columns called "Colour" and "Origin" using `mutate()` with
# `ifelse` used to identify whether each word is in the list of colours.
# If the word is in the list of colours, add it to the "Colours" column, if
# it isn't, add it to the "Origin" column.
mutate(Colour = ifelse(!is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
Characteristic_1, Characteristic_2),
Origin = ifelse(is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
Characteristic_1, Characteristic_2)) %>%
# Then select the columns you want
select(Name, Colour, Origin)
df2
# A tibble: 4 x 3
# Name Colour Origin
# <chr> <chr> <chr>
#1 Apple Yellow Italian
#2 Pear Yellow British
#3 Strawberries Red French
#4 Blackberry Blue Austrian
I think there is also a better way of achieving this but for now this is the one solution that came to my mind:
library(dplyr)
library(stringr)
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
df %>%
mutate(char_1 = if_else(str_to_lower(Characteristic_1) %in% colours(distinct = TRUE),
Characteristic_1, Characteristic_2),
char_2 = if_else(Characteristic_1 == char_1, Characteristic_2, Characteristic_1)) %>%
select(-c(Characteristic_1, Characteristic_2))
# A tibble: 4 x 3
Name char_1 char_2
<chr> <chr> <chr>
1 Apple Yellow Italian
2 Pear Yellow British
3 Strawberries Red French
4 Blackberry Blue Austrian
I want to get diagram similar to picture below, but code I use creates different diagram. With rbind I added some hierarchy to a diagram. In data frame col0 there is a string with names of animals. In col1 string is split into individual animals & col2 is adding latin name for a animal. col1 data are always changing and in col2 data constant (there always be feline or canis names in that column).
library(igraph)
# I create my dataframe with animals
df <- data.frame(col0 = c("Cat Dog Wolf", "Cat Dog Wolf", "Cat Dog Wolf"),
col1 = c( "Cat", "Dog", "Wolf"),
col2 = c( "Feline", "Canis", "Canis2"))
# Add extra lines for hierarchy
# These lines work with current graph for a new one these should be replace or deleted
df <-rbind(df, data.frame(col0 = "Cat Dog Wolf", col1 = "Feline", col2 ="Animal"))
df <-rbind(df, data.frame(col0 = "Cat Dog Wolf", col1 = "Canis", col2 = "Animal"))
df <-rbind(df, data.frame(col0 = "Cat Dog Wolf", col1 = "Canis2", col2 = "Canis"))
##########
df <-df[c('col2', 'col1')]
names(df) <-c('from', 'to')
abc <-union(df$to, df$from)
###########
g <-graph.data.frame(df, directed = TRUE, vertices = abc)
plot(g, vertex.size = 20, vertex.label.dist = 0.5, vertex.color = c("blue",
"red", "green", "white", "orange" ),
edge.arrow.size = 0.5, layout = layout.reingold.tilford(g))
This is the graph that the above code outputs, but it's not quite what I want:
I want a similar diagram to what's shown below:
I think that I understand what you want, but I will restate the problem
so that you can confirm whether or not I understood. I think that what
you want to do is this:
Find all of the leaves in the tree, i.e. the nodes with no descendants.
Each leaf will have one parent. Rename the parent with the name of the
leaf, then delete the leaf from the graph. The following code implements that.
## Assume that we have created the graph g using your code
g2 = g # Keep original graph intact
SourceNodes = sapply(strsplit(attr(E(g2), "vnames"), "\\|"), "[", 1)
DestNodes = sapply(strsplit(attr(E(g2), "vnames"), "\\|"), "[", 2)
## Leaf nodes are nodes that are destinations, but not sources
## Also need the node numbers for later deletion
(LeafNodes = DestNodes[which(!(DestNodes%in% SourceNodes ))])
[1] "Cat" "Dog" "Wolf"
(LeafNumbers = match(LeafNodes, attr(V(g), "name")))
[1] 1 2 3
## Find the parents of the leaves
(UpOne = SourceNodes[match(LeafNodes, DestNodes)])
[1] "Feline" "Canis" "Canis2"
## Rename the UpOne nodes (parents of leaves)
vertex_attr(g2)$name[match(UpOne, vertex_attr(g2)$name)] = LeafNodes
## Now delete the leaf nodes and plot
g2 = delete_vertices(g2, LeafNumbers)
plot(g2, vertex.size = 20, vertex.label.dist = 0.5,
vertex.color = c("red", "green", "white", "orange" ),
edge.arrow.size = 0.5, layout = layout.reingold.tilford(g2))
Result
There is dataset with code below. And I need get a graph like in the picture, without changing frame. I tried use rbind to add more hierarchy to data frame in favor to get diagram like in picture. col0 and col1 data is changing debending on data while col2 remains always the same.
df <- data.frame(col0 = c("Cat Dog Wolf", "Cat Dog Wolf", "Cat Dog Wolf"),
col1 = c( "Cat", "Dog", "Wolf"),
col2 = c( "Feline", "Canis", "Canis2"))
df <-rbind(df, data.frame(col0="Cat Dog Wolf", col1 = "Canis2", col2 = "Canis"))
df <-df[c('col1', 'col2')]
names(df) <-c('from', 'to')
abc <-union(df$to, df$from)
g <-graph.data.frame(df, directed = TRUE, vertices = abc)
plot(g, vertex.size = 20, vertex.label.dist = 0.5, vertex.color = "blue",
edge.arrow.size = 0.5, layout = layout.reingold.tilford(g))
You need three edges taken from only two columns ("From" and "To"). But you have three columns in df so you have to choose from them. I created a new column with the names from col1 and col2 pasted together. Then, I chose the first two vertex from the top and rbind the third one.
df <- data.frame(col0 = "Cat Dog Wolf",
col1 = c( "Cat", "Dog", "Wolf"),
col2 = c( "Feline", "Canis", "Canis2"))
df$col1_2 <- paste(df$col2,df$col1)
df <- rbind(df[1:2,c(1,4)],data.frame(col0=df[2,4],col1_2=df[3,4]))
names(df) <-c('from', 'to')
abc <-union(df$to, df$from)
g <-graph.data.frame(df, directed = TRUE, vertices = abc)
plot(g, vertex.size = 20, vertex.label.dist = 0.5, vertex.color = c("lightblue","red","green","white"),
edge.arrow.size = 0.5, layout = layout.reingold.tilford(g))
maybe this is a very simple question, but I cannot figure out what is wrong with my short code.
This is my (very simple) data frame:
structure(list(sample = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 3L,
3L, 3L), .Label = c("a", "b", "c"), class = "factor"), value = c(0.1446689595,
0.9151456018, 0.880888083, 0.005522657, 0.7079621046, 0.4770259836,
0.6960717649, 0.5892328324, 0.1134234308), new = c("red", "red",
"red", "red", "red", "red", "red", "red", "red")), .Names = c("sample",
"value", "new"), row.names = c(NA, -9L), class = "data.frame")
what I would like to do is add a new column where the new values depend on the values of the first column. In other and simpler words:
if (df1$sample != "a") {
df1$new <- "green"
} else {
df1$new <- "red"
}
but R returns an error:
In if (df1$sample != "a") { :
the condition has length > 1 and only the first element will be used
I also tried with an elseif statement:
ifelse(df1$sample != "a", df1$new <- "green", df1$new <- "red")
but it this case the new column contains only "red" and no "green".
Am I missing something?
Thanks!
You could try
df1$new <- c('green', 'red')[(df1$sample=='a')+1L]
df1
# sample value new
#1 a 0.144668959 red
#2 b 0.915145602 green
#3 a 0.880888083 red
#4 a 0.005522657 red
#5 a 0.707962105 red
#6 b 0.477025984 green
#7 c 0.696071765 green
#8 c 0.589232832 green
#9 c 0.113423431 green
ifelse should work fine - you just need to assign it
df1$new1 <- ifelse(df1$sample != "a", df1$new1 <- "green", df1$new1 <- "red")
sample value new new1
1 a 0.144668959 red red
2 b 0.915145602 red green
3 a 0.880888083 red red
4 a 0.005522657 red red
5 a 0.707962105 red red
6 b 0.477025984 red green
7 c 0.696071765 red green
8 c 0.589232832 red green
9 c 0.113423431 red green
I would avoid using new as a variable name - it is the name of a function and this may cause issues.