I'm looking to create an binary variable column that shows simply indicates whether or not an existing column is equal to "R" or "P". If it is equal, i would like the new column to read "1", and if there is a blank observation I would like it to read "0".
I would like this:
Person Play Key
A 1 R
B 2 P
C 3
D 4 R
E 5
To become this:
Person Play Key Indicator
A 1 R 1
B 2 P 1
C 3 0
D 4 R 1
E 5 0
I have tried:
df$Indicator <- (df$Key == 'R' | 'P')
But that doesn't work. I get the error Error in df$Indicator <- (df$Key == 'R' | 'P')" : operations are possible only for numeric, logical or complex types
Besides I'm not sure that would provide the binary indicator I'm looking for.
Try any of these approaches. You were almost close as you were using a code like df$Indicator <- (df$Key == 'R' | 'P') but the proper form would be df$Indicator <- df$Key == 'R' | df$Key =='P'. That will produce TRUE/FALSE values, so you can use as.numeric() to make them 0/1. Here the code:
#Code 1
df$Indicator <- as.numeric(df$Key %in% c('R','P'))
#Code 2
df$Indicator <- as.numeric(df$Key == 'R' | df$Key== 'P')
Output:
Person Play Key Indicator
1 A 1 R 1
2 B 2 P 1
3 C 3 0
4 D 4 R 1
5 E 5 0
Some data used:
#Data
df <- structure(list(Person = c("A", "B", "C", "D", "E"), Play = 1:5,
Key = c("R", "P", "", "R", "")), row.names = c(NA, -5L), class = "data.frame")
Another option would be (All credits to #ChuckP):
#Code3
df$indicator <- ifelse(df$Key == 'R' | df$Key== 'P', 1, 0)
Which will produce same output.
expl <- data.frame(Person = LETTERS[1:5], Play = 1:5, Key = c("R", "R"," ", "P", " "))
expl$Indicator <- expl$Key == 'R' | expl$Key =='P'
print(expl)
expl$Indicator2 <- as.numeric(expl$Key == 'R' | expl$Key =='P')
print(expl)
I am trying to get a function to solve a small problem. I have to two list, and each list comprise n samples. Each sample has a variable amount of identifiers of bacteria (in the example letters, in my problem bacterial identifiers like OTU1-OTUn, in both cases are “character”). One list comprise samples from diet, and the another list samples from gut contents. I want to know for each sample of list gut, how many bacteria from diet are in the gut and how many bacteria in the gut do not come from diet. This was easily done when working with phyloseq object were diet and gut are both phyloseq objects with n samples each.
Bacteria_from_diet<-length(intersect(taxa_names(gut),taxa_names(diet))
Bacteria_not_diet<-length(taxa_names(diet)- Bacteria_from_diet
However, this “summarizes” the result over the n samples of gut and diet, I mean, like if I collapse data by sample, and I need some measure of variation.
I have tried the following code in R:
diet<-list(DL1=c("A","B","C"),DL2=c("A","C","D"),DL3=c("B","D","E"),DL4=c("B","D","E"))
gut<-list(DL5=c("A","F","G"),DL6=c("B","F","H"),DL7=c("D","H","J"),DL8=c("A","G","F"))
gut_vs_diet <- function(a,b) ## a is diet and b is gut
{
xx<-10
gut = numeric(xx)
diet = numeric(xx)
all<-unlist(lapply(b,length)) ### get the number of elements of each element of list b
for(i in seq_along(b)){ #### loop over b (gut) to get:
diet<-length(intersect(b[[i]],a[[i]])) ### the number of elements of diet are present in gut
gut = all-diet ## the number of elements of gut that not come from diet
}
gutvsdiet = data.frame(all,gut,diet)
return(gutvsdiet)
}
When running the funtion I obtain this result, which is not correct
gut_vs_diet(diet,gut)
all gut diet
DL5 3 3 0
DL6 3 3 0
DL7 3 3 0
DL8 3 3 0
In some cases, I was able to get some value in diet column, but the function randomly choose the diet sample.
I do not know where could be the mistake.Anyway, I would like to do this iteratively, I mean, get the values for each sample of gut compared with all samples of diet. Alternatively, I can run a replicate(10,gut_vs_diet(sample(diet),sample(gut)) to get random comparisons and avoid somekind of bias.
Thank you very much for your help
Manuel
Here is my version of your code:
diet <- list(DL1=c("A","B","C"), DL2=c("A","C","D"), DL3=c("B","D","E"), DL4=c("B","D","E"))
gut <- list(DL5=c("A","F","G"), DL6=c("B","F","H"), DL7=c("D","H","J"), DL8=c("A","G","F"))
gut_vs_diet <- function(a, b) ## a is diet and b is gut
{
all <- lengths(b) ### get the number of elements of each element of list b
diet <- mapply(function(ai, bi) length(intersect(ai, bi)), a, b)
# diet <- lengths(mapply(intersect, a, b)) ## a variant
data.frame(all, gut=all-diet, diet)
}
gut_vs_diet(diet,gut)
# > gut_vs_diet(diet,gut)
# all gut diet
# DL5 3 2 1
# DL6 3 3 0
# DL7 3 2 1
# DL8 3 3 0
As #jogo suggested in a comment, you can use mapply instead of your for-loop:
FOO <- function(x, y){
all <- lengths(y)
diet <- mapply(function(a, b){
length(intersect(b, a))
}, x, y)
gut <- all - diet
return(data.frame(all, gut, diet))
}
> FOO(diet, gut)
all gut diet
DL5 3 2 1
DL6 3 3 0
DL7 3 2 1
DL8 3 3 0
Just for completion, with a for loop it would look like this. Note that you need to subtract all[[i]] - diet and construct the dataframe inside the loop, otherwise you will just fill it with the last result of the loop, which is, data.frame(all = c(3,3,3,3), gut = 3, diet = 0)
diet <- list(DL1 = c("A", "B", "C"), DL2 = c("A", "C", "D"), DL3 = c("B", "D", "E"), DL4 = c("B", "D", "E"))
gut <- list(DL5 = c("A", "F", "G"), DL6 = c("B", "F", "H"), DL7 = c("D", "H", "J"), DL8 = c("A", "G", "F"))
gut_vs_diet <- function(a, b)
{
all <- lengths(b)
gutvsdiet <- NULL
for (i in seq_along(b)) {
diet <- length(intersect(b[[i]], a[[i]]))
gut <- all[[i]] - diet
resultForThisListElement <- c(all[[i]], gut, diet)
gutvsdiet <- rbind(gutvsdiet, resultForThisListElement)
}
colnames(gutvsdiet) <- c("all", "gut", "diet")
return(gutvsdiet)
}
gut_vs_diet(diet, gut)
I have created a table with igraph listing the data as follows :
where a,b,c,d,e are the edges.
a and b are mutual edges,
with the weight values of 1 for a->b, 2 for b->a (There is no self-loop).
By the way I used the following code to create the above table:
library(igraph)
library(dplyr)
g <- data.frame(from = c("a", "b", "c", "d", "e"),
to = c("b", "a", "a", "b", "a"), weight = c(1:5)) %>%
igraph::graph_from_data_frame()
Now I hope to create another table listing both the forward and backward information between the edges, as well as the weight values like:
Does anyone know how to do this with igraph?
First you could get a list of the pairs of node that share and edge regardless of direction
simplified <- as.undirected(g, mode="collapse")
pairs <- ends(simplified, E(simplified))
Then we can write a helper function to return a given edge weight between two node and if it doesn't exist, return NA instead
get_edge_weight<- Vectorize(function(a, b) {
e <- E(g)[a %->% b]
if(length(e)==1) {
e$weight
} else {
NA
}
})
Then you can build your desired data.frame with
data.frame(from=pairs[,1], to=pairs[,2],
fwd=get_edge_weight(pairs[,1], pairs[,2]),
back=get_edge_weight(pairs[,2], pairs[,1])
)
# from to fwd back
# b a b 1 2
# c a c NA 3
# d b d NA 4
# e a e NA 5
I have a data frame with Column1, which can take the value of any letter of the alphabet. I want to create a second column that spells out the number corresponding to that letter. I am trying to do this with an if then statement... But keep getting an error. Sorry this is a simple question but I have tried the R for dummies website http://www.dummies.com/how-to/content/how-to-use-if-statements-in-r.html with no luck!
x$Column2 <- NULL
if (x$Column1 == "A") then[x$Column2 <- "One"]
The best way to do this is create a reference table:
>Reference = data.frame(Number = c("One", "Two", "Three", "Four"), Letter = c("A", "B", "C", "D"))
> Reference
Number Letter
1 One A
2 Two B
3 Three C
4 Four D
> Data = data.frame(Letter = c("B", "B", "C", "A", "D"))
> Data
Letter
1 B
2 B
3 C
4 A
5 D
Then you can find the indices:
> Indices = sapply(Data$Letter, function(x) which(x == Reference$Letter))
> Indices
[1] 2 2 3 1 4
And use them to create the column
> Data$Number = Reference[Indices,]$Number
> Data
Letter Number
1 B Two
2 B Two
3 C Three
4 A One
5 D Four
To my understanding, it is like creating a dummy variable, what you want to do here. Try
> x$dummy <- as.numeric(Column1 != "A")
and you should get 0 for all A's and 1 for other values.
Look at Generate a dummy-variable for further information.
I am working with network data and have come across an odd (or at least I didn't expect it) behavior with count.multiple in the igraph package in R.
library(igraph)
library(plyr)
df <- data.frame( sender = c( "a", "a", "a", "b", "b", "c","c","d" ),
receiver = c( "b", "b", "b", "c", "a", "d", "d", "a" ) )
What I want is to count up all of the edges and use the multiples as a weight.
when I do ddply(df, .(sender, receiver), "nrow") my results are:
sender receiver nrow
1 a b 3
2 b a 1
3 b c 1
4 c d 2
5 d a 1
Which is what I would expect.
However, I cannot reproduce this using igraph's count.multiple, which is what I expected to do this within igraph
df.graph <- graph.edgelist(as.matrix(df))
E(df.graph)$weight <- count.multiple(df.graph)
E(df.graph)$weight produces:
3 3 3 1 1 2 2 1
I then used the simplify command:
df.graph <- simplify(df.graph)
which produces
9 1 1 4 1
I get what is going on here, simplify is just adding the weights, but I don't understand why/when this would be used as opposed to what ddply is doing..?
Any thoughts?
Thanks!
The default behaviour of simplify is to add the weights of multiple edges.
To avoid double counting, you can set the initial weights to 1
g <- graph.edgelist(as.matrix(df))
E(g)$weight <- 1
g <- simplify( g )
E(g)$weight
or change the way they are aggregated.
g <- graph.edgelist(as.matrix(df))
E(g)$weight <- count.multiple(g)
g <- simplify( g, edge.attr.comb = list(weight=max, name="concat", "ignore") )
E(g)$weight