How to I get igraph to ignore blank cells? - r

So I have this dataframe that we will call test_a2. I want to use igraph to create a network map.
Col 1 Col 2 Col 3 Col 4
Table A | Table B | Table C |
Table Z | Table A | Table C | Table Y
Table K | Table L | Table M | Table B
Table J | Table H |
I am currently using the following code to map multiple columns
plot(graph.data.frame(rbindlist(lapply(seq(ncol(test_a2)-1), function(i) test_a2[i:(i+1)]))))
This give me a graph with nodes and edges. However, where there is an empty space which it creates a node for and create unnecessary connection. Anyway to have it ignore this?

Would this work?
library(igraph)
library(data.table)
test_a2 <- data.frame(col1 = c("A","Z","K","J"),
col2 = c("B","A","L","H"),
col3 = c("C","C","M",""),
col3 = c("","Y","B",""), stringsAsFactors=FALSE)
test_a2[test_a2 ==""] <- NA
test_a3 <- na.omit(rbindlist(lapply(seq(ncol(test_a2)-1), function(i) test_a2[i:(i+1)])))
plot(graph.data.frame(test_a3))][1]][1]
One note about this approach: the graph will not contain vertices that are not connected with anything else but "empty" cells. If you need to include them you can add them afterwards.

Related

Combining Grep and a For-Loop to Construct A Matrix (R)

I have a huge list of small data frames which I would like to meaningfully combine into one, however the logic around how to do so escapes me.
For instance, if I have a list of data frames that look something like this albeit with far more files, many of which I do not want in my data frame:
MyList = c("AthosVersusAthos.csv", "AthosVerusPorthos.csv", "AthosVersusAramis.csv", "PorthosVerusAthos.csv", "PorthosVersusPorthos.csv", "PorthosVersusAramis.csv", "AramisVersusAthos.csv", "AramisVersusPorthos.csv", "AramisVerusPothos.csv", "BobVersusMary.csv", "LostCities.txt")
What I want is to assemble these into one large data frame. Which would look like this.
| |
AthosVersusAthos | PorthosVersusAthos | AramisVersusAthos
| |
------------------------------------------------------
| |
AthosVerusPorthos | PothosVersusPorthos| AramisVersusPorthos
| |
------------------------------------------------------
| |
AthosVersusAramis | PorthosVersusAramis| AramisVersusAramis
| |
Or perhaps more correctly (with sample numbers in only one portion of the matrix):
| Athos | Porthos | Aramis
-------|------------------------------------------------------
| 10 9 5 | |
Athos | 2 10 4 | |
| 3 0 10 | |
-------|------------------------------------------------------
| | |
Porthos | | |
| | |
-------|------------------------------------------------------
| | |
Aramis | | |
| | |
-------------------------------------------------------------
What I have managed so far is:
Musketeers = c("Athos", "Porthos", "Aramis")
for(i in 1:length(Musketeers)) {
for(j in 1:length(Musketeers)) {
CombinedMatrix <- cbind (
rbind(MyList[grep(paste0("^(", Musketeers[i],
")(?=.*Versus[", Musketeers[j], "]"), names(MyList),
value = T, perl=T)])
)
}
}
What I was trying to do was combine my grep command (quite importnant given the number of files and specificity with which I need to select them) and then combine rbind and cbind so that the rows and the columns of the matrix are meaningfully concatenated.
My general plan was to merge all the data frames starting with 'Athos' into one column, and doing this once again for data frames starting with 'Porthos' and 'Aramis', and then combine those three columns, row-wise into a final dataframe.
I know I'm quite far off but I can't quite get my head around where to start.
Edit: #PierreGramme generated a useful model data set which I will add below seeing as I imagine it would have been useful to provide it originally.
Musketeers = c("Athos", "Porthos", "Aramis")
MyList = c("AthosVersusAthos.csv", "AthosVersusPorthos.csv", "AthosVersusAramis.csv",
"PorthosVersusAthos.csv", "PorthosVersusPorthos.csv", "PorthosVersusAramis.csv",
"AramisVersusAthos.csv", "AramisVersusPorthos.csv", "AramisVersusAramis.csv",
"BobVersusMary.csv", "LostCities.txt")
MyList = lapply(setNames(nm=MyList), function(x) matrix(rnorm(9), nrow=3, dimnames=list(c("a","b","c"), c("x","y","z"))) )
First make a reproducible example. Is it faithful? If so, I will add code to answer
Musketeers = c("Athos", "Pothos", "Aramis")
MyList = c("AthosVersusAthos.csv", "AthosVersusPothos.csv", "AthosVersusAramis.csv",
"PothosVersusAthos.csv", "PothosVersusPothos.csv", "PothosVersusAramis.csv",
"AramisVersusAthos.csv", "AramisVersusPothos.csv", "AramisVersusAramis.csv",
"BobVersusMary.csv", "LostCities.txt")
MyList = lapply(setNames(nm=MyList), function(x) matrix(rnorm(9), nrow=3, dimnames=list(c("a","b","c"), c("x","y","z"))) )
And then is it correct that you would like to concatenate 9 of these matrices into your combined matrix shaped as you described?
Edit:
Then the code solving your problem:
# Helper function to extract the relevant portion of MyList and rbind() it
makeColumns = function(n){
re = paste0("^",n,"Versus")
sublist = MyList[grep(re, names(MyList))]
names(sublist) = sub(re, "", sub("\\.csv$","", names(sublist)))
# Make sure sublist is sorted correctly and contains info on all musketeers
sublist = sublist[Musketeers]
# Change row and col names so that they are unique in the final result
sublist = lapply(names(sublist), function(m) {
res = sublist[[m]]
rownames(res) = paste0(m,"_",rownames(res))
colnames(res) = paste0(n,"_",colnames(res))
res
})
do.call(rbind, sublist)
}
lColumns = lapply(setNames(nm=Musketeers), makeColumns)
CombinedMatrix = do.call(cbind, lColumns)

Add new row to dataframe when actionButton is clicked

I'm trying to create a R Shiny application that creates a 1 row dataframe based on the input values, and when an action button is clicked it adds that dataframe as a new row to another dataframe (which starts blank).
I've browsed around StackOverflow but I couldn't find something that addressed my issue.
What I would like to happen is the following:
input$one <- "A"
input$two <- "B"
input$three <- "C"
df1 = A | B | C
now when I press the actionButton, I would like df2 (which starts blank) to be the following:
df2 = A | B | C
next, I want to be able to add more rows. So if I change my input values to the following:
input$one <- "X"
input$two <- "Y"
input$three <- "Z"
df1 = X | Y | Z
and I click the actionButton again, df2 should update to be the following:
df2 = A | B | C
X | Y | Z
and finally, one last update and click of the actionButton would be the following:
input$one <- "1"
input$two <- "2"
input$three <- "3"
df1 = 1 | 2 | 3
*actionButton click*
df2 = A | B | C
X | Y | Z
1 | 2 | 3
I would like to do this virtually as many times as possible, that every time I click the actionButton it adds whatever is in df1 as a new row to df2. I know that this will have to use rbind, but how can it be done with actionButton?
This worked for me.
mydata <- data.frame()
df <- eventReactive(input$add_payer, {
newrow <- data.frame(ALEMO())
mydata <<- rbind(mydata, newrow)
})
output$all_data <- renderTable(df())

Using R to text mine and extract words

I asked a similar questions before but i still need some help/be pointed into the right direction.
I am trying to locate certain words within a column that consists of a SQL statement on all the rows and extract the next word in R studio.
Example: lets call this dataframe "SQL
| **UserID** | **SQL Statement**
1 | N781 | "SELECT A, B FROM Table.1 p JOIN Table.2 pv ON
p.ProdID.1ProdID.1 JOIN Table.3 v ON pv.BusID.1 =
v.BusID WHERE SubID = 1 ORDER BY v.Name;"
2 | N283 | "SELECT D, E FROM Table.11 p JOIN Table.2 pv ON
p.ProdID.1ProdID.1 JOIN Table.3 v ON pv.BusID.1 =
v.BusID WHERE SubID = 1 ORDER BY v.Name;"
So I am trying to pull out the table name. So I am trying to find the words "From" and "Join" and pulling the next table names.
I have been using some code with help from earlier:
I make the column "SQL Statement" in a list of 2 name "b"
I use the code:
z <- mapply(grepl,"(FROM|JOIN)",b)
which gives me a True and fasle for each word in each list.
z <- mapply(grep,"(FROM|JOIN)",b)
The above is close. It give me a position of every match in each of the lists.
But I am just trying to find the word Join or From and take the text word out. I was trying to get an output something like
| **UserID** | **SQL Statement** | Tables
1 | N781 | "SELECT A, B FROM Table.1 p JOIN Table.2 pv ON | Table.1, Table.2
p.ProdID.1ProdID.1 JOIN Table.3 v ON pv.BusID.1 =
v.BusID WHERE SubID = 1 ORDER BY v.Name;"
2 | N283 | "SELECT D, E FROM Table.11 p JOIN Table.2 pv ON
p.ProdID.1ProdID.1 JOIN Table.3 v ON pv.BusID.1 = | Table.11, Table.31
v.BusID WHERE SubID = 1 ORDER BY v.Name;"
Here is a working script which uses base R options. The inspiration here is to leverage strsplit to split the query string on the keywords FROM or JOIN. Then, the first separate word of each resulting term (except for the first term) should be a table name.
sql <- "SELECT A, B FROM Table.1 p JOIN Table.2 pv ON
p.ProdID.1ProdID.1 JOIN Table.3 v ON pv.BusID.1 =
v.BusID WHERE SubID = 1 ORDER BY v.Name;"
terms <- strsplit(sql, "(FROM|JOIN)\\s+")
out <- unlist(lapply(terms, function(x) gsub("^([^[:space:]]+).*", "\\1", x)))
out <- out[2:length(out)]
out
[1] "Table.1" "Table.2" "Table.3"
Demo
To understand better what I did, follow the demo and have a look at the terms list which resulted from splitting.
Edit:
Here is a link to another demo which shows how you might use the above logic on a vector of query strings, to generate a list of vector of tables, for each query
Demo

R: Aggregating a list of column names mapped to row numbers based off of a condition in a data frame

Scaled down my dataframe looks like this:
+---+------------+-------------+
| | Label1 | Label2 |
+---+------------+-------------+
| 1 | T | F |
| 2 | F | F |
| 3 | T | T |
+---+------------+-------------+
I need to create a list of lists that map the column names to all the row numbers that have a false boolean as their value. For the above example it would look something like this:
{"Label1" : (2), "Label2" : (1,2)}
I am currently doing it as so:
myList = with(data.frame(which(!myDataFrame, arr.ind = TRUE)), list("colNames" = names(myDataFrame)[col], "rows" = row))
l = list()
count = 1;
for (i in myList[["colNames"]]) {
tmpRowNum = myList[["rows"]][[count]];
tmpList = l[[i]];
if (is.null(tmpList)) {
tmpList = list();
}
l[[i]] = c(tmpList, list(tmpRowNum))
count = count + 1;
}
This does work, but as I am new to R I can only assume there is a more efficient method of doing this. The with function creates two separate lists that I essentially have to combine to get the result that I am looking for.
You could try:
df <- data.frame(Label1=c("T","F","T"),Label2=c("F","F","T"))
lapply(df,function(x) which(x=="F"))
$Label1
[1] 2
$Label2
[1] 1 2
EDIT To get the same by row, use apply with margin=1:
apply(df,1,function(x) which(x=="F"))
To get a vector of the "F"s in row 2:
res <- apply(df,1,function(x) which(x=="F"))
res[[2]]
1 2
One useful way to get the row/column index is with which and arr.ind
i1 <- which(df=="F", arr.ind=TRUE)

Getting a dataframe of logical values from a vector of statements

I have a number of lists of conditions and I would like to evaluate their combinations, and then I'd like to get binary values for these logical values (True = 1, False = 0). The conditions themselves may change or grow as my project progresses, and so I'd like to have one place within the script where I can alter these conditional statements, while the rest of the script stays the same.
Here is a simplified, reproducible example:
# get the data
df <- data.frame(id = c(1,2,3,4,5), x = c(11,4,8,9,12), y = c(0.5,0.9,0.11,0.6, 0.5))
# name and define the conditions
names1 <- c("above2","above5")
conditions1 <- c("df$x > 2", "df$x >5")
names2 <- c("belowpt6", "belowpt4")
conditions2 <- c("df$y < 0.6", "df$y < 0.4")
# create an object that contains the unique combinations of these conditions and their names, to be used for labeling columns later
names_combinations <- as.vector(t(outer(names1, names2, paste, sep="_")))
condition_combinations <- as.vector(t(outer(conditions1, conditions2, paste, sep=" & ")))
# create a dataframe of the logical values of these conditions
condition_combinations_logical <- ????? # This is where I need help
# lapply to get binary values from these logical vectors
df[paste0("var_",names_combinations] <- +(condition_combinations_logical)
to get output that could look something like:
-id -- | -x -- | -y -- | -var_above2_belowpt6 -- | -var_above2_belowpt4 -- | etc.
1 | 11 | 0.5 | 1 | 0 |
2 | 4 | 0.9 | 0 | 0 |
3 | 8 | 0.11 | 1 | 1 |
etc. ....
Looks like the dreaded eval(parse()) does it (hard to think of a much easier way ...). Then use storage.mode()<- to convert from logical to integer ...
res <- sapply(condition_combinations,function(x) eval(parse(text=x)))
storage.mode(res) <- "integer"

Resources