R - Function that call conditionally another functions - r

I am working with igraph package and I'm trying to build a function that calculates the number of intra-community edges of different algorithm implementation. I try to concatenate everything inside the function even the algorithms community detection functions. Like this:
library("igraph")
intra.edges<-function(G,algorithm) {
if(algorithm==1){
Mod<-cluster_louvain(G)}
if(algoritmo==2){
Mod<-cluster_edge_betweenness(G)}
if(algoritmo==3){
Mod<-cluster_walktrap(G)}
Com<-as.data.frame(sizes(Mod))
NoCom<-as.vector(Com$Community.sizes)
vert<-NULL
for(i in 1:length(NoCom)){
M<-which(membership(Mod)==i)
sg<-induced.subgraph(G,M)
c.ec<-ecount(sg)
vert<-c.ec
}
intra<-data.frame(Com,vert)
print(intra)
}
When I try the function, it don't works correctly. For example:
When I run:
G <- graph.famous("Zachary")
intra.edges(G,1)
I get:
Community.sizes Freq vert
1 9 6
2 7 6
3 9 6
4 4 6
5 5 6
And when I run intra.edges(G,2) or intra.edges(G,3) I get the same output.
Also, not all the network's components have six vertex, it is only in one component.

You can either add your calculated value of vert to the dataframe with each iteration of your for loop by changing your code to:
intra<-Com
for(i in 1:length(NoCom)){
M<-which(membership(Mod)==i)
sg<-induced.subgraph(G,M)
intra$vert[i]<-ecount(sg)
}
print(intra)
Or, as #dash2 suggested, create a vector called vert and add values sequentially like this:
vert<-NULL
for(i in 1:length(NoCom)){
M<-which(membership(Mod)==i)
sg<-induced.subgraph(G,M)
c.ec<-ecount(sg)
vert[i]<-c.ec
}

Related

Recursively plot multiple tables and outputs to images in R

Hi I have multiple tables, let say 20 tables, to be plotted and outputted to jpeg images as below.
table1:
list percentage
1 20
2 50
3 30
...
table2:
list percentage
1 40
2 10
3 20
...
I wish to do a for loop to recursively plot each table and output to a jpeg image. I tried the code below:
for (i in 1:20) {
jpeg(paste0("plot_",i,".jpg"))
plot(paste0("table",i, "$percentage"))
dev.off()
}
It showed error. I wonder how I can write to get it worked. Thanks in advance.
Passing a string to plot is probably what is causing you problems. Use eval:
for (i in 1:20) {
jpeg(paste0("plot_",i,".jpg"))
plot(eval(parse(text=paste0("table",i, "$percentage"))))
dev.off()
}

R Refer to (part of) data frame using string in R

I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work
It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.
Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6

Poisson Table in R

I am trying to generate a Poisson Table in R for two events, one with mean 1.5 (lambda1) and the other with mean 1.25 (lambda2). I would like to generate the probabilities in both cases for x=0 to x=7+ (7 or more). This is probably quite simple but I can't seem to figure out how to do it! I've managed to create a data frame for the table but I don't really know how to input the parameters as I've never written a function before:
name <- c("0","1","2","3","4","5","6","7+")
zero <- mat.or.vec(8,1)
C <- data.frame(row.names=name,
"0"=zero,
"1"=zero,
"2"=zero,
"3"=zero,
"4"=zero,
"5"=zero,
"6"=zero,
"7+"=zero)
I am guessing I will need some "For" loops and will involve dpois(x,lambda1) at some point. Can somebody help please?
I'm assuming these events are independent. Here's one way to generate a table of the joint PMF.
First, here are the names you've defined, along with the lambdas:
name <- c("0","1","2","3","4","5","6","7+")
lambda1 <- 1.5
lambda2 <- 1.25
We can get the marginal probabilities for 0-6 by using dpois, and the marginal probability for 7+ using ppois and lower.tail=FALSE:
p.x <- c(dpois(0:6, lambda1), ppois(7, lambda1, lower.tail=FALSE))
p.y <- c(dpois(0:6, lambda2), ppois(7, lambda2, lower.tail=FALSE))
An even better way might be to create a function that does this given any lambda.
Then you just take the outer product (really, the same thing you would do by hand, outside of R) and set the names:
p.xy <- outer(p.x, p.y)
rownames(p.xy) <- colnames(p.xy) <- name
Now you're done:
0 1 2 3 4 5
0 6.392786e-02 7.990983e-02 4.994364e-02 2.080985e-02 6.503078e-03 1.625770e-03
1 9.589179e-02 1.198647e-01 7.491546e-02 3.121478e-02 9.754617e-03 2.438654e-03
2 7.191884e-02 8.989855e-02 5.618660e-02 2.341108e-02 7.315963e-03 1.828991e-03
3 3.595942e-02 4.494928e-02 2.809330e-02 1.170554e-02 3.657982e-03 9.144954e-04
4 1.348478e-02 1.685598e-02 1.053499e-02 4.389578e-03 1.371743e-03 3.429358e-04
5 4.045435e-03 5.056794e-03 3.160496e-03 1.316873e-03 4.115229e-04 1.028807e-04
6 1.011359e-03 1.264198e-03 7.901240e-04 3.292183e-04 1.028807e-04 2.572018e-05
7+ 4.858139e-05 6.072674e-05 3.795421e-05 1.581426e-05 4.941955e-06 1.235489e-06
6 7+
0 3.387020e-04 1.094781e-05
1 5.080530e-04 1.642171e-05
2 3.810397e-04 1.231628e-05
3 1.905199e-04 6.158140e-06
4 7.144495e-05 2.309303e-06
5 2.143349e-05 6.927908e-07
6 5.358371e-06 1.731977e-07
7+ 2.573935e-07 8.319685e-09
You could have also used a loop, as you originally suspected, but that's a more roundabout way to the same solution.

%Rpush >> lists of complex objects (e.g. pandas DataFrames in IPython Notebook)

Once again, I am having a great time with Notebook and the emerging rmagic infrastructure, but I have another question about the bridge between the two. Currently I am attempting to pass several subsets of a pandas DataFrame to R for visualization with ggplot2. Just to be clear upfront, I know that I could pass the entire DataFrame and perform additional subsetting in R. My preference, however, is to leverage the data management capability of Python and the subset-wise operations I am performing are just easier and faster using pandas than the equivalent operations in R. So for the sake of efficiency and morbid curiosity...
I have been trying to figure out if there is a way to push several objects at once. The wrinkle is that sometimes I don't know in advance how many items will need to be pushed. To retain flexibility, I have been populating dictionaries with DataFrames throughout the front end of the script. The following code provides a reasonable facsimile of what I am working through (I have not converted via com.convert_to_r_dataframe for simplicity, but my real code does take this step):
import pandas as pd
from pandas import DataFrame
%load_ext rmagic
d1=DataFrame(np.arange(16).reshape(4,4))
d2=DataFrame(np.arange(20).reshape(5,4))
d_list=[d1,d2]
names=['n1','n2']
d_dict=dict(zip(names,d_list))
for name in d_dict.keys():
exec '%s=d_dict[name]' % name
%Rpush n1
As can be seen, I can assign a static name and push the DataFrame into the R namespace individually (as well as in a 'list' >> %Rpush n1 n2). What I cannot do is something like the following:
for name in d_dict.keys():
%Rpush d_dict[name]
That snippet raises an exception >> KeyError: u'd_dict[name]'. I also tried to deposit the dynamically named DataFrames in a list, the list references end up pointing to the data rather than the object reference:
df_list=[]
for name in d_dict.keys():
exec '%s=d_dict[name]' % name
exec 'df_list.append(%s)' % name
print df_list
for df in df_list:
%Rpush df
[ 0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15,
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19]
%Rpush did not throw an exception when I looped through the lists contents, but the DataFrames could not be found in the R namespace. I have not been able to find much discussion of this topic beyond talk about the conversion of lists to R vectors. Any help would be greatly appreciated!
Rmagic's push uses the name that you give it both to look up the Python variable, and to name the R variable it creates. So it needs a valid name, not just any expression, on both sides.
There's a trick you can do to get the name from a Python variable:
d1=DataFrame(np.arange(16).reshape(4,4))
name = 'd1'
%Rpush {name}
# equivalent to %Rpush d1
But if you want to do more advanced things, it's best to get hold of the r object and use that to put your objects in. Rmagic is just a convenience wrapper over rpy2, which is a full API. So you can do:
from rpy2.robjects import r
r.assign('a', 1)
You can mix and match which interface you use - rmagic and rpy2 are talking to the same instance of R.

R stack alternative

I am trying to write code that takes values from one column of each of many files and prints out a list of the values of a different column depending on the values found in the first. If that makes sense. I have read the files in, but I am having trouble managing the table. I would like to limit the table to just those two columns, because the files are very large, cumbersome and unnecessary. In my attempt to do so I had this line:
tmp<-stack(lapply(inputFiles,function(x) x[,3]))
But ideally I would like to include two columns (3 and 1), not just one, so that I may use a line, such as these ones:
search<-tmp[tmp$values < 100, "Target"]
write(search, file = "Five", ncolumns = 2)
But I am not sure how. I am almost certain that stack is not going to work for more than one column. I tried some different things, similar to this:
tmp<-stack(lapply(inputFiles,function(x) x[,3], x[,1]))
But of course that didn't work.
But I don't know where to look. Does anyone have any suggestions?
The taRifx package has a list method for stack that will do what you want. It stacks lists of data.frames.
Untested code:
library(taRifx)
tmp<-stack(lapply(inputFiles,function(x) x[,c(1,3)]))
But you didn't change anything! Why does this work?
lapply() returns a list. In your case, it returns a list where each element is a data.frame.
Base R does not have a special method for stacking lists. So when you call stack() on your list of data.frames, it calls stack.default, which doesn't work.
Loading the taRifx library loads a method of stack that deals specifically with lists of data.frames. So everything works fine since stack() now knows how to properly handle a list of data.frames.
Tested example:
dat <- replicate(10, data.frame(x=runif(2),y=rnorm(2)), simplify=FALSE)
str(dat)
stack(dat)
x y
1 0.42692948 0.32023455
2 0.75388820 0.24154125
3 0.64035957 1.96580059
4 0.47690790 -1.89772855
5 0.41668993 0.78083412
6 0.12643784 0.38029833
7 0.01656855 0.51225268
8 0.40653094 1.09408159
9 0.94236491 -0.13410923
10 0.05578115 1.12475364
11 0.75651062 -0.65441493
12 0.48210444 1.67325343
13 0.95348755 0.04828449
14 0.02315498 -0.28481193
15 0.27370762 0.43927826
16 0.83045889 0.75880763
17 0.40049367 0.06945058
18 0.86212662 1.49918712
19 0.97611629 0.13959291
20 0.29107186 0.64483646

Resources