I have a csv file which looks like this:
"","people_id","commit_id"
"1",1,0
"2",1,117
"3",1,144
"4",1,278
…
Here's the csv file if you wanna look at it. It contains 11735 lines but 5923 unique people ids.
Does anyone know how to connect the people ids with the common "commit_id" and ignore commit_id 0 as id 0 does not exist.
For now I have done this:
# read the csv file
commitsNetwork <- read.csv("commits.csv", header=TRUE)
# use a subset for demo purpose
commitsNetwork <- commitsNetwork[c("people_id", "commit_id")]
#build edgelist(for commits)
C <- spMatrix(nrow = length(unique(commitsNetwork$people_id)),
ncol = length(unique(commitsNetwork$commit_id)),
i = as.numeric(factor(commitsNetwork$people_id)),
j = as.numeric(factor(commitsNetwork$commit_id)),
x = rep(1, length(as.numeric(commitsNetwork$people_id))) )
row.names(C) <- levels(factor(commitsNetwork$people_id))
colnames(C) <- levels(factor(commitsNetwork$commit_id))
adjC <- tcrossprod(C)
comG <- graph.adjacency(adjC, mode = "undirected", weighted = TRUE, diag = FALSE)
#write to pajek file
write.graph(comG, "comNetwork.net", format = "pajek")
Also, the edges are from the 2nd column "commit_id". If both vertices(people) are connected by the common commit_id from the 6th column.
Therefore I'm not sure how to generate the network with this csv file in R.
The ideal output is should turn out like:
*Vertices 5923
1
2
3
4
...
*Edges
1 4 1
1 25 1
1 39 1
1 41 1
1 48 1
until 5923...
Maybe you want something like this:
library(igraph)
library(Matrix)
download.file("https://www.dropbox.com/s/q7sxfwjec97qzcy/people.csv?dl=1",
tf <- tempfile(fileext = ".csv"), mode = "wb")
people <- read.csv(tf)
A <- spMatrix(nrow = length(unique(people$people)),
ncol = length(unique(people$repository_id)),
i = as.numeric(factor(people$people)),
j = as.numeric(factor(people$repository_id)),
x = rep(1, length(as.numeric(people$people))) )
row.names(A) <- levels(factor(people$people))
colnames(A) <- levels(factor(people$repository_id))
adj <- tcrossprod(A)
g <- graph.adjacency(adj, mode = "undirected", weighted = TRUE, diag = FALSE)
See also here.
Related
I know how to append "say numbers" to a csv file using R. What I want is to paste the respective graphs also for the same file in the respective column. Here's the code I tried, but failed.
Please help.
library(muhaz)
library(readr)
ball_data <- read_csv("C:/Users/dovini jayasinghe/Desktop/Cricket/ball data.csv")
attach(ball_data)
#View(ball_data)
for(i in 1:3){
fit <- muhaz(times = scoreValue[help_index2==i],
delta = Censored[help_index2==i],
bw.method="local",
kern="epanechnikov",
max.time=max(scoreValue[help_index2==i]))
graph <- plot(fit)#draws the plot
max_x <- max(fit$pin$times)
#data frame excluding the graph
df <- data.frame("Ball Number" = i,
"Maximum Score" = max_x
)
#if we need the graph, I think the code is this way
# df <- data.frame("Ball Number" = i,
# "Maximum Score" = max_x,
# "Graph" = graph)
write.table(df,
'C://Users//dovini jayasinghe//Desktop//Cricket//hazard_i_th_ball.csv',
row.names = FALSE,
col.names = FALSE,
sep = ",",
append = TRUE)
}
To be specific, I need a table (a csv file) like this
Ball Number
Maximum Score
Graph
1
4
image of the plot 1
2
1
image of the plot 2
3
3
image of the plot 3
I try to execute this command
df2 <- as.data.frame.matrix(table(stack(setNames(strsplit(df$col1, "---", fixed = TRUE), df$id))[2:1]))
However I receive this error:
Error in table(stack(setNames(strsplit(df$col1, :
attempt to make a table with >= 2^31 elements
Any idea why this error happaned? Unfortunately I can't provide a reproducable example with this code because I can't find what caused this error.
What makes this command is that it make 0 and 1 values which separate by ---.
Example input:
data.frame(id = c(1,2), col1 = c("text---here","text---there"))
expected output
data.frame(id = c(1,2), text = c(1,1), here = c(1,0), there = c(0,1))
If the task in question is complex, it is worth splitting it into chunks. Try this:
x = data.frame(id = c(1,2), col1 = c("text---here","text---there")); x$col1 = as.vector(x$col1)
Split = strsplit(as.vector(x$col1), split = "---")
levels = unique(unlist(Split))
x = cbind(x, matrix(ncol = length(levels), nrow = nrow(x)))
for(i in 1:length(levels))
{
x[,ncol(x)-length(levels)+i] <- sapply(Split, function(x) max(x == levels[i]))
}
colnames(x) <- c("id", "col1", levels)
x
# id col1 text here there
# 1 1 text---here 1 1 0
# 2 2 text---there 1 0 1
I have the following code in R:
library(party)
dat = read.csv("data.csv", header = TRUE)
train <- dat[1:1000, ]
test <- dat[1000:1200, ]
output.tree <- cforest(t_class ~ var1 + var2,
data = train)
train_predict <- predict(output.tree, newdata = test, OOB=TRUE, type = "prob")
for (name in names(train_predict))
{
p <- (train_predict[[name]][1:3])
write.table(p, file = "result.csv",col.names = FALSE, append=TRUE)
}
I am trying to write the result of the random forest prediction to a csv file.
The result train_predict looks like the following:
When I run the above code its only write the first column of each row to the csv and not all three.
How can I write all three columns of the list to the file?
Also is there a way in R to clear the csv before you write to it in case there is something in it already?
Rather than write serially, you can convert to a data.frame and just write all at once:
Generate fake data that looks similar to what you posted:
fakeVec <- function(dummy) t(setNames(rnorm(3), letters[1:3]))
my_list <- lapply(0:4, fakeVec)
names(my_list) <- 6000:6004
Here's the fake data:
$`6000`
a b c
[1,] -0.2444195 -0.2189598 -1.442364
$`6001`
a b c
[1,] 0.2742636 1.068294 -0.8335477
$`6002`
a b c
[1,] -1.13298 1.927268 -2.123603
$`6003`
a b c
[1,] 0.8260184 1.003259 -0.003590849
$`6004`
a b c
[1,] -0.2025963 0.1192242 -1.121807
Then convert format:
# crush to flat matrix
my_mat <- do.call(rbind, my_list)
# add in list names as new column
my_df <- data.frame(id = names(my_list), my_mat)
Now you have a data.frame like this:
id a b c
1 6000 -0.2444195 -0.2189598 -1.442364429
2 6001 0.2742636 1.0682937 -0.833547659
3 6002 -1.1329796 1.9272681 -2.123603334
4 6003 0.8260184 1.0032591 -0.003590849
5 6004 -0.2025963 0.1192242 -1.121807439
Which you can just write straight to a file:
write.csv(my_df, 'my_file.csv', row.names=F)
How about this?
temp = list(x = data.frame(a = "a", b = "b", c = "c"),
y = data.frame(d = "d", e = "e", f = "f"),
z = data.frame(g = "g", h = "h", i = "i"))
for (i in 1:length(temp)) {
write.table(temp[[i]], "temp.csv",col.names = F, append = T)
}
Regarding clearing the csv. If I understood your question correctly, just erase the append = T?
I have a task to repeat the function shortest.paths,but my input is too large.I wanna know how to get it fast.To all my known, the igraph is the best choice. My kernel is as follows:
First, I got a real network and 1000 random network as 'list' format
library(igraph)
# real ppi network
REAL.PPI <- paste0(RANDM, "real.ppi.txt")
real.ppi <- graph.data.frame(read.table(REAL.PPI, header = F), directed = F)
ppi.gs <- V(real.ppi)$name
# random network
random_net_names <- dir(paste0(RANDM, "randomnetwork"))
random_nets <- lapply(random_net_names, function(x){
path <- paste(RANDM, "randomnetwork/", x, sep="")
rn <- read.table(path, header = F)
graph.data.frame(rn, directed = F)
}
Then, I have to compare the node sets' shortest paths in real and random networks.For this, I choose for-loop instead apply-function because the latter will not be faster.
input format is as:
hsa-let-7a-2-3p hsa-let-7a-3p GO:0001702 4040 10818 4089
hsa-let-7a-2-3p hsa-let-7a-3p GO:0001764 27185 2625 5048 429 6695
My kernel is as follows:
# input
ovr <- strsplit(readLines("ovr.txt"), '\t')
# for-loop
OUT.final <- outfile("out.txt", "w")
for(i in 1 : length(ovr)){
hyper.ppi.ovr <- ovr[[i]][- c(1, 2, 3)]
D1 <- shortest.paths(real.ppi, hyper.ppi.ovr, hyper.ppi.ovr)
CPLs <- sapply(random_nets, function(r_net){
sum(shortest.paths(r_net, hyper.ppi.ovr, hyper.ppi.ovr))
}
)
D2.p <- sum(CPLs < D1) / 1000
if(D2.p > .01)next
# output
out.value <- ovr[[i]]
cat(out.value, sep = "\t", file = OUT.final)
cat("\n", file = OUT.final)
}
close(OUT.final)
Let's say I have the following data frame:
x <- data.frame(let = sample(LETTERS, 100, replace = T),
num = sample(1:10, 100, replace = T))
I want to create several subsets of x where each new data frame is named after the levels of x$let. So far, I've come up with this simple function:
ss <- function(letra){
return(subset(x, let == letra))
}
Which is very rudimentary and doesn't do the naming as I wanted. My question is: how can I automate the following procedure?
a <- ss('A')
b <- ss('B')
c <- ss('C')
...
z <- ss('Z')
To elaborate a bit.
xs <- split(x, x$let)
Now we have a list, xs, of each subset of the original data frame. The names of each list component matches the factor level it was selected on:
xs[['D']]
let num
8 D 8
14 D 1
16 D 9
54 D 5
60 D 6
64 D 8
74 D 8
Most people use either xlsx or XLConnect to write Excel files from R. I happen to use XLConnect, but the solutions would be very similar.
Now we can simply do this:
require(XLConnect)
file_name <- paste0("file",LETTERS,".xlsx")
for (i in seq_len(length(xs))){
wb <- loadWorkbook(file_name[i],create = TRUE)
createSheet(wb,"Sheet1")
writeWorksheet(wb,data = xs[[i]],sheet = 1)
saveWorkbook(wb)
}
I've done this in a for loop so that it's easier to read and understand, but obviously this could all be shoved into an lapply or mapply type solution as well.
Agreed with Joshua that you may want to do something different, but if you're really hooked on your previous idea, you can use:
x <- data.frame(let = sample(LETTERS, 100, replace = T),
num = sample(1:10, 100, replace = T))
ss <- function(letra){
assign(letra, subset(x, let == letra), envir = .GlobalEnv)
# Returning the DF is optional:
# return(subset(x, let == letra))
}
ss('A')
print(A)
Update: taking Joran's suggestion, one can write:
x_split <- split(x,x$let)
for (let in x_split) {
write.csv(let, file = paste0((let$let)[1], ".csv"))
}