I managed to fill my doi_list but it does not work if I encapsulate the code into a function. From a tutorial I've seen I assume that this should be possible but doi_list is empty after get_doi_from_category() finishes.
library(aRxiv)
get_doi_from_category <- function(category, doi_list) {
arxiv_rec <- arxiv_search(category)
arxiv_doi_list <- arxiv_rec[13]
by(arxiv_doi_list, 1:nrow(arxiv_doi_list),
function(row) {
if(nchar(row) > 0) {
doi_list <<- c(doi_list, row)
}
})
}
doi_list <- list()
get_doi_from_category('cat:stat.ML', doi_list)
for(doi in doi_list)
{
print(doi)
}
get_doi_from_category('cat:stat.CO', doi_list)
get_doi_from_category('cat:stat.ME', doi_list)
get_doi_from_category('cat:stat.TH', doi_list)
PS: First day with R.
Here's a better way to do what you want in R:
categ <- c(CO = "cat:stat.CO", #I'm naming these elements so
ME = "cat:stat.ME", # that the corresponding elements
TH = "cat:stat.TH", # in the list are named as well.
ML = "cat:stat.ML") # Could also just set 'names(doi_list)' to 'categ'.
doi_list <-
lapply(categ, function(ctg)
(doi <- arxiv_search(ctg)$doi)[nchar(doi) > 0])
I sort of threw you in the deep end on the last line with in-line assignment of doi; a more step-by-step approach would be:
lapply(categ, function(ctg){
arxiv.df <- arxiv_search(ctg)
doi <- arxiv.df$doi
doi[nchar(doi) > 0]})
Related
I have to automate this sequence of functions:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
dist_ao_i <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
As a result, I want a "dist_ao" for each i. The indexed values are to be found in the isic columns of the WBES_sf_angola and the FDI_angola datasets.
How can I embed the index in the various items' names?
EDIT:
I tried with following modification:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
result_list <- list()
result_list[[paste0("dist_ao_", i)]] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
and the output is just a list of 1 that contains dist_ao_62. Where do I avoid overwriting?
Untested (due to missing MRE) but should work:
result_list <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
result_list[[paste0("dist_ao_", i)]] <- distm(as_Spatial(subset(WBES_sf_angola, isic == i)) , FDI_angola[FDI_angola$isic==i,], fun = distGeo)/1000
}
You could approach it this way. All resulting dataframes will be included in the list, which you can convert to a dataframe from the last line of the the code here. NOTE: since not reproducible, I have mostly taken the code from your question inside the loop.
WBES_sf_angola_result <- list() # renamed this, as it seems you are using a dataset with the name WBES_sf_angola
WBES_angola <- list()
FDI_angola <- list()
dist_ao <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola[[paste0("i_", i)]] <- subset(WBES_sf_angola, isic == i)
WBES_angola[[paste0("i_", i)] <- as_Spatial(WBES_sf_angola_i)
FDI_angola[[paste0("i_", i)] <- FDI_angola[FDI_angola$isic==i,]
dist_ao[[paste0("i_", i)] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
WBES_sf_angola_result <- do.call(rbind, WBES_sf_angola_result) # to get a dataframe
Your subset data can also be accessed through list index. eg.
WBES_sf_angola_result[[i_15]] # for the first item.
I just read that vectorization increases performance and lowers significantly computation time, and in the case of if() else , best choice is ifelse().
My problem is I got some if statements inside a for loop, and each if statement contains multiple assignments, like the following:
x <- matrix(1:10,10,3)
criteria <- matrix(c(1,1,1,0,1,0,0,1,0,0,
1,1,1,1,1,0,0,1,1,0,
1,1,1,1,1,1,1,1,1,1),10,3) #criteria for the ifs
output1 <- rep(list(NA),10) #storage list for output
for (i in 1:10) {
if (criteria[i,1]>=1) {
output1[[i]] <- colMeans(x)
output1[[i]] <- output1[[i]][1] #part of the somefunction output
} else {
if(criteria[i,2]>=1) {
output1[[i]] <- colSums(x)
output1[[i]] <- output1[[i]][1] #the same
} else {
output1[[i]] <- colSums(x+1)
output1[[i]] <- output1[[i]][1] #the same
}}}
How can I translate this into ifelse?
Thanks in advance!
Note that you don't need a for loop as all operations used are vectorized:
output2 <- ifelse(criteria[, 1] >= 1,
colMeans(x)[1],
ifelse(criteria[, 2] >= 1,
colSums(x)[1],
colSums(x+1)[1]))
identical(output1, as.list(output2))
## [1] TRUE
At least you can convert two assignments into one. So instead of
output[[i]] <- somefunction(arg1,arg2,...)
output[[i]] <- output[[i]]$thing #part of the somefunction output
you can refer directly to the only part you are interested in.
output[[i]] <- somefunction(arg1,arg2,...)$thing #part of the somefunction output
Hope that it helps!
It seems I found the answer trying to build the example:
output2 <- rep(list(NA),10) #storage list for output
for (i in 1:10) {
output2[[i]] <- ifelse(criteria[i,1]>=1,
yes=colMeans(x)[1],
no=ifelse(criteria[i,2]>=1,
yes=colSums(x)[1],
no=colSums(x+1)[1]))}
I am having an issue with my code. I have got a for loop so that it Identifies all "strong" tags in an html document and then identifies the row number of a given word in the html. I want it, for any instances where the row numbers match, to make note of that row number. I have it so far, but if there is an instance of the word outside that of the rows where the strong tag is, it fails
url <- readLines("http://afip.gob.ar/contacto")
tagname=NULL
identifier=NULL
IDintag=NULL
rowst=NULL
rowend=NULL
data=NULL
tag <- as.matrix(grep("<strong>",url))
if(length(tag) > 0)
{ID <- grep("Telef|Numero",url)
for(i in 1:length(ID))
{IDintag[i] <- grep(ID[i],tag)
}
for(i in 1:length(IDintag))
{tagname[i] <- tag[IDintag[i]]
}
for(i in 1:length(tagname))
{rowst[i] <- which(grepl(tagname[i],tag))
rowend[i] <- tag[rowst[i] + 1,]-1
data[i] <- toString(url[tagname[i]:rowend[i]])
}
}
This works like a dream but if I change the url to one where the ID terms occur where the strong tag doesn't, it fails for example:
url <- readLines("http://www2.le.ac.uk/contact")
tagname=NULL
identifier=NULL
IDintag=NULL
rowst=NULL
rowend=NULL
data=NULL
tag <- as.matrix(grep("<h2>",url))
if(length(tag) > 0)
{ID <- grep("Telef|Numero|phone",url)
for(i in 1:length(ID))
{IDintag[i] <- grep(ID[i],tag)
}
for(i in 1:length(IDintag))
{tagname[i] <- tag[IDintag[i]]
}
for(i in 1:length(tagname))
{rowst[i] <- which(grepl(tagname[i],tag))
rowend[i] <- tag[rowst[i] + 1,]-1
data[i] <- toString(url[tagname[i]:rowend[i]])
}
}
Thanks in advance
I have a vector appended within a list. The entry successively grows.
li <- list()
for(i in 1:10)
{
v <- runif(2)
if(i==1)
{
li[[1]] <- v
} else {
li[[1]] <- append(li[[1]],v)
}
}
It's ugly that I need different code for the two cases 1) li[[1]] does not exist and 2) li[[1]] exists. Any solutions?
Background:
you cant initialize a list element as you do it with a vector:
v <- NULL
v <- append(v,c(1,2,3))
works
but
li <- list()
li[[1]] <- NULL
li[[1]] <- append(li[[1]],c(1,2,3))
throws an error, since li[[1]] can't be initialized by li[[1]] <- NULL .
Update: I learned that this will work with named lists (which also adds some grace), but there may be (dynamical) cases where naming is not a good option.
I am not sure that you need a loop here. But you can pre-allocate your list to avoid dealing with empty lists. You allocate using vector like this:
n <- 10
li <- vector('list',n)
Then you just assign each element :
for(i in 1:10) {
v <- runif(sample(n,1)) ## I choose a dynamic length here
## otherwise the example don't make sense
li[[i]] <- v
}
The proper way to initialize list elements is such:
li <- list(NULL)
for(i in 1:10)
{
v <- runif(2)
li[[1]] <- append(li[[1]],v)
}
Thanks
I have the following code (nested for loop) in R which is extremely slow. The loop matches values from two columns. Then picks up a corresponding file and iterates through the file to find a match. Then it picks up that row from the file. The iterations could go up to more than 100,000. Please if some one can provide an insight on how to quicken the process.
for(i in 1: length(Jaspar_ids_in_Network)) {
m <- Jaspar_ids_in_Network[i]
gene_ids <- as.character(GeneTFS$GeneIds[i])
gene_names <- as.character(GeneTFS$Genes[i])
print("i")
print(i)
for(j in 1: length(Jaspar_ids_in_Exp)) {
l <- Jaspar_ids_in_Exp[j]
print("j")
print(j)
if (m == l) {
check <- as.matrix(read.csv(file=paste0(dirpath,listoffiles[j]),sep=",",header=FALSE))
data_check <- data.frame(check)
for(k in 1: nrow(data_check)) {
gene_ids_JF <- as.character(data_check[k,3])
genenames_JF <- as.character(data_check[k,4])
if(gene_ids_JF == gene_ids) {
GeneTFS$Source[i] <- as.character(data_check[k,3])
data1 <- rbind(data1, cbind(as.character(data_check[k,3]),
as.character(data_check[k,8]),
as.character(data_check[k,9]),
as.character(data_check[k,6]),
as.character(data_check[k,7]),
as.character(data_check[k,5])))
} else if (toupper(genenames_JF) == toupper(gene_names)) {
GeneTFS$Source[i] <- as.character(data_check[k,4])
data1 <- rbind(data1, cbind(as.character(data_check[k,4]),
as.character(data_check[k,5]),
as.character(data_check[k,6]),
as.character(data_check[k,7]),
as.character(data_check[k,8]),
as.character(data_check[k,2])))
} else {
# GeneTFS[i,4] <- "No Evidence"
}
}
} else {
# GeneTFS[i,4] <- "Record Not Found"
}
}
}
If you pull out the logic for processing one pair into a function, f(m,l), then you could replace the double loop with:
outer(Jaspar_ids_in_Network, Jaspar_ids_in_Exp, Vectorize(f))