Arguments imply differing number of rows for an iteration loop - r

code problem
Save the result from an iteration loop into a whole dataframe problem
library(rscopus)
library(dplyr)
auth_token_header("d2f02ad55dcfc907212f0e6b216bf847")
akey="d2f02ad55dcfc907212f0e6b216bf847"
set_api_key(akey)
df = data.frame(doi = c("10.1109/TPAMI.2018.2798607", "10.1109/CNS.2017.8228696"))
df_references <- NULL
for (i in 1:nrow(df)) {
x = abstract_retrieval(df$doi[i], identifier= "doi")
for (a in 1:length(x$content$`abstracts-retrieval-response`$`item`$bibrecord$tail$`bibliography`$reference)){
call_str <- paste("ref <- x$content$`abstracts-retrieval-response`$`item`$bibrecord$tail$`bibliography`$reference[[",a,"]]$`ref-info`$`ref-title`")
eval(parse(text = call_str))
df_references <- rbind(df_references, data.frame(initial_paper = df$doi[i],
ref_title = ref))
}
}
I expect the output to be saved results of every iteration into a dataframe

Related

r - writing a function that includes for i loop

I scanned similar questions previously answered but couldn't find the thread that is specific to my problem.
I have a number of datasets that all have five flagging columns (binary) at the end.
The aim is to produce an output that summarises the specified column in each dataset by each flag.
Hence, each output is a list of five summary tables.
library(tidyverse)
library(janitor)
## mydataset1
mydataset1 <- tibble(id = 1:100,
column_000 = sample(1:16, 100, replace = TRUE),
flag1 = sample(0:1, 100, replace = TRUE),
flag2 = sample(0:1, 100, replace = TRUE),
flag3 = sample(0:1, 100, replace = TRUE),
flag4 = sample(0:1, 100, replace = TRUE),
flag5 = sample(0:1, 100, replace = TRUE))
## summary table function
get_table <- function(data, column) {
data %>%
# select the flag
filter(data[[i]] == 1) %>%
# summary table
tabyl(column) %>%
arrange(desc(n)) %>%
top_n(5, n)
}
## list of tables function
output_list <- function(data, column) {
# empty list
output <- list()
# for loop - go through each flagging column
for (i in (length(data)-4):length(data)) {
output[[i]] <- get_table(data, column)
}
# for some reason, there are NULL list items for all other columns
output <- compact(output)
# rename and print
names(output) <- names(data)[(length(data)-4):length(data)]
print(output)
}
### execute
output_list(mydataset1, "column_000")
# error
### manually executing the function works fine
# empty list
output <- list()
# for loop - go through each flagging column
for (i in (length(mydataset1)-4):length(mydataset1)) {
output[[i]] <- get_table(mydataset1, "column_000")
}
# for some reason, there are NULL list items for all other columns
output <- compact(output)
# rename and print
names(output) <- names(mydataset1)[(length(mydataset1)-4):length(mydataset1)]
print(output)
This is what I have for now.
If I execute the contents of output_list function manually, it works fine.
However, if I execute it as a function, it gives me an error that object i is not found.
Where did I get it wrong? Please help!
Pass i as an input to get_table function.
library(tidyverse)
library(janitor)
get_table <- function(data, column, i) {
data %>%
# select the flag
filter(data[[i]] == 1) %>%
# summary table
tabyl(column) %>%
arrange(desc(n)) %>%
top_n(5, n)
}
Make the corresponding changes in output_list function.
output_list <- function(data, column) {
# empty list
output <- list()
# for loop - go through each flagging column
for (i in (length(data)-4):length(data)) {
output[[i]] <- get_table(data, column, i)
}
# for some reason, there are NULL list items for all other columns
output <- compact(output)
# rename and print
names(output) <- names(data)[(length(data)-4):length(data)]
print(output)
}
Run the function -
output_list(mydataset1, "column_000")
In your get table function, you are using "i" but not declaring "i" in the function argument. Your code works fine when you run code separately because i value gets assigned from the for loop in the global environment. if you intend to use i from for loop in get_table function you can just declare it. See code below.
library(tidyverse)
library(janitor)
## mydataset1
mydataset1 <- tibble(id = 1:100,
column_000 = sample(1:16, 100, replace = TRUE),
flag1 = sample(0:1, 100, replace = TRUE),
flag2 = sample(0:1, 100, replace = TRUE),
flag3 = sample(0:1, 100, replace = TRUE),
flag4 = sample(0:1, 100, replace = TRUE),
flag5 = sample(0:1, 100, replace = TRUE))
## summary table function
get_table <- function(data, column) {
data %>%
# select the flag
filter(data[[i]] == 1) %>%
# summary table
tabyl(column) %>%
arrange(desc(n)) %>%
top_n(5, n)
}
## list of tables function
output_list <- function(data, column) {
# empty list
output <- list()
# for loop - go through each flagging column
for (i in (length(data)-4):length(data)) {
output[[i]] <- get_table(data, column)
}
# for some reason, there are NULL list items for all other columns
output <- compact(output)
# rename and print
names(output) <- names(data)[(length(data)-4):length(data)]
print(output)
}
### execute
output_list(mydataset1, "column_000")
# error
### manually executing the function works fine
# empty list
output <- list()
# for loop - go through each flagging column
for (i in (length(mydataset1)-4):length(mydataset1)) {
output[[i]] <- get_table(mydataset1, "column_000")
}
# for some reason, there are NULL list items for all other columns
output <- compact(output)
# rename and print
names(output) <- names(mydataset1)[(length(mydataset1)-4):length(mydataset1)]
print(output)

What does the error “arguments imply differing number of rows: x, y” mean?

I have a data.frame with 43958 rows and 3 columns (problem, project and value), and I'm trying to run an statistical test, but I'm dealing in this error:
Error in data.frame(problem.name = problem.name, avg.imp = avg.imp, k.esd = k.esd) :
arguments imply differing number of rows: 1, 0
My script bellow:
require('ScottKnottESD')
require('rowr')
sk_format <- function (data){
variables <- unique(data$problem)
result <- data.frame(matrix(ncol=length(variables),nrow=113))
for(i in seq(1,length(variables))){
result[i] <- data$value[data$problem==variables[i]]
}
colnames(result) <- variables
return (result)
}
avg.imp <- function(data){
data.esd <- sk_esd(sk_format(data))
data.esd <- data.frame(data.esd$groups)
data.esd$problem <- rownames(data.esd)
rownames(data.esd) <- NULL
result <- data.frame(problem.name=vector(),
avg.imp=vector(),
k.esd=vector())
variables <- unique(data$problem)
print(length(variables))
for(problem in 1:length(variables)){
sub <- data[data$problem==variables[problem],]
avg.imp <- mean(sub$value)
problem.name <- variables[problem]
k.esd <- data.esd[data.esd$problem==paste(problem.name),1]
row <- data.frame(problem.name=problem.name,
avg.imp=avg.imp,
k.esd=k.esd)
result <- rbind(result, row)
}
return (result)
}
eclipse.varimp <- read.csv("agora_vai.csv", sep = ",")
eclipse.vimp <- avg.imp(eclipse.varimp)
eclipse.vimp
Anyone can tell me how I can solve this error?
This is a data sample:
project,problem,value
albertoirurueta_irurueta-navigation,squid:CommentedOutCodeLine,0
albertoirurueta_irurueta-navigation,squid:S2129,0
albertoirurueta_irurueta-navigation,javascript:S1126,0
albertoirurueta_irurueta-navigation,Web:PageWithoutTitleCheck,0
albertoirurueta_irurueta-navigation,squid:S1155,0
albertoirurueta_irurueta-navigation,squid:S4784,0
problem
Looks like one of the variables is not populated when you assign a data frame to row.
This is what happens when you try to create a data frame from vectors of different lengths, for example, one of them being empty:
row <- data.frame(problem.name="X",
avg.imp= 5,
k.esd=vector())
This gives you the following error:
Error in data.frame(problem.name = "X", avg.imp = 5, k.esd = vector())
: arguments imply differing number of rows: 1, 0
Check your code carefully. I suspect that the problem happens here:
problem.name <- variables[problem]
But I cannot check this because there is no data sample provided.

How to get the result of a for loop to print as a 0 instead of integer(0)

I have a for loop that goes through a database and returns the elements I need into a vector, but some of the elements return as "integer(0)", is there a way to initially print them as "0" instead of 'integer(0)'
I have tried to switch the integer(0) to NA, but no luck.
ageStart = c()
for(i in CancerMet) {
x <- dbGetQuery(conn, paste('SELECT * FROM table WHERE Person = ', i, ';'))
info = fromJSON(x$info)
indx <- as.data.frame(info$dx)
inrx <- as.data.frame(info$rx)
beforedata <- indx[indx[,1]==4591,]
start <- head(beforedata[,2],1)
print(start)
**startAge <- c(startAge, capture.output(start))**
}
ageEventStarts <- as.data.frame(startAge)
Now, I get a vector that has some Integer(0) outputs but I want to replace the integer(0) outputs in the vector to 0. To get a vector of only numerical information. The portion with ** is where integer(0) outputs show up.
Probably, a check for length of beforedata will be helpful
for(i in CancerMet) {
x <- dbGetQuery(conn, paste('SELECT * FROM table WHERE Person = ', i, ';'))
info = fromJSON(x$info)
indx <- as.data.frame(info$dx)
inrx <- as.data.frame(info$rx)
beforedata <- indx[indx[,1]==4591,]
if (length(beforedata) > 0)
start <- head(beforedata[,2],1)
else
start <- 0
print(start)
startAge <- c(startAge, capture.output(start))
}
ageEventStarts <- as.data.frame(startAge)

MHSMM package R input data format with multiple variables

my problem is similar to the question as followingthe problem of R-input Format
I have tried the above code in the above link and revised some part to suit my data. my data is like follow
I want my data can be created as a data frame with 4 variable vectors. The code what I have revised is
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
The output observation is not the data of 4th col, it's a list of (4, 8, 12,...,396, 1, 1, ..., 56, 192,...,6550, 68, NA, NA,...) It has picked up 1/4 data of each col. Why it is like this?
Thank you very much!!!!
Why don't you simply count yout observations by Id, and create the hsmm.data object directly? Supposing yout dataframe is called "data", we have:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"
Extracted from http://www.jstatsoft.org/v39/i04/paper

Reading series of values in R

I have read a series of 332 files like below by storing the data in each file as a data frame in List.
files <- list.files()
data <- list()
for (i in 1:332){
data[[i]] = read.csv(files[[i]])
}
The data has 3 columns with names id, city, town. Now I need to calculate the mean of all values under city corresponding to the id values 1:10 for which I wrote the below code
for(j in 1:10){
req.data <- data[[j]]$city
}
mean(na.omit(req.data))
But it is giving me a wrong value and when I call it in a function its transferring null values. Any help is highly appreciated.
Each time you iterate through j = 1:10 you assign data[[j]]$city to the object req.data. In doing so, for steps j = 2:10 you are overwriting the previous version of req.data with the contents of the jth data set. Hence req.data only ever contains at any one time a single city's worth of data and hence you are getting the wrong answer sa you are computing the mean for the last city only, not all 10.
Also note that you could do mean(req.data, na.rm = TRUE) to remove the NAs.
You can do this without an explicit loop at the user R level using lapply(), for example, with dummy data,
set.seed(42)
data <- list(data.frame(city = rnorm(100)),
data.frame(city = rnorm(100)),
data.frame(city = rnorm(100)))
mean(unlist(lapply(data, `[`, "city")), na.rm = TRUE)
which gives
> mean(unlist(lapply(data, `[`, "city")), na.rm = TRUE)
[1] -0.02177902
So in your case, you need:
mean(unlist(lapply(data[1:10], `[`, "city")), na.rm = TRUE)
If you want to write a loop, then perhaps
req.data <- vector("list", length = 3) ## allocate, adjust to length = 10
for (j in 1:3) { ## adjust to 1:10 for your data / Q
req.data[[j]] <- data[[j]]$city ## fill in
}
mean(unlist(req.data), na.rm = TRUE)
> mean(unlist(req.data), na.rm = TRUE)
[1] -0.02177902
is one way. Or alternatively, compute the mean of the individual cities and then average those means
vec <- numeric(length = 3) ## allocate, adjust to length = 10
for (j in 1:3) { ## adjust to 1:10 for your question
vec[j] <- mean(data[[j]]$city, na.rm = TRUE)
}
mean(vec)

Resources