I am trying to run the cspade function in the arulesSequences package in R. After I successfully read in my transactions using read_baskets, I try to execute the cspade function against the transactions object I read in.
However, when I execute the command, I obtain an error: system invocation failed. Specifically, here is the output.
preprocessing ... 1 partition(s), 1.2 MB [0.23s]
mining transactions ...Error in cspade(table, parameter = list(support = 0.1), control = list (verbose = TRUE)) :
system invocation failed
The presence of "mining transactions" indicates that the following function call in the cspade code is failing.
if (system2(file.path(exe, "spade"), args = c("-i", file,
"-s", parameter#support, opt, "-e", nop, "-o"), stdout = out))
stop("system invocation failed").
For reference, I can successfully generate sequences using the example zaki dataset.
Does anyone have any idea why that command might be failing?
Thanks,
Stewart
http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE#Caveats
See link above, caveats section and lower your support threshold.
I have the same problem, I decreased the support ( put 0), still didn't work. But came up with another solution:
I changed the data size (less rows) and it works.
so if for example you can devide your data set to 2 parts of half size and make transcation data of each part as x_1 and x_2,
frequent_pattern_1 <- cspade(x_1,parameter = list(support = 0 ))
frequent_pattern_2 <- cspade(x_2,parameter = list(support = 0 ))
then get the absolute support of analysing each dataset to get result of all.
get the absolute support value
support_x_1<-support(frequent_pattern_1 ,x_1, type= c("absolute"), control = NULL)
support_x_2<-support(frequent_pattern_2 ,x_2, type= c("absolute"), control = NULL)
then finding the matching sequence and summing the supports of the matched ones.
Related
I'm trying to access an API from iNaturalist to download some citizen science data. I'm using the package rinat to get this done (see vignette). The loop below is, essentially, pulling all observations for one species, in one state, in one year iteratively on a per-month basis, then summing the number of observations for that year (input parameters subset from my actual script for convenience).
require(rinat)
state_ids <- c(18, 14)
bird_ids <- c(14886,1409)
months <- c(1:12)
final_nums <- vector()
for(i in 1:length(state_ids)){
total_count <- vector()
for(j in 1:length(months)){
monthly <- get_inat_obs(place_id=state_ids[i],
taxon_id=bird_ids[i],
year=2019,
month = months[j])
total_count <- append(total, length(monthly$scientific_name))
print(paste("done with month", months[j], "in state", state_ids[i]))
}
final_nums <- append(final_nums, sum(total_count))
print(paste("done with state", state_ids[i]))
}
Occasionally, and seemingly randomly, I get the following error:
No encoding supplied: defaulting to UTF-8.
Error in if (!x$headers$`content-type` == "text/csv; charset=utf-8") { :
argument is of length zero
This ends up breaking the loop or makes the loop run without actually pulling any real data. Is this an issue with my script, or the API, or something else? I've tried manually supplying encoding information to the get_inat_obs() function, but it doesn't accept that as an argument. Thank you in advance!
I don't believe this is an error in your script. The issue is with the api most likely.
the error argument is of length zero is a common error when you try to make a comparison that has no length. For example:
if(logical(0) == "TEST") print("WORKED!!")
#Error in if (logical(0) == "TEST") print("WORKED!!") :
# argument is of length zero
I did some a few greps on their source code to see where this if statement is and it seems to be within inat_handle line 211 in get_inate_obs.R
This would suggest that the authors did not expect for
!x$headers$`content-type` == 'text/csv; charset=utf-8'
to evaluate to logical(0), but more specifically
x$headers$`content-type`
to be NULL.
I would suggest making a bug report on their GitHub and recommend they change the specified line to:
if(is.null(x$headers$`content-type`) || !x$headers$`content-type` == 'text/csv; charset=utf-8'){
Suggesting a bug is usually more well received if you have a reproducible example.
Also, you could totally make this change yourself locally by cloning out the git repository, editing the file, rebuild the package, and then confirm if you no longer get an error in your code.
I have a large quantity of .wav files that I need to analyze using the acoustic indices from the "soundecology" package in R. However, the recordings do not have uniform start times and I need to analyze specific periods of time within the files. I want to create a function and loop for automating the process.
I have created a spread sheet for each folder of recordings (each folder is a different location) that lays out the recording and the times within each recording that I need to analyze. Basically, a row contains: the sound file name, the time when the sample should start (eg. 09:00:00, the number of seconds from the start of the file that that time occurs, and the munber of seconds from the start time of the file that the end of the sample should occur.
That data looks like this:
Spread sheet of data
I am using the package "tuneR" and "warbleR" to select the specific portion of a sound file that I want to analyze. Here is the the code and the output that I would like to loop across all the sound files:
wavrow1 <-read_wave(mvb$sound.files[1], from = mvb$start[1], to = mvb$end[1])
wavrow1.aci <- acoustic_complexity(wavrow1, j=10)
which yeilds
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 934.568
However, when I put this into a function in order to then put it into a loop I get a different output.
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
row1.test <- acianalyzeFUN(mvb, 1)
This gives the output:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Which is different.
So I need to fix this function and put it into a loop so that I can apply it across all the files and save the results into a data frame or ultimately another spread sheet.
I was thinking a loop like the following might work but I am also getting errors with it:
output <- vector("logical", length(97))
for (i in seq_along(mvb$sound.files)) {
output[[i]] <- acianalyzeFUN(mvb, i)
}
Which returns this error:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Error in output[[i]] <- acianalyzeFUN(mvb, i) :
more elements supplied than there are to replace
Thanks for any help and advice on this. Please let me know if there are any other pieces of information that would be helpful.
the read_wave function takes following arguments :
read_wave(X, index, from = X$start[index], to = X$end[index], channel = NULL,
header = FALSE, path = NULL)
In the manual test, you specify from = mvb$start[1], to = mvb$end[1]
In the function you created, you dont specify the arguments :
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
so that mvb$start[i] gets affected to index and mvb$end[i] to from.
You should write:
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], from = mvb$start[i], to = mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
This should explain the difference you observe.
Regarding the error, you create a vector of logical to collect the result, but acianalyzeFUN returns nothing : it just sets two variables r and soundfileaci without returning anything.
Question
I am using the package mongolite to connect from R to MongoDB database. I want to see whether it is possible to do multiple insert in parallel into the database? While mcmapply in the parallel package works in other cases, it doesn't seem to work for multiple insertion into MongoDB.
Example codes
I have a data frame grid.df containing grid IDs and geographical locations for the grid points. I want to insert the id_grid, lon and lat columns to the database.
insert.grid <- function(i){
# check which data to insert
criteria <- sprintf('{"id_grid" : %s}', id_grid[i])
# values to be inserted
newdoc <- sprintf('{"$set" : {"id_grid" : %s, "loc" : {"type" : "Point", "coordinates" : [%s, %s] } } }',
grid.df$id_grid[i],
round(grid.df$lon[i], digits = 4),
round(grid.df$lat[i], digits = 4)
)
inserted <- F
while (!inserted) {
inserted <- gcon$update(query = criteria, update = newdoc, upsert = T)
}
}
Attempts
1) Using mcmapply
mcmapply works fine with 1 core. However, as soon as I try to use multiple cores, it fails.
results <- mcmapply(insert.grid, 1:nrow(grid.df), mc.cores = 4)
Setting mc.preschedule parameter to FALSE didn't help neither.
results <- mcmapply(insert.grid, 1:nrow(grid.df), mc.cores = 4, mc.preschedule = FALSE)
Both are giving the following error:
Warning message: In mclapply(seq_len(n), do_one, mc.preschedule =
mc.preschedule, : all scheduled cores encountered errors in user
code
2) Using file locks
I also followed the approach introduced here: http://www.quintuitive.com/2014/11/20/synchronization-for-r-with-the-flock-package/ and while it works with multiple cores, I do not really gain much in the performance since making the locks for every insert completely slows down the process and it similar to use 1 core.
This was solved by moving the gcon connection inside the function definition. The problem was simply that only one connection was defined and could not work for parallel insert.
I am trying to build the inline segment to filter the pages (ex. to separate the pages for blogs and games) using the function BuildClassificationValueSegment() to get the data from Adobe Analytics API,
I have tried some thing like
report.data.visits <- QueueTrended(reportsuite.id,date.from,date.to,metrics,elements,
segment.inline = BuildClassificationValueSegment("evar2","blog","OR")).
Got error like :
Error in ApiRequest(body = report.description, func.name = "Report.Validate") :
ERROR: segment_invalid - Segment "evar2" not valid for this company
In addition: Warning message:
In if (segment.inline != "") { :
the condition has length > 1 and only the first element will be used
Please help on the same.Thanks in advance...
I recommend you to declare the InlineSegment in advance and store it in a variable. Then pass it to the QueueTrended function.
I've been using the following syntax to generate an inline segment:
InlineSegment <- list(container=list(type=unbox("hits"),
rules=data.frame(
name=c("Page Name(eVar48)"),
element=c("evar48"),
operator=c("equals"),
value=c(as.character("value1","value2"))
))
You can change the name and element arguments in order to personalize the query.
The next step is to pass the InlineSegment to the QueueRanked function:
Report <- as.data.frame(QueueRanked("reportsuite",
date.from = dateStart,
date.to = dateEnd,
metrics = c("pageviews"),
elements = c("element"),
segment.inline = InlineSegment,
max.attempts=500))
I borrowed that syntax from this thread some time ago: https://github.com/randyzwitch/RSiteCatalyst/issues/129
Please note that there might be easier ways to obtain this kind of report without using InlineSegmentation. Maybe you can use the selected argument from the QueueRanked function in order to narrow down the scope of the report.
Also, I'm purposefully avoiding the BuildClassificationValueSegment function as I found it a bit difficult to understand.
Hope this workaround helps...
Upon trying to calculate precision#k, I get an exception. To what follows is the a simple code that reproduces the problem.
First the code defines the variable scope:
initializer = tf.random_uniform_initializer(-0.1, 0.1, seed=1234)
with tf.variable_scope("model", reuse=None, initializer=initializer)
Then it calls those lines:
predictions = tf.Variable(tf.ones([2, 10], tf.int64))
labels = tf.Variable(tf.ones([2, 1], tf.int64))
precision = tf.contrib.metrics.streaming_sparse_precision_at_k(predictions, labels, 5)
tf.initialize_all_variables().run()
(I know this code is meaningless, and tries to calculate the precision given 2 fixed matrices...)
Then I get the following exception:
W tensorflow/core/framework/op_kernel.cc:936] Failed precondition:
Attempting to use uninitialized value
model/precision_at_5/false_positive_at_5 [[Node:
model/precision_at_5/false_positive_at_5/read = IdentityT=DT_DOUBLE,
_class=["loc:#model/precision_at_5/false_positive_at_5"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
The same goes when I tried to invoke streaming_sparse_recall_at_k instead of streaming_sparse_precision_at_k.
The installed version is r0.10 on linux with python 2.7.
Please help... Thanks in advance :)
Unfortunately, tf.initialize_all_variables() doesn't initialize "local" variables (which tend to be internal implementation details for ops like tf.contrib.metrics.streaming_sparse_precision_at_k() and tf.train.string_input_producer(), as opposed to variables used as model weights).
You'll need to add a line to your program that runs tf.initialize_local_variables() before running the evaluation op:
sess.run(tf.initialize_local_variables()) # or `tf.initialize_local_variables().run()`