How to store vector of string in Julia? - julia

I have two numerical one dimensional vector (A,B) of size ~800000, and through some hashing operation these two are combined together and produce a string of size 6. Now I don't know how to store these strings? Whatever I do, it gives me an error.
I've tried using ArrayArray{string}(undef, 6) and also Dict.
My try is something like this:
# import pakages: read from csv, hashing
using DelimitedFiles
using GeohashHilbert
csvfilename = "C:/Users/lin/Desktop/uber-raw-data-jul14.csv"
csvdata = readdlm(csvfilename, ',', header=true)
data = csvdata[1]
header = csvdata[2]
lat = data[:,2]
long = data[:,3]
lat_len = length(lat)
#GeoHashed = GeohashHilbert.encode(lon, lat, precision, bits_per_char)
GeoHashed = Dict()
for i in 1:lat_len
GeoHashed[i] = GeohashHilbert.encode(long[i], lat[i], 6, 6)
end
What's the issue??
ERROR: LoadError: MethodError: no method matching isless(::Int64, ::SubString{String})
Closest candidates are:
isless(::AbstractString, ::AbstractString) at strings/basic.jl:344
isless(::Any, ::Missing) at missing.jl:88
isless(::Missing, ::Any) at missing.jl:87
...

It looks like you're feeding strings into the encode function which doesn't work:
julia> GeohashHilbert.encode("51.1", "0.5", 6, 6)
ERROR: MethodError: no method matching isless(::Int64, ::String)
...
Stacktrace:
[1] <(x::Int64, y::String)
# Base .\operators.jl:352
[2] <=(x::Int64, y::String)
# Base .\operators.jl:401
[3] encode(lon::String, lat::String, precision::Int64, bits_per_char::Int64)
# GeohashHilbert C:\Users\ngudat\.julia\packages\GeohashHilbert\vh6xu\src\GeohashHilbert.jl:118
[4] top-level scope
# REPL[62]:1
[5] top-level scope
# C:\Users\ngudat\.julia\packages\CUDA\KnJGx\src\initialization.jl:52
You probably meant to do:
julia> GeohashHilbert.encode(51.1, 0.5, 6, 6)
"W13T#3"
so your problem is likely reading in the data. It's impossible to tell without having the csv file available, but I'm assuming if you did typeof(lat) you would get Vector{SubString{String}} instead of Vector{Float64} as you seem to expect.
So the solution is probably to use a more fully featured CSV reader like CSV.jl to read your csv file to ensure that you end up with numerical data, or do parse.(Float64, lat) to convert your data after reading it in from csv.

Related

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

How does R Markdown automatically format print effects into dataframes? Or how can I access special print methods?

I'm working with the WRS2 package and there are cases where it'll output its analysis (bwtrim) into a list with a special class of the analysis type class = "bwtrim". I can't as.data.frame() it, but I found that there is a custom print method called print.bwtrim associated with it.
As an example let's say this is the output: bwtrim.out <- bwtrim(...). When I run the analysis output in an Rmarkdown chunk, it seems to "steal" part of the text output and make it into a dataframe.
So here's my question, how can I either access print.bwtrim or how does R markdown automatically format certain outputs into dataframes? Because I'd like to take this outputted dataframe and use it for other purposes.
Update: Here is a minimally working example -- put the following in a chunk in Rmd file."
```{r}
library(WRS2)
df <-
data.frame(
subject = rep(c(1:100), each = 2),
group = rep(c("treatment", "control"), each = 2),
timepoint = rep(c("pre", "post"), times = 2),
dv = rnorm(200, mean = 2)
)
analysis <- WRS2::bwtrim(dv ~ group * timepoint,
id = subject,
data = df,
tr = .2)
analysis
```
With this, a data.frame automatically shows up in the chunk afterwards and it shows all the values very nicely. My main question is how can I get this data.frame for my own uses. Because if you do str(analysis), you see that it's a list. If you do class(analysis) you get "bwtrim". if you do methods(class = "bwtrim"), you get the print method. And methods(print) will have a line that says print.bwtrim*. But I can't seem to figure out how to call print.bwtrim myself.
Regarding what Rmarkdown is doing, compare the following
If you run this in a chunk, it actually steals the data.frame part and puts it into a separate figure.
```{r}
capture.output(analysis)
```
However, if you run the same line in the console, the entire output comes out properly. What's also interesting is that if you try to assign it to another object, the output will be stolen before it can be assigned.
Compare x when you run the following in either a chunk or the console.
```{r}
x<-capture.output(analysis)
```
This is what I get from the chunk approach when I call x
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] ""
This is what I get when I do it all in the console
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] " value df1 df2 p.value"
[6] "group 1.0397 1 56.2774 0.3123"
[7] "timepoint 0.0001 1 57.8269 0.9904"
[8] "group:timepoint 0.5316 1 57.8269 0.4689"
[9] ""
My question is what can I call whatever Rstudio/Rmarkdown is doing to make data.frames, so that I can have an easy data.frame myself?
Update 2: This is probably not a bug, as discussed here https://github.com/rstudio/rmarkdown/issues/1150.
Update 3: You can access the method by using WRS2:::bwtrim(analysis), though I'm still interested in what Rmarkdown is doing.
Update 4: It might not be the case that Rmarkdown is stealing the output and automatically making dataframes from it, as you can see when you call x after you've already captured the output. Looking at WRS2:::print.bwtrim, it prints a dataframe that it creates, which I'm guessing Rmarkdown recognizes then formats it out.
See below for the print.bwtrim.
function (x, ...)
{
cat("Call:\n")
print(x$call)
cat("\n")
dfx <- data.frame(value = c(x$Qa, x$Qb, x$Qab), df1 = c(x$A.df[1],
x$B.df[1], x$AB.df[1]), df2 = c(x$A.df[2], x$B.df[2],
x$AB.df[2]), p.value = c(x$A.p.value, x$B.p.value, x$AB.p.value))
rownames(dfx) <- c(x$varnames[2], x$varnames[3], paste0(x$varnames[2],
":", x$varnames[3]))
dfx <- round(dfx, 4)
print(dfx)
cat("\n")
}
<bytecode: 0x000001f587dc6078>
<environment: namespace:WRS2>
In R Markdown documents, automatic printing is done by knitr::knit_print rather than print. I don't think there's a knit_print.bwtrim method defined, so it will use the default method, which is defined as
function (x, ..., inline = FALSE)
{
if (inline)
x
else normal_print(x)
}
and normal_print will call print().
You are asking why the output is different. I don't see that when I knit the document to html_document, but I do see it with html_notebook. I don't know the details of what is being done, but if you look at https://rmarkdown.rstudio.com/r_notebook_format.html you can see a discussion of "output source functions", which manipulate chunks to produce different output.
The fancy output you're seeing looks a lot like what knitr::knit_print does for a dataframe, so maybe html_notebook is substituting that in place of print.

How to save an object whose name is in a variable?

This is calling for some "tricky R", but this time it's beyond my fantasy :-) I need to save() an object whose name is in the variable var. I tried:
save(get(var), file = ofn)
# Error in save(get(var), file = ofn) : object ‘get(var)’ not found
save(eval(parse(text = var)), file = ofn)
# Error in save(eval(parse(text = var)), file = ofn) :
# object ‘eval(parse(text = var))’ not found
both of which fail, unfortunatelly. How would you solve this?
Use the list argument. This saves x in the file x.RData. (The list argument can specify a vector of names if you need to save more than one at a time.)
x <- 3
name.of.x <- "x"
save(list = name.of.x, file = "x.RData")
# loading x.RData to check that it worked
rm(x)
load("x.RData")
x
## [1] 3
Note
Regarding the first attempt in the question which attempts to use get we need to specify the name rather than its value so that attempt could use do.call converting the character name to a name class object.
do.call("save", list(as.name(name.of.x), file = "x.RData"))
Regarding the second attempt in the question which uses eval, to do that write out the save, substitute in its name as a name class object and then evaluate it.
eval(substitute(save(Name, file = "x.RData"), list(Name = as.name(name.of.x))))
If it's just one object, you can use saveRDS:
a<-1:4
var<-"a"
saveRDS(get(var),file="test.R")
readRDS(file="test.R")
[1] 1 2 3 4

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Concatenate multiple strings in R

I want to concatenate the below urls, I have written a below function to concatenate all the urls:
library(datetime)
library(lubridate)
get_thredds_url<- function(mon, hr){
a <-"http://abc.co.in/"
b <-"thredds/path/"
c <-paste0("%02d", ymd_h(mon))
d <-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e <-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c, hr))
url <-paste0(a,b,b,d)
return (url)
}
mon = datetime(2017, 9, 26, 0)
hr = 240
url = get_thredds_url(mon,hr)
print (url)
But I am getting below error when I execute the definition of get_thredds_url():
Error: unexpected ',' in:
" d<-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e<-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c,"
url <-paste0(a,b,b,d)
Error in paste0(a, b, b, d) : object 'a' not found
return (url)
Error: no function to return from, jumping to top level
}
Error: unexpected '}' in "}"
What is wrong with my function and how can I solve this?
The final output should be:
http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240
Using sprintf allows more control of values being inserted into string
library(lubridate)
get_thredds_url<- function(mon, hr){
sprintf("http://abc.co.in/thredds/path/%s/gfs.t%02dz.pgrb2.0p25.f%03d",
strftime(mon, format = "%Y%m%d%H", tz = "UTC"),
hour(mon),
hr)
}
mon <- make_datetime(2017, 9, 26, 0, tz = "UTC")
hr <- 240
get_thredds_url(mon, hr)
[1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
It was a bit messy to figure out what it is, you're trying to do. There seem to be quite a couple of contradicting pieces in your code, especially compared to your wanted final output. Therefore, I decided to focus on the wanted output and the inputs you provided in your variables.
get_thredds_url <- function(yr, mnth, day, hrs1, hrs2){
part1 <- "http://abc.co.in/"
part2 <- "thredds/path/"
ymdh <- c(yr, formatC(c(mnth, day, hrs1), width=2, flag="0"))
part3 <- paste0(ymdh, collapse="")
pre4 <- formatC(hrs1, width=2, flag="0")
part4 <- paste0("/gfs.t", pre4, "z.pgrb2.0p25.f", hrs2)
return(paste0(part1, part2, part3, part4))
}
get_thredds_url(2017, 9, 26, 0, 240)
# [1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
The key is using paste0() appropriately and I think formatC() may be new to some people (including me).
formatC() is used here to pad zeros in front of the number you provide, and thus makes sure that 9 is converted to 09, whereas 12 remains 12.
Note that this answer is in base R and does not require additional packages.
Also note that you should not use url and c as variable names. These names are already reserved for other functionalities in R. By using them as variable names, you are overwriting their actual purpose, which can (will) lead to problems at some point down the road

Resources