How can I initialize a date vector? - datetime

I have to use Julia for a problem set for class. I've never used it before and I've only ever really used Matlab/Stata.
I want to initialize an empty date vector, so that I can store values in it in a for loop. The for loop will have a lot of if statements that will change how I input into the date vector. It'd be something like this:
using Dates
date = vector(N,1) # Need help with this line
for i in 1:N
if stuff1
date[i] = blah1
elseif stuff2
date[i] = blah2
else
date[i] = blah3
end
end

Use:
julia> N = 5
5
julia> date = Vector{Date}(undef, N)
5-element Vector{Date}:
56207879210758--12056695473012767-3689348814741910322
97580475146329-6028347736506381-7378697629483820659
-99964444404694-24113390946025549-7378697629483820668
-54150401430865--30141738682531934-3689348814741910309
0001-02-07
for an uninitialized vector, or:
julia> fill(Date(2022, 03, 18), N)
5-element Vector{Date}:
2022-03-18
2022-03-18
2022-03-18
2022-03-18
2022-03-18
to initially fill it with some default value.

Related

Dynamically accessing globals through string interpolation

You can call variables through a loop in Python like this:
var1 = 1
var2 = 2
var3 = 3
for i in range(1,4):
print(globals()[f"var{i}"])
This results in:
1
2
3
Now I'm looking for the equivalent way to call variables using interpolated strings in Julia! Is there any way?
PS: I didn't ask "How to get a list of all the global variables in Julia's active session". I asked for a way to call a local variable using an interpolation in a string.
PS: this is dangerous.
Code:
var1 = 1
var2 = 2
var3 = 3
for i in 1:3
varname = Symbol("var$i")
println(getfield(Main, varname))
end
List of globals:
vars = filter!(x -> !( x in [:Base, :Core, :InteractiveUtils, :Main, :ans, :err]), names(Main))
Values of globals:
getfield.(Ref(Main), vars)
To display names and values you can either just use varinfo() or eg. do DataFrames(name=vars,value=getfield.(Ref(Main), vars)).
You don't. Use a collection like an array instead:
julia> values = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> for v in values
println(v)
end
1
2
3
As suggested in earlier comments, dynamically finding the variable names really isn't the approach you want to go for.

Filter strings by its time content in R

Suppose I have a string vector (file names actually):
x<-c("abcd20090809.txt", "bc20100209.txt", "bcd19971109.txt",
"abcef20120802.txt", "efg20151109.txt","xyz19860102.txt")
The numbers in x represent the time in format of yyyymmdd. What I wanted is to filter the x for the files's time before year 2000. e.g. an output would be:
> xx
[1] "bcd19971109.txt" "xyz19860102.txt"
You can use grep
grep(pattern = "^[a-z]+1", x, value = TRUE)
# [1] "bcd19971109.txt" "xyz19860102.txt"
edit
If we want to subset by the condition 'before 2010' we might do
thres <- as.Date("2010-01-01")
idx <- as.Date(unlist(regmatches(x, gregexpr("\\d+", text = x), )), format = "%Y%m%d") < thres
x[idx]
# [1] "abcd20090809.txt" "bcd19971109.txt" "xyz19860102.txt"
Here, I use substring to pull out the year and then I check it against your condition (i.e., < 2000) and pull out the elements of x that are TRUE.
x<-c("abcd20090809.txt", "bc20100209.txt", "bcd19971109.txt",
"abcef20120802.txt", "efg20151109.txt","xyz19860102.txt")
x[as.numeric(substring(x,nchar(x)-11,nchar(x)-8))<2000]
#> [1] "bcd19971109.txt" "xyz19860102.txt"
Created on 2019-02-08 by the reprex package (v0.2.1)

Conditional statements with Dataframes [Julia v1.0]

I am porting over custom functions from R. I would like to use Julia Dataframes to store my data. I like to reference by column name instead of, say, array indices hence I am using the Dataframes package.
I simplified the follow to illustrate:
if( DataFrame(x=1).x .>1) end
The error is:
ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context
Is there a simple workaround that would allow me to continue using DataFrames?
The expression:
DataFrame(x=1).x .> 1
Does the following things:
Creates a DataFrame
Extracts a column x from it
Compares all elements of this column to 1 using vectorized operation .> (broadcasting in Julia parlance)
In effect you get the following one element array:
julia> DataFrame(x=1).x .> 1
1-element BitArray{1}:
false
As opposed to R, Julia distinguishes between vectors and scalars so it is not the same as simply writing false. Moreover if statement expects a scalar not a vector, so something like this works:
if 2 > 1
println("2 is greater than 1")
end
but not something like this:
if DataFrame(x=2).x .> 1
println("success!")
end
However, for instance this would work:
if (DataFrame(x=2).x .> 1)[1]
println("success!")
end
as you extract the first (and only in this case) element from the array.
Notice that in R if you passed more than one-element vector to a conditional expression you get a warning like this:
> if (c(T,F)) {
+ print("aaa") } else {print("bbb")}
[1] "aaa"
Warning message:
In
the condition has length > 1 and only the first element will be used
Simply Julia is stricter than R in checking the types in this case. In R you do not have a distinction between scalars and vectors, but in Julia you have.
EDIT:
length(df) returns you the number of columns of a DataFrame (not number of rows). If you are coming from R it is easier to remember nrow and ncol functions.
Now regarding your question you can write either:
for i in 1:nrow(df)
if df.x[i] > 3
df.y[i] = df.x[i] + 1
end
end
or
bigx = df.x .> 3
df.y[bigx] = df.x[bigx] .+ 1
or
df.y .= ifelse.(df.x .> 3, df.x .+ 1, df.y)
or using DataFramesMeta to shorten the notation:
using DataFramesMeta
#with df begin
df.y .= ifelse.(:x .> 3, :x .+ 1, :y)
end
or
using DataFramesMeta
#byrow! df begin
if :x > 3
:y = :x + 1
end
end

rowr::cbind.fill() changes characters value to numeric

I have two variables date and referencenumber. Both are extracted from a text string, with the use of a regular expression. They both have the class character.
When I use the cbind.fill function to combine these variables in an already excising dataframe the values are transformed to numeric values, 1 and 1. Instead of "06-07-2016" and "123ABC". I use the cbind.fill function because something only 1 variables is found, and then this variable still must be placed in the dataframe.
When I run the same code on a computer at school, it doesn't transform the values to numeric. So maybe it has something to do with my settings?
Why is this happening?
library(rowr)
dataframevariablen <- as.data.frame(matrix(nrow = 0, ncol = 2))
colnames(dataframevariablen) <- c("date", "refnr")
rulebased(dfgg$Text[i]) #returns the date and refnr as global variable
dataframevariablen[i,] <- cbind.fill(date,refnr, fill = NULL)
This works for you?
x <- c("6jul2016", "2jan1960", "31mar1960", "30jul1960")
date <- as.Date(x, "%d%b%Y")
refnr="123ABC" #returns the date and refnr as global variable
for (i in 1:length(date))
dataframevariablen[i,] <- data.frame(date[i],refnr,stringsAsFactors = F)
dataframevariablen$date=as.Date(dataframevariablen$date,origin="1970-01-01")
dataframevariablen
date refnr
1 2016-07-06 123ABC
2 1960-01-02 123ABC
3 1960-03-31 123ABC
4 1960-07-30 123ABC

Union of time intervals that are not necessarily contiguous

I am looking for an implementation of union for time intervals which is capable of dealing with unions that are not themselves intervals.
I have noticed lubridate includes a union function for time intervals but it always returns a single interval even if the union is not an interval (ie it returns the interval defined by the minimum of both start dates and the maximum of both end dates, ignoring intervening periods not covered by either interval):
library(lubridate)
int1 <- new_interval(ymd("2001-01-01"), ymd("2002-01-01"))
int2 <- new_interval(ymd("2003-06-01"), ymd("2004-01-01"))
union(int1, int2)
# Union includes intervening time between intervals.
# [1] 2001-01-01 UTC--2004-01-01 UTC
I have also looked at the interval package, but its documentation makes no reference to union.
My end goal is to use the complex union with %within%:
my_int %within% Reduce(union, list_of_intervals)
So if we consider a concrete example, suppose the list_of_intervals is:
[[1]] 2000-01-01 -- 2001-01-02
[[2]] 2001-01-01 -- 2004-01-02
[[3]] 2005-01-01 -- 2006-01-02
Then my_int <- 2001-01-01 -- 2004-01-01 is not %within% the list_of_intervals so it should return FALSE and my_int <- 2003-01-01 -- 2006-01-01 is so it should be TRUE.
However, I suspect the complex union has more uses than this.
If I understand your question correctly, you'd like to start with a set of potentially overlapping intervals and obtain a list of intervals that represents the UNION of the input set, rather than just the single interval spanning the mininum and maximum of the input set. This is the same question I had.
A similar question was asked at: Union of intervals
... but the accepted response fails with overlapping intervals. However, hosolmaz (I am new to SO, so don't know how to link to this user) posted a modification (in Python) that fixes the issue, which I then converted to R as follows:
library(dplyr) # for %>%, arrange, bind_rows
interval_union <- function(input) {
if (nrow(input) == 1) {
return(input)
}
input <- input %>% arrange(start)
output = input[1, ]
for (i in 2:nrow(input)) {
x <- input[i, ]
if (output$stop[nrow(output)] < x$start) {
output <- bind_rows(output, x)
} else if (output$stop[nrow(output)] == x$start) {
output$stop[nrow(output)] <- x$stop
}
if (x$stop > output$stop[nrow(output)]) {
output$stop[nrow(output)] <- x$stop
}
}
return(output)
}
With your example with overlapping and non-contiguous intervals:
d <- as.data.frame(list(
start = c('2005-01-01', '2000-01-01', '2001-01-01'),
stop = c('2006-01-02', '2001-01-02', '2004-01-02')),
stringsAsFactors = FALSE)
This produces:
> d
start stop
1 2005-01-01 2006-01-02
2 2000-01-01 2001-01-02
3 2001-01-01 2004-01-02
> interval_union(d)
start stop
1 2000-01-01 2004-01-02
2 2005-01-01 2006-01-02
I am a relative novice to R programming, so if anyone could convert the interval_union() function above to accept as parameters not only the input data frame, but also the names of the 'start' and 'stop' columns to use so the function could be more easily re-usable, that'd be great.
Well, in the example you provided, the union of int1 and int2 could be seen just as a vector with the two intervals :
int1 <- new_interval(ymd("2001-01-01"), ymd("2002-01-01"))
int2 <- new_interval(ymd("2003-06-01"), ymd("2004-01-01"))
ints <- c(int1,int2)
%within% works on vectors, so you can do something like this :
my_int <- new_interval(ymd("2001-01-01"), ymd("2004-01-01"))
my_int %within% ints
# [1] TRUE FALSE
So you can check if your interval is in one of the intervals of your list with any :
any(my_int %within% ints)
# [1] TRUE
Your comment is right, the results given by %within% doesn't seem coherent with the documentation, which says :
If a is an interval, both its start and end dates must fall within b
to return TRUE.
If I look at the source code of %within% when a and b are both intervals, it seems to be the following :
setMethod("%within%", signature(a = "Interval", b = "Interval"), function(a,b){
as.numeric(a#start) - as.numeric(b#start) <= b#.Data & as.numeric(a#start) - as.numeric(b#start) >= 0
})
So it seems that only the starting point of a is tested against b, and it looks coherent with the results. Maybe this should be considered as a bug and should be reported ?

Resources