converting data into time series data

converting data into time series data - r

I tried to enter these commands but it does not work... can someone help please?
OPts <- read.csv(OILBRENT)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
OPts <- ts(OILBRENT, start = c(jun-87), end = c(Jan-20), frequency =12)
Error in ts(OILBRENT, start = c(jun - 87), end = c(Jan - 20), frequency = 12) :
object 'jun' not found
OPts <- ts(OILBRENT, start = c(jun-87, 1), end = c(Dec-19, 12), frequency =12)
Error in ts(OILBRENT, start = c(jun - 87, 1), end = c(Dec - 19, 12), frequency = 12) :
object 'jun' not found

seems you are missing some quotation marks.
read.csv expects either a character pointing to a file, or a connection. If you are trying to read a file in your current working directory try OPts <- read.csv("OILBRENT").
Both ts commands seem to be failing because the start parameter is expecting either a number or a numeric vector. When you write start = c(jun-87), R will try to find an object jun and then compute the difference jun - 87. From the man page for ts:
start: the time of the first observation. Either a single number or
a vector of two integers, which specify a natural time unit
and a (1-based) number of samples into the time unit. See
the examples for the use of the second form.

Related

Transaction problem in RStudio for tweet apriori analysis

I want to use the apriori algorithm to apply association rules between words on the tweet database I have with RStudio. However, the code below gives an error on a million rows of data, while working on a small number of data. I needed your help as I couldn't understand what caused the error.
TweetTrans <- read.transactions("../input/tweets/output.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
The Error is:
Error in validObject(.Object): invalid class “ngCMatrix” object: row indices are not sorted within columns
Traceback:
1. read.transactions("../input/tweets/output.csv", rm.duplicates = FALSE,
. format = "basket", sep = ",", encoding = "UTF-8")
2. as(data, "transactions")
3. asMethod(object)
4. new("transactions", as(from, "itemMatrix"), itemsetInfo = data.frame(transactionID = names(from),
. stringsAsFactors = FALSE))
5. initialize(value, ...)
6. initialize(value, ...)
7. callNextMethod()
8. .nextMethod(.Object = .Object, ... = ...)
9. callNextMethod()
10. .nextMethod(.Object = .Object, ... = ...)
11. as(from, "itemMatrix")
12. asMethod(object)
13. new("ngCMatrix", p = c(0L, p), i = as.integer(i) - 1L, Dim = c(length(levels(i)),
. length(p)))
14. initialize(value, ...)
15. initialize(value, ...)
16. callNextMethod()
17. .nextMethod(.Object = .Object, ... = ...)
18. validObject(.Object)
19. stop(msg, ": ", errors, domain = NA)

Here are some ideas for how to find a rogue line in the data file. The input to read.transactions should be a text file the looks something like
A, B, C
B, C
C, D, E
D, A, B, F
where A, B ,C, etc are the names of the items (probably longer than one character each!)
So you could read in the file using readLines...
data <- readLines("../input/tweets/output.csv")
Each element of data (one per line of the file) should be a string of the form "A, B, C" etc, as above.
You could then use functions (e.g. from the stringr package) to check if any lines contain unusual characters, or have an odd format. Without seeing your file, it is hard to say how to do this, but you might, for example, look for quotes in odd places (str_detect(data, '\\"')) or characters that are not letters, digits , spaces or commas (str_detect(data, "[^\\w\\d\\s,]")).
Another thing you could try is to write a for loop to take each element of data (or perhaps larger chunks if that is too slow), save it as a file, try reading it with read.transactions, and see where it crashes.
for(i in seq_along(data)){
writeLines(data[i], "dummyfile.csv")
trans <- read.transactions("dummyfile.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
}
The value of i when it crashes will give you the problem row number. It might take a long time to run, though!

I ran into a very similar problem: the same error got triggered when trying to cast a list to a transaction object.
I also couldn't easily figure out what lines in the data caused the issue, as it seems to be triggered by a combination of transactions and not necessarily by any individual one, but I managed to track down the source of the problem in this assignment (source):
p <- new("ngCMatrix", p = c(0L, p),
i = as.integer(i) - 1L,
Dim = c(length(levels(i)), length(p)))
My R got pretty rusty over time and I couldn't find an immediate way to patch the code, but I came up with an alternative solution for constructing the ngCMatrix object:
Assume you have the data in a data.frame following some sort of (user, item) format - in your case it would most likely be (tweet_id, term/word)
Create a unique incremental ID for every user and item and add it to your data.frame
Use those ID to create the sparse matrix and - optionally - enrich it with the labels for item and user to make it more interpretable
Finally, cast the sparse matrix to a transaction object
Example (I implemented mine with data.table, but a traditional dataframe implementation would be very similar):
library(Matrix)
library(data.table)
library(arules)
DT <- data.table(user = c('A','A','B','B','A','C','D'),
item = c('AAB','AAA','AAB','BBB','ABA','BBB','AAB'))
# Create user_ids
unique_users <- unique(DT$user)
users <- data.table(user=unique_users,
user_id=c(1:length(unique_users)))
# Repeat for items
unique_items <- unique(DT$item)
items <- data.table(item=unique_items,
item_id=c(1:length(unique_items)))
# Add indexes to original data table (setting keys helps with performance)
DT <- merge.data.table(x=DT, y=users, by='user')
DT <- merge.data.table(x=DT, y=items, by='item')
# Create the sparse matrix
mat <- sparseMatrix(
i = DT$item_id,
j = DT$user_id,
dims = c(nrow(items), nrow(users)),
dimnames = list(items$item, users$user)
)
# transform to arules 'transactions'
txn <- as(op, "transactions")
Please note that this doesn't help understanding what caused the issue, but rather provides a workaround to solve it. In my data.table implementation the code is pretty performant, taking only a few seconds to process over 30M transactions on a laptop-sized machine (2 CPUs, 16gb RAM).

How to open/read an ENVI file in R

I'm new at using GIS with R and I'm trying to open an ENVI file containing hyperspectral data following the suggestions from this post R how to read ENVI .hdr-file?, but I don't seem to be able to do so. I tried three different approaches but all of them failed. I also can't seem to find any other posts where my problem is described.
# install.packages("rgdal")
# install.packages("raster")
# install.packages("caTools")
library("rgdal")
library("raster")
library("caTools")
dirname <- "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights"
filename <- file.path(dirname, "AISA_Flight_4_resampled")
file.exists(filename)
The first option that I tried was using file name only
x <- read.ENVI(filename)
But I got the following error message:
#Error in read.ENVI(filename) :
# read.ENVI: Could not open input file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
#In addition: Warning message:
# In nRow * nCol * nBand : NAs produced by integer overflow
I tried then the second option which is using file name + header file name read using file.path
headerfile <- file.path(dirname, "AISA_Flight_4_resampled")
x <- read.ENVI(filename = filename,headerfile = headerfile)
Again, I got an error message that says:
#Error in read.ENVI(filename = filename, headerfile = headerfile) :
# read.ENVI: Could not open input header file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
Finally, I tried the third option by using file name + header file name read using readLines
hdr_file <- readLines(con = "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled.hdr")
x <- read.ENVI(filename = filename,headerfile = hdr_file)
But I got the error message:
#Error in read.ENVI(filename = filename, headerfile = hdr_file) :
# read.ENVI: Could not open input header file: ENVIdescription = { Spectrally Resampled File. Input number of bands: 63, output number of bands: 115. [Fri Jun 25 16:57:21 2021]}samples = 5187lines = 6111bands = 115header offset = 0file type = ENVI Standarddata type = 4interleave = bilsensor type = Unknownbyte order = 0map info = {UTM, 1.000, 1.000, 482828.358, 5029367.353, 7.5000000000e-001, 7.5000000000e-001, 15, North, WGS-84, units=Meters}coordinate system string = {PROJCS["UTM_Zone_15N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-93.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]}default bands = {46,31,16}wavelength units = Nanometersdata ignore value = -9999.00000000e+000band names = { Resampled
# In addition: Warning message:
# In if (!file.exists(headerfile)) stop("read.ENVI: Could not open input header file: ", :
# the condition has length > 1 and only the first element will be used
Any help would be really appreciated!

Clustering by M3C package : Error in `[.data.frame`(df, neworder2) : undefined columns selected

I had a similar problem to what posted here. To resolve the issue, followed the answer by #Jack Gisby there. Now a new error showed up:
Working on TCGA data , I am getting the same error (first error):
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
running duplicated() on each relevant field returned FALSE.
Her is the second error (just after trimming identifiers to not start with a common string like "TCGA-"):
Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem,
pFeature = 1, clusterAlg = clusteralg, distance = distance,
title = "/home/christopher/Desktop/", des = des, lthick = lthick,
dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots,
silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC",
fsize = 8, lthick = 1, dotsize = 1.25)
I've added to an opened issue on the M3C GitHub.

I got the same error as Hamid Ghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):
df <- data.frame(m_matrix)
Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.
To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.
Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!

I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Concatenate multiple strings in R

I want to concatenate the below urls, I have written a below function to concatenate all the urls:
library(datetime)
library(lubridate)
get_thredds_url<- function(mon, hr){
a <-"http://abc.co.in/"
b <-"thredds/path/"
c <-paste0("%02d", ymd_h(mon))
d <-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e <-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c, hr))
url <-paste0(a,b,b,d)
return (url)
}
mon = datetime(2017, 9, 26, 0)
hr = 240
url = get_thredds_url(mon,hr)
print (url)
But I am getting below error when I execute the definition of get_thredds_url():
Error: unexpected ',' in:
" d<-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e<-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c,"
url <-paste0(a,b,b,d)
Error in paste0(a, b, b, d) : object 'a' not found
return (url)
Error: no function to return from, jumping to top level
}
Error: unexpected '}' in "}"
What is wrong with my function and how can I solve this?
The final output should be:
http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240

Using sprintf allows more control of values being inserted into string
library(lubridate)
get_thredds_url<- function(mon, hr){
sprintf("http://abc.co.in/thredds/path/%s/gfs.t%02dz.pgrb2.0p25.f%03d",
strftime(mon, format = "%Y%m%d%H", tz = "UTC"),
hour(mon),
hr)
}
mon <- make_datetime(2017, 9, 26, 0, tz = "UTC")
hr <- 240
get_thredds_url(mon, hr)
[1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"

It was a bit messy to figure out what it is, you're trying to do. There seem to be quite a couple of contradicting pieces in your code, especially compared to your wanted final output. Therefore, I decided to focus on the wanted output and the inputs you provided in your variables.
get_thredds_url <- function(yr, mnth, day, hrs1, hrs2){
part1 <- "http://abc.co.in/"
part2 <- "thredds/path/"
ymdh <- c(yr, formatC(c(mnth, day, hrs1), width=2, flag="0"))
part3 <- paste0(ymdh, collapse="")
pre4 <- formatC(hrs1, width=2, flag="0")
part4 <- paste0("/gfs.t", pre4, "z.pgrb2.0p25.f", hrs2)
return(paste0(part1, part2, part3, part4))
}
get_thredds_url(2017, 9, 26, 0, 240)
# [1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
The key is using paste0() appropriately and I think formatC() may be new to some people (including me).
formatC() is used here to pad zeros in front of the number you provide, and thus makes sure that 9 is converted to 09, whereas 12 remains 12.
Note that this answer is in base R and does not require additional packages.
Also note that you should not use url and c as variable names. These names are already reserved for other functionalities in R. By using them as variable names, you are overwriting their actual purpose, which can (will) lead to problems at some point down the road

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

converting data into time series data - r

Related

Transaction problem in RStudio for tweet apriori analysis

How to open/read an ENVI file in R

Clustering by M3C package : Error in `[.data.frame`(df, neworder2) : undefined columns selected

how to interpolate data within groups in R using seqtime?

Concatenate multiple strings in R

Categories

Resources