How to convert a dataframe to a time series

How to convert a dataframe to a time series - r

I have the following R dataframe:
> str(H.dark)
'data.frame': 86400 obs. of 18 variables:
$ groupname: Factor w/ 8 levels "rowA","rowB",..: 8 8 8 8 8 8 8 8 8 8 ...
$ location : Factor w/ 96 levels "c1","c10","c11",..: 84 85 86 87 88 90 91 92 93 94 ...
$ starttime: int 7200 7200 7200 7200 7200 7200 7200 7200 7200 7200 ...
$ dark : int 1 1 1 1 1 1 1 1 1 1 ...
$ inadist : num 2.2 2.6 3.9 3.8 2.5 2.2 3 1.3 2.7 1.2 ...
$ smldist : num 11.5 22.6 9 18.6 19.1 18.9 11.7 28 13.9 9.8 ...
$ lardist : num 9.8 5.8 1.9 3.1 2.1 1.3 6 13.6 3.4 0.9 ...
$ emptydur : num 1.5 0.4 0 0.2 0.3 0.7 0.4 0.4 0.1 2 ...
$ inadur : num 2.1 2 3.3 2 2.5 1.8 2.8 1.2 2.6 1.7 ...
$ smldur : num 1.2 2.4 1.6 2.7 2.1 2.5 1.6 2.8 2.1 1.2 ...
$ lardur : num 0.2 0.3 0.1 0.1 0.1 0.1 0.3 0.6 0.1 0.1 ...
$ emptyct : int 5 7 1 1 5 1 1 3 1 2 ...
$ entct : int 0 0 0 0 0 0 0 0 0 0 ...
$ inact : int 8 10 9 11 13 9 12 7 7 6 ...
$ smlct : int 10 16 11 14 16 12 16 15 9 7 ...
$ larct : int 2 3 1 1 1 1 3 5 1 1 ...
I wish to turn it into a timeseries using ts() because ts has some nice functions for lags and other time specific analyses. The starttime column is the time. The min value is 7200 seconds and the max is 43200. Observations occur every 5 seconds. I've read the ts documentation and I think I may first have to convert my dataframe to a matrix. But I'm completely unclear on how I need to rearrange the data, or if it indeed needs to be rearranged.
Can anyone offer some clarity?

Related

Selecting subsets of each dataset in a list in R

After using kfold from the dismo package, I am attempting to select a subset of the groups that this function makes from different datasets in a list in R. In an individual datset, this is easy:
#With an individual dataset:
library(dismo)
data_car <- mtcars
group_presence <- kfold(x = data_car, k = 5) # kfold is in dismo package
# Separate observations into training and testing groups:
presence_train <- data_car[group_presence != 1, ]
But, I can't seem to get it to work across multiple datasets in a list in R:
#Now, with listed datasets:
data_1 <- mtcars
data_2 <- iris
mylist <- list(data_1, data_2)
mylist_data <- lapply(mylist, function(q) {
data = q
return(data)
})
mylist_groups <- lapply(mylist, function(q) {
group_item = kfold(x = q,
k = 5)
q$group_obj = group_item
return(q)
})
presence_train <- mylist_groups[group_obj != 1, ]
#Result:
Error: object 'group_obj' not found

We could use Map
out <- Map(function(x, y) x[y !=1, ], mylist, mylist_groups)
where
mylist_groups <- lapply(mylist, function(q) {
kfold(x = q,
k = 5)})
-output
> str(out)
List of 2
$ :'data.frame': 26 obs. of 11 variables:
..$ mpg : num [1:26] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:26] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:26] 160 160 108 258 360 ...
..$ hp : num [1:26] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:26] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:26] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:26] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:26] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:26] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:26] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:26] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 120 obs. of 5 variables:
..$ Sepal.Length: num [1:120] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:120] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:120] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:120] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Error message in R: replacement has (x) rows, data has (y)

I am getting an error message when I am attempting to change the name of a data element in a column. This is the structure of the data frame I am using.
'data.frame': 2070 obs. of 7 variables:
$ Period: Factor w/ 188 levels "1 v 1","10 v 10",..: 158 158 158 158 158 158 158 158 158 158 ...
$ Dist : num 7548 7421 9891 8769 10575 ...
$ HIR : num 2676 2286 3299 2455 3465 ...
$ V6 : num 66.2 18.5 81 40 275.1 ...
$ Date : Factor w/ 107 levels "1/3/17","1/4/17",..: 38 38 38 38 38 38 38 38 38 38 ...
$ Type : Factor w/ 28 levels "Captain's Run",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Day : Factor w/ 8 levels "Friday","Monday",..: 1 1 1 1 1 1 1 1 1 1 ...
#> Error: <text>:1:22: unexpected symbol
#> 1: 'data.frame': 2070 obs.
#> ^
```
I wish to change the value Main Session in db$Type to Main Training so I can match this data frame to another I'm using. I'm using the code below to try and do this.
class(db$Type)
db$Type <- as.character(db$Type)
db$Type["Main Session"] = "Main Training"
I am getting this error message when I attempt to run the piece of code.
db$Type["Main Session"] = "Main Training"
Error in `$<-.data.frame`(`*tmp*`, Type, value = c("Main Session", "Main Session", :
replacement has 2071 rows, data has 2070
#> Error: <text>:2:7: unexpected 'in'
#> 1: db$Type["Main Session"] = "Main Training"
#> 2: Error in
#> ^
Being relatively new to R, is there anything I am missing in my code that could resolve this issue? Any suggestions will be greatly appreciated. Thank you.

The error you are encountering is in relation to your subset operation: db$Type["Main Session"] = "Main Training".
Using the mtcars dataset in R we can reproduce this error:
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
class(iris$Species)
#> [1] "factor"
iris$Species<- as.character(iris$Species)
iris$Species["setosa"] <- "new name"
#> Error in `$<-.data.frame`(`*tmp*`, Species, value = structure(c("setosa", : replacement has 151 rows, data has 150
Created on 2018-09-03 by the reprex package (v0.2.0).
Inside the square brackets you need to subset the vector using a logical operation (i.e. one that evaluates to TRUE or FALSE.
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris$Species<- as.character(iris$Species)
unique(iris$Species)
#> [1] "setosa" "versicolor" "virginica"
iris$Species[iris$Species == "setosa"] <- "new name"
unique(iris$Species)
#> [1] "new name" "versicolor" "virginica"
Created on 2018-09-03 by the reprex package (v0.2.0).

R glarma error: "requires numeric/complex matrix/vector arguments"

This is my data:
'data.frame': 72 obs. of 7 variables:
$ X1 : chr "2011M1" "2011M2" "2011M3" "2011M4" ...
$ KPR : int 0 0 0 0 0 0 0 0 0 0 ...
$ LTV : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ sukubunga: num 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 ...
$ inflasi : num 0.89 0.13 -0.32 -0.31 0.12 0.55 0.67 0.93 0.27 -0.12 ...
$ npl : num 2.31 2.39 2.22 2.2 2.12 ...
$ sbkredit : num 11.4 11.4 11.3 11.4 11.3 ...
i use the package glarma and this is my steps:
library(readr)
b <- read_csv("E:/b.csv")
dataku<-as.data.frame(b)
dataku$LTV<-as.factor(dataku$LTV)
dataku$LTV<-relevel(dataku$LTV,ref="0")
glmmo<-glm(KPR~LTV+sbkredit+inflasi+npl,data=dataku,family=binomial(link=logit),na.action=na.omit,x=TRUE)
summary(glmmo)
X<-glmmo$x
X<-as.matrix(X)
y1<-dataku$KPR
n1<-rep(1,length(dataku$X))
Y<-cbind(y1,n1-y1)
Y<-as.matrix(Y)
library(glarma)
glarmamo<-glarma(Y,X,phiLags=c(1),phiInit=c(0.6),type="Bin",method="FS",residuals="Pearson",maxit=100,grad=1e-6)
but, i get error :
Error in GL$cov %*% GL$ll.d : requires numeric/complex matrix/vector
arguments
When i multiply GL$cov %*% GL$ll.d for
so, what should i do?

How to use aritmatic using tapply() in R

I'm calling height, diameter and age from a csv file. I'm trying to calculate the volume of the tree using pi x h x r^2. In order to calculate the radius, I'm taking dbh and dividing it by 2. Then I get this error.
Error in dbh/2 : non-numeric argument to binary operator
setwd("/Users/user/Desktop/")
treeg <- read.csv("treeg.csv",row.names=1)
head(treeg)
heights <- tapply(treeg$height.ft,treeg$forest, identity)
ages <- tapply(treeg$age,treeg$forest, identity)
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
radius <- dbh / 2
In the vector dbh it is storing the diameter from he csv file in terms of forest which is the ID.
How can I divide dbh by 2, while still retaining format of each value being stored by its receptive ID (which is he forest ---> treeg$forest) and treeg is the dataframe that call the csv file.
> head(treeg)
tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107
str(dbh)
List of 9
$ 1: num [1:36] 19.9 18.6 16.2 14.2 12.3 9.4 6.8 4.9 2.6 22 ...
$ 2: num [1:60] 16.5 15.5 14.5 13.7 12.7 11.4 9.5 8 5.9 4.1 ...
$ 3: num [1:50] 18.4 17.2 15.6 13.7 11.6 8.5 5.3 2.8 13.3 10.6 ...
$ 4: num [1:81] 14.6 12.4 8.8 7 4 20 18.8 17 15.9 14 ...
$ 5: num [1:153] 28 27.2 26.1 25 23.7 21.3 19 16.7 12.2 9.8 ...
$ 6: num [1:22] 21.3 20.2 19.1 18 16.9 15.6 14.8 13.3 11.3 9.2 ...
$ 7: num [1:63] 13.9 12.4 10.6 8.1 5.8 3.4 27 25.6 23 20.2 ...
$ 8: num [1:27] 20.8 17.7 15.6 13.2 10.5 7.5 4.8 2.9 12.9 11.3 ...
$ 9: num [1:50] 23.6 20.5 16.9 14.1 11.1 8 5.1 2.9 24.1 20.9 ...
- attr(*, "dim")= int 9
- attr(*, "dimnames")=List of 1
..$ : chr [1:9] "1" "2" "3" "4" ...

Are you just trying to create a radius column that is dbh.in divided by two?
treeg <- read.table(textConnection("tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107"), header=TRUE)
treeg$radius <- treeg$dbh.in / 2
Or do you need that dbh list for something...
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
> dbh
$`4`
[1] 14.6 12.4 8.8 7.0 4.0 20.0
lapply(dbh, function(x)x/2)
List of 1
$ 4: num [1:6] 7.3 6.2 4.4 3.5 2 10

replacing randomly values in an existing matrix in R

I have an existing matrix and I want to replace some of the existing values by NA's in a random uniform way.
I tried to use the following, but it only replaced 392 values with NA, not 452 as I expected. What am I doing wrong?
N <- 452
ind1 <- (runif(N,2,length(macro_complet$Sod)))
macro_complet$Sod[ind1] <- NA
summary(macro_complet$Sod)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.3222 0.9138 1.0790 1.1360 1.3010 2.8610 392.0000
My data looks like this
> str(macro_complet)
'data.frame': 1504 obs. of 26 variables:
$ Sod : num 8.6 13.1 12 13.8 12.9 10 7 14.8 11.3 4.9 ...
$ Azo : num 2 1.7 2.2 1.9 1.89 1.61 1.72 2.1 1.63 2 ...
$ Cal : num 26 28.1 24 28.5 24.5 24 17.4 26.6 24.8 10.5 ...
$ Bic : num 72 82 81 84 77 68 66 81 70 37.8 ...
$ DBO : num 3 2.2 3 2.7 3.3 3 3.2 2.9 2.8 2 ...
$ AzoK : num 0.7 0.7 0.9 0.8 0.7 0.7 0.7 0.9 0.7 0.7 ...
$ Orho : num 0.3 0.2 0.31 0.19 0.19 0.2 0.16 0.24 0.2 0.01 ...
$ Ammo : num 0.12 0.16 0.15 0.13 0.19 0.22 0.19 0.16 0.17 0.08 ...
$ Carb : num 0.3 0.3 2 0.3 0.3 0.3 0.3 0.3 0.3 0.5 ...
$ Ox : num 10.2 9.7 9.8 9.6 9.7 9.1 9.1 8.1 9.7 10.6 ...
$ Mag : num 5.5 6.5 6.3 7 6.4 5.1 6 6.7 5.7 2 ...
$ Nit : num 4.2 4.7 5.7 4.6 4.2 3.5 4.9 4.5 4.2 2.8 ...
$ Matsu : num 17 9 24 15 17 19 20 19 13 3.9 ...
$ Tp : num 10.5 9.7 11.9 12 12.9 11.2 12.8 13.7 11.5 10.6 ...
$ Co : num 3 3.45 3.3 3.54 2.7 2.7 3.3 3.49 2.8 1.8 ...
$ Ch : num 17 24 22 28 25 19 13 28 23 6.4 ...
$ Cu : num 25 15 20 20 15 20 15 15 20 15 ...
$ Po : num 3.5 3.8 4 3.6 3.8 3.7 3 4.2 3.7 0.4 ...
$ Ph : num 0.2 0.17 0.2 0.14 0.18 0.2 0.17 0.17 0.17 0.01 ...
$ Cnd : int 226 275 285 295 272 225 267 283 251 61 ...
$ Txs : num 93 88 89 86 87 88 84 80 91 94 ...
$ Niti : num 0.06 0.09 0.07 0.06 0.08 0.07 0.08 0.11 0.1 0.01 ...
$ Dt : num 9 9.7 9 10.2 8 8 7 9.4 8.5 3 ...
$ H : num 7.6 7.7 7.6 7.7 7.55 7.4 7.3 7.5 7.5 7.6 ...
$ Dco : int 17 12 15 13 15 20 16 14 12 7 ...
$ Sf : num 22 20.5 18 22.2 22.1 21 11.6 21.7 21.9 6.8 ...
I also tried to do this for only a single variable, but got the same result.
I converted my data frame into a matrix using
as.matrix(n1)
then I replaced some values for only one variable
N <- 300
ind <- (runif(N,1,length(n1$Sodium)))
n1$Sodium[ind] <- NA
However, using summary() I observed that only 262 values were replaced instead of 300 as expected. What am I doing wrong?
summary(n1$Sodium)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.3222 0.8976 1.0790 1.1320 1.3010 2.8610 262.0000

Try this. This will sample your matrix uniformly without replacement (so the same value is not chosen and replaced twice). If you want some other distribution, you can modify the weights using the prob argument (see ?sample)
vec <- matrix(1:25, nrow = 5)
vec[sample(1:length(vec), 4, replace = FALSE)] <- NA
vec
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 NA 16 NA
[2,] NA 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25

you must apply runif in the right spot, which is the index to vec. (The way you have it now, you are asking R to draw random numbers from a uniform distribution between NA and NA, which of course does not make sense and so it gives you back NaNs)
Try instead:
N <- 5 # the number of random values to replace
inds <- round ( runif(N, 1, length(vec)) ) # draw random values from [1, length(vec)]
vec[inds] <- NA # use the random values as indicies to vec, for which to replace
Note that it is not necessary to use round(.) since [[ will accept numerics, but they will all be rounded down by default, which is just slightly less than a uniform dist.

We could use
vec[sample(seq_along(vec), 4, replace = FALSE)] <- NA

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to convert a dataframe to a time series - r

Related

Selecting subsets of each dataset in a list in R

Error message in R: replacement has (x) rows, data has (y)

R glarma error: "requires numeric/complex matrix/vector arguments"

How to use aritmatic using tapply() in R

replacing randomly values in an existing matrix in R

Categories

Resources