Truncate a Time-Series in R - r

I'm using continuous Morlet wavelet transform (cwt) analysis over a time series by the use of the R-package dplR. The time series corresponds to a 15min data (gam_15min) with length 7968 (corresponding to 83 days of measurements).
I have the following output:
cwtGamma=morlet(gam_15min,x1=seq_along(gam_15min),p2=NULL,dj=0.1,siglvl=0.95)
str(cwtGamma)
List of 9
$ y : Time-Series [1:7968] from 1 to 1993: 672 674 673 672 672 ...
$ x : int [1:7968] 1 2 3 4 5 6 7 8 9 10 ...
$ wave : cplx [1:7968, 1:130] -0.00332+0.0008i 0.00281-0.00181i -0.00194+0.00234i ...
$ coi : num [1:7968] 0.73 1.46 2.19 2.92 3.65 ...
$ period: num [1:130] 1.03 1.11 1.19 1.27 1.36 ...
$ Scale : num [1:130] 1 1.07 1.15 1.23 1.32 ...
$ Signif: num [1:130] 0.000382 0.001418 0.005197 0.018514 0.062909 ...
$ Power : num [1:7968, 1:130] 1.17e-05 1.11e-05 9.26e-06 7.09e-06 5.54e-06 ...
$ siglvl: num 0.95
In my analysis I want to truncate the time-series (I suppose $wave) by removing 1 period length in the beginning and 1 period length at the end. how do I do that? maybe its easy but I'm seeing how... Thanks

Related

Box-Cox Tranformation Error: object 'x' not found

hopefully a relatively easy one for those more experienced than me!
Trying to perform a Box-Cox transformation using the following code:
fit <- lm(ABOVEGROUND_BIO ~ TREATMENT * P_LEVEL, data = MYCORRHIZAL_VARIANCE)
bc <- boxcox(fit)
lambda<-with(bc, x[which.max(y)])
MYCORRHIZAL_VARIANCE$bc <- ((x^lambda)-1/lambda)
boxplot(bc ~ TREATMENT * P_LEVEL, data = MYCORRHIZAL_VARIANCE)
however when I run it, I get the following error message:
Error: object 'x' not found. (on line 4)
For context, here's the str of my dataset:
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 24 obs. of 14 variables:
$ TREATMENT : Factor w/ 2 levels "Mycorrhizal",..: 1 1 1 1 1 1 1 1 1 1 ...
$ P_LEVEL : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 2 2 2 2 ...
$ REP : int 1 2 3 4 5 6 1 2 3 4 ...
$ ABOVEGROUND_BIO : num 7.5 6.8 5.3 6 6.7 7 12 12.7 12 10.2 ...
$ BELOWGROUND_BIO : num 3 2.4 2 4 2.7 3.6 7.9 8.8 9.5 9.2 ...
$ ROOT_SHOOT : num 0.4 0.35 0.38 0.67 0.4 0.51 0.66 0.69 0.79 0.9 ...
$ ROOT_SHOOT.log : num -0.916 -1.05 -0.968 -0.4 -0.916 ...
$ ABOVEGROUND_BIO.log : num 2.01 1.92 1.67 1.79 1.9 ...
$ ABOVEGROUND_BIO.sqrt : num 2.74 2.61 2.3 2.45 2.59 ...
$ ABOVEGROUND_BIO.cubert: num 1.96 1.89 1.74 1.82 1.89 ...
$ BELOWGROUND_BIO.log : num 1.099 0.875 0.693 1.386 0.993 ...
$ BELOWGROUND_BIO.sqrt : num 1.73 1.55 1.41 2 1.64 ...
$ BELOWGROUND_BIO.cubert: num 1.44 1.34 1.26 1.59 1.39 ...
$ TOTAL_BIO : num 10.5 9.2 7.3 10 9.4 10.6 19.9 21.5 21.5 19.4 ...
- attr(*, "spec")=
.. cols(
.. TREATMENT = col_factor(levels = c("Mycorrhizal", "Non-mycorrhizal"), ordered = FALSE, include_na = FALSE),
.. P_LEVEL = col_factor(levels = c("Low", "High"), ordered = FALSE, include_na = FALSE),
.. REP = col_integer(),
.. ABOVEGROUND_BIO = col_number(),
.. BELOWGROUND_BIO = col_number(),
.. ROOT_SHOOT = col_number()
.. )
I understand there's no variable named bc in the MYCORRHIZAL_VARIANCE dataset, but I'm just following basic instructions given to me on performing a Box-Cox, and I guess I'm confused as to what 'x' should actually be denoted as, since I thought 'x' was being defined in line 3? Any suggestions as to how to fix this error?
Thanks in advance!
I thought 'x' was being defined in line 3?
Line 3 is lambda<-with(bc, x[which.max(y)]). It doesn't define x, it defines lambda. It does use x, which it looks for within the bc environment. If you're using boxcox() from the MASS package, bc should indeed include x and y components, so bc$x shouldn't give you the same error message. I'd expect an error about the replacement lengths. Because...
bc$x are the potential lambda values tried by boxcox - you're using the default seq(-2, 2, 1/10), and it would be an unlikely coincidence if your data had a multiple of 41 rows needed to not give an error when assigning 41 values to a new column.
Line 3 picks out the lambda value that maximizes the likelihood, so you shouldn't need the rest of the values in bc ever again. I'd expect you to use that lambda values to transform your response variable, as that's what the Box Cox transformation is for. ((x^lambda)-1/lambda) doesn't make any statistical or programmatic sense. Use this instead:
MYCORRHIZAL_VARIANCE$bc <- (MYCORRHIZAL_VARIANCE$ABOVEGROUND_BIO ^ lambda - 1) / lambda
(Note that I also corrected the parentheses. You want (y ^ lambda - 1) / lambda, not (y ^ lambda) - 1 / lambda.)

I'm getting an error while trying to create a confusion matrix

I'm getting the following error while trying to generate the confusion Matrix - this used to work.
str(credit_test)
# Generate predicted classes using the model object
class_prediction <- predict(object=credit_model,
newdata=credit_test,
type="class")
class(class_prediction)
class(credit_test$ACCURACY)
# Calculate the confusion matrix for the test set
confusionMatrix(data=class_prediction, reference=credit_test$ACCURACY)
'data.frame': 20 obs. of 4 variables:
$ ACCURACY : Factor w/ 2 levels "win","lose": 1 1 1 2 2 1 1 1 1 1 ...
$ PM_HIGH : num 5.7 5.12 10.96 7.99 1.73 ...
$ OPEN_PRICE: num 4.46 3.82 9.35 7.77 1.54 5.17 1.88 2.65 5.71 4.09 ...
$ PM_VOLUME : num 0.458 0.676 1.591 3.974 1.785 ...
[1] "factor"
[1] "factor"
**Error in confusionMatrix(data=class_prediction, reference=credit_test$ACCURACY) :
unused arguments (data=class_prediction, reference=credit_test$ACCURACY)**
From some reason I had to run it this way, something has changed
caret::confusionMatrix(data=class_prediction,reference=credit_test$ACCURACY)

“length of 'dimnames' [2] not equal to array extent”

So I have seen questions regarding this error code before, but the suggested troubleshooting that worked for those authors didn't help me diagnose. I'm self-learning R and new to Stackoverflow, so please give me constructive feedback on how to better ask my question, and I will do my best to provide necessary information. I've seen many, similar questions put on hold so I want to help you to help me. I'm sure the error probably stems from my lack of experience in data prep.
I'm trying to run a panel data model, loaded as .csv and this error returns when the model is run
fixed = plm(Y ~ X, data=pdata, model = "within")
Error in `colnames<-`(`*tmp*`, value = "1") :
length of 'dimnames' [2] not equal to array extent
running str() on my dataset returns that ID and Time are factors with 162 levels and 7 levels, respectively.
str(pdata)
Classes ‘plm.dim’ and 'data.frame': 1127 obs. of 11 variables:
$ ID : Factor w/ 162 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 2 ...
$ Time : Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7 1 2 3 ...
$ Online.Service.Index : num 0.083 0.131 0.177 0.268 0.232 ...
$ Eparticipation : num 0.0345 0.0328 0.0159 0.0454 0.0571 ...
$ CPI : num 2.5 2.6 2.5 1.5 1.4 0.8 1.2 2.5 2.5 2.4 ...
$ GE.Est : num -1.178 -0.883 -1.227 -1.478 -1.466 ...
$ RL.Est : num -1.67 -1.71 -1.72 -1.95 -1.9 ...
$ LN.Pop : num 16.9 17 17 17.1 17.1 ...
$ LN.GDP.Cap : num 5.32 5.42 5.55 5.95 6.35 ...
$ Human.Capital.Index : num 0.268 0.268 0.268 0.329 0.364 ...
$ Telecommunication.Infrastructure.Index: num 0.0016 0.00173 0.00202 0.01576 0.03278 ...
Still, I don't see how it would create this error. I've tried transforming it as a data frame or matrix, with the same result (I got desperate and it worked for some people)
dim() yields
[1] 1127 11
I have some NA values, but I understand that these shouldn't cause a problem. Again, I'm self-taught and new here, so please take it easy on me! Hope I explained the problem well.

Build a proper dataframe from a matrix list after importing .xlsx file

Implemented:
I am importing a .xlsx file into R.
This file consists of three sheets.
I am binding all the sheets into a list.
Need to Implement
Now I want to combine this matrix lists into a single data.frame. With the header being the --> names(dataset).
I tried using the as.data.frame with read.xlsx as given in the help but it did not work.
I explicitly tried with as.data.frame(as.table(dataset)) but still it generates a long list of data.frame but nothing that I want.
I want to have a structure like
header = names and the values below that, just like how the read.table imports the data.
This is the code I am using:
xlfile <- list.files(pattern = "*.xlsx")
wb <- loadWorkbook(xlfile)
sheet_ct <- wb$getNumberOfSheets()
b <- rbind(list(lapply(1:sheet_ct, function(x) {
res <- read.xlsx(xlfile, x, as.data.frame = TRUE, header = TRUE)
})))
b <- b [-c(1),] # Just want to remove the second header
I want to have the data arrangement something like below.
Ei Mi hours Nphy Cphy CHLphy Nhet Chet Ndet Cdet DON DOC DIN DIC AT dCCHO TEPC Ncocco Ccocco CHLcocco PICcocco par Temp Sal co2atm u10 dicfl co2ppm co2mol pH
1 1 1 1 0.1023488 0.6534707 0.1053458 0.04994161 0.3308593 0.04991916 0.3307085 0.05042275 49.76304 14.99330000 2050.132 2150.007 0.9642220 0.1339044 0.1040715 0.6500288 0.1087667 0.1000664 0.0000000 9.900000 31.31000 370 0.01 -2.963256000 565.1855 0.02562326 7.879427
2 1 1 2 0.1045240 0.6448216 0.1103250 0.04988347 0.3304699 0.04984045 0.3301691 0.05085697 49.52745 14.98729000 2050.264 2150.007 0.9308690 0.1652179 0.1076058 0.6386706 0.1164099 0.1001396 0.0000000 9.900000 31.31000 370 0.01 -2.971632000 565.7373 0.02564828 7.879042
3 1 1 3 0.1064772 0.6369597 0.1148174 0.04982555 0.3300819 0.04976363 0.3296314 0.05130091 49.29323 14.98221000 2050.396 2150.007 0.8997098 0.1941872 0.1104229 0.6291149 0.1225822 0.1007908 0.8695131 9.900000 31.31000 370 0.01 -2.980446000 566.3179 0.02567460 7.878636
4 1 1 4 0.1081702 0.6299084 0.1187672 0.04976784 0.3296952 0.04968840 0.3290949 0.05175249 49.06034 14.97810000 2050.524 2150.007 0.8705440 0.2210289 0.1125141 0.6213265 0.1273103 0.1018360 1.5513170 9.900000 31.31000 370 0.01 -2.989259000 566.8983 0.02570091 7.878231
5 1 1 5 0.1095905 0.6239005 0.1221460 0.04971029 0.3293089 0.04961446 0.3285598 0.05220978 48.82878 14.97485000 2050.641 2150.007 0.8431960 0.2459341 0.1140222 0.6152447 0.1308843 0.1034179 2.7777070 9.900000
Please dont suggest me to have all data on a single sheet and also convert .xlsx to .csv or simple text format. I am trying really hard to have a proper dataframe from a .xlsx file.
Following is the file
And this is the post following : Followup
This is what resulted:
str(full_data)
'data.frame': 0 obs. of 19 variables:
$ Experiment : Factor w/ 2 levels "#","1":
$ Mesocosm : Factor w/ 10 levels "#","1","2","3",..:
$ Exp.day : Factor w/ 24 levels "1","10","11",..:
$ Hour : Factor w/ 24 levels "108","12","132",..:
$ Temperature: Factor w/ 125 levels "10","10.01","10.02",..:
$ Salinity : num
$ pH : num
$ DIC : Factor w/ 205 levels "1582.2925","1588.6475",..:
$ TA : Factor w/ 117 levels "1813","1826",..:
$ DIN : Factor w/ 66 levels "0.2","0.3","0.4",..:
$ Chl.a : Factor w/ 156 levels "0.171","0.22",..:
$ PIC : Factor w/ 194 levels "-0.47","-0.96",..:
$ POC : Factor w/ 199 levels "-0.046","1.733",..:
$ PON : Factor w/ 151 levels "1.675","1.723",..:
$ POP : Factor w/ 110 levels "0.032","0.034",..:
$ DOC : Factor w/ 93 levels "100.1","100.4",..:
$ DON : Factor w/ 1 level "µmol/L":
$ DOP : Factor w/ 1 level "µmol/L":
$ TEP : Factor w/ 100 levels "10.4934","11.0053",..:
[Note: Above is the structure after reading from .xlsx file......the levels makes the calculation and manipulation part tedious and messy.]
This is what I want to achieve:
str(a)
'data.frame': 9936 obs. of 29 variables:
$ Ei : int 1 1 1 1 1 1 1 1 1 1 ...
$ Mi : int 1 1 1 1 1 1 1 1 1 1 ...
$ hours : int 1 2 3 4 5 6 7 8 9 10 ...
$ Cphy : num 0.653 0.645 0.637 0.63 0.624 ...
$ CHLphy : num 0.105 0.11 0.115 0.119 0.122 ...
$ Nhet : num 0.0499 0.0499 0.0498 0.0498 0.0497 ...
$ Chet : num 0.331 0.33 0.33 0.33 0.329 ...
$ Ndet : num 0.0499 0.0498 0.0498 0.0497 0.0496 ...
$ Cdet : num 0.331 0.33 0.33 0.329 0.329 ...
$ DON : num 0.0504 0.0509 0.0513 0.0518 0.0522 ...
$ DOC : num 49.8 49.5 49.3 49.1 48.8 ...
$ DIN : num 15 15 15 15 15 ...
$ DIC : num 2050 2050 2050 2051 2051 ...
$ AT : num 2150 2150 2150 2150 2150 ...
$ dCCHO : num 0.964 0.931 0.9 0.871 0.843 ...
$ TEPC : num 0.134 0.165 0.194 0.221 0.246 ...
$ Ncocco : num 0.104 0.108 0.11 0.113 0.114 ...
$ Ccocco : num 0.65 0.639 0.629 0.621 0.615 ...
$ CHLcocco: num 0.109 0.116 0.123 0.127 0.131 ...
$ PICcocco: num 0.1 0.1 0.101 0.102 0.103 ...
$ par : num 0 0 0.87 1.55 2.78 ...
$ Temp : num 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9 ...
$ Sal : num 31.3 31.3 31.3 31.3 31.3 ...
$ co2atm : num 370 370 370 370 370 370 370 370 370 370 ...
$ u10 : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...
$ dicfl : num -2.96 -2.97 -2.98 -2.99 -3 ...
$ co2ppm : num 565 566 566 567 567 ...
$ co2mol : num 0.0256 0.0256 0.0257 0.0257 0.0257 ...
$ pH : num 7.88 7.88 7.88 7.88 7.88 ...
[Note: sorry for the extra columns, this is another dataset (simple text), which I am reading from read.table]
With NA's handled:
> unique(mydf_1$Exp.num)
[1] # 1
Levels: # 1
> unique(mydf_2$Exp.num)
[1] # 2
Levels: # 2
> unique(mydf_3$Exp.num)
[1] # 3
Levels: # 3
> unique(full_data$Exp.num)
[1] 2 3 4
Without handling NA's:
> unique(full_data$Exp.num)
[1] 1 NA 2 3
> unique(full_data$Mesocosm)
[1] 1 2 3 4 5 6 7 8 9 NA
I think this is what you need. I add a few comments on what I am doing:
xlfile <- list.files(pattern = "*.xlsx")
wb <- loadWorkbook(xlfile)
sheet_ct <- wb$getNumberOfSheets()
for( i in 1:sheet_ct) { #read the sheets into 3 separate dataframes (mydf_1, mydf_2, mydf3)
print(i)
variable_name <- sprintf('mydf_%s',i)
assign(variable_name, read.xlsx(xlfile, sheetIndex=i,startRow=1, endRow=209)) #using this you don't need to use my formula to eliminate NAs. but you need to specify the first and last rows.
}
colnames(mydf_1) <- names(mydf_2) #this here was unclear. I chose the second sheet's
# names as column names but you can chose whichever you want using the same (second and third column had the same names).
#some of the sheets were loaded with a few blank rows (full of NAs) which I remove
#with the following function according to the first column which is always populated
#according to what I see
remove_na_rows <- function(x) {
x <- x[!is.na(x)]
a <- length(x==TRUE)
}
mydf_1 <- mydf_1[1:remove_na_rows(mydf_1$Exp.num),]
mydf_2 <- mydf_2[1:remove_na_rows(mydf_2$Exp.num),]
mydf_3 <- mydf_3[1:remove_na_rows(mydf_3$Exp.num),]
full_data <- rbind(mydf_1[-1,],mydf_2[-1,],mydf_3[-1,]) #making one dataframe here
full_data <- lapply(full_data,function(x) as.numeric(x)) #convert fields to numeric
full_data2$Ei <- as.integer(full_data[['Ei']]) #use this to convert any column to integer
full_data2$Mi <- as.integer(full_data[['Mi']])
full_data2$hours <- as.integer(full_data[['hours']])
#*********code to use for removing NA rows *****************
#so if you rbind not caring about the NA rows you can use the below to get rid of them
#I just tested it and it seems to be working
n_row <- NULL
for ( i in 1:nrow(full_data)) {
x <- full_data[i,]
if ( all(is.na(x)) ) {
n_row <- append(n_row,i)
}
}
full_data <- full_data[-n_row,]
I think now this is what you need

Dealing with Zero Values in Principal Component Analysis

I've really been struggling to get my PCA working and I think it is because there are zero values in my data set. But I don't know how to resolve the issue.
The first problem is, the zero values are not missing values (they are areas with no employment in a certain sector), so I should probably keep them in there. I feel uncomfortable that they might be excluded because they are zero.
Secondly, even when I try remove all missing data I still get the same error message.
Starting with the following code, I get the following error message:
urban.pca.cov <- princomp(urban.cov, cor-T)
Error in cov.wt(z) : 'x' must contain finite values only
Also, I can do this:
urban.cut<- na.omit(urban.cut)
> sum(is.na(urban.cut))
[1] 0
And then run it again and get the same issue.
urban.pca.cov <- princomp(urban.cov, cor-T)
Error in cov.wt(z) : 'x' must contain finite values only
Is this a missing data issue? I've log transformed all of my variables according to this PCA tutorial. Here is the structure of my data.
> str(urban.cut)
'data.frame': 5490 obs. of 13 variables:
$ median.lt : num 2.45 2.57 2.53 2.6 2.31 ...
$ p.nga.lt : num 0.547 4.587 4.529 4.605 4.564 ...
$ p.mbps2.lt : num 1.66 4.17 4 3.9 4.2 ...
$ density.lt : num 3.24 3.44 3.85 3.21 4.28 ...
$ p_m_s.lt : num 4.54 4.61 4.56 4.61 4.61 ...
$ p_m_l.lt : num 1.87 -Inf 1.44 -Inf -Inf ...
$ p.tert.lt : num 4.59 4.61 4.55 4.61 4.61 ...
$ p.kibs.lt : num 4.25 3.05 3.12 3 3.03 ...
$ p.edu.lt : num 4.14 2.6 2.9 2.67 2.57 ...
$ p.non.white.lt : num 3.06 3.56 3.82 2.94 3.52 ...
$ p.claim.lt : num 0.459 1.287 1.146 1.415 1.237 ...
$ d.connections.lt: num 2.5614 0.6553 5.2573 0.9562 -0.0252 ...
$ SAM.KM.lt2 : num 1.449 1.081 1.071 1.246 0.594 ...
Thank you in advance for your help.
Sounds to me like R wants finite values. -inf is not finite. it is minus infinity. Perhaps you should be doing log(data + 1) if you really need to log transform your data, and not log a 0

Resources