I'm attempting to use the package treeclim to analyze my tree ring growth data and climate. I measured the widths in CooRecorder, grouped them into series in CDENDRO, and read them into R-Studio using dplR read.rwl function. However, I keep getting an error message reading
"Error in dcc(Plot92.crn, Site92PRISM, selection = -6:9, method = "response", :
Overlapping time span of chrono and climate records is smaller than number of parameters! Consider adapting the number of parameters to a maximum of 100."
I have 100 years of monthly climate data that looks like below:
# head(Site92PRISM)
year month ppt tmax tmin tmean vpdmin..hPa. vpdmax..hPa. site
1 1915 01 0.97 26.1 12.3 19.2 0.97 2.32 92
2 1915 02 1.20 31.5 16.2 23.9 1.03 3.30 92
3 1915 03 2.51 36.0 17.0 26.5 0.97 4.69 92
4 1915 04 3.45 48.9 26.3 37.6 1.14 8.13 92
5 1915 05 3.95 44.6 29.1 36.9 0.94 5.58 92
6 1915 06 6.64 51.0 31.5 41.3 1.04 7.93 92
And my chronology, made in dplR looks like below:
#head(Plot92.crn)
CAMstd samp.depth
1840 0.7180693 1
1841 0.3175528 1
1842 0.5729651 1
1843 0.9785082 1
1844 0.7676334 1
1845 0.3633687 1
Where am I going wrong? Both files contain data from 1915-2015.
I posted a similar question to the author in the google forum of the package (i.e. https://groups.google.com/forum/#!forum/treeclim).
What you need to make sure of is that the number of parameters (n_param) is less or equal to the sample size of your dendrochronological data. By 'number of parameters' I mean the total number of columns in the climatic variables matrices.
For instance, in the following analysis:
resp <- dcc(chrono = my_chrono,
climate = list(precip, temp),
boot = 'stationary')
You need to make sure that the following is TRUE :
length(unique(rownames(my_chrono))) >= (ncol(precip)-1) + (ncol(temp)-1)
ncol(precip)-1 and not ncol(precip) because the first column of the matrix is YEAR. Also note that in my example the years in my_chrono are the same years as in precip and temp, which doesn't have to be the case to run the function (it will automatically take the common years).
Finally, if the previous line code gives you FALSE, you can reduce the number of parameters with the argument selection like this :
resp <- dcc(chrono = my_chrono,
climate = list(precip, temp),
selection = .range(6:12,'prec') + .range(6:12, 'temp'),
var_names = c('prec', 'temp'),
boot = 'stationary')
Because the dcc function automatically takes all months from previous June to current September (i.e. .range(-6:9)), you may need to reduce that range.
Related
I have the following code to get options data list and create a new list to get only puts data (only_puts_list)
library(quantmod)
Symbols<-c ("AA","AAL","AAOI","ABBV","ABC","ABNB")
Options.20221111 <- lapply(Symbols, getOptionChain)
names(Options.20221111) <- Symbols
only_puts_list <- lapply(Options.20221111, function(x) x$puts)
I'd like now to subset the only_puts_list and create a new list (i.e. new_list1) to subset and get only the data which has a positive value in the column ChgPct of the only_puts_list.
I guess lapply should work, but how to apply to only positive values of a specific column ChgPct?
We could use subset after looping over the list with lapply
new_list1 <- lapply(only_puts_list, subset, subset = ChgPct > 0)
If we check the output, most of the list elements returned have only 0 rows as there were no positive observations in 'ChgPct'. We could Filter to keep only those having any rows
new_list1_sub <- Filter(nrow, new_list1)
-output
new_list1_sub
$ABBV
ContractID ConractSize Currency Expiration Strike Last Chg ChgPct Bid Ask Vol OI LastTradeTime IV
31 ABBV221202P00155000 REGULAR USD 2022-12-02 155.0 0.66 0.1100000 20.00000 0.56 0.66 70 480 2022-11-29 13:10:43 0.2690503
32 ABBV221202P00157500 REGULAR USD 2022-12-02 157.5 1.49 0.2400000 19.20000 1.41 1.51 544 383 2022-11-29 13:17:43 0.2627027
33 ABBV221202P00160000 REGULAR USD 2022-12-02 160.0 3.05 0.4300001 16.41222 2.79 2.99 34 308 2022-11-29 12:07:54 0.2692944
34 ABBV221202P00162500 REGULAR USD 2022-12-02 162.5 4.95 1.6499999 50.00000 4.80 5.05 6 28 2022-11-29 13:26:10 0.3017648
ITM
31 FALSE
32 FALSE
33 TRUE
34 TRUE
$ABC
ContractID ConractSize Currency Expiration Strike Last Chg ChgPct Bid Ask Vol OI LastTradeTime IV ITM
18 ABC221202P00165000 REGULAR USD 2022-12-02 165 1.05 0.1999999 23.5294 0.6 0.8 3 111 2022-11-29 09:51:47 0.2710034 FALSE
I want to add a new column with calculation. In the below data frame,
Env<- c("High_inoc","High_NO_inoc","Low_inoc", "Low_NO_inoc")
CV1<- c(30,150,20,100)
CV2<- c(74,99,49,73)
CV3<- c(78,106,56,69)
CV4<- c(86,92,66,70)
CV5<- c(74,98,57,79)
Data<-data.frame(Env,CV1,CV2,CV3,CV4,CV5)
Data$Mean <- rowMeans(Data %>% select(-Env))
Data <- rbind(Data, c("Mean", colMeans(Data %>% select(-Env))))
I'd like to add a new column names 'Env_index' with calculation {each value of 'mean' column - overall mean (76.3) such as 68.4 - 76.3 , 109- 76.3 ,... 78.2 - 76.3
So, I did like this and obtained what I want.
Data$Env_index <- c(68.4-76.3,109-76.3,49.6-76.3,78.2-76.3, 76.3-76.3)
But, I want to directly calculate using code, so if I code like this,
Data$Env_index <- with (data, data$Mean - 76.3)
It generates error. Could you let me know how to calculate?
Thanks,
To make the calculation dynamic which will work on any data you can do :
Data$Mean <- as.numeric(Data$Mean)
Data$Env_index <- Data$Mean - Data$Mean[nrow(Data)]
Data
# Env CV1 CV2 CV3 CV4 CV5 Mean Env_index
#1 High_inoc 30 74 78 86 74 68.4 -7.9
#2 High_NO_inoc 150 99 106 92 98 109.0 32.7
#3 Low_inoc 20 49 56 66 57 49.6 -26.7
#4 Low_NO_inoc 100 73 69 70 79 78.2 1.9
#5 Mean 75 73.75 77.25 78.5 77 76.3 0.0
Data$Mean[nrow(Data)] will select last value of Data$Mean.
The structure function in R shows that USArrests only has 4 variables.
However, there are 5. State names are in the first column however it is unlabeled.
I am struggling to understand the intuition behind this and how this works.
I have done a K-means clustering algorithm with the data and it seems that the first column(state names) acts as labels in the analysis. Without being used a categorical data.
this is the tutorial I used.
https://uc-r.github.io/kmeans_clustering
Below is some code to explain myself in a clearer manner.
str(USArrests)
'data.frame': 50 obs. of 4 variables:
$ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
$ Assault : int 236 263 294 190 276 204 110 238 335 211 ...
$ UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...
$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
head(USArrests)
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
How it looks as "label" in the K Means Clustering
library(tidyverse) # data manipulation
library(cluster) # clustering algorithms
Data Cleaning
df <- USArrests
df <- na.omit(df)
Scaling
(df <- scale(df))
Compute K-means Clustering
k2 <- kmeans(df, centers = 2, nstart = 25)
Sample Output
Clustering vector:
Alabama Alaska Arizona Arkansas California
2 2 2 1 2
If there are only four variables how does R, or the clustering algorithm know to associate the cluster with the state name, which technically isn't a column?
The first "column" is not actually a column but the index to the dataset. Instead of the index being 1,2,3,4, etc. like default, it is Alabama, Alaska, Arizona, Arkansas, etc. Which is why running the str() function gives us only 4 columns as an index is never treated as a column.
Now, the clustering output showed which cluster each state belonged to. This is simply the index and the algorithm at the end is telling us which cluster each row belongs to. For example, if the index was 1, 2, 3, 4, etc. instead of state names, we would still get the result as row 1 being cluster 2, row 2 being in cluster 2, row 3 being in cluster 2,row 4 being in cluster 1, etc. The algorithm does what you tell it to do. It sees the index and labels the respective cluster against that index.
Hope this helps.
I tried the RANN package to extract the nearest coordinate points by comparing two files and then add the nearest extracted points to the another file.
My files -> fire
lat lon frp
30.037 80.572 38.5
23.671 85.008 7.2
22.791 86.206 11.4
23.755 86.421 5.6
23.673 86.088 4.2
23.768 86.392 8.4
23.789 86.243 7.8
23.805 86.327 6.4
23.682 86.085 7.8
23.68 86.095 5.7
21.194 81.41 19
16.95 81.912 8
16.952 81.898 11.5
16.899 81.682 10.6
12.994 79.651 16.1
9.2 77.603 14.5
12.291 77.346 20.5
17.996 79.708 13.9
17.998 79.718 29.6
16.61 81.266 6.6
16.499 81.2 6.8
19.505 81.784 22.4
18.322 80.555 7.7
19.506 81.794 28.2
21.081 81.957 8.7
21.223 82.127 9.4
20.918 81.025 6.3
19.861 82.123 9.3
20.62 75.049 11.6
and 2nd file -> wind
lat lon si10 u10 v10
40 60 3.5927058834376 -0.874587879393667 -0.375465368327018
40 60.125 3.59519876134577 -0.836646189656238 -0.388624092937835
40 60.25 3.59769163925393 -0.798704499918809 -0.401782817548651
40 60.375 3.6001845171621 -0.76076281018138 -0.414941542159468
40 60.5 3.60246965524458 -0.722821120443951 -0.428380239634345
40 60.625 3.60496253315275 -0.684585309080651 -0.441538964245162
40 60.75 3.60766315088659 -0.646937740969094 -0.454977661720038
40 60.875 3.60911732966636 -0.609878416109279 -0.468976304923035
40 61 3.608701850015 -0.575172064256437 -0.484934758174451
40 61.125 3.60807863053795 -0.540759834029467 -0.500893211425867
40 61.25 3.60787089071227 -0.506053482176625 -0.516851664677283
40 61.375 3.60745541106091 -0.471641251949655 -0.533090090792759
40 61.5 3.60703993140955 -0.437229021722684 -0.548768571180115
40 61.625 3.60662445175819 -0.402522669869843 -0.565006997295591
40 61.75 3.60454705350139 -0.398993210359384 -0.579285613362648
40 61.875 3.60163869594186 -0.411346318645989 -0.592724310837524
40 62 3.59873033838234 -0.423405305306722 -0.606163008312401
40 62.125 3.59540650117145 -0.435758413593327 -0.619601705787278
40 62.25 3.59249814361192 -0.44781740025406 -0.633320376126214
40 62.375 3.5895897860524 -0.460170508540664 -0.646759073601091
40 62.5 3.58668142849287 -0.471935373575526 -0.660197771075968
40 62.625 3.57546347790613 -0.509288820061212 -0.666357174085286
40 62.75 3.56445326714507 -0.546642266546898 -0.672236604230545
40 62.875 3.55323531655832 -0.584289834658455 -0.678675980103923
40 63 3.54201736597158 -0.621643281144141 -0.684835383113241
40 63.125 3.53100715521052 -0.658996727629827 -0.69099478612256
40 63.25 3.51978920462378 -0.696350174115513 -0.697154189131878
40 63.375 3.50005392118414 -0.726644701580281 -0.692954596170979
40 63.5 3.46266075256166 -0.743115512629088 -0.668037011269646
I want to add wind$si10 wind$u10 wind$v10 into the fire file with nearest coordinates corresponding to frp values. First, I tried only with variable si10 because in RANN package both fire and wind files should have the same number of columns. So I use the code with si10 only
library(RANN)
read.table(file.choose(), sep="\t", header = T) -> wind_jan
read.table(file.choose(), sep="\t", header = T) -> fire_jan
names(fire_jan)
names(wind_jan)
closest <- RANN::nn2(data = wind_jan, query = fire_jan, k = 1)
closest
fire_jan$wind_lat <- wind_jan[closest$nn.idx, "lat"]
fire_jan$wind_lon <- wind_jan[closest$nn.idx, "lon"]
fire_jan$WS <- wind_jan[closest$nn.idx, "si10"]
From the above code I am able to extract si10 values at the nearby coordinates of fire$frp but when I apply the same code for u10 and v10variables in wind file then I am not able to get the extracted values on the same coordinates as I got with si10.
How can I solve this query with this code?
you call closest_u$nn.id that doesnt exist.
Maybe there is an error with your label as well when reading wind df ?
could that be the error?
As I am beginner R, I allow myself to ask R users a little question.
I want to represent in a graphic (points, lines, curves) the values of weight of two human groups treated and not treated by drug (0,1) measured ten times (months).
drug NumberIndividu Mar Apr May June July August September November October December
1 9 25.92 24.6 31.85 38.50 53.70 53.05 65.65 71.45 69.10 67.20
1 10 28.10 26.6 32.00 38.35 53.60 53.25 65.35 65.95 67.80 65.95
1 11 29.10 28.8 30.80 38.10 52.25 47.30 62.20 68.05 66.20 67.55
1 13 27.16 25.0 27.15 34.85 47.30 43.85 54.65 62.25 60.85 58.05
0 5 25.89 25.2 26.50 27.45 37.05 38.95 43.30 50.60 48.20 50.10
0 6 28.19 27.6 28.05 28.60 36.15 37.20 40.40 47.80 45.25 44.85
0 7 28.06 27.2 27.45 28.85 39.20 41.80 51.40 57.10 54.55 55.30
0 8 22.39 21.2 30.10 30.90 42.95 46.30 48.15 54.85 53.35 49.90
I tried :
w= read.csv (file="/file-weight.csv", header=TRUE, sep=",")
w<-data.frame(w)
rownames(w[1:8,])
rownames(w)<-(c(w[,1]))
cols <- character(nrow(w))
cols[rownames(w) %in% c(rownames(w[1:4,]))]<-"blue"
cols[rownames(w) %in% c(rownames(w[5:8,]))]<-"red"
pairs(w,col=cols)
My question is how to configurate matplot function to have one graphic view (points or curves or hist +curves)
My main goal is to visualize all distributions of individus following two colors of first column (drug) for all dates in one image.
Thanks a lot for your suggestions
Is this what you had in mind?
The code is based on the answer to ->this question<-, just using your dataset (df) instead of iris. So in that response, replace:
x <- with(iris, data.table(id=1:nrow(iris), group=Species, Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
with:
xx <- with(df, data.table(id=1:nrow(df), group=drug, df[3:12]))
If all you want is the density plots/histograms, it's easier (see below). These are complementary, because they show that weight is increasing in both control and test groups, just faster in the test group. You wouldn't pick that up from the scatterplot matrix. Also, there's the suggestion that variability in weight is greater in the control group, and grows over time.
library(ggplot2)
library(reshape2) # for melt(...)
# convert df into a form suitable to use with ggplot
gg <- melt(df,id=1:2, variable.name="Month", value.name="Weight")
# density plots
ggplot(gg) +
stat_density(aes(x=Weight, y=..scaled.., color=factor(drug)),geom="line", position="dodge")+
facet_grid(Month~.)+
scale_color_discrete("Drug")
# histograms
ggplot(gg) +
geom_histogram(aes(x=Weight, fill=factor(drug)), position="dodge")+
facet_grid(Month~.)+
scale_fill_discrete("Drug")