I'm using 'pls' package and for this I need to produce a dataframe with a structure a bit different to what I'm used to.
Data frame structure I need to work with 'pls: gasoline
library(pls)
gasoline
Example to show how my data looks like: gasoline2
Background info - What I tend to do to load data into R is to transcript the data in a .xls and then convert the file to a .txt which is then loaded to R.
When my data is loaded it looks like this:
gasoline2 <- as.data.frame(as.matrix(gasoline))
Question
How can convert the structure of gasoline2 into the structure of gasoline?
Thanks a lot in advance for your help!
You're looking for I, which will allow you to combine different data structures (like lists or matrices) as columns in a data.frame:
## Assume you are starting with this:
X <- as.data.frame(as.matrix(gasoline))
## Create a new object where column 1 is the same as the first
## column in your existing data frame, and column 2 is a matrix
## of the remaining columns
newGas <- cbind(X[1], NIR = I(as.matrix(X[-1])))
str(gasoline)
# 'data.frame': 60 obs. of 2 variables:
# $ octane: num 85.3 85.2 88.5 83.4 87.9 ...
# $ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "1" "2" "3" "4" ...
# .. ..$ : chr "900 nm" "902 nm" "904 nm" "906 nm" ...
str(newGas)
# 'data.frame': 60 obs. of 2 variables:
# $ octane: num 85.3 85.2 88.5 83.4 87.9 ...
# $ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "1" "2" "3" "4" ...
# .. ..$ : chr "NIR.900 nm" "NIR.902 nm" "NIR.904 nm" "NIR.906 nm" ...
There's a slight difference in the column naming, but I think that can easily be taken care of...
> colnames(newGas$NIR) <- gsub("NIR.", "", colnames(newGas$NIR))
> identical(gasoline, newGas)
[1] TRUE
Related
I'm trying to compute a phenotypic covariance matrix between a fatty acid dataset and a phylogenetic tree using the Rphylopars package.
I'm able to load the data set and phylogeny; however, when I attempt to run the test I get the error message
Error in class(tree) <- "phylo" : attempt to set an attribute on NULL"
This is the code for the test
phy <- read.tree("combined_trees.txt")
plot(phy)
phy$tip.label
FA_data <- read.csv("fatty_acid_example_data.csv", header = TRUE, na.strings = ".")
head(FA_data)
str(FA_data)
PPE <- phylopars(trait_data = FA_data$fatty1_continuous, tree = FA_data$phy)
Not sure what other info will help figure out the issue. The data set and phylogeny loaded without an error.
In the tutorial, the tree and trait data are jointly simulated by the simtraits() function, so both end up as elements of a single list. In your case (which will be typical of real-data cases), the tree and the trait data come from different sources, so most likely you want
PPE <- phylopars(trait_data = FA_data, tree = phy)
provided that FA_data contains a first column species matching the tip names in phy, and otherwise only the numeric data you want to use (potentially only the single fatty_acid1 column).
For comparison, the data structure returned by simtraits() looks like this (using str()):
List of 4
$ trait_data:'data.frame': 45 obs. of 5 variables:
..$ species: chr [1:45] "t7" "t8" "t2" "t3" ...
..$ V1 : num [1:45] 1.338 0.308 1.739 2.009 2.903 ...
..$ V2 : num [1:45] -2.002 -0.115 -0.349 -4.452 NA ...
..$ V3 : num [1:45] -1.74 NA 1.09 -2.54 -1.19 ...
..$ V4 : num [1:45] 2.496 2.712 1.198 1.675 -0.117 ...
$ tree :List of 4
..$ edge : int [1:28, 1:2] 29 29 28 28 27 27 26 26 25 25 ...
..$ edge.length: num [1:28] 0.0941 0.0941 0.6233 0.7174 0.0527 ...
..$ Nnode : int 14
..$ tip.label : chr [1:15] "t7" "t8" "t2" "t3" ...
..- attr(*, "class")= chr "phylo"
..- attr(*, "order")= chr "postorder"
...
you can see that simtraits() returns a list containing (among other things) (1) a data frame with species as the first column and the other columns numeric and (2) a phylogenetic tree.
You
I have two zoo objects and I am wondering, why the first zoo object shows
..$ : NULL
when the second one shows
..$ : chr [1:2266] "1" "2" "3" "4" ...
instead.
What does this mean in this case? And how could I change
"..$ : chr [1:2266] "1" "2" "3" "4" ..." to "..$ : NULL" in the second object?
Background: I am conducting an eventstudy (package eventstudies) and I get the error:
Error in rval[i, j, drop = drop., ...] : subscript out of bounds
One of the possible problems could be the formatting of the data. The object "OtherReturns" is used as example data in the package. I am hoping to resolve the error by understanding more about zoo objects. Yet, I could not understand the formatting difference between the two following objects:
> str(OtherReturns)
‘zoo’ series from 2010-07-01 to 2013-03-28
Data: num [1:720, 1:4] -1.1568 -0.2727 -0.0229 1.01 -0.9107 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "NiftyIndex" "CallMoneyRate" "SP500" "USDINR"
Index: Date[1:720], format: "2010-07-01" "2010-07-02" "2010-07-05" "2010-07-06" "2010-07-07" "2010-07-08" "2010-07-09" "2010-07-12" ...
> str(zoo_SP500)
‘zoo’ series from 2007-01-03 to 2015-12-31
Data: num [1:2266, 1:2] 1417 1418 1410 1413 1412 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2266] "1" "2" "3" "4" ...
..$ : chr [1:2] "SP500_Closing" "Index_return"
Index: Date[1:2266], format: "2007-01-03" "2007-01-04" "2007-01-05" "2007-01-08" "2007-01-09" "2007-01-10" "2007-01-11" "2007-01-12" ...
At the first glance, both objects look the same in this regard:
> head(OtherReturns,3)
NiftyIndex CallMoneyRate SP500 USDINR
2010-07-01 -1.15678265 NA NA 0.3009785
2010-07-02 -0.27267977 NA -1.341017 0.0000000
2010-07-05 -0.02291607 NA NA 0.4070272
> head(zoo_SP500,3)
SP500_Closing Index_return
2007-01-03 1416.60 -0.1198618
2007-01-04 1418.34 0.1228293
2007-01-05 1409.71 -0.6084578
Any help and explanations much appreciated!
Thank you!
I would like to make some specific calculation within a large dataset.
This is my MWE using an API call (takes 3-4 sec ONLY to Download)
devtools::install_github('mingjerli/IMFData')
library(IMFData)
fdi_asst <- c("BFDA_BP6_USD","BFDAD_BP6_USD","BFDAE_BP6_USD")
databaseID <- "BOP"
startdate <- "1980-01-01"
enddate <- "2016-12-31"
checkquery <- FALSE
FDI_ASSETS <- as.data.frame(CompactDataMethod(databaseID, list(CL_FREA = "Q", CL_AREA_BOP = "", CL_INDICATOR_BOP= fdi_asst), startdate, enddate, checkquery))
my dataframe 'FDI_ASSETS' looks like this (I provide a picture instead of head() for convenience)
the last column is a list and contains three more variables:
head(FDI_ASSETS$Obs)
[[1]]
#TIME_PERIOD #OBS_VALUE #OBS_STATUS
1 1980-Q1 30.0318922812441 <NA>
2 1980-Q2 23.8926174547104 <NA>
3 1980-Q3 26.599634375058 <NA>
4 1980-Q4 32.7522451203517 <NA>
5 1981-Q1 44.124979234001 <NA>
6 1981-Q2 35.9907120805994 <NA>
MY SCOPE
I want to do the following:
if/when the "#UNIT_MULT == 6" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000
if/when the "#UNIT_MULT == 3" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000000
UPDATE
Structure of FDI_ASSETS looks like this:
str(FDI_ASSETS)
'data.frame': 375 obs. of 6 variables:
$ #FREQ : chr "Q" "Q" "Q" "Q" ...
$ #REF_AREA : chr "FI" "MX" "MX" "TO" ...
$ #INDICATOR : chr "BFDAE_BP6_USD" "BFDAD_BP6_USD" "BFDAE_BP6_USD" "BFDAD_BP6_USD" ...
$ #UNIT_MULT : chr "6" "6" "6" "3" ...
$ #TIME_FORMAT: chr "P3M" "P3M" "P3M" "P3M" ...
$ Obs :List of 375
..$ :'data.frame': 147 obs. of 3 variables:
.. ..$ #TIME_PERIOD: chr "1980-Q1" "1980-Q2" "1980-Q3" "1980-Q4" ...
.. ..$ #OBS_VALUE : chr "30.0318922812441" "23.8926174547104" "26.599634375058" "32.7522451203517" ...
.. ..$ #OBS_STATUS : chr NA NA NA NA ...
..$ :'data.frame': 60 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q3" "2002-Q1" "2002-Q2" ...
.. ..$ #OBS_VALUE : chr "9.99999999748979E-05" "9.99999997475243E-05" "9.8999999998739E-05" "-9.90000000342661E-05" ...
..$ :'data.frame': 63 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q2" "2001-Q3" "2001-Q4" ...
.. ..$ #OBS_VALUE : chr "130.0149" "189.627" "3453.8319" "630.483" ...
..$ :'data.frame': 17 obs. of 2 variables:
I downloaded your data and it is quite complicated. I have removed my wrong answer so that you can get it answered by #akrun or someone similar :) I don't have the time to parse through it right now.
I found the following solution
list_assets<-list(FDI_ASSETS=FDI_ASSETS, Portfolio_ASSETS=Portfolio_ASSETS, other_invest_ASSETS=other_invest_ASSETS, fin_der_ASSETS=fin_der_ASSETS, Reserves=Reserves)
for (df in list_assets){
for( i in 1:length(df$"#UNIT_MULT")){
if (df$"#UNIT_MULT"[i]=="6"){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000
} else if ((df$"#UNIT_MULT"[i]=="3")){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000000
}
}
}
Please let me know how I can modify the code in order to make it more efficient and avoid these loops.
in R, I have computed a k-means clustering as follows:
km = (mat2, centers=3)
where mat2 is a matrix of column vectors obtained by combining elements of a set of time series. There are 31 rows
Now that I have my k-means object how can I look at the data associated with a particular point? For example, supposed I clicked on a dot in that belongs to one of the partitions. How can I view this data? Of course what I mean is how to programmatically obtain this data.
I expect that you call kmeans as this:
set.seed(42)
df <- data.frame( row.names = paste0( "obs", 1:100 ),
V1 = rnorm(100),
V2 = rnorm(100),
V3 = rnorm(100) )
km <- kmeans( df, centers = 3 )
If you are unfamiliar with a new function, it's always a good idea to inspect the resulting object using str():
> str(km)
List of 7
$ cluster : Named int [1:100] 1 2 3 3 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:100] "obs1" "obs2" "obs3" "obs4" ...
$ centers : num [1:3, 1:3] 0.65604 -1.09689 0.56428 0.11162 0.00549 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:3] "V1" "V2" "V3"
$ totss : num 291
$ withinss : num [1:3] 43.7 65.7 51.3
$ tot.withinss: num 161
$ betweenss : num 130
$ size : int [1:3] 36 34 30
- attr(*, "class")= chr "kmeans"
As I understood from your question, you are looking for km$cluster, which tells you which observation of your data has been assigned to which cluster. The cluster centers can accordingly be investigated by km$centers.
If you now want to know which observations has been clustered to the third cluster with the center km$centers[3,], you can subset your data.frame (or matrix) by
> rownames(df[ km$cluster == 3, ])
[1] "obs3" "obs4" "obs12" "obs15" "obs16" "obs21" "obs25" "obs27" "obs32" "obs42" "obs43" "obs46" "obs48" "obs54" "obs55" "obs58" "obs61" "obs62" "obs63" "obs66" "obs67" "obs73" "obs76"
[24] "obs77" "obs81" "obs84" "obs86" "obs87" "obs90" "obs94"
I have a short R script that loads a bunch of data and plots it in an XBar chart. Using the following code, I can plot the data and view the various statistical information.
library(qcc)
tir<-read.table("data.dat", header=T,,sep="\t")
names(tir)
attach(tir)
rand <- sample(tir)
xbarchart <- qcc(rand[1:100,],type="R")
summary(xbarchart)
I want to be able to do some process capability analysis (described here(PDF) on page 5) immediately after the XBar chart is created. In order to create the analysis chart, I need to store the LCL and UCL results from the XBar chart results created before as variables. Is there any way I can do this?
I shall answer your question using the example in the ?qcc help file.
x <- c(33.75, 33.05, 34, 33.81, 33.46, 34.02, 33.68, 33.27, 33.49, 33.20,
33.62, 33.00, 33.54, 33.12, 33.84)
xbarchart <- qcc(x, type="xbar.one", std.dev = "SD")
A useful function to inspect the structure of variables and function results is str(), short for structure.
str(xbarchart)
List of 11
$ call : language qcc(data = x, type = "xbar.one", std.dev = "SD")
$ type : chr "xbar.one"
$ data.name : chr "x"
$ data : num [1:15, 1] 33.8 33 34 33.8 33.5 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:15] "1" "2" "3" "4" ...
.. ..$ Samples: NULL
$ statistics: Named num [1:15] 33.8 33 34 33.8 33.5 ...
..- attr(*, "names")= chr [1:15] "1" "2" "3" "4" ...
$ sizes : int [1:15] 1 1 1 1 1 1 1 1 1 1 ...
$ center : num 33.5
$ std.dev : num 0.342
$ nsigmas : num 3
$ limits : num [1, 1:2] 32.5 34.5
..- attr(*, "dimnames")=List of 2
.. ..$ : chr ""
.. ..$ : chr [1:2] "LCL" "UCL"
$ violations:List of 2
..$ beyond.limits : int(0)
..$ violating.runs: num(0)
- attr(*, "class")= chr "qcc"
You will notice the second to last element in this list is called $limits and contains the two values for LCL and UCL.
It is simple to extract this element:
limits <- xbarchart$limits
limits
LCL UCL
32.49855 34.54811
Thus LCL <- limits[1] and UCL <- limits[2]