convert triple nested list to dataframe - r

I'm trying to convert a triple nested list into a dataframe. This question has helped, but I can't get the dataframe I'd like.
The list is an options chain obtained from IBrokers, a summary is shown below. I've uploaded the actual chain here which is more detailed.
Chain <-
list(
list(
list(
list(version="8",contract=list(symbol="BHP",right="C",expiry="20180621",strike="25")),
list(version="8",contract=list(symbol="BHP",right="C",expiry="20180621",strike="26"))
),
list(
list(version="8",contract=list(symbol="BHP",right="C",expiry="20180730",strike="25")),
list(version="8",contract=list(symbol="BHP",right="C",expiry="20180730",strike="26"))
)
),
list(
list(
list(version="8",contract=list(symbol="CBA",right="C",expiry="20180621",strike="65")),
list(version="8",contract=list(symbol="CBA",right="C",expiry="20180621",strike="64"))
),
list(
list(version="8",contract=list(symbol="CBA",right="C",expiry="20180730",strike="65")),
list(version="8",contract=list(symbol="CBA",right="C",expiry="20180730",strike="64"))
)
)
)
I'd like to convert the list into a dataframe like this:
Contracts <- data.frame(symbol=c("BHP","BHP","BHP","BHP","CBA","CBA","CBA","CBA"),
right=c("C","C","C","C","C","C","C","C"),
expiry=c("20180621","20180621","20180730","20180730","20180621","20180621","20180730","20180730"),
strike=c("25","26","25","26","65","64","65","64"))
I tried this code, but it didn't give me the dataframe I wanted.
X <- lapply(Chain,function(x) as.data.frame.list(lapply(x,as.data.frame.list)))
dfx <- do.call(rbind,X)
Any suggestions please?

How about the following?
df <- as.data.frame(matrix(unlist(Chain, recursive = T), ncol = 5, byrow = T)[, -1]);
colnames(df) <- c("symbol", "right", "expiry", "strike");
# symbol right expiry strike
#1 BHP C 20180621 25
#2 BHP C 20180621 26
#3 BHP C 20180730 25
#4 BHP C 20180730 26
#5 CBA C 20180621 65
#6 CBA C 20180621 64
#7 CBA C 20180730 65
#8 CBA C 20180730 64
Explanation: Recursively unlist the nested Chain, then recast as matrix, remove column version and convert to data.frame. The only minor down-side is that we have to manually add column names.
Update
Since your actual data is quite different, here is a possibility.
Note: I assume the structure from the Gist is stored in tbl.
tbl;
#Source: local data frame [2 x 6]
#Groups: <by row>
#
## A tibble: 2 x 6
# symbol sectype exch currency multiplier Chain
# <fct> <fct> <fct> <fct> <fct> <list>
#1 BHP OPT ASX AUD 100 <list [1,241]>
#2 CBA OPT ASX AUD 100 <list [1,204]>
The following list contains two data.frames, one for each row from tbl.
lst <- lapply(tbl$Chain, function(x)
do.call(rbind.data.frame, lapply(x, function(y) as.data.frame(unclass(y$contract)))))
#List of 2
# $ :'data.frame': 1241 obs. of 16 variables:
# ..$ conId : Factor w/ 1241 levels "198440202","198440207",..: 1 2 3 4 5 6 7 8 9 10 ...
# ..$ symbol : Factor w/ 1 level "BHP": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ sectype : Factor w/ 1 level "OPT": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ exch : Factor w/ 1 level "ASX": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ primary : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ expiry : Factor w/ 18 levels "20180628","20181220",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ strike : Factor w/ 118 levels "25","26","27",..: 1 1 2 2 3 3 4 4 5 5 ...
# ..$ currency : Factor w/ 1 level "AUD": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ right : Factor w/ 2 levels "C","P": 1 2 1 2 1 2 1 2 1 2 ...
# ..$ local : Factor w/ 1241 levels "BHPV78","BHPV88",..: 1 2 3 4 5 6 7 8 9 10 ...
# ..$ multiplier : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ combo_legs_desc: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ comboleg : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ include_expired: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ secIdType : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ secId : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# $ :'data.frame': 1204 obs. of 16 variables:
# ..$ conId : Factor w/ 1204 levels "198447027","198447030",..: 1 2 3 4 5 6 7 8 9 10 ...
# ..$ symbol : Factor w/ 1 level "CBA": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ sectype : Factor w/ 1 level "OPT": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ exch : Factor w/ 1 level "ASX": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ primary : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ expiry : Factor w/ 18 levels "20180628","20181220",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ strike : Factor w/ 179 levels "79.68","81.68",..: 1 1 2 2 3 3 4 4 5 5 ...
# ..$ currency : Factor w/ 1 level "AUD": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ right : Factor w/ 2 levels "C","P": 1 2 1 2 1 2 1 2 1 2 ...
# ..$ local : Factor w/ 1204 levels "CBAKT9","CBAKU9",..: 1 2 3 4 5 6 7 8 9 10 ...
# ..$ multiplier : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ combo_legs_desc: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ comboleg : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ include_expired: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ secIdType : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# ..$ secId : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...

You can use unstack
unstack(data.frame(d<-unlist(Chain),names(d)))
contract.expiry contract.right contract.strike contract.symbol version
1 20180621 C 25 BHP 8
2 20180621 C 26 BHP 8
3 20180730 C 25 BHP 8
4 20180730 C 26 BHP 8
5 20180621 C 65 CBA 8
6 20180621 C 64 CBA 8
7 20180730 C 65 CBA 8
8 20180730 C 64 CBA 8
If you want you can delete the word contract.
unstack(data.frame(d<-unlist(Chain),sub(".*[.]","",names(d))))
expiry right strike symbol version
1 20180621 C 25 BHP 8
2 20180621 C 26 BHP 8
3 20180730 C 25 BHP 8
4 20180730 C 26 BHP 8
5 20180621 C 65 CBA 8
6 20180621 C 64 CBA 8
7 20180730 C 65 CBA 8
8 20180730 C 64 CBA 8
This can also be written as unstack(data.frame(d<-unlist(Chain),sub("contract[.]","",names(d)))) Although I would prefer to maintain the name contract in order to know which columns indeed form the contract dataframe needed
Or even you can change the names After unstacking.
With the new data:
a=readLines("https://raw.githubusercontent.com/hughandersen/OptionsTrading/master/Stocks_option_chain")
b=eval(parse(text=paste(a,collapse="")))
s=unstack(data.frame(d<-unlist(b[6]),names(d)))

Related

variables length differ found

I face problem in r, while doing glm. The problem is, Variable length differ found for "var1". but when I delete this var1 from the data. Then next same type error appear for next variables present in the data. I checked all the data, but there are no length differs in actual. How I resolve this problem? Anyone can please help me. Thanks in advance.
The data is look like; d_status is my response variable and is factor. here doesn't appear because of there are more variables.
data.frame': 300 obs. of 20 variables:
$ age : num 28 43 32 64 37 42 36 48 55 31 ...
$ gender : num 1 2 2 2 1 2 2 1 2 2 ...
$ u_clarity: num 1 2 1 2 1 1 1 2 1 1 ...
$ ph : num 5 5.5 5 5 5 5 5 5.2 5 5 ...
$ sp_g : num 1.01 1.02 1.01 1.01 1.01 ...
$ albumin : num 1 1 2 1 2 2 2 1 1 2 ...
$ glucose : num 2 2 2 2 2 2 2 2 2 2 ...
$ sugar : num 2 1 2 2 2 2 1 1 1 2 ...
$ kb : num 2 2 2 2 2 2 2 2 2 2 ...
$ bpigment : num 2 2 2 2 2 2 2 1 2 2 ...
$ ur_bi : num 2 2 2 2 2 2 2 2 2 2 ...
$ blood : num 2 2 2 2 2 2 2 2 2 2 ...
$ pus_cells: num 1 2 1 2 1 1 1 1 1 1 ...
$ red_cells: num 1 2 1 2 1 1 1 2 1 1 ...
$ epi_cells: num 1 2 1 2 1 2 1 1 2 2 ...
$ mt : num 1 2 1 2 1 1 2 2 2 1 ...
$ co : num 2 1 1 2 1 1 1 2 2 1 ...
$ gc : num 2 1 1 1 1 1 1 2 2 1 ...
$ bacteria : num 1 2 1 2 1 1 1 2 2 1 ...
$ cc : num 1 1 1 2 1 1 1 2 1 1 ...
f1=glm(y~.,family=quasibinomial(link='logit'),data=dataset1[training,])
Error in model.frame.default(formula = y ~ ., data = dataset1[training, :
variable lengths differ (found for 'age')

Categorical data in R with h2o

I have run a logistical regression model with both categorical and numerical variables. The model was trying to predict the number of website visits in a month based off the first week. Obviously the number of website visits in the first week was the strongest indicator. However when i ran h2o deep learning with various models and activation functions the model performs very poorly. Based off the var_imp function it gives importance to very non important variables(based off my logistical regression model, which is quite good, this is wrong), and only seems to have categorical subsets ranked with high importance. and the model does not perform well even on the training data, a real warning sign! So i just wanted to upload my code to check i am not doing anything to harm the model. It seems strange for logistical regression to get it quiteright but deep learning to get it so wrong, so i imagine its something i've done!
summary(data)
$ VAR1: Factor w/ 8 levels ,..: 1 5 2 1 7 2 5 1 5 1 ...
$ VAR2: Factor w/ 5 levels ,..: 1 4 1 1 4 4 4 1 1 4 ...
$ VAR3: Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 2 ...
$ VAR4: Factor w/ 2 levels : 2 1 2 2 1 1 1 2 2 1 ...
$ VAR5 : num 1000 20 30 20 30 30 30 50 30 400 ...
$ VAR6: Factor w/ 2 levels "N","Y": 1 2 2 1 2 2 2 2 1 2 ...
$ VAR7: Factor w/ 2 levels "N","Y": 1 2 2 1 2 2 2 2 1 2 ...
$ VAR8: num 0 0 0 0 0 0 0 0 0 0 ...
$ VAR9: num 56 52 49 29 28 38 34 79 53 36 ...
$ VAR10: num 3 2 1 3 2 2 3 4 2 2 ...
$ VAR11: num 1 1 1 2 2 1 1 1 1 2 ...
$ VAR12: Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 1 1 1 ...
$ VAR13: num 1 0 1 1 1 0 1 0 0 0 ...
$ VAR14: Factor w/ 2 levels "N","Y": 2 1 1 1 1 1 1 1 1 1 ...
$ VAR15: Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ VAR16: num 1 0 0 1 0 0 0 1 1 0 ...
$ VAR17: num 19 7 1 4 10 2 4 4 7 12 ...
$ VAR18: Factor w/ 2 levels "N","Y": 1 2 2 2 2 2 2 1 2 1 ...
$ VAR19: Factor w/ 2 levels "0","Y": 1 1 2 1 1 1 1 1 1 1 ...
$ VAR20: Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 1 1 1 1 ...
$ VAR21: Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ VAR22: : num 0.579 0 0 0 0.4 ...
$ VAR23: num 1.89 1 1 1 2.9 ...
$ VAR24: num 0.02962 0.00691 0.05327 0.02727 0.01043 ...
$ VAR25: Factor w/ 3 levels ..: 2 2 2 3 3 2 3 2 1 3 ...
$ VAR26: num 3 2 1 2 3 1 2 1 2 4 ...
$ VAR27: num 3 2 1 1 5 1 1 1 1 2 ...
$ VAR_RESPONSE: num 7 24 4 3 8 12 5 48 2 7 ...
sapply(data,function(x) sum(is.na(x)))
colSums(is.na(data))
data[is.na(data)] = 0
d.hex = as.h2o(data, destination_frame= "d.hex")
Data_g.split = h2o.splitFrame(data = d.hex,ratios = 0.75)
Data_train = Data_g.split[[1]]#75% training data
Data_test = Data_g.split[[2]]
activation_opt <-
c("Rectifier","RectifierWithDropout","Maxout","MaxoutWithDropout",
"Tanh","TanhWithDropout")
hidden_opt <- list(c(10,10),c(20,15),c(50,50,50),c(5,3,2),c(100,100),c(5),c(30,30,30),c(50,50,50,50),c(5,4,3,2))
l1_opt <- c(0,1e-3,1e-5,1e-7,1e-9)
l2_opt <- c(0,1e-3,1e-5,1e-7,1e-9)
hyper_params <- list( activation=activation_opt,
hidden=hidden_opt,
l1=l1_opt,
l2=l2_opt )
search_criteria <- list(strategy = "RandomDiscrete", max_models=30)
dl_grid10 <- h2o.grid("deeplearning"
,grid_id = "deep_learn10"
,hyper_params = hyper_params
,search_criteria = search_criteria
,x = 1:27
,y = "VAR_RESPONSE"
,training_frame = Data_train)
d_grid10 <- h2o.getGrid("deep_learn10",sort_by = "mse")
mn = h2o.deeplearning(x = 1:27,
y = "VAR_RESPONSE",
training_frame = Data_train,
model_id = "mn",
activation = "Maxout",
l1 = 0,
l2 = 1e-9,
hidden = c(100,100),)

R, aggregate function apparently causes loss of column levels?

I just encountered a weird situation in RGui...I used the same script as always to get my data.frame into the right shape for ggplot2. So my data looks like the following:
time days treatment nucleic_acid habitat parallel disturbance variable cellcounts value
1 1 2 control dna water 1 none Proteobacteria batch 0.000000000
2 2 22 control dna water 1 none Proteobacteria batch 0.003586543
3 1 2 treated dna water 1 none Proteobacteria batch 0.000000000
4 2 22 treated dna biofilm 1 none Proteobacteria NA 0.000000000
'data.frame': 185648 obs. of 10 variables:
$ time : int 5 5 5 5 5 5 6 6 6 6 ...
$ days : int 62 62 62 62 62 62 69 69 69 69 ...
$ treatment : Factor w/ 2 levels "control","treated": 2 2 2 1 1 1 2 2 2 1 ...
$ parallel : int 1 2 3 1 2 3 1 2 3 1 ...
$ nucleic_acid: Factor w/ 2 levels "cdna","dna": 1 1 1 1 1 1 1 1 1 1 ...
$ habitat : Factor w/ 2 levels "biofilm","water": 1 1 1 1 1 1 1 1 1 1 ...
$ cellcounts : Factor w/ 4 levels "batch","high",..: NA NA NA NA NA NA NA NA NA NA ...
$ disturbance : Factor w/ 3 levels "high","low","none": 3 3 3 3 3 3 3 3 3 3 ...
$ variable : Factor w/ 656 levels "Proteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 0 0 0 0 0 0 0 0 0 ...
and I wanted aggregate to calculate the mean value of my up to 3 parallels:
df_mean<-aggregate(value~time+days+treatment+nucleic_acid+habitat+disturbance+variable+cellcounts, data = df, mean)
afterwards, the level "biofilm" in column "habitat" is lost.
df_mean<-droplevels(df_mean)
str(df_mean)
'data.frame': 44608 obs. of 9 variables:
$ time : int 1 2 1 2 1 2 1 2 1 2 ...
$ days : int 2 22 2 22 2 22 2 22 2 22 ...
$ treatment : Factor w/ 2 levels "control","treated": 1 1 2 2 1 1 2 2 1 1 ...
$ nucleic_acid: Factor w/ 2 levels "cdna","dna": 2 2 2 2 2 2 2 2 2 2 ...
$ habitat : Factor w/ 1 level "water": 1 1 1 1 1 1 1 1 1 1 ...
$ disturbance : Factor w/ 3 levels "high","low","none": 3 3 3 3 3 3 3 3 3 3 ...
$ variable : Factor w/ 656 levels "Proteobacteria",..: 1 1 1 1 2 2 2 2 3 3 ...
$ cellcounts : Factor w/ 4 levels "batch","high",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 0.00359 0 0 0 ...
So I spent a lot of time (I actually just realised this, there were many more issues that now all seem to be aggregate related) looking into this. I removed the column "cellcounts" and it worked. Interestingly, the columns "cellcounts" and "habitat" always carry in case of "biofilm" the same, therefore redundant, information ("biofilm" goes always with "NA"). Is this the cause? But it always worked before, so I don't get my head around this. Was there a change to the base::aggregate function or something like that? Do you have an explanation for me? I'm using R-3.4.0, other packages used are reshape, reshape2 and ggplot2
Thx a lot, a confused crazysantaclaus
The issue comes from the NA, maybe your file was loaded differently in the past and these were stored as strings instead of NA values ? Here's a way to solve it by setting them to a "NA" string:
levels(df$cellcounts) <- c(levels(df$cellcounts),"NA")
df$cellcounts[is.na(df$cellcounts)] <- "NA"
df_mean <- aggregate(value ~ time+days+treatment+nucleic_acid+habitat+disturbance+variable+cellcounts, data = df, mean,na.rm=TRUE)
df_mean<-droplevels(df_mean)
str(df_mean)
'data.frame': 4 obs. of 9 variables:
$ time : int 1 2 1 2
$ days : int 2 22 2 22
$ treatment : Factor w/ 2 levels "control","treated": 1 1 2 2
$ nucleic_acid: Factor w/ 1 level "dna": 1 1 1 1
$ habitat : Factor w/ 2 levels "biofilm","water": 2 2 2 1
$ disturbance : Factor w/ 1 level "none": 1 1 1 1
$ variable : Factor w/ 1 level "Proteobacteria": 1 1 1 1
$ cellcounts : Factor w/ 2 levels "batch","NA": 1 1 1 2
$ value : num 0 0.00359 0 0
data
df <- read.table(text=" time days treatment nucleic_acid habitat parallel disturbance variable cellcounts value
1 1 2 control dna water 1 none Proteobacteria batch 0.000000000
2 2 22 control dna water 1 none Proteobacteria batch 0.003586543
3 1 2 treated dna water 1 none Proteobacteria batch 0.000000000
4 2 22 treated dna biofilm 1 none Proteobacteria NA 0.000000000
",header=T)

Merging in R: 1 row missing after merge

I have a dataframe movielens:
str(u.data)
'data.frame': 100000 obs. of 4 variables:
$ userID : int 196 186 22 244 166 298 115 253 305 6 ...
$ movieID : int 242 302 377 51 346 474 265 465 451 86 ...
$ rating : int 3 3 1 2 1 4 2 5 3 3 ...
$ timestamp: int 881250949 891717742 878887116 880606923 886397596 884182806 881171488 891628467 886324817 883603013 ...
and
str(u.item)
'data.frame': 1681 obs. of 20 variables:
$ unknown : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Action : Factor w/ 2 levels "0","1": 1 2 1 2 1 1 1 1 1 1 ...
$ Adventure : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
$ Animation : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 1 1 1 ...
$ Childrens : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 2 1 1 ...
$ Comedy : Factor w/ 2 levels "0","1": 2 1 1 2 1 1 1 2 1 1 ...
$ Crime : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
$ Documentary: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Drama : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 2 2 ...
$ Fantasy : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Film-Noir : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Horror : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Musical : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Mystery : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Romance : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Sci-Fi : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 1 ...
$ Thriller : Factor w/ 2 levels "0","1": 1 2 2 1 2 1 1 1 1 1 ...
$ War : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
$ Western : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ movieID : int 1 2 3 4 5 6 7 8 9 10 ...
The number of row of u.data is 100.000
nrow(u.data)
100000
And
nrow(u.item)
[1] 1681
Then, I want to merge them:
all_data = u.data
all_data = merge(all_data, u.item, by = "movieID")
But the merged data has only 99.999 rows
nrow(all_data)
[1] 99999
Did I did something wrong while merging these two data frames?
This happens if min(u.data$movieID) < min(u.item$movieID) or if max(u.data$movieID) > max(u.item$movieID). Example for the latter:
# max(u.data$movieID) = 10
u.data <- data.frame(movieID = 1:10, NAME = LETTERS[1:10])
dim(u.data)
# [1] 10 2
# max(u.item$movieID) = 11
u.item <- data.frame(movieID = c(1:9,11), name = letters[c(1:9,11)])
dim(u.item)
# [1] 10 2
out <- merge(u.data, u.item, by = "movieID")
dim(out)
# [1] 9 3
# check if all elements of u.item$movieID exist in u.data$movieID
is.element(u.data$movieID, u.item$movieID)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
Suggested by Batanichek:
out <- merge(u.data, u.item, by = "movieID", all.x = TRUE)
dim(out)
# [1] 10 3

getting "PC1" insted of variable name in principal component analysis

I have some data that looks like this:
head(data)
net1re net2re net3re net4re net5re net6re
24 3 2 1 2 3 3
33 1 1 1 1 1 2
30 3 1 1 1 1 3
22 2 1 1 1 1 1
31 3 2 1 1 1 2
1 2 1 1 1 1 2
I'm running principal component analysis as follows:
library(psych)
fit <- principal(data[,1:6], rotate="varimax")
data$friendship=fit$scores
This creates the variable "friendship" which I can call on the console:
> colnames(data)
[1] "net1re" "net2re" "net3re" "net4re" "net5re"
[6] "net6re" "friendship"
But when I want to view my data, instead of the variable name I get "PC1":
> head(data)
net1re net2re net3re net4re net5re net6re PC1
24 3 2 1 2 3 3 1.29231531
33 1 1 1 1 1 2 -0.68448111
30 3 1 1 1 1 3 0.02783916
22 2 1 1 1 1 1 -0.67371031
31 3 2 1 1 1 2 0.10251282
1 2 1 1 1 1 2 -0.44345075
This becomes a major trouble because I need to repeat that with diffrent variables and all the results get "PC1".
Why is this happening and how can I assign the variable name instead of "PC1".
Thanks
This unusual effect appears becausefit$scores is a matrix:
str(data)
#'data.frame': 6 obs. of 7 variables:
# $ net1re : int 3 1 3 2 3 2
# $ net2re : int 2 1 1 1 2 1
# $ net3re : int 1 1 1 1 1 1
# $ net4re : int 2 1 1 1 1 1
# $ net5re : int 3 1 1 1 1 1
# $ net6re : int 3 2 3 1 2 2
# $ friendship: num [1:6, 1] 1.1664 -1.261 0.0946 -0.5832 1.1664 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "24" "33" "30" "22" ...
# .. ..$ : chr "PC1"
To get the desired result, you can use
data$friendship=as.vector(fit$scores)
or
data$friendship=fit$scores[,1]
In either case, the output will be:
data
# net1re net2re net3re net4re net5re net6re friendship
#24 3 2 1 2 3 3 1.16635312
#33 1 1 1 1 1 2 -1.26098965
#30 3 1 1 1 1 3 0.09463653
str(data)
#'data.frame': 6 obs. of 7 variables:
# $ net1re : int 3 1 3 2 3 2
# $ net2re : int 2 1 1 1 2 1
# $ net3re : int 1 1 1 1 1 1
# $ net4re : int 2 1 1 1 1 1
# $ net5re : int 3 1 1 1 1 1
# $ net6re : int 3 2 3 1 2 2
# $ friendship: num 1.1664 -1.261 0.0946 -0.5832 1.1664 ...

Resources