Importing CSV of arrays as list - r

I'm trying to do the following:
I have a .csv file with N rows and 2 columns that I need to import and convert to a list.
Example file from .csv:
First seven rows of data
I import with command: points <- read.csv("points.csv")
'data.frame': 42 obs. of 2 variables:
$ Firefly : int 0 1 0 1 0 1 0 1 0 1 ...
$ Hawkes_times: Factor w/ 42 levels "[ 0.03485687 0.20167375 0.20275073
I need it as a sorted "List of 2" (one for each Firefly) with the following structure:
> str(points)
List of 2
$ : num [1:33] 0.79 0.87 0.88 0.89 0.94 1.01 1.13 1.19 ...
$ : num [1:14] 0.00 0.10 0.56 0.67 1.27 1.31 1.37 1.42 ...
, where the first list represents Firefly == 0 and second list represents Firefly == 1.
I attempt the following:
fy0 <- subset(points,Firefly == 0)
fy1 <- subset(points,Firefly == 1)
points.list <- list(fy0,fy1)
> str(points.list)
List of 2
$ :'data.frame': 21 obs. of 2 variables:
..$ Firefly : int [1:21] 0 0 0 0 0 0 0 0 0 0 ...
..$ Hawkes_times: Factor w/ 42 levels "[ 0.03485687 0.20167375 0.20275073 0.20941455 0.40515277 0.47026309\n 0.55714817 0.64789982 0.70749241 "| __truncated__,..: 30 29 28 31 39 40 33 37 25 24 ...
$ :'data.frame': 21 obs. of 2 variables:
..$ Firefly : int [1:21] 1 1 1 1 1 1 1 1 1 1 ...
..$ Hawkes_times: Factor w/ 42 levels "[ 0.03485687 0.20167375 0.20275073 0.20941455 0.40515277 0.47026309\n 0.55714817 0.64789982 0.70749241 "| __truncated__,..: 26 32 21 23 20 41 34 22 27 36 ...
I think I need a as.numeric(fy0$Hawkes_times) somewhere, but I want to avoid loops since I will have hundreds of rows and n Firefly values (fy0, fy1, fy2, ... fyn).
Thank you!
-Richard

points <- data.frame(firefly=rep(0:1, times=10), times=1:20)
split(points$times, points$firefly)
# $`0`
# [1] 1 3 5 7 9 11 13 15 17 19
# $`1`
# [1] 2 4 6 8 10 12 14 16 18 20
This does not rely on equally-sized groups:
set.seed(42)
points <- data.frame(firefly=sample(0:1, size=20, replace=TRUE), times=1:20)
split(points$times, points$firefly)
# $`0`
# [1] 3 8 11 14 15 18 19
# $`1`
# [1] 1 2 4 5 6 7 9 10 12 13 16 17 20
and as you can see the order is preserved.

Related

Access frequencies of an atomic vector in a tibble data frame

I am doing Exploratory Data Analysis on a tibble data frame. I've never used tibble so I'm experiecing some difficulties.
My tibble data frame has this structure:
spec_tbl_df [7,397 x 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ X1 : num [1:7397] 9617 12179 9905 5745 10067 ...
$ Administrative : num [1:7397] 5 26 4 3 7 16 4 3 2 0 ...
$ Administrative_Duration: num [1:7397] 408 1562 58 103 165 ...
$ Informational : num [1:7397] 2 9 2 0 1 3 4 5 0 0 ...
$ Informational_Duration : num [1:7397] 47.5 503.7 28.5 0 28.5 ...
$ ProductRelated : num [1:7397] 54 183 82 25 115 86 75 23 27 33 ...
$ ProductRelated_Duration: num [1:7397] 1547 9676 4729 1109 3428 ...
$ BounceRates : num [1:7397] 0 0.0111 0 0 0 ...
$ ExitRates : num [1:7397] 0.01733 0.0142 0.01454 0.00167 0.01629 ...
$ PageValues : num [1:7397] 0 19.57 9.06 61.3 4.97 ...
$ SpecialDay : num [1:7397] 0 0 0 0 0 0 0 0 0 0 ...
$ Month : Factor w/ 10 levels "Aug","Dec","Feb",..: 8 8 8 1 8 4 8 7 8 8 ...
$ OperatingSystems : Factor w/ 8 levels "1","2","3","4",..: 2 3 2 2 2 3 3 4 8 2 ...
$ Browser : Factor w/ 13 levels "1","2","3","4",..: 2 2 2 2 2 2 2 1 2 5 ...
$ Region : Factor w/ 9 levels "1","2","3","4",..: 3 2 1 6 4 8 1 1 7 3 ...
$ TrafficType : Factor w/ 19 levels "1","2","3","4",..: 2 12 2 5 10 4 2 4 2 1 ...
$ VisitorType : Factor w/ 3 levels "New_Visitor",..: 3 3 3 1 3 3 3 3 1 3 ...
$ Weekend : Factor w/ 2 levels "FALSE","TRUE": 2 1 1 1 1 1 1 1 1 1 ...
$ Revenue : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
Now if I use plot_bar to plot the cathegorical data (using DataExplorer package) I have no problem. I would like, for example, to create a boxplot for the cathegorical variable "Month" where for each month I have a boxplot showing how values are distribuited. The problem is that I can't find a way to access the frequencies. If I do the following:
boxplot(Month)
It creates a single boxplot for all the data (all the months) but it's not helpfull at all. Like this:
I would like the months on the x axis and the frequencies on the y axis and a boxplot for each month.
I've tried to "extract" the feature month, transform it to a matrix and repeat the process but it does not work.
Here is the variable montht taken alone:
> summary(x_Month)
Aug Dec Feb Jul June Mar May Nov Oct Sep
258 1034 123 259 166 1125 2014 1814 327 277
What am I missing ?
Something like this would probably work to create barplots for the frequencies of Month:
library(ggplot2)
spec_tbl_df %>%
ggplot(aes(x = Month)) +
geom_bar()

Extracting Gene Games RNAseq DataSet in R

I have a question I can understand or solve. I downloaded GSE115262 From GEO. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115262. I want to extract the gene names from GSM3172784HC$annotation.gene_name. When I do this, I get numbers not the gene names. How do I get the character values? If I run Str(), this is what I get $ annotation.gene_name : Factor w/ 56233 levels "5_8S_rRNA","5S_rRNA",..: 53514 52750 11836 48738. We see I get numbers. If I run head() and look at the GSM3172784HC$annotation.gene_name, I get the gene names, this is what I want. How do I get these?
#### Need to load in all libraries
#General Bioconductor packages
library("GEOquery");
library("Biobase");
# Loop Through Files for download
for(i in 1:length(tmp$V1)){
getGEOSuppFiles(tmp$V1[i])
};
######## Healthy Controls GSE115262 ##########
## May need to read thing mult. times to get into R
GSM3172784HC<-read.table(gzfile("FilePath.txt.gz"), header=T)
## New data-frame
HCData<- cbind(GSM3172784HC$annotation.gene_name, GSM3172784HC$expected_count);
HCData<- as.data.frame(HCData)
row.names(HCData) <- HCData$V1
colnames(HCData) <- c("HC1")
str(GSM3172784HC)
'data.frame': 57955 obs. of 11 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ annotation.gene_id : Factor w/ 57955 levels "ENSG00000000003",..: 1 2 3 4 5 6 7 8 9 10 ...
$ annotation.gene_biotype: Factor w/ 43 levels "3prime_overlapping_ncRNA",..: 20 20 20 20 20 20 20 20 20 20 ...
$ annotation.gene_name : Factor w/ 56233 levels "5_8S_rRNA","5S_rRNA",..: 53514 52750 11836 48738 5916 13731 7375 14125 14433 24521 ...
$ annotation.source : Factor w/ 4 levels "ensembl","ensembl_havana",..: 2 2 2 2 2 2 2 2 2 2 ...
$ transcript_id.s. : Factor w/ 57955 levels "ENST00000000233,ENST00000415666,ENST00000459680,ENST00000463733,ENST00000467281,ENST00000489673",..: 17666 17669 17397 16695 5799 17850 14301 7 1276 12553 ...
$ length : num 1749 940 1073 1538 2430 ...
$ effective_length : num 1623 814 947 1412 2304 ...
$ expected_count : num 0 0 1 1 0 2 2 0 1 1 ...
$ TPM : num 0 0 0.27 0.18 0 0.23 0.07 0 0.65 0.17 ...
$ FPKM : num 0 0 0.41 0.27 0 0.35 0.11 0 0.98 0.25 ...
head(GSM3172784HC)
X annotation.gene_id annotation.gene_biotype annotation.gene_name
1 1 ENSG00000000003 protein_coding TSPAN6
2 2 ENSG00000000005 protein_coding TNMD
3 3 ENSG00000000419 protein_coding DPM1
4 4 ENSG00000000457 protein_coding SCYL3
5 5 ENSG00000000460 protein_coding C1orf112
6 6 ENSG00000000938 protein_coding FGR
annotation.source
1 ensembl_havana
2 ensembl_havana
3 ensembl_havana
4 ensembl_havana
5 ensembl_havana
6 ensembl_havana
transcript_id.s.
1 ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008
2 ENST00000373031,ENST00000485971
3 ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752
4 ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238
5 ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289
6 ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472
length effective_length expected_count TPM FPKM
1 1749.40 1623.17 0 0.00 0.00
2 940.50 814.28 0 0.00 0.00
3 1073.00 946.77 1 0.27 0.41
4 1538.00 1411.77 1 0.18 0.27
5 2430.11 2303.88 0 0.00 0.00
6 2350.00 2223.77 2 0.23 0.35
We can convert the column to character
library(dplyr)
GSM3172784HC <- GSM3172784HC %>%
mutate_if(is.factor, as.character)
Or with mutate/across
GSM3172784HC <- GSM3172784HC %>%
mutate(across(where(is.factor), as.character))
In base R, we can do
i1 <- sapply(GSM3172784HC, is.factor)
GSM3172784HC[i1] <- lapply(GSM3172784HC[i1], as.character)
NOTE: With R >= 4.0.0, by default stringsAsFactors = FALSE

How to combine training and testing dataset in same format

I am practicing with this dataset: http://archive.ics.uci.edu/ml/datasets/Census+Income
I loaded training & testing data.
# Downloading train and test data
trainFile = "adult.data"; testFile = "adult.test"
if (!file.exists (trainFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
destfile = trainFile)
if (!file.exists (testFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
destfile = testFile)
# Assigning column names
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.table (trainFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", stringsAsFactors = TRUE)
# Load the testing data set
testing = read.table (testFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", fill = TRUE, stringsAsFactors = TRUE)
I needed to combined two into one. But, there is a problem. I am seeing structure of the two data is not same.
Display structure of the training data
> str (training)
'data.frame': 32561 obs. of 15 variables:
$ age : int 39 50 38 53 28 37 49 52 31 42 ...
$ workclass : Factor w/ 8 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 16 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 14 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 41 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 2 levels "<=50K",">50K": 1 1 1 1 1 1 1 2 2 2 ...
Display structure of the testing data
> str (testing)
'data.frame': 16282 obs. of 15 variables:
$ age : Factor w/ 74 levels "|1x3 Cross validator",..: 1 10 23 13 29 3 19 14 48 9 ...
$ workclass : Factor w/ 9 levels "","Federal-gov",..: 1 5 5 3 5 NA 5 NA 7 5 ...
$ fnlwgt : int NA 226802 89814 336951 160323 103497 198693 227026 104626 369667 ...
$ education : Factor w/ 17 levels "","10th","11th",..: 1 3 13 9 17 17 2 13 16 17 ...
$ educationnum : int NA 7 9 12 10 10 6 9 15 10 ...
$ maritalstatus: Factor w/ 8 levels "","Divorced",..: 1 6 4 4 4 6 6 6 4 6 ...
$ occupation : Factor w/ 15 levels "","Adm-clerical",..: 1 8 6 12 8 NA 9 NA 11 9 ...
$ relationship : Factor w/ 7 levels "","Husband","Not-in-family",..: 1 5 2 2 2 5 3 6 2 6 ...
$ race : Factor w/ 6 levels "","Amer-Indian-Eskimo",..: 1 4 6 6 4 6 6 4 6 6 ...
$ sex : Factor w/ 3 levels "","Female","Male": 1 3 3 3 3 2 3 3 3 2 ...
$ capitalgain : int NA 0 0 0 7688 0 0 0 3103 0 ...
$ capitalloss : int NA 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int NA 40 50 40 40 30 30 40 32 40 ...
$ nativecountry: Factor w/ 41 levels "","Cambodia",..: 1 39 39 39 39 39 39 39 39 39 ...
$ incomelevel : Factor w/ 3 levels "","<=50K.",">50K.": 1 2 2 3 3 2 2 2 3 2 ...
Problem 1:
age has become factor at testing. and all other levels of factor in testing is being increased by 1 than levels of factor in training. This is because first row is an unnecessary row in testing.
|1x3 Cross validator
I tried to get rid of this by re-assigning testing:
testing = testing[-1,]
but, after running str() command again, I don't see any change.
Problem 2:
Like I said at previous, I needed to combine those two data-frame into one data-frame. So, I run this:
combined <- rbind(training , testing)
Besides the problem-1, I can see new a problem after running str()
> str(combined)
'data.frame': 48842 obs. of 15 variables:
$ age : chr "39" "50" "38" "53" ...
$ workclass : Factor w/ 9 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 17 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 8 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 15 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 7 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 6 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 42 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 5 levels "<=50K",">50K",..: 1 1 1 1 1 1 1 2 2 2 ...
factor levels at target variable (incomelevel) in combined data-frame is 5 where it's 2 (which is correct) in the training data-frame and 3 (increased by 1 for problem-1) in testing data-frame. This is because there is a . (dot) after each value at incomelevel in testing data-frame (<=50K., <=50K., >50K.,......). So, I need to remove that .(dot) But, I am not getting idea how to remove it. Is there any function?
I am very in data and r. That's why, facing this type of basic issues. Can you please help me to solve the issue I am facing?
I think you can ignore the first line of test, this will solve the issue of age being a factor, because it seems like a header:
head(readLines(testFile))
[1] "|1x3 Cross validator"
[2] "25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child, Black, Male, 0, 0, 40, United-States, <=50K."
[3] "38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing, Husband, White, Male, 0, 0, 50, United-States, <=50K."
We run your code, we can use read.csv, with skip=1 for test:
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.csv (trainFile, header = FALSE, col.names = colNames,stringsAsFactors = TRUE,na.strings = "?",strip.white = TRUE)
testing = read.csv (testFile, header = FALSE, col.names = colNames,na.strings = "?",stringsAsFactors = TRUE,skip=1,strip.white = TRUE)
Now, the income level, unfortunately we have to correct it manually, it's a good thing you check:
testing$incomelevel = factor(gsub("\\.","",as.character(testing$incomelevel)))
We check levels, only difference is native country:
all.equal(sapply(testing,levels) ,sapply(training,levels))
[1] "Component “nativecountry”: Lengths (40, 41) differ (string compare on first 40)"
[2] "Component “nativecountry”: 26 string mismatches"
And I don't think there's much you can do, maybe you have to remove it before / after joining:
setdiff(levels(training$nativecountry),levels(testing$nativecountry))
[1] "Holand-Netherlands"

Change type of variables in multiple data frames

I have a list of data frames:
str(df.list)
List of 34
$ :'data.frame': 506 obs. of 7 variables:
..$ Protocol : Factor w/ 5 levels "P1","P2","P3",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : num [1:506] 0 2 3 0.5 6 1 24 24 24 24 ...
..$ SampleID : Factor w/ 40 levels "P1T0","P1T0.5",..: 1 5 7 2 8 3 6 6 6 6 ...
..$ VolunteerID: Factor w/ 15 levels "ID-02","ID-03",..: 10 10 10 10 10 10 10 11 13 14 ...
..$ Assay : Factor w/ 1 level "ALAT": 1 1 1 1 1 1 1 1 1 1 ...
..$ ResultAssay: int [1:506] 23 23 23 24 25 24 20 34 28 17 ...
..$ Index : Factor w/ 502 levels "P1T0.5VID-02",..: 8 31 37 2 43 19 25 26 28 29 ...
$ :'data.frame': 505 obs. of 7 variables:
..$ Protocol : Factor w/ 5 levels "P1","P2","P3",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : num [1:505] 0 2 3 0.5 6 1 24 24 24 24 ...
..$ SampleID : Factor w/ 40 levels "P1T0","P1T0.5",..: 1 5 7 2 8 3 6 6 6 6 ...
..$ VolunteerID: Factor w/ 15 levels "ID-02","ID-03",..: 10 10 10 10 10 10 10 11 13 14 ...
..$ Assay : Factor w/ 1 level "ALB": 1 1 1 1 1 1 1 1 1 1 ...
..$ ResultAssay: int [1:505] 45 46 47 47 49 47 46 46 44 43 ...
..$ Index : Factor w/ 501 levels "P1T0.5VID-02",..: 8 31 37 2 43 19 25 26 28 29 ..
The list contains 34 data frames with equal variable names. The variables Time and ResultAssay are of the wrong type: I would like to have Time as factor and ResultAssay as numerical.
I am trying to generate a function to use together with lapply to convert the variable type of this list of 34 data frames in one go, but so far i am unsuccessful.
I have tried things in parallel to:
ChangeType <- function(DF){
DF[,2] <- as.factor(DF[,2])
DF[, "ResultAssay"] <- as.numeric(DF[, c("ResultAssay")]
}
lapply(df.list, ChangeType)
What you have tried is nearly correct, but you also need to return the new data.frame and also store it to your existing variable, as so:
ChangeType <- function(DF){
DF[,2] <- as.factor(DF[,2])
DF[, "ResultAssay"] <- as.numeric(DF[, c("ResultAssay")]
DF #return the data.frame
}
# store the returned value to df.list,
# thus updating your existing data.frame
df.list <- lapply(df.list, ChangeType)

R Appending Columns to Dataset Misnamed

Edit: Clarity
When I append a new column to a existing data.frame, the title of the columns are incorrect. In summary.myData, the last two columns "Measure" and "Measure" should say "plus" and "minus" respectively.
This is tied in with another question I had, where I ask about how to correctly reference a column in a Tk/R GUI I am working on.
Parent Question
myData:
Group Subgroup Measure
1 A 1 0.234213
2 A 1 0.046248
3 A 1 0.391376
4 A 2 0.911849
5 A 2 0.729955
6 A 2 0.991110
7 A 2 0.378422
8 A 3 0.898037
9 A 3 0.258884
10 A 3 NA
11 A 3 0.057631
12 A 3 0.745202
13 A 3 0.121376
14 B 1 0.385198
15 B 1 0.484399
16 B 1 0.115034
17 B 1 0.073629
18 B 1 0.456150
19 B 2 0.336108
20 B 2 0.845458
21 B 2 0.267494
22 B 3 0.536123
23 B 3 1.331731
24 B 3 0.505114
25 B 3 0.843348
26 B 3 0.827932
27 B 3 0.813351
28 C 1 0.095587
29 C 1 0.158822
30 C 1 0.392376
31 C 1 0.284625
32 C 2 0.898819
33 C 2 0.743428
34 C 2 0.298989
35 C 2 0.423961
36 C 3 0.868351
37 C 3 0.181547
38 C 3 1.146131
39 C 3 0.234941
Append script:
summary.myData<-summarySE(myData, measurevar=paste(tx.choice1), groupvars=paste(tx.choice2),conf.interval=0.95,na.rm=TRUE,.drop=FALSE)
summary.myData$plus<-summary.myData[3]-summary.myData[6]
summary.myData$minus<-summary.myData[3]+summary.myData[6]
Result:
Group N Measure sd se ci Measure Measure
1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
The problem you're running into is that you've assigned $plus and $minus to data.frames, rather than atomic vectors. So when printing, R is showing the column name in the embedded data.frame ('Measure' in both cases), rather than the name of the list component ('plus' and 'minus').
str(summary.myData);
## 'data.frame': 3 obs. of 8 variables:
## $ Group : Factor w/ 3 levels "A","B","C": 1 2 3
## $ N : num 12 14 12
## $ Measure: num 0.48 0.559 0.477
## $ sd : num 0.354 0.341 0.347
## $ se : num 0.1022 0.0912 0.1
## $ ci : num 0.225 0.197 0.22
## $ plus :'data.frame': 3 obs. of 1 variable:
## ..$ Measure: num 0.255 0.362 0.257
## $ minus :'data.frame': 3 obs. of 1 variable:
## ..$ Measure: num 0.705 0.756 0.697
summary.myData;
## Group N Measure sd se ci Measure Measure
## 1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
Replace the assignments with
summary.myData$plus <- summary.myData[,3]-summary.myData[,6];
summary.myData$minus <- summary.myData[,3]+summary.myData[,6];
Then you get:
str(summary.myData);
## 'data.frame': 3 obs. of 8 variables:
## $ Group : Factor w/ 3 levels "A","B","C": 1 2 3
## $ N : num 12 14 12
## $ Measure: num 0.48 0.559 0.477
## $ sd : num 0.354 0.341 0.347
## $ se : num 0.1022 0.0912 0.1
## $ ci : num 0.225 0.197 0.22
## $ plus : num 0.255 0.362 0.257
## $ minus : num 0.705 0.756 0.697
summary.myData;
## Group N Measure sd se ci plus minus
## 1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
The key here is the different indexing style. When you use 1D indexing, you're actually treating the data.frame as a list (which it is internally), and so the index operation returns the specified list components, still classed as a data.frame. When you use 2D indexing, you index the rows and columns separately, which allows you to extract a 2D "subtable" of the data.frame. But when you only specify one column, the default behavior (drop=T) is for the column to be returned as an atomic vector, rather than as a one-column data.frame. You can change this with drop=F.
summary.myData[3];
## Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
summary.myData[,3];
## [1] 0.4803586 0.5586478 0.4772981
summary.myData[,3,drop=F];
## Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981

Resources