I'm using R and need merge data with different lenghts
Following this dataset
> means2012
# A tibble: 232 x 2
exporter eci
<fct> <dbl>
1 ABW 0.235
2 AFG -0.850
3 AGO -1.40
4 AIA 1.34
5 ALB -0.480
6 AND 1.22
7 ANS 0.662
8 ARE 0.289
9 ARG 0.176
10 ARM 0.490
# ... with 222 more rows
> means2013
# A tibble: 234 x 2
exporter eci
<fct> <dbl>
1 ABW 0.534
2 AFG -0.834
3 AGO -1.26
4 AIA 1.47
5 ALB -0.498
6 AND 1.13
7 ANS 0.616
8 ARE 0.267
9 ARG 0.127
10 ARM 0.0616
# ... with 224 more rows
> str(means2012)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 232 obs. of 2 variables:
$ exporter: Factor w/ 242 levels "ABW","AFG","AGO",..: 1 2 3 4 5 6 7 9 10 11 ...
$ eci : num 0.235 -0.85 -1.404 1.337 -0.48 ...
> str(means2013)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 234 obs. of 2 variables:
$ exporter: Factor w/ 242 levels "ABW","AFG","AGO",..: 1 2 3 4 5 6 7 9 10 11 ...
$ eci : num 0.534 -0.834 -1.263 1.471 -0.498 ...
Note that 2 tibble has different lenghts. "Exporter" are countries.
Is there any way to merge both tibble, looking to the factors (Exporter) and fill the missing it with "na"?
It doesn't matter if is a tibble, dataframe, or other kind.
like this:
tibble 1
a 5
b 10
c 15
d 25
tibble 2
a 7
c 23
d 20
merged one:
a 5 7
b 10 na
c 15 23
d 25 20
using merge with parameter all set to TRUE:
tibble1 <- read.table(text="
x y
a 5
b 10
c 15
d 25",header=TRUE,stringsAsFactors=FALSE)
tibble2 <- read.table(text="
x z
a 7
c 23
d 20",header=TRUE,stringsAsFactors=FALSE)
merge(tibble1,tibble2,all=TRUE)
x y z
1 a 5 7
2 b 10 NA
3 c 15 23
4 d 25 20
Or dplyr::full_join(tibble1,tibble2) for the same effect
You could rename the colums to join them, and get NA where the other value is missing.
library(tidyverse)
means2012 %>%
rename(eci2012 = eci) %>%
full_join(means2013 %>%
rename(eci2013 = eci))
But a tidier approach would be to add a year column, keep the column eci as is and just bind the rows together.
means2012 %>%
mutate(year = 2012) %>%
bind_rows(means2013 %>%
mutate(year = 2013))
Related
I am trying to spread my data such that months are the columns associated with both site and spx. I tried to use recast but I lose the informaton about species. What do I do to get the expected output (attached)?
set.seed(111)
month <- rep(c("J","F","M"), each = 6)
site <- rep(c(1,2,3,4,5,6), times = 3)
spA <- rnorm(18,0,2)
spB <- rnorm(18,0,2)
spC <- rnorm(18,0,2)
spD <- rnorm(18,0,2)
df <- data.frame(month, site, spA, spB, spC, spD)
df.test <- reshape2::recast(df, site ~ month)
Here is what I am getting.
site F J M
1 1 5 5 5
2 2 5 5 5
3 3 5 5 5
4 4 5 5 5
5 5 5 5 5
6 6 5 5 5
#Expected output (It's dummy data)
site sp J F M
1 A 5 6 7
1 B 2 3 4
..
6 D 1 2 3
If the intention is not to aggregate, but just transpose, then we can use pivot_longer to reshape to long and then reshape back to wide with pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('sp'), names_prefix = 'sp',
names_to = 'sp') %>%
pivot_wider(names_from = month, values_from = value)
-output
# A tibble: 24 × 5
site sp J F M
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 0.470 -2.99 3.69
2 1 B -2.39 0.653 -6.23
3 1 C -0.232 -2.72 4.97
4 1 D 0.350 -0.433 0.405
5 2 A -0.661 -2.02 0.788
6 2 B 0.728 1.20 -1.88
7 2 C 0.669 0.962 3.92
8 2 D -1.69 2.89 -1.61
9 3 A -0.623 -1.90 1.60
10 3 B 0.723 -3.68 2.80
# … with 14 more rows
Or using recast - specify the id.var and then include the variable also in the formula
library(reshape2)
reshape2::recast(df, site + variable ~ month, id.var = c("month", "site"))
site variable F J M
1 1 spA -2.99485331 0.4704414 3.6912725
2 1 spB 0.65309848 -2.3872179 -6.2264346
3 1 spC -2.72380897 -0.2323101 4.9713231
4 1 spD -0.43285732 0.3501913 0.4046144
5 2 spA -2.02037684 -0.6614717 0.7881082
6 2 spB 1.19650840 0.7283735 -1.8827148
7 2 spC 0.96224916 0.6685120 3.9199634
8 2 spD 2.89295633 -1.6945355 -1.6123984
9 3 spA -1.89695121 -0.6232476 1.5950570
10 3 spB -3.68306860 0.7233249 2.8005176
11 3 spC 1.48394325 -1.2417162 0.3833268
12 3 spD 0.81941960 1.9564633 0.5892684
13 4 spA -0.98792443 -4.6046913 -3.1333307
14 4 spB 5.43611120 0.6939287 -3.2409401
15 4 spC 0.05564925 -2.6196898 3.1050885
16 4 spD 1.82183314 3.6117365 2.8097662
...
I have a question I can understand or solve. I downloaded GSE115262 From GEO. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115262. I want to extract the gene names from GSM3172784HC$annotation.gene_name. When I do this, I get numbers not the gene names. How do I get the character values? If I run Str(), this is what I get $ annotation.gene_name : Factor w/ 56233 levels "5_8S_rRNA","5S_rRNA",..: 53514 52750 11836 48738. We see I get numbers. If I run head() and look at the GSM3172784HC$annotation.gene_name, I get the gene names, this is what I want. How do I get these?
#### Need to load in all libraries
#General Bioconductor packages
library("GEOquery");
library("Biobase");
# Loop Through Files for download
for(i in 1:length(tmp$V1)){
getGEOSuppFiles(tmp$V1[i])
};
######## Healthy Controls GSE115262 ##########
## May need to read thing mult. times to get into R
GSM3172784HC<-read.table(gzfile("FilePath.txt.gz"), header=T)
## New data-frame
HCData<- cbind(GSM3172784HC$annotation.gene_name, GSM3172784HC$expected_count);
HCData<- as.data.frame(HCData)
row.names(HCData) <- HCData$V1
colnames(HCData) <- c("HC1")
str(GSM3172784HC)
'data.frame': 57955 obs. of 11 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ annotation.gene_id : Factor w/ 57955 levels "ENSG00000000003",..: 1 2 3 4 5 6 7 8 9 10 ...
$ annotation.gene_biotype: Factor w/ 43 levels "3prime_overlapping_ncRNA",..: 20 20 20 20 20 20 20 20 20 20 ...
$ annotation.gene_name : Factor w/ 56233 levels "5_8S_rRNA","5S_rRNA",..: 53514 52750 11836 48738 5916 13731 7375 14125 14433 24521 ...
$ annotation.source : Factor w/ 4 levels "ensembl","ensembl_havana",..: 2 2 2 2 2 2 2 2 2 2 ...
$ transcript_id.s. : Factor w/ 57955 levels "ENST00000000233,ENST00000415666,ENST00000459680,ENST00000463733,ENST00000467281,ENST00000489673",..: 17666 17669 17397 16695 5799 17850 14301 7 1276 12553 ...
$ length : num 1749 940 1073 1538 2430 ...
$ effective_length : num 1623 814 947 1412 2304 ...
$ expected_count : num 0 0 1 1 0 2 2 0 1 1 ...
$ TPM : num 0 0 0.27 0.18 0 0.23 0.07 0 0.65 0.17 ...
$ FPKM : num 0 0 0.41 0.27 0 0.35 0.11 0 0.98 0.25 ...
head(GSM3172784HC)
X annotation.gene_id annotation.gene_biotype annotation.gene_name
1 1 ENSG00000000003 protein_coding TSPAN6
2 2 ENSG00000000005 protein_coding TNMD
3 3 ENSG00000000419 protein_coding DPM1
4 4 ENSG00000000457 protein_coding SCYL3
5 5 ENSG00000000460 protein_coding C1orf112
6 6 ENSG00000000938 protein_coding FGR
annotation.source
1 ensembl_havana
2 ensembl_havana
3 ensembl_havana
4 ensembl_havana
5 ensembl_havana
6 ensembl_havana
transcript_id.s.
1 ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008
2 ENST00000373031,ENST00000485971
3 ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752
4 ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238
5 ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289
6 ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472
length effective_length expected_count TPM FPKM
1 1749.40 1623.17 0 0.00 0.00
2 940.50 814.28 0 0.00 0.00
3 1073.00 946.77 1 0.27 0.41
4 1538.00 1411.77 1 0.18 0.27
5 2430.11 2303.88 0 0.00 0.00
6 2350.00 2223.77 2 0.23 0.35
We can convert the column to character
library(dplyr)
GSM3172784HC <- GSM3172784HC %>%
mutate_if(is.factor, as.character)
Or with mutate/across
GSM3172784HC <- GSM3172784HC %>%
mutate(across(where(is.factor), as.character))
In base R, we can do
i1 <- sapply(GSM3172784HC, is.factor)
GSM3172784HC[i1] <- lapply(GSM3172784HC[i1], as.character)
NOTE: With R >= 4.0.0, by default stringsAsFactors = FALSE
I'm looking for an efficient way to create multiple 2-dimension tables from an R dataframe of chi-square statistics. The code below builds on this answer to a previous question of mine about getting chi-square stats by groups. Now I want to create tables from the output by group. Here's what I have so far using the hsbdemo data frame from the UCLA R site:
ml <- foreign::read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
str(ml)
'data.frame': 200 obs. of 13 variables:
$ id : num 45 108 15 67 153 51 164 133 2 53 ...
$ female : Factor w/ 2 levels "male","female": 2 1 1 1 1 2 1 1 2 1 ...
$ ses : Factor w/ 3 levels "low","middle",..: 1 2 3 1 2 3 2 2 2 2 ...
$ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
$ prog : Factor w/ 3 levels "general","academic",..: 3 1 3 3 3 1 3 3 3 3 ...
ml %>%
dplyr::select(prog, ses, schtyp) %>%
table() %>%
apply(3, chisq.test, simulate.p.value = TRUE) %>%
lapply(`[`, c(6,7,9)) %>%
reshape2::melt() %>%
tidyr::spread(key = L2, value = value) %>%
dplyr::rename(SchoolType = L1) %>%
dplyr::arrange(SchoolType, prog) %>%
dplyr::select(-observed, -expected) %>%
reshape2::acast(., prog ~ ses ~ SchoolType ) %>%
tbl_df()
The output after the last arrange statement produces this tibble (showing only the first five rows):
prog ses SchoolType expected observed stdres
1 general low private 0.37500 2 3.0404678
2 general middle private 3.56250 3 -0.5187244
3 general high private 2.06250 1 -1.0131777
4 academic low private 1.50000 0 -2.5298221
5 academic middle private 14.25000 14 -0.2078097
It's easy to select one column, for example, stdres, and pass it to acast and tbl_df, which gets pretty much what I'm after:
# A tibble: 3 x 6
low.private middle.private high.private low.public middle.public high.public
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3.04 -0.519 -1.01 1.47 -0.236 -1.18
2 -2.53 -0.208 1.50 -0.940 -2.06 3.21
3 -0.377 1.21 -1.06 -0.331 2.50 -2.45
Now I can repeat these steps for observed and expected frequencies and bind them by rows, but that seems inefficient. The output would observed frequencies stacked on expected, stacked on the standardized residuals. Something like this:
low.private middle.private high.private low.public middle.public high.public
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 3 1 14 17 8
2 0 14 10 19 30 32
3 0 2 0 12 29 7
4 0.375 3.56 2.06 10.4 17.6 10.9
5 1.5 14.2 8.25 21.7 36.6 22.7
6 0.125 1.19 0.688 12.9 21.7 13.4
7 3.04 -0.519 -1.01 1.47 -0.236 -1.18
8 -2.53 -0.208 1.50 -0.940 -2.06 3.21
9 -0.377 1.21 -1.06 -0.331 2.50 -2.45
Seems there ought to be a way to do this without repeating the code for each column, probably by creating and processing a list. Thanks in advance.
Might this be the answer?
ml1 <- ml %>%
dplyr::select(prog, ses, schtyp) %>%
table() %>%
apply(3, chisq.test, simulate.p.value = TRUE) %>%
lapply(`[`, c(6,7,9)) %>%
reshape2::melt()
ml2 <- ml1 %>%
dplyr::mutate(type=paste(ses, L1, sep=".")) %>%
dplyr::select(-ses, -L1) %>%
tidyr::spread(type, value)
This gives you
prog L2 high.private high.public low.private low.public middle.private middle.public
1 general expected 2.062500 10.910714 0.3750000 10.4464286 3.5625000 17.6428571
2 general observed 1.000000 8.000000 2.0000000 14.0000000 3.0000000 17.0000000
3 general stdres -1.013178 -1.184936 3.0404678 1.4663681 -0.5187244 -0.2360209
4 academic expected 8.250000 22.660714 1.5000000 21.6964286 14.2500000 36.6428571
5 academic observed 10.000000 32.000000 0.0000000 19.0000000 14.0000000 30.0000000
6 academic stdres 1.504203 3.212431 -2.5298221 -0.9401386 -0.2078097 -2.0607058
7 vocation expected 0.687500 13.428571 0.1250000 12.8571429 1.1875000 21.7142857
8 vocation observed 0.000000 7.000000 0.0000000 12.0000000 2.0000000 29.0000000
9 vocation stdres -1.057100 -2.445826 -0.3771236 -0.3305575 1.2081594 2.4999085
I am not sure I understand completely what you are out after... But basically, create a new variable of SES and school type, and gather based on that. And obviously, reorder it as you wish :-)
Did some research on this and only found information on reading in multiple CSV files.
I'm trying to create a widget where I can read in a CSV file with data sets and print as many graphs as there are data sets.
But I was trying to brainstorm a means of reading in a CSV with multiple data sets inputted vertically. However, I won't know the length of each data set and I won't know how many data sets would be present.
Any ideas or concepts to consider would be appreciated.
# Create sample data
unlink("so-data.csv") # remove it if it exists
set.seed(1492) # reproducible
# make 3 data frames of different lengths
frames <- lapply(c(3, 10, 5), function(n) {
data.frame(X = runif(n), Y1 = runif(n), Y2= runif(n))
})
# write them to single file preserving the header
suppressWarnings(
invisible(
lapply(frames, write.table, file="so-data.csv", sep=",", quote=FALSE,
append=TRUE, row.names=FALSE)
)
)
That file looks like:
"X","Y1","Y2"
0.277646409813315,0.110495456494391,0.852662623859942
0.21606229362078,0.0521760624833405,0.510357670951635
0.184417578391731,0.00824321852996945,0.390395383816212
"X","Y1","Y2"
0.769067857181653,0.916519832098857,0.971386880846694
0.6415081594605,0.63678711745888,0.148033464793116
0.638599780155346,0.381162445060909,0.989824152784422
0.194932354846969,0.132614633999765,0.845784503268078
0.522090089507401,0.599085820373148,0.218151196138933
0.521618122234941,0.0903550288639963,0.983936473494396
0.792095972690731,0.932019826257601,0.703315682942048
0.12338977586478,0.584303047973663,0.421113619813696
0.343668724410236,0.561827397439629,0.111441049026325
0.660837838426232,0.345943035557866,0.0270762923173606
"X","Y1","Y2"
0.309987690066919,0.441982284653932,0.133840701542795
0.747786369873211,0.240106994053349,0.62044994905591
0.789473889162764,0.853503877297044,0.150850139558315
0.165826949058101,0.119402598123997,0.318282842403278
0.39083837531507,0.109747459646314,0.876092307968065
Now you can do:
# read in the data as lines
l <- readLines("so-data.csv")
# figure out where the individual data sets are
starts <- which(grepl("X", l))
ends <- c((starts[2:length(starts)]-1), length(l))
# read them in
new_frames <- mapply(function(start, end) {
read.csv(text=paste0(l[start:end], collapse="\n"), header=TRUE)
}, starts, ends, SIMPLIFY=FALSE)
str(new_frames)
## List of 3
## $ :'data.frame': 3 obs. of 3 variables:
## ..$ X : num [1:3] 0.278 0.216 0.184
## ..$ Y1: num [1:3] 0.1105 0.05218 0.00824
## ..$ Y2: num [1:3] 0.853 0.51 0.39
## $ :'data.frame': 10 obs. of 3 variables:
## ..$ X : num [1:10] 0.769 0.642 0.639 0.195 0.522 ...
## ..$ Y1: num [1:10] 0.917 0.637 0.381 0.133 0.599 ...
## ..$ Y2: num [1:10] 0.971 0.148 0.99 0.846 0.218 ...
## $ :'data.frame': 5 obs. of 3 variables:
## ..$ X : num [1:5] 0.31 0.748 0.789 0.166 0.391
## ..$ Y1: num [1:5] 0.442 0.24 0.854 0.119 0.11
## ..$ Y2: num [1:5] 0.134 0.62 0.151 0.318 0.876
As #Oriol Mirosa mentioned in the comments, this is one way you can do it. You can first read the whole file:
df = read.csv("path", header = TRUE)
Assuming below is how the whole csv file is structured:
df = data.frame(X=c(1:10, "X", 1:20, "X", 1:30),
Y=c(1:10, "Y", 1:20, "Y", 1:30),
Z=c(1:10, "Z", 1:20, "Z", 1:30))
df$newset = ifelse(df$X == "X", 1, 0)
df$newset = as.factor(cumsum(df$newset))
dfs = split(df, df$newset)
dfs[-1] = lapply(dfs[-1], function(x) x[-1,-ncol(x)])
dfs[[1]] = dfs[[1]][,-ncol(dfs[[1]])]
I created a binary variable newset indicating whether a row is a "header". Then, used cumsum to populate each "dataset" with a unique number. I then split() on newset to create a list of datasets with each element containing one. Finally, I removed the first row of each dataset and made them the column names as desired. This should work no matter the length of each dataset.
Result:
# $`0`
# X Y Z
# 1 1 1 1
# 2 2 2 2
# 3 3 3 3
# 4 4 4 4
# 5 5 5 5
# 6 6 6 6
# 7 7 7 7
# 8 8 8 8
# 9 9 9 9
# 10 10 10 10
#
# $`1`
# X Y Z
# 12 1 1 1
# 13 2 2 2
# 14 3 3 3
# 15 4 4 4
# 16 5 5 5
# 17 6 6 6
# 18 7 7 7
# 19 8 8 8
# 20 9 9 9
# 21 10 10 10
# 22 11 11 11
# 23 12 12 12
# 24 13 13 13
# 25 14 14 14
# 26 15 15 15
# 27 16 16 16
# 28 17 17 17
# 29 18 18 18
# 30 19 19 19
# 31 20 20 20
#
# $`2`
# X Y Z
# 33 1 1 1
# 34 2 2 2
# 35 3 3 3
# 36 4 4 4
# 37 5 5 5
# 38 6 6 6
# 39 7 7 7
# 40 8 8 8
# 41 9 9 9
# 42 10 10 10
# 43 11 11 11
# 44 12 12 12
# 45 13 13 13
# 46 14 14 14
# 47 15 15 15
# 48 16 16 16
# 49 17 17 17
# 50 18 18 18
# 51 19 19 19
# 52 20 20 20
# 53 21 21 21
# 54 22 22 22
# 55 23 23 23
# 56 24 24 24
# 57 25 25 25
# 58 26 26 26
# 59 27 27 27
# 60 28 28 28
# 61 29 29 29
# 62 30 30 30
Edit: Clarity
When I append a new column to a existing data.frame, the title of the columns are incorrect. In summary.myData, the last two columns "Measure" and "Measure" should say "plus" and "minus" respectively.
This is tied in with another question I had, where I ask about how to correctly reference a column in a Tk/R GUI I am working on.
Parent Question
myData:
Group Subgroup Measure
1 A 1 0.234213
2 A 1 0.046248
3 A 1 0.391376
4 A 2 0.911849
5 A 2 0.729955
6 A 2 0.991110
7 A 2 0.378422
8 A 3 0.898037
9 A 3 0.258884
10 A 3 NA
11 A 3 0.057631
12 A 3 0.745202
13 A 3 0.121376
14 B 1 0.385198
15 B 1 0.484399
16 B 1 0.115034
17 B 1 0.073629
18 B 1 0.456150
19 B 2 0.336108
20 B 2 0.845458
21 B 2 0.267494
22 B 3 0.536123
23 B 3 1.331731
24 B 3 0.505114
25 B 3 0.843348
26 B 3 0.827932
27 B 3 0.813351
28 C 1 0.095587
29 C 1 0.158822
30 C 1 0.392376
31 C 1 0.284625
32 C 2 0.898819
33 C 2 0.743428
34 C 2 0.298989
35 C 2 0.423961
36 C 3 0.868351
37 C 3 0.181547
38 C 3 1.146131
39 C 3 0.234941
Append script:
summary.myData<-summarySE(myData, measurevar=paste(tx.choice1), groupvars=paste(tx.choice2),conf.interval=0.95,na.rm=TRUE,.drop=FALSE)
summary.myData$plus<-summary.myData[3]-summary.myData[6]
summary.myData$minus<-summary.myData[3]+summary.myData[6]
Result:
Group N Measure sd se ci Measure Measure
1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
The problem you're running into is that you've assigned $plus and $minus to data.frames, rather than atomic vectors. So when printing, R is showing the column name in the embedded data.frame ('Measure' in both cases), rather than the name of the list component ('plus' and 'minus').
str(summary.myData);
## 'data.frame': 3 obs. of 8 variables:
## $ Group : Factor w/ 3 levels "A","B","C": 1 2 3
## $ N : num 12 14 12
## $ Measure: num 0.48 0.559 0.477
## $ sd : num 0.354 0.341 0.347
## $ se : num 0.1022 0.0912 0.1
## $ ci : num 0.225 0.197 0.22
## $ plus :'data.frame': 3 obs. of 1 variable:
## ..$ Measure: num 0.255 0.362 0.257
## $ minus :'data.frame': 3 obs. of 1 variable:
## ..$ Measure: num 0.705 0.756 0.697
summary.myData;
## Group N Measure sd se ci Measure Measure
## 1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
Replace the assignments with
summary.myData$plus <- summary.myData[,3]-summary.myData[,6];
summary.myData$minus <- summary.myData[,3]+summary.myData[,6];
Then you get:
str(summary.myData);
## 'data.frame': 3 obs. of 8 variables:
## $ Group : Factor w/ 3 levels "A","B","C": 1 2 3
## $ N : num 12 14 12
## $ Measure: num 0.48 0.559 0.477
## $ sd : num 0.354 0.341 0.347
## $ se : num 0.1022 0.0912 0.1
## $ ci : num 0.225 0.197 0.22
## $ plus : num 0.255 0.362 0.257
## $ minus : num 0.705 0.756 0.697
summary.myData;
## Group N Measure sd se ci plus minus
## 1 A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
The key here is the different indexing style. When you use 1D indexing, you're actually treating the data.frame as a list (which it is internally), and so the index operation returns the specified list components, still classed as a data.frame. When you use 2D indexing, you index the rows and columns separately, which allows you to extract a 2D "subtable" of the data.frame. But when you only specify one column, the default behavior (drop=T) is for the column to be returned as an atomic vector, rather than as a one-column data.frame. You can change this with drop=F.
summary.myData[3];
## Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
summary.myData[,3];
## [1] 0.4803586 0.5586478 0.4772981
summary.myData[,3,drop=F];
## Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981