How to sum up durations if certain patterns are found across columns - r

I have a dataframe with words and their durations in speech:
test1
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
10 0.103 0.168 0.198 0.188 0.359 0.343 0.064 0.075 0.095 0.367 And I thought oh no Sarah do n't do it
132 0.091 0.072 0.109 0.119 0.113 0.087 0.088 0.264 0.092 0.249 I du n no you ca n't see his head
784 0.152 0.341 0.117 0.108 0.123 0.263 0.083 0.095 0.099 0.098 Oh honestly I did n't touch it I did n't
The short form n't is treated as if it were a separate word. That's okay as long as the preceding word ends on a consonant such as did, but that's not okay if the preceding word ends on a vowel such do or ca. Because that separation into different words is incorrect the separation into different durations is incorrect too.
What I'd like to do is sum up the durations of ca and n't as well as doand n't but leave alone the separate durations for did and n't.
I know how to select the rows where the changes need to be implemented:
test1[which(grepl("(?<=(ca|do)\\s)n't", apply(test1, 1, paste0, collapse = " "), perl = T)),]
but I'm stuck going forward.
The desired result would look like this:
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
10 0.103 0.168 0.198 0.188 0.359 0.343 0.139 0.095 0.367 NA And I thought oh no Sarah do n't do it
132 0.091 0.072 0.109 0.119 0.113 0.175 0.264 0.092 0.249 NA I du n no you ca n't see his head
784 0.152 0.341 0.117 0.108 0.123 0.263 0.083 0.095 0.099 0.098 Oh honestly I did n't touch it I did n't
How can this be done? Help is much appreciated.
Reproducible data:
test1 <- structure(list(d1 = c(0.103, 0.091, 0.152), d2 = c(0.168, 0.072,
0.341), d3 = c(0.198, 0.109, 0.117), d4 = c(0.188, 0.119, 0.108
), d5 = c(0.359, 0.113, 0.123), d6 = c(0.343, 0.087, 0.263),
d7 = c(0.064, 0.088, 0.083), d8 = c(0.075, 0.264, 0.095),
d9 = c(0.095, 0.092, 0.099), d10 = c(0.367, 0.249, 0.098),
w1 = c("And", "I", "Oh"), w2 = c("I", "du", "honestly"),
w3 = c("thought", "n", "I"), w4 = c("oh", "no", "did"), w5 = c("no",
"you", "n't"), w6 = c("Sarah", "ca", "touch"), w7 = c("do",
"n't", "it"), w8 = c("n't", "see", "I"), w9 = c("do", "his",
"did"), w10 = c("it", "head", "n't")), row.names = c(10L,
132L, 784L), class = "data.frame")

I think this is best done with data in long instead of wide format so you can take advantage of grouping operations:
library(dplyr)
library(tidyr)
library(tibble)
test1 %>%
rownames_to_column() %>%
pivot_longer(-rowname, names_to = c(".value", "number"), names_pattern = "(\\D)(\\d+)") %>%
group_by(rowname) %>%
mutate(wid = cumsum(!(lag(w) %in% c("ca", "do") & w == "n't"))) %>%
group_by(rowname, wid) %>%
summarise(d = sum(d),
w = paste0(w, collapse = "")) %>%
pivot_wider(names_from = wid, values_from = c(d, w), names_sep = "")
`summarise()` regrouping output by 'rowname' (override with `.groups` argument)
# A tibble: 3 x 21
# Groups: rowname [3]
rowname d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 0.103 0.168 0.198 0.188 0.359 0.343 0.139 0.095 0.367 NA And I thought oh no Sarah don't do it NA
2 132 0.091 0.072 0.109 0.119 0.113 0.175 0.264 0.092 0.249 NA I du n no you can't see his head NA
3 784 0.152 0.341 0.117 0.108 0.123 0.263 0.083 0.095 0.099 0.098 Oh honestly I did n't touch it I did n't

Related

r arrange data nested wide format

I have a dataset like this
Time1 Time2 Time3
A
Median 0.046 0.12 0
Q1, Q3 -0.12, 0.22 -1.67, -4.59 -0.245, 0.289
Range -2.75 -4.65 -2.20 - 1.425 -3.12, -1.928
B
Median 0.016 0.42 0.067
Q1, Q3 -0.21, 0.63 -1.17, -2.98 -0.478, 0.187
Range -2.15 -2.15 -1.12 - 1.125 -1.45, -1.478
What I want is to make this look like this
Time1 Time2 Time3
Median Q1,Q3 Range Median Q1,Q3 Range Median Q1,Q3 Range
A 0.046 -0.12, 0.22 2.75 -4.65 0.12 -1.67, -4.59 -2.20 - 1.425 0 -0.245, 0.289 -3.12, -1.928
B 0.016 -0.21, 0.63 -2.15 -2.15 0.42 -1.17, -2.98 -1.12 - 1.125 0.067 -0.478, 0.187 -1.45, -1.478
I have used spread function before to change long to wide, not sure how to turn this into a nested wide. Any suggestions is much appreciated.
df <- structure(list(Col1 = c("A", "Median", "Q1, Q3", "Range", "B",
"Median", "Q1, Q3", "Range"), Time1 = c("", "0.046", "-0.12, 0.22",
"-2.75 -4.65", "", "0.016", "-0.21, 0.63", "-2.15 -2.15"), Time2 = c("",
"0.12", "-1.67, -4.59", "-2.20 - 1.425", "", "0.42", "-1.17, -2.98",
"-1.12 - 1.125"), Time3 = c("", "0 ", "-0.245, 0.289 ",
"-3.12, -1.928", "", "0.067 ", "-0.478, 0.187 ", "-1.45, -1.478"
)), class = "data.frame", row.names = c(NA, -8L))
Here is a potential solution, see comments for the step by step.
library(tidyr)
#find rows containing the ids
namerows <- which(df$Time1=="")
#create and fill in the id column
df$id <- ifelse(df$Time1=="", df$Col1, NA)
df <- fill(df, id, .direction="down")
#clean up the dataframe
df <- df[-namerows, ]
#pivot
pivot_wider(df, id_cols = "id", names_from = "Col1", values_from = starts_with("Time"))
The result:
# A tibble: 2 × 10
id Time1_Median `Time1_Q1, Q3` Time1_Range Time2_Median `Time2_Q1, Q3` Time2_Range Time3_Median `Time3_Q1, Q3` Time3_Range
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A 0.046 -0.12, 0.22 -2.75 -4.65 0.12 -1.67, -4.59 -2.20 - 1.425 "0 " "-0.245, 0.289 " -3.12, -1.928
2 B 0.016 -0.21, 0.63 -2.15 -2.15 0.42 -1.17, -2.98 -1.12 - 1.125 "0.067 " "-0.478, 0.187 " -1.45, -1.478

Divide matrix values by category means in R

I have a matrix (A) containing 211 rows and 6 columns (one per time period) and a different matrix (B) containing 211 rows and 2 columns, the second of which contains categorial information (1-9).
My aim is to create a new matrix (C) where each value in matrix A is the value(A) divided by the mean of (value(A) by category(B)). I managed to compute the means for each category per column with the aggregate function. These are stored in a separate dataframe, column_means, with each time wave in a separate column. This also contains the information about the group in column_means[,1].
I don't understand how to proceed from here and am looking for an elegant solution so I can transfer this knowledge to future projects (and possibly improve my existing code). My guess is that the solution is hidden somewhere in dplyr and rather simple once you know it.
Thank you for any suggestions.
Data example:
##each column here represents a wave:
initialmatrix <- structure(c(0.882647671948723, 0.847932241438909, 0.753052308699317,
0.754977233408875, NA, 0.886095543329695, 0.849625252682829,
0.78893884364632, 0.77111113840682, NA, 0.887255207679895, 0.851503493865384,
0.812107856411831, 0.793982699495818, NA, 0.885212452552841,
0.854894065774315, 0.815265718290737, 0.806766276556325, NA,
0.882027335190646, 0.85386634818439, 0.818052477777012, 0.815997781565393,
NA, 0.88245957310107, 0.855819521951304, 0.830425687228663, 0.820857689847061,
NA), .Dim = 5:6, .Dimnames = list(NULL, c("V1", "V2", "V3", "V4",
"V5", "V6")))
##the first column is unique ID, the 2nd the category:
categories <- structure(c(1L, 2L, 3L, 4L, 5L, 2L, 1L, 2L, 2L, 4L), .Dim = c(5L,
2L), .Dimnames = list(NULL, c("V1", "V2")))
##the first column represents the category, column 1-6 the mean per category for each corresponding wave in "initialmatrix"
column.means <- structure(list(Group.1 = 1:5, x = c(0.805689153058216, 0.815006230419524,
0.832326976776262, 0.794835253329865, 0.773041961434791), asset_means_2...2. = c(0.80050960343197,
0.81923553710203, 0.833814773618545, 0.797834687980729, 0.780028077018158
), asset_means_3...2. = c(0.805053341257357, 0.828691564900149,
0.833953165695685, 0.799381078569563, 0.785813047374534), asset_means_4...2. = c(0.806116664276125,
0.832439754757116, 0.835982197159582, 0.801702200401293, 0.788814840753852
), asset_means_5...2. = c(0.807668548993891, 0.83801834926905,
0.836036508152776, 0.803433961863399, 0.79014026195926), asset_means_6...2. = c(0.808800359101212,
0.840923947682599, 0.839660313992458, 0.804901773257962, 0.793165113115977
)), row.names = c(NA, 5L), class = "data.frame")
Is this what you are trying to do?
options(digits=3)
divisor <- column.means[categories[, 2], -1]
divisor
# x asset_means_2...2. asset_means_3...2. asset_means_4...2. asset_means_5...2. asset_means_6...2.
# 2 0.815 0.819 0.829 0.832 0.838 0.841
# 1 0.806 0.801 0.805 0.806 0.808 0.809
# 2.1 0.815 0.819 0.829 0.832 0.838 0.841
# 2.2 0.815 0.819 0.829 0.832 0.838 0.841
# 4 0.795 0.798 0.799 0.802 0.803 0.805
initialmatrix/divisor
# x asset_means_2...2. asset_means_3...2. asset_means_4...2. asset_means_5...2. asset_means_6...2.
# 2 1.083 1.082 1.071 1.063 1.053 1.049
# 1 1.052 1.061 1.058 1.061 1.057 1.058
# 2.1 0.924 0.963 0.980 0.979 0.976 0.988
# 2.2 0.926 0.941 0.958 0.969 0.974 0.976
# 4 NA NA NA NA NA NA
This looks like a job for Superma ... no wait ... map2.
library(dplyr)
library(purrr)
as_tibble(initialmatrix) %>%
mutate(category = as.double(as_tibble(categories)$V2),
across(starts_with('V'),
~ unlist(map2(., category, ~ .x/mean(c(.x, .y)))))) %>%
select(-category)
# V1 V2 V3 V4 V5 V6
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0.612 0.614 0.615 0.614 0.612 0.612
# 2 0.918 0.919 0.920 0.922 0.921 0.922
# 3 0.547 0.566 0.578 0.579 0.581 0.587
# 4 0.548 0.557 0.568 0.575 0.580 0.582
# 5 NA NA NA NA NA NA

Control the level of detail in a pivot in R (tidyverse) [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 2 years ago.
I have an extremely wide dataset that I am trying to unpivot to a degree but not completely. Essentially I am trying to group certain columns together based on a string before an underscore and pivot on those groups individually. My current method uses two opposite pivots, a for loop, and an intermediate list to accomplish my goal. I am able to get my final product but for my own knowledge, I am wondering if there is a more elegant solution. I realize that I am likely not explaining things well so I have recreated the scenario with a dummy dataset.
#Required packages
library(tidyverse)
#Dummy data
file <- as_tibble(data.frame(id = c("QQQ", "WWW", "EEE", "RRR", "TTT"),
state = c("aa", "bb", "cc", "dd", "ee"),
city = c("ff", "gg", "hh", "ii", "jj"),
a_1 = runif(5),
a_2 = runif(5),
a_3 = runif(5),
a_4 = runif(5),
a_5 = runif(5),
a_6 = runif(5),
a_7 = runif(5),
a_8 = runif(5),
a_9 = runif(5),
a_10 = runif(5),
b_1 = runif(5),
b_2 = runif(5),
b_3 = runif(5),
b_4 = runif(5),
b_5 = runif(5),
b_6 = runif(5),
b_7 = runif(5),
b_8 = runif(5),
b_9 = runif(5),
b_10 = runif(5),
c_1 = runif(5),
c_2 = runif(5),
c_3 = runif(5),
c_4 = runif(5),
c_5 = runif(5),
c_6 = runif(5),
c_7 = runif(5),
c_8 = runif(5),
c_9 = runif(5),
c_10 = runif(5)))
#My solution
longer <- file %>%
pivot_longer(cols = c(-id:-city),
names_to = c(".value", "section"),
names_pattern = "(.+)_([0-9]+$)"
)
num_letterGroup <- ncol(longer) - 4 #4 is the number of columns i want to retain
wide_list <- vector(mode = "list", length = num_letterGroup)
name_list <- vector(mode = "character", length = num_letterGroup)
for (i in 1:num_letterGroup) {
col_num <- 4 + i
col_name <- colnames(longer)[col_num]
wide <- longer %>%
select(1:4, all_of(col_name)) %>%
pivot_wider(names_from = section, values_from = col_name) %>%
mutate(letterGroup = col_name)
wide_list[[i]] <- wide
name_list[i] <- col_name
}
names(wide_list) <- name_list
wide_df <- bind_rows(wide_list)
I realize that the amount of data given might seem excessive but I needed the column numbers to be sequential as well as reach double digits. Thank you in advance for any assistance you can provide.
EDIT TO CLARIFY: wide_df is the final product that I want
EDIT
This is actually much simpler than the original answer. (Thanks to #thelatemail)
library(tidyr)
pivot_longer(file,
cols = -c(id:city),
names_to = c('letterGroup', '.value'),
names_sep = '_')
# A tibble: 15 x 14
# id state city letterGroup `1` `2` `3` `4` `5` `6` `7` `8` `9` `10`
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 QQQ aa ff a 0.894 0.534 0.583 0.327 0.497 0.254 0.877 0.236 0.585 0.436
# 2 QQQ aa ff b 0.861 0.897 0.244 0.292 0.818 0.428 0.732 0.322 0.702 0.158
# 3 QQQ aa ff c 0.371 0.842 0.918 0.615 0.346 0.675 0.821 0.718 0.461 0.374
# 4 WWW bb gg a 0.573 0.00886 0.555 0.810 0.480 0.763 0.624 0.0667 0.705 0.872
# 5 WWW bb gg b 0.994 0.652 0.961 0.825 0.398 0.0138 0.560 0.695 0.0171 0.704
# 6 WWW bb gg c 0.113 0.988 0.663 0.0461 0.335 0.478 0.291 0.338 0.386 0.183
# 7 EEE cc hh a 0.482 0.197 0.630 0.442 0.633 0.932 0.317 0.119 0.872 0.678
# 8 EEE cc hh b 0.834 0.378 0.504 0.911 0.644 0.976 0.777 0.485 0.470 0.560
# 9 EEE cc hh c 0.819 0.240 0.683 0.570 0.969 0.956 0.745 0.790 0.0548 0.314
#10 RRR dd ii a 0.887 0.818 0.0266 0.444 0.554 0.817 0.332 0.0801 0.966 0.252
#11 RRR dd ii b 0.416 0.211 0.931 0.105 0.948 0.555 0.201 0.656 0.794 0.526
#12 RRR dd ii c 0.652 0.897 0.741 0.254 0.815 0.154 0.422 0.361 0.925 0.696
#13 TTT ee jj a 0.391 0.626 0.358 0.296 0.804 0.743 0.655 0.000308 0.257 0.415
#14 TTT ee jj b 0.764 0.686 0.0174 0.460 0.0164 0.0718 0.700 0.558 0.341 0.411
#15 TTT ee jj c 0.812 0.995 0.845 0.513 0.987 0.249 0.429 0.749 0.557 0.369
Original Answer
You can get data completely in long format (no need for intermediate columns), separate the column names in two different columns and get the data in wide format.
file %>%
pivot_longer(cols = -c(id:city)) %>%
separate(name, into = c('letterGroup', 'col'), sep = "_") %>%
pivot_wider(names_from = col, values_from = value)
You can try this:
library(tidyr)
df1 <- pivot_longer(file,cols = names(file)[-c(1:3)]) %>%
separate(name,into = c('letter','number'),sep = '_') %>%
pivot_wider(names_from = number,values_from = value,id_cols = c() )
#Reshape
df2 <- reshape(as.data.frame(df1),idvar = c('id','state','city','letter'),timevar = 'number',direction = 'wide')
names(df2) <- gsub('value.','',names(df2))

Summing one particular column to n number of columns in every 2 and 3 possible combinations

I have a dataset of 240 columns and 146 rows. I am providing only first to chunks from the dataset with 5 rows
DF <- data.frame(
D1 = c(-0.253, 0.253, -0.951, 0.951, 0.501, -0.501),
D2 = c(-0.52, -0.52, 0.52, 0.52, -0.172, -0.172),
D3 = c(0.014, 0.014, 0.014, 0.014, -0.014, -0.014),
S3 = c(0.095, 0.095, 0.095, 0.095, 0.095, 0.095),
D1 = c(-0.966, 0.966, -0.647, 0.647, 0.905, -0.905),
D2 = c(-0.078, -0.078, 0.078, 0.078, -0.943, -0.943),
D3 = c(-0.046, -0.046, -0.046, -0.046, 0.046, 0.046),
S3 = c(0.07, 0.07, 0.07, 0.07, 0.07, 0.07)
)
I want to add every 4th column (i.e. S3) with the preceding 3 columns as the following combinations
D1+S3
D2+S3
D3+S3
D1+D2+S3
D1+D3+S3
Now in the new dataframe the columns should be
D1 D2 D3 S3 D1+S3 D2+S3 D3+S3 D1+D2+S3 D1+D3+S3 D1 D2 D3 S3 D1+S3 D2+S3 D3+S3 D1+D2+S3 D1+D3+S3
How to do it in R? Any help in this regard is highly appreciated.
In the following code I reshape your data frame so that it brings all the values into 4 columns. To distinguish between the original columns, I have added an ID column. After that the operation you want to do becomes easy.
library(tidyverse)
df <- read_table(
"D1 D2 D3 S3 D1 D2 D3 S3
-0.253 -0.520 0.014 0.095 -0.966 -0.078 -0.046 0.070
0.253 -0.520 0.014 0.095 0.966 -0.078 -0.046 0.070
-0.951 0.520 0.014 0.095 -0.647 0.078 -0.046 0.070
0.951 0.520 0.014 0.095 0.647 0.078 -0.046 0.070
0.501 -0.172 -0.014 0.095 0.905 -0.943 0.046 0.070
-0.501 -0.172 -0.014 0.095 -0.905 -0.943 0.046 0.070
")
i <- seq(1, ncol(df)-3, 4)
df_out <- map_dfr(i, ~select(df, seq(., .+3)) %>% set_names(c("D1", "D2", "D3", "S3")))
df_out %>%
mutate(d1d2s3 = D1 + D2 + D3,
d1d3s3 = D1 + D3 + D3,
id = rep(1:length(i), each = nrow(df))) %>%
mutate_at(1:3, ~.+S3) %>%
bind_cols(df_out, .)
If you want to return it to the original shape after that you can do the following.
df_out %>%
group_split(id) %>%
bind_cols()
Edit:
I have rewritten the code so as to work for a for a variable number of decompositions. You should just have to change n_decomp <- 3 to the appropriate number. It creates variables for all possible combinations of the decomposition variables with S3. So it'll escalate quickly with an increasing number of decompositions.
n_decomp <- 3
n_var <- n_decomp + 1
i <- seq(1, ncol(df), n_var)
df_names <- names(df[1:n_var])
df_out <-
map_dfr(i,
~select(df, seq(., .+n_decomp)) %>%
set_names(df_names)) %>%
mutate(id = rep(1:length(i), each = nrow(df)))
decomp_combn <- map(1:n_decomp,
~combn(df_names[1:n_decomp], .) %>%
as_tibble %>%
as.list) %>%
flatten() %>%
map(c, "S3")
decomp_combn %>%
map(~select(df_out, .)) %>%
set_names(map(., ~str_c(names(.), collapse = "_"))) %>%
map(~apply(., 1, sum)) %>%
as_tibble %>%
bind_cols(df_out, .)
Quite long but should work:
data<-read.csv("Decompositions_1.csv")
nc_input=ncol(data)
nc_output = (ncol(data)/4)*5
output <- data.frame(as.data.frame(matrix(0,ncol=nc_output,nrow=nrow(data))))
firsts=data[,seq(1,nc_input,4)]
seconds=data[,seq(2,nc_input,4)]
thirds=data[,seq(3,nc_input,4)]
fourths=data[,seq(4,nc_input,4)]
starts_ou=seq(1,nc_output,5)
subsets=1:length(starts_ou)
for(i in subsets) {
ou_index=starts_ou[i]
output[,ou_index]=firsts[i]+fourths[i]
output[,ou_index+1]=seconds[i]+fourths[i]
output[,ou_index+2]=thirds[i]+fourths[i]
output[,ou_index+3]=firsts[i]+thirds[i]+fourths[i]
output[,ou_index+4]=seconds[i]+thirds[i]+fourths[i]
}
A little late - but here is a data.table approach:
library(data.table)
DT <- data.table(
D1 = c(-0.253, 0.253, -0.951, 0.951, 0.501, -0.501),
D2 = c(-0.52, -0.52, 0.52, 0.52, -0.172, -0.172),
D3 = c(0.014, 0.014, 0.014, 0.014, -0.014, -0.014),
S3 = c(0.095, 0.095, 0.095, 0.095, 0.095, 0.095),
D1 = c(-0.966, 0.966, -0.647, 0.647, 0.905, -0.905),
D2 = c(-0.078, -0.078, 0.078, 0.078, -0.943, -0.943),
D3 = c(-0.046, -0.046, -0.046, -0.046, 0.046, 0.046),
S3 = c(0.07, 0.07, 0.07, 0.07, 0.07, 0.07)
)
DT[, c("D1+S3", "D2+S3", "D3+S3", "D1+D2+S3", "D1+D3+S3") := list(D1+S3, D2+S3, D3+S3, D1+D2+S3, D1+D3+S3)]
DT
D1 D2 D3 S3 D1 D2 D3 S3 D1+S3 D2+S3 D3+S3 D1+D2+S3 D1+D3+S3
1: -0.253 -0.520 0.014 0.095 -0.966 -0.078 -0.046 0.07 -0.158 -0.425 0.109 -0.678 -0.144
2: 0.253 -0.520 0.014 0.095 0.966 -0.078 -0.046 0.07 0.348 -0.425 0.109 -0.172 0.362
3: -0.951 0.520 0.014 0.095 -0.647 0.078 -0.046 0.07 -0.856 0.615 0.109 -0.336 -0.842
4: 0.951 0.520 0.014 0.095 0.647 0.078 -0.046 0.07 1.046 0.615 0.109 1.566 1.060
5: 0.501 -0.172 -0.014 0.095 0.905 -0.943 0.046 0.07 0.596 -0.077 0.081 0.424 0.582
6: -0.501 -0.172 -0.014 0.095 -0.905 -0.943 0.046 0.07 -0.406 -0.077 0.081 -0.578 -0.420

Strip leading zero from numeric vector without changing class

I have the following data, which is a few Major League Baseball statistics.
Year AVG SLG TB OBP IsoPow RC
1 1986 0.223 0.300 172 0.330 0.194 64.1
2 1987 0.261 0.356 271 0.329 0.230 92.8
3 1988 0.283 0.357 264 0.368 0.208 100.0
4 1989 0.248 0.328 247 0.351 0.178 91.9
5 1990 0.301 0.374 293 0.406 0.264 128.0
6 1991 0.292 0.367 262 0.410 0.222 118.2
Usually, percentage-type MLB statistics are displayed as a decimal, but with the leading zero removed. I'd like to do the same, but also preserve the class of the variable, which in this case is numeric.
For example, with bonds$AVG I'd like the result to be a numeric vector that looks exactly like
[1] .223 .261 .283 .248 .301 .292
Using sub, the vector goes from numeric to character, then back to its original numeric state after wrapping it with as.numeric.
> sub(0, "", bonds$AVG)
# [1] ".223" ".261" ".283" ".248" ".301" ".292"
> as.numeric(sub(0, "", bonds$AVG))
# [1] 0.223 0.261 0.283 0.248 0.301 0.292
Is this possible in R?
bonds <-
structure(list(Year = c(1986, 1987, 1988, 1989, 1990, 1991),
AVG = c(0.223, 0.261, 0.283, 0.248, 0.301, 0.292), SLG = c(0.3,
0.356, 0.357, 0.328, 0.374, 0.367), TB = c(172, 271, 264,
247, 293, 262), OBP = c(0.33, 0.329, 0.368, 0.351, 0.406,
0.41), IsoPow = c(0.194, 0.23, 0.208, 0.178, 0.264, 0.222
), RC = c(64.1, 92.8, 100, 91.9, 128, 118.2)), .Names = c("Year",
"AVG", "SLG", "TB", "OBP", "IsoPow", "RC"), row.names = c(NA,
6L), class = "data.frame")
Perhaps you could generalize the following by modifying print.data.frame?
f1 <- function(x) noquote(sub(0, "", x))
f1(bonds$AVG)
.223 .261 .283 .248 .301 .292

Resources