Using the output of `combn` to rename lists - r

I have the following data which looks like:
Pza_de_Espana Escuelas_Aguirre Av_Ramon_y_Cajal Arturo_Soria C_Farolillo Casa_de_Campo Barajas
1 12 29 26 27 19 4 31
2 40 42 55 49 41 25 53
3 51 51 73 57 56 51 60
4 53 52 65 56 64 64 56
5 46 46 59 50 53 34 65
6 30 34 39 39 34 20 50
7 31 39 40 28 37 28 37
I can run the following to get the first stage linear regression fitted values for each combination of the columns.
firstStage <- combn(names(data)[-1], 2, FUN = function(x)
lm(reformulate(x[1], response = x[2]), data = data), simplify = FALSE)
library(dplyr)
library(purrr)
firstStagePreds <- firstStage %>%
map(., ~ pluck(., "fitted.values"))
firstStageFittedValues <- firstStagePreds %>%
bind_cols()
Which gives me:
# A tibble: 10 × 171
...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...14 ...15 ...16 ...17 ...18
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 33.1 25.7 19.1 1.93 35.6 28.7 35.1 25.0 19.5 22.8 30.1 14.8 23.2 81.6 19.2 5.92 14.3 74.6
2 53.4 43.8 43.0 31.7 50.1 47.3 57.3 47.1 37.9 42.5 46.0 32.8 43.8 81.7 42.4 23.8 29.8 74.9
3 67.5 56.4 59.6 52.2 60.1 60.2 72.7 62.4 50.6 56.2 57.0 45.3 58.1 81.7 58.4 36.2 40.6 75.2
4 69.0 57.8 61.4 54.5 61.2 61.6 74.4 64.1 52.0 57.7 58.2 46.6 59.7 81.7 60.2 37.6 41.8 75.2
5 59.7 49.4 50.4 40.8 54.5 53.0 64.2 53.9 43.5 48.6 50.9 38.3 50.2 81.7 49.5 29.3 34.6 75.0
Where the column names are ...1 ...2 ...3 etc. I would like to use the combn(names(data)... to rename the lists in the firstStage part of the code. Then I can use the bind_cols(.id = "myListNames") to have more meaningful column names. So how can I use the output of the combn store them and rename the list?
Data:
data <- structure(list(Pza_de_Espana = c(12, 40, 51, 53, 46, 30, 31,
30, 26, 47), Escuelas_Aguirre = c(29, 42, 51, 52, 46, 34, 39,
31, 39, 41), Av_Ramon_y_Cajal = c(26, 55, 73, 65, 59, 39, 40,
49, 47, 56), Arturo_Soria = c(27, 49, 57, 56, 50, 39, 28, 25,
35, 50), C_Farolillo = c(19, 41, 56, 64, 53, 34, 37, 22, 25,
50), Casa_de_Campo = c(4, 25, 51, 64, 34, 20, 28, 7, 9, 38),
Barajas = c(31, 53, 60, 56, 65, 50, 37, 41, 36, 54), Pza_del_Carmen = c(24,
46, 59, 63, 54, 35, 43, 39, 40, 47), Moratalaz = c(35, 62,
76, 69, 67, 48, 37, 39, 47, 66), Cuatro_Caminos = c(19, 40,
65, 64, 52, 33, 50, 37, 33, 51), Barrio_de_Pilar = c(23,
40, 53, 52, 40, 29, 28, 19, 31, 41), Vallecas = c(21, 44,
55, 56, 54, 37, 28, 26, 33, 47), Mendez_Alvaro = c(24, 44,
55, 59, 51, 32, 51, 42, 31, 51), Retiro = c(13, 31, 44, 50,
36, 22, 30, 22, 21, 37), Ensanche_Vallecas = c(21, 47, 55,
56, 62, 34, 32, 29, 32, 45), Plaza_Eliptica = c(81.989743981193,
81.9187518755066, 81.8477597698195, 81.7767676641325, 81.7057755584454,
81.6347834527583, 81.5637913470712, 81.4927992413841, 81.421807135697,
81.3508150300099), Sanchinarro = c(16, 52, 61, 56, 46, 30,
28, 25, 33, 48), El_Pardo = c(7, 28, 30, 42, 31, 16, 12,
8, 13, 29), Parque_Juan_Carlos_1 = c(17, 36, 41, 41, 35,
27, 18, 12, 20, 32), Tres_Olivos = c(76.7710873995529, 76.3531164480543,
75.935145496561, 75.5171745450677, 75.0992035935744, 74.6812326420812,
74.2632616905879, 73.8452907390946, 73.4273197876013, 73.009348836108
)), row.names = c(NA, 10L), class = "data.frame")

You can return only the fitted values from combn.
library(dplyr)
combn(names(data)[-1], 2, FUN = function(x) {
model <- lm(reformulate(x[1], response = x[2]), data = data)
tibble(!!paste(x, collapse = ' vs ') := model$fitted.values)
}, simplify = FALSE) %>%
bind_cols()

Related

LInear regression model over different combinations of columns

I have some data which looks like:
date Moratalaz Cuatro_Caminos Barrio_de_Pilar Vallecas
1 2010-01-01 35 19 23 21
2 2010-01-02 62 40 40 44
3 2010-01-03 76 65 53 55
4 2010-01-04 69 64 52 56
5 2010-01-05 67 52 40 54
6 2010-01-06 48 33 29 37
7 2010-01-07 37 50 28 28
8 2010-01-08 39 37 19 26
9 2010-01-09 47 33 31 33
10 2010-01-10 66 51 41 47
I can run a linear regression model over individual columns using:
lm(data$Moratalaz ~ data$Cuatro_Caminos)
However, I would like to run the regression model over every combination of columns (excluding the date column)
I tried something like the following but was not able to get it working:
formula_list <- list(as.formula('data$Moratalaz ~ data$Barrio_de_Pilar'),
as.formula('data$Barrio_de_Pilar ~ data$Cuatro_Caminos')
)
lapply(formula_list, FUN = lm, data = data)
Data
data <- structure(list(date = structure(c(14610, 14611, 14612, 14613,
14614, 14615, 14616, 14617, 14618, 14619), class = "Date"), Moratalaz = c(35,
62, 76, 69, 67, 48, 37, 39, 47, 66), Cuatro_Caminos = c(19, 40,
65, 64, 52, 33, 50, 37, 33, 51), Barrio_de_Pilar = c(23, 40,
53, 52, 40, 29, 28, 19, 31, 41), Vallecas = c(21, 44, 55, 56,
54, 37, 28, 26, 33, 47)), row.names = c(NA, 10L), class = "data.frame")
Consider using
combn(names(data)[-1], 2, FUN = function(x)
lm(reformulate(x[1], response = x[2]), data = data), simplify = FALSE)

Splitting a vector or list based on a value

I am trying to split the following list:
x <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61, 1, 74, 72, 66, 1, 68, 5, 1)
What I would like to do is split the above using the number 1 as the break points.
x1 <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61)
x2 <- c(1, 74, 72, 66)
x3 <- c(1, 68, 5)
There must be a simple method to use but I am drawing a blank and my search-fu is weak and coming up empty.
Thanks for your help.
Use split with cumsum:
x <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61, 1, 74, 72, 66, 1, 68, 5, 1)
split(x, f=cumsum(x==1))
#> $`1`
#> [1] 1 19 25 62 38 41 52 53 60 61
#>
#> $`2`
#> [1] 1 74 72 66
#>
#> $`3`
#> [1] 1 68 5
#>
#> $`4`
#> [1] 1

How in R group data by date and smooth them by moving average

I want to group daily data from Google Trends into weekly observations and smooth them by 7-day centered moving average? How can I do this? In which order?
Should I first group data? Or should I use centered moving average on daily data?
This is my data:
dput(multiTimeline)
structure(list(day = structure(c(1598400000, 1598486400, 1598572800,
1598659200, 1598745600, 1598832000, 1598918400, 1599004800, 1599091200,
1599177600, 1599264000, 1599350400, 1599436800, 1599523200, 1599609600,
1599696000, 1599782400, 1599868800, 1599955200, 1600041600, 1600128000,
1600214400, 1600300800, 1600387200, 1600473600, 1600560000, 1600646400,
1600732800, 1600819200, 1600905600, 1600992000, 1601078400, 1601164800,
1601251200, 1601337600, 1601424000, 1601510400, 1601596800, 1601683200,
1601769600, 1601856000, 1601942400, 1602028800, 1602115200, 1602201600,
1602288000, 1602374400, 1602460800, 1602547200, 1602633600, 1602720000,
1602806400, 1602892800, 1602979200, 1603065600, 1603152000, 1603238400,
1603324800, 1603411200, 1603497600, 1603584000, 1603670400, 1603756800,
1603843200, 1603929600, 1604016000, 1604102400, 1604188800, 1604275200,
1604361600, 1604448000, 1604534400, 1604620800, 1604707200, 1604793600,
1604880000, 1604966400, 1605052800, 1605139200, 1605225600, 1605312000,
1605398400, 1605484800, 1605571200, 1605657600, 1605744000, 1605830400,
1605916800, 1606003200, 1606089600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), football = c(36, 36, 41, 60, 45, 38, 38, 39,
43, 49, 70, 49, 44, 46, 50, 62, 71, 92, 96, 61, 51, 45, 50, 58,
87, 81, 54, 50, 43, 49, 58, 97, 84, 55, 48, 41, 51, 56, 94, 83,
51, 47, 46, 49, 62, 97, 84, 51, 55, 51, 47, 52, 96, 79, 51, 49,
42, 44, 52, 100, 82, 49, 45, 41, 42, 50, 89, 73, 48, 40, 21,
29, 36, 75, 69, 45, 37, 39, 45, 51, 87, 69, 47, 48, 43, 37, 45,
79, 66, 46)), row.names = c(NA, -90L), class = c("tbl_df", "tbl",
"data.frame"))
Data is from 2020-08-26 to 2020-11-23.
I allowed myself to use the packages dplyr, to make data manipulation easier, and lubidrate, which makes date manipualtion easy.
The code is:
library(dplyr)
library(lubridate)
df2 <- df %>%
mutate(week = week(day)) %>%
group_by(week) %>%
summarise(average = mean(football))
The only function I used from lubidrate there was week(), if you're interested.
What I did was: first, I created another column (could have been the same one, though) that states the week. Note that this only works because your column was already in date-time format (though just date would have workes too, maybe even better). From that, I grouped by week and took the average. I hope I understood your question correctly and this will help.
It worked; this was the output:
> df2
# A tibble: 13 x 2
week average
<dbl> <dbl>
1 35 42
2 36 48.6
3 37 69
4 38 60.7
5 39 62
6 40 60.4
7 41 63.4
8 42 60.7
9 43 59.1
10 44 54.7
11 45 44.6
12 46 55.1
13 47 52.7
You can use rollmean from zoo package to do all this as a one-liner.
multiTimeline$rolling <- zoo::rollmean(multiTimeline$football, 7, na.pad = TRUE)
multiTimeline
#> # A tibble: 90 x 3
#> day football rolling
#> <dttm> <dbl> <dbl>
#> 1 2020-08-26 00:00:00 36 NA
#> 2 2020-08-27 00:00:00 36 NA
#> 3 2020-08-28 00:00:00 41 NA
#> 4 2020-08-29 00:00:00 60 42
#> 5 2020-08-30 00:00:00 45 42.4
#> 6 2020-08-31 00:00:00 38 43.4
#> 7 2020-09-01 00:00:00 38 44.6
#> 8 2020-09-02 00:00:00 39 46
#> 9 2020-09-03 00:00:00 43 46.6
#> 10 2020-09-04 00:00:00 49 47.4
#> # ... with 80 more rows
If you want to pick out the smoothed average for each week from Saturday to Friday, just use filter to select only Tuesdays. This will give you the 7-day average from the previous Saturday to the following Friday.
multiTimeline %>% filter(lubridate::wday(day) == 3)
#> # A tibble: 12 x 3
#> day football rolling
#> <dttm> <dbl> <dbl>
#> 1 2020-09-01 00:00:00 38 44.6
#> 2 2020-09-08 00:00:00 46 56
#> 3 2020-09-15 00:00:00 51 64.7
#> 4 2020-09-22 00:00:00 50 60.3
#> 5 2020-09-29 00:00:00 48 61.7
#> 6 2020-10-06 00:00:00 47 61.7
#> 7 2020-10-13 00:00:00 55 62.4
#> 8 2020-10-20 00:00:00 49 59
#> 9 2020-10-27 00:00:00 45 58.4
#> 10 2020-11-03 00:00:00 40 48
#> 11 2020-11-10 00:00:00 37 51.6
#> 12 2020-11-17 00:00:00 48 53.7
To show this is what you want, we can plot your data and the averaged line using ggplot:
ggplot(multiTimeline, aes(day, football)) +
geom_line() +
geom_line(data = multiTimeline %>% filter(lubridate::wday(day) == 3),
aes(y = rolling), col = "red", lty = 2, size = 1.5)

How do I separate the pattern counts with R?

via a program I have received the following pattern count.
Counter({'CCCC': 22115, 'TTTT': 22043, 'AAAA': 22037, 'GGGG': 21930, 'AAAC': 154, 'TTAT': 152, 'CCCA': 152, 'CCTC': 152, 'GGGC': 151, 'TTTG': 150, 'GTGG': 149, 'GCCC': 148, 'CCGC': 145, 'CGGG': 145, 'TGGG': 144, 'AGAA': 144, 'TTGT': 144, 'GAAA': 142, 'CCCG': 142, 'CCCT': 142, 'TCCC': 141, 'CAAA': 139, 'ATTT': 137, 'CGCC': 134, 'GGTG': 133, 'GAGG': 133, 'TTTA': 132, 'CTTT': 131, 'TCTT': 131, 'ACCC': 130, 'AGGG': 130, 'GGAG': 129, 'AACA': 129, 'TAAA': 129, 'TATT': 128, 'TTTC': 128, 'AAGA': 127, 'GGGA': 126, 'ACAA': 126, 'TTCT': 125, 'CTCC': 124, 'GCGG': 124, 'ATAA': 123, 'GGCG': 120, 'CACC': 119, 'AAAT': 118, 'AATA': 117, 'AAAG': 114, 'GTTT': 114, 'TGTT': 112, 'GGGT': 112, 'CCAC': 110, 'CGCG': 45, 'AACC': 43, 'TTAA': 41, 'CTCT': 41, 'GGCC': 41, 'ACTC': 40, 'CTTC': 40, 'GCCG': 39, 'ATTA': 39, 'ACCT': 39, 'TGCG': 39, 'ATAT': 39, 'TCTC': 38, 'ACGG': 38, 'TATA': 37, 'ATCA': 37, 'CGGC': 37, 'CGAG': 36, 'AGAG': 36, 'GACA': 35, 'GTTG': 35, 'TGAG': 35, 'TGGT': 35, 'CCAA': 35, 'TTGG': 34, 'GTGT': 34, 'GCGC': 34, 'CACA': 34, 'GTAA': 34, 'GTAG': 34, 'TCCA': 34, 'TCCT': 34, 'AAGG': 34, 'GAGA': 34, 'GCTT': 34, 'GTGC': 33, 'CTAT': 33, 'TTGC': 33, 'CGGA': 33, 'AGGA': 32, 'GACG': 32, 'AATT': 32, 'CAAC': 32, 'CTGC': 32, 'CTAC': 32, 'ACGA': 32, 'CGAC': 32, 'CCGG': 32, 'TCTG': 32, 'GGAA': 32, 'GGAT': 32, 'TGCT': 32, 'TTAG': 32, 'GCTG': 32, 'GAGT': 31, 'AGGC': 31, 'TTCC': 31, 'ATGA': 31, 'TTCA': 31, 'CCAT': 31, 'AAGT': 31, 'GAGC': 31, 'GTAT': 31, 'CGAA': 31, 'TCAT': 31, 'ATTC': 31, 'TGTG': 30, 'AGTT': 30, 'ATCC': 30, 'AGCA': 30, 'GTCT': 30, 'TGTC': 30, 'TCAC': 30, 'CACT': 30, 'ACTA': 30, 'TAAT': 30, 'CCGT': 30, 'CCTA': 29, 'TCGG': 29, 'GGTA': 29, 'TATG': 29, 'AACG': 29, 'CACG': 29, 'GATT': 29, 'ATCT': 29, 'TGGC': 29, 'AGCC': 29, 'TATC': 29, 'GCTC': 29, 'GGCT': 29, 'TCTA': 29, 'AACT': 28, 'CCTT': 28, 'CTTA': 28, 'TGTA': 28, 'TAGT': 28, 'AGTG': 28, 'CCGA': 27, 'AATG': 27, 'CCTG': 27, 'CTGT': 27, 'AGTC': 27, 'GTCC': 27, 'GGTT': 27, 'ACAC': 26, 'TACC': 26, 'CATC': 26, 'CATA': 26, 'GTGA': 26, 'TGAA': 26, 'GGTC': 26, 'CTTG': 26, 'GCAC': 26, 'GGCA': 26, 'CGTC': 26, 'CTGG': 26, 'TAAG': 26, 'TCGT': 26, 'TGAT': 25, 'CAGA': 25, 'GAAC': 25, 'ACCA': 25, 'TTAC': 25, 'CATT': 25, 'AGAT': 25, 'CGGT': 25, 'ATTG': 25, 'TTGA': 25, 'GATA': 24, 'GGAC': 24, 'AAGC': 24, 'GTCA': 24, 'CAAT': 24, 'GCAG': 24, 'ACAT': 24, 'TGCC': 24, 'ATAG': 24, 'CGTG': 24, 'CGCA': 24, 'TAGG': 23, 'ACCG': 23, 'TTCG': 23, 'AGCG': 23, 'GTTC': 23, 'ACTT': 23, 'CGTT': 23, 'AGAC': 23, 'GCAT': 22, 'TCCG': 22, 'TAAC': 22, 'ACGC': 22, 'CAGC': 22, 'GACC': 22, 'CATG': 22, 'TCGA': 22, 'TAGA': 22, 'GCAA': 22, 'CTCG': 22, 'TACT': 22, 'AATC': 21, 'CGCT': 21, 'GAAT': 21, 'GCGT': 21, 'AGTA': 21, 'GCCA': 21, 'ATGG': 21, 'TCAA': 21, 'CTCA': 21, 'TGGA': 20, 'GAAG': 20, 'GATC': 20, 'TGCA': 20, 'GCCT': 19, 'GTCG': 19, 'CAAG': 19, 'TCGC': 19, 'CTGA': 19, 'GATG': 19, 'CTAA': 19, 'GCGA': 19, 'ATAC': 18, 'GTTA': 18, 'GCTA': 18, 'AGGT': 18, 'CCAG': 18, 'ACAG': 18, 'CTAG': 17, 'CGTA': 17, 'ACGT': 17, 'TACA': 17, 'AGCT': 16, 'CAGG': 16, 'ATGT': 16, 'ATCG': 16, 'ATGC': 15, 'TGAC': 14, 'TAGC': 14, 'ACTG': 14, 'TCAG': 14, 'CGAT': 14, 'TACG': 13, 'CAGT': 11, 'GTAC': 10, 'GACT': 9})
I want to convert it now as a list, so that in the first column "AAAA" there are all corresponding values and so also for all combinations. Does anyone have an idea how to program this well?
This is how I read the data into R:
daten <- read.table("/PATTERN.txt", header = FALSE, sep = "\t");
So far I've tried direct reading, but somehow it doesn't really work. It should look like this:
AAAA CCCC
1 22128 22127
Thank you very much!
If Lines shown reproducibly in the Note at the end contains the data then in it replace Counter( with [, ) with ] and ' with " and read that in using fromJSON:
library(jsonlite)
fromJSON(gsub("'", '"',
sub("\\)", "]",
sub("Counter.","[", Lines))))
giving:
CCCC TTTT AAAA GGGG AAAC TTAT CCCA CCTC GGGC TTTG GTGG GCCC CCGC CGGG
1 22115 22043 22037 21930 154 152 152 152 151 150 149 148 145 145
TGGG AGAA TTGT GAAA CCCG CCCT TCCC CAAA ATTT CGCC GGTG GAGG TTTA CTTT TCTT
1 144 144 144 142 142 142 141 139 137 134 133 133 132 131 131
ACCC AGGG GGAG AACA TAAA TATT TTTC AAGA GGGA ACAA TTCT CTCC GCGG ATAA GGCG
1 130 130 129 129 129 128 128 127 126 126 125 124 124 123 120
CACC AAAT AATA AAAG GTTT TGTT GGGT CCAC CGCG AACC TTAA CTCT GGCC ACTC CTTC
1 119 118 117 114 114 112 112 110 45 43 41 41 41 40 40
GCCG ATTA ACCT TGCG ATAT TCTC ACGG TATA ATCA CGGC CGAG AGAG GACA GTTG TGAG
1 39 39 39 39 39 38 38 37 37 37 36 36 35 35 35
TGGT CCAA TTGG GTGT GCGC CACA GTAA GTAG TCCA TCCT AAGG GAGA GCTT GTGC CTAT
1 35 35 34 34 34 34 34 34 34 34 34 34 34 33 33
TTGC CGGA AGGA GACG AATT CAAC CTGC CTAC ACGA CGAC CCGG TCTG GGAA GGAT TGCT
1 33 33 32 32 32 32 32 32 32 32 32 32 32 32 32
TTAG GCTG GAGT AGGC TTCC ATGA TTCA CCAT AAGT GAGC GTAT CGAA TCAT ATTC TGTG
1 32 32 31 31 31 31 31 31 31 31 31 31 31 31 30
AGTT ATCC AGCA GTCT TGTC TCAC CACT ACTA TAAT CCGT CCTA TCGG GGTA TATG AACG
1 30 30 30 30 30 30 30 30 30 30 29 29 29 29 29
CACG GATT ATCT TGGC AGCC TATC GCTC GGCT TCTA AACT CCTT CTTA TGTA TAGT AGTG
1 29 29 29 29 29 29 29 29 29 28 28 28 28 28 28
CCGA AATG CCTG CTGT AGTC GTCC GGTT ACAC TACC CATC CATA GTGA TGAA GGTC CTTG
1 27 27 27 27 27 27 27 26 26 26 26 26 26 26 26
GCAC GGCA CGTC CTGG TAAG TCGT TGAT CAGA GAAC ACCA TTAC CATT AGAT CGGT ATTG
1 26 26 26 26 26 26 25 25 25 25 25 25 25 25 25
TTGA GATA GGAC AAGC GTCA CAAT GCAG ACAT TGCC ATAG CGTG CGCA TAGG ACCG TTCG
1 25 24 24 24 24 24 24 24 24 24 24 24 23 23 23
AGCG GTTC ACTT CGTT AGAC GCAT TCCG TAAC ACGC CAGC GACC CATG TCGA TAGA GCAA
1 23 23 23 23 23 22 22 22 22 22 22 22 22 22 22
CTCG TACT AATC CGCT GAAT GCGT AGTA GCCA ATGG TCAA CTCA TGGA GAAG GATC TGCA
1 22 22 21 21 21 21 21 21 21 21 21 20 20 20 20
GCCT GTCG CAAG TCGC CTGA GATG CTAA GCGA ATAC GTTA GCTA AGGT CCAG ACAG CTAG
1 19 19 19 19 19 19 19 19 18 18 18 18 18 18 17
CGTA ACGT TACA AGCT CAGG ATGT ATCG ATGC TGAC TAGC ACTG TCAG CGAT TACG CAGT
1 17 17 17 16 16 16 16 15 14 14 14 14 14 13 11
GTAC GACT
1 10 9
Note
Lines <- "
Counter({'CCCC': 22115, 'TTTT': 22043, 'AAAA': 22037, 'GGGG':21930, 'AAAC': 154, 'TTAT': 152, 'CCCA': 152, 'CCTC': 152, 'GGGC': 151, 'TTTG': 150, 'GTGG': 149, 'GCCC': 148, 'CCGC': 145, 'CGGG': 145, 'TGGG': 144, 'AGAA': 144, 'TTGT': 144, 'GAAA': 142, 'CCCG': 142, 'CCCT': 142, 'TCCC': 141, 'CAAA': 139, 'ATTT': 137, 'CGCC': 134, 'GGTG': 133, 'GAGG': 133, 'TTTA': 132, 'CTTT': 131, 'TCTT': 131, 'ACCC': 130, 'AGGG': 130, 'GGAG': 129, 'AACA': 129, 'TAAA': 129, 'TATT': 128, 'TTTC': 128, 'AAGA': 127, 'GGGA': 126, 'ACAA': 126, 'TTCT': 125, 'CTCC': 124, 'GCGG': 124, 'ATAA': 123, 'GGCG': 120, 'CACC': 119, 'AAAT': 118, 'AATA': 117, 'AAAG': 114, 'GTTT': 114, 'TGTT': 112, 'GGGT': 112, 'CCAC': 110, 'CGCG': 45, 'AACC': 43, 'TTAA': 41, 'CTCT': 41, 'GGCC': 41, 'ACTC': 40, 'CTTC': 40, 'GCCG': 39, 'ATTA': 39, 'ACCT': 39, 'TGCG': 39, 'ATAT': 39, 'TCTC': 38, 'ACGG': 38, 'TATA': 37, 'ATCA': 37, 'CGGC': 37, 'CGAG': 36, 'AGAG': 36, 'GACA': 35, 'GTTG': 35, 'TGAG': 35, 'TGGT': 35, 'CCAA': 35, 'TTGG': 34, 'GTGT': 34, 'GCGC': 34, 'CACA': 34, 'GTAA': 34, 'GTAG': 34, 'TCCA': 34, 'TCCT': 34, 'AAGG': 34, 'GAGA': 34, 'GCTT': 34, 'GTGC': 33, 'CTAT': 33, 'TTGC': 33, 'CGGA': 33, 'AGGA': 32, 'GACG': 32, 'AATT': 32, 'CAAC': 32, 'CTGC': 32, 'CTAC': 32, 'ACGA': 32, 'CGAC': 32, 'CCGG': 32, 'TCTG': 32, 'GGAA': 32, 'GGAT': 32, 'TGCT': 32, 'TTAG': 32, 'GCTG': 32, 'GAGT': 31, 'AGGC': 31, 'TTCC': 31, 'ATGA': 31, 'TTCA': 31, 'CCAT': 31, 'AAGT': 31, 'GAGC': 31, 'GTAT': 31, 'CGAA': 31, 'TCAT': 31, 'ATTC': 31, 'TGTG': 30, 'AGTT': 30, 'ATCC': 30, 'AGCA': 30, 'GTCT': 30, 'TGTC': 30, 'TCAC': 30, 'CACT': 30, 'ACTA': 30, 'TAAT': 30, 'CCGT': 30, 'CCTA': 29, 'TCGG': 29, 'GGTA': 29, 'TATG': 29, 'AACG': 29, 'CACG': 29, 'GATT': 29, 'ATCT': 29, 'TGGC': 29, 'AGCC': 29, 'TATC': 29, 'GCTC': 29, 'GGCT': 29, 'TCTA': 29, 'AACT': 28, 'CCTT': 28, 'CTTA': 28, 'TGTA': 28, 'TAGT': 28, 'AGTG': 28, 'CCGA': 27, 'AATG': 27, 'CCTG': 27, 'CTGT': 27, 'AGTC': 27, 'GTCC': 27, 'GGTT': 27, 'ACAC': 26, 'TACC': 26, 'CATC': 26, 'CATA': 26, 'GTGA': 26, 'TGAA': 26, 'GGTC': 26, 'CTTG': 26, 'GCAC': 26, 'GGCA': 26, 'CGTC': 26, 'CTGG': 26, 'TAAG': 26, 'TCGT': 26, 'TGAT': 25, 'CAGA': 25, 'GAAC': 25, 'ACCA': 25, 'TTAC': 25, 'CATT': 25, 'AGAT': 25, 'CGGT': 25, 'ATTG': 25, 'TTGA': 25, 'GATA': 24, 'GGAC': 24, 'AAGC': 24, 'GTCA': 24, 'CAAT': 24, 'GCAG': 24, 'ACAT': 24, 'TGCC': 24, 'ATAG': 24, 'CGTG': 24, 'CGCA': 24, 'TAGG': 23, 'ACCG': 23, 'TTCG': 23, 'AGCG': 23, 'GTTC': 23, 'ACTT': 23, 'CGTT': 23, 'AGAC': 23, 'GCAT': 22, 'TCCG': 22, 'TAAC': 22, 'ACGC': 22, 'CAGC': 22, 'GACC': 22, 'CATG': 22, 'TCGA': 22, 'TAGA': 22, 'GCAA': 22, 'CTCG': 22, 'TACT': 22, 'AATC': 21, 'CGCT': 21, 'GAAT': 21, 'GCGT': 21, 'AGTA': 21, 'GCCA': 21, 'ATGG': 21, 'TCAA': 21, 'CTCA': 21, 'TGGA': 20, 'GAAG': 20, 'GATC': 20, 'TGCA': 20, 'GCCT': 19, 'GTCG': 19, 'CAAG': 19, 'TCGC': 19, 'CTGA': 19, 'GATG': 19, 'CTAA': 19, 'GCGA': 19, 'ATAC': 18, 'GTTA': 18, 'GCTA': 18, 'AGGT': 18, 'CCAG': 18, 'ACAG': 18, 'CTAG': 17, 'CGTA': 17, 'ACGT': 17, 'TACA': 17, 'AGCT': 16, 'CAGG': 16, 'ATGT': 16, 'ATCG': 16, 'ATGC': 15, 'TGAC': 14, 'TAGC': 14, 'ACTG': 14, 'TCAG': 14, 'CGAT': 14, 'TACG': 13, 'CAGT': 11, 'GTAC': 10, 'GACT': 9})"
This answer may help you in this particular case, but you should insist that whoever produced that result to export in such a way that can be easily imported with every programming language. Here you have a string representation of a python object which is definitely not a good way for exchanging data.
However, you can try this:
#place here the correct path to the file
fn <- "pattern.txt"
#here we read the content of the file as is
filecontent <- readChar(fn,file.info(fn)$size)
#we manipulate the string a bit to have an R list
res <- eval(parse(text = gsub("[\\{\\}\n]", "",
gsub(":", "=", sub("Counter", "list", filecontent)))))

Trying to Repeat, but data is not a multiple

So I am trying to label a data matrix with conditions; however, when I did my experiment, I had 3 tubes where I repeated the first two 7 times and the third tube 6 times. How can I code the matrix to be re-written and ignore that there is "missing" data:
dm$Strain<-dm$variable
dm$Strain<-rep(c("446-1", "446-2", "446-3"), each.out=193)
dm$Strain<-factor(dm$Strain)
levels(dm$Strain)
Error in $<-.data.frame(*tmp*, "Strain", value = c("446-1", "446-2", :
replacement has 3 rows, data has 19300
Data Setup in Wells:
1) Control = 1, 16, 31, 46, 61, 76, 91
2) LI 446-1 tube = 2, 17, 32, 47, 62, 77, 92
3) LI 446-1 10^7 = 3, 18, 33, 48, 63, 78, 93
4) LI 446-1 10^6 = 4, 19, 34, 49, 64, 79, 94
5) LI 446-1 10^5 = 5, 20, 35, 50, 65, 80, 95
6) Control = 6, 21, 36, 51, 66, 81, 96
7) LI-446-2 tube = 7, 22, 37, 52, 67, 82, 97
8) LI-446-2 10^7 = 8, 23, 38, 53, 68, 83, 98
9) LI-446-2 10^6 = 9, 24, 39, 54, 69, 84, 99
10) LI-446-2 10^5 = 10, 25, 40 ,55, 70, 85, 100
11) Control = 11, 26, 41, 56, 71, 86
12) LI-446-3 tube = 12, 27, 42, 57, 72, 87
13) LI-446-3 10^7 = 13, 28, 43, 58, 73, 88
14) LI-446-3 10^6 = 14, 29, 44, 59, 74, 89
15) LI-446-3 10^5 = 15, 30, 45, 60, 75, 90
I have 19300 columns of data, where 1:193 correspond to Well 1 at 15min intervals, 194:386 are Well 2 at 15 min intervals, etc up to Well 100. However, 446-3 (AKA 11-15 above) are repeated 6 times and 446-1 and 446-2 are repeated 7 times.
str(dm)
'data.frame': 19300 obs. of 4 variables:
$ Time..mins.: int 15 30 45 60 75 90 105 120 135 150 ...
$ variable : Factor w/ 100 levels "Well_1","Well_2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0.439 0.204 0.191 0.187 0.185 0.19 0.187 0.19 0.188 0.191 ...
$ Media : Factor w/ 2 levels "BHI","BHI_salt": 1 1 1 1 1 1 1 1 1 1 ...

Resources