Extracting gene name and ID number from a vector [duplicate]

Extracting gene name and ID number from a vector [duplicate] - r

This question already has answers here:
How do I separate a character column into two columns? [duplicate]
(2 answers)
Closed 1 year ago.
What gsub function can I use in R to get the gene name and the id number from a vector which looks like this?
head(colnames(cn), 20)
[1] "A1BG (1)" "NAT2 (10)" "ADA (100)" "CDH2 (1000)" "AKT3 (10000)" "GAGE12F (100008586)"
[7] "RNA5-8SN5 (100008587)" "RNA18SN5 (100008588)" "RNA28SN5 (100008589)" "LINC02584 (100009613)" "POU5F1P5 (100009667)" "ZBTB11-AS1 (100009676)"
[13] "MED6 (10001)" "NR2E3 (10002)" "NAALAD2 (10003)" "DUXB (100033411)" "SNORD116-1 (100033413)" "SNORD116-2 (100033414)"
[19] "SNORD116-3 (100033415)" "SNORD116-4 (100033416)"

1) Assuming the input s given in the Note at the end we can use read.table specifying that the fields are separated by ( and that ) is a comment character. We also strip white space around fields and give meaningful column names. No packages are used.
DF <- read.table(text = s, sep = "(", comment.char = ")",
strip.white = TRUE, col.names = c("Gene", "Id"))
DF
giving this data frame so DF$Gene is the genes and DF$Id is the id's.
Gene Id
1 A1BG 1
2 NAT2 10
3 ADA 100
4 CDH2 1000
5 AKT3 10000
6 GAGE12F 100008586
7 RNA5-8SN5 100008587
8 RNA18SN5 100008588
9 RNA28SN5 100008589
10 LINC02584 100009613
11 POU5F1P5 100009667
12 ZBTB11-AS1 100009676
13 MED6 10001
14 NR2E3 10002
15 NAALAD2 10003
16 DUXB 100033411
17 SNORD116-1 100033413
18 SNORD116-2 100033414
19 SNORD116-3 100033415
20 SNORD116-4 100033416
2) A variation of the above is to first remove the parentheses and then read it in giving the same result. Note that the second argument of chartr contains two spaces so that each parenthesis is translated to a space.
read.table(text = chartr("()", " ", s), col.names = c("Gene", "Id"))
Note
Lines <- '[1] "A1BG (1)" "NAT2 (10)" "ADA (100)" "CDH2 (1000)" "AKT3 (10000)" "GAGE12F (100008586)"
[7] "RNA5-8SN5 (100008587)" "RNA18SN5 (100008588)" "RNA28SN5 (100008589)" "LINC02584 (100009613)" "POU5F1P5 (100009667)" "ZBTB11-AS1 (100009676)"
[13] "MED6 (10001)" "NR2E3 (10002)" "NAALAD2 (10003)" "DUXB (100033411)" "SNORD116-1 (100033413)" "SNORD116-2 (100033414)"
[19] "SNORD116-3 (100033415)" "SNORD116-4 (100033416)" '
L <- Lines |>
textConnection() |>
readLines() |>
gsub(pattern = "\\[\\d+\\]", replacement = "")
s <- scan(text = L, what = "")
so s looks like this:
> dput(s)
c("A1BG (1)", "NAT2 (10)", "ADA (100)", "CDH2 (1000)", "AKT3 (10000)",
"GAGE12F (100008586)", "RNA5-8SN5 (100008587)", "RNA18SN5 (100008588)",
"RNA28SN5 (100008589)", "LINC02584 (100009613)", "POU5F1P5 (100009667)",
"ZBTB11-AS1 (100009676)", "MED6 (10001)", "NR2E3 (10002)", "NAALAD2 (10003)",
"DUXB (100033411)", "SNORD116-1 (100033413)", "SNORD116-2 (100033414)",
"SNORD116-3 (100033415)", "SNORD116-4 (100033416)")

First, in the future please share your data using the dput() command. See this for details.
Second, here is one solution for extracting the parts you need:
library(tidyverse)
g<-c("A1BG (1)","NAT2 (10)","ADA (100)" , "RNA18SN5 (100008588)", "RNA28SN5 (100008589)")
gnumber<-stringr::str_extract(g,"(?=\\().*?(?<=\\))")
gnumber
gname<-stringr::str_extract(g, "[:alpha:]+")
gname
# or, to get the whole first word:
gname<-stringr::word(g,1,1)
gname

Related

How to manipulate digits in a character string in R?

I feel like I have a super easy question but for the life of me I can't find it when googling or searching here (or I don't know the correct terms to find a solution) so here goes.
I have a large amount of text in R in which I want to identify all numbers/digits, and add a specific number to them, for example 5.
So just as a small example, if this were my text:
text <- c("Hi. It is 6am. I want to leave at 7am")
I want the output to be:
> text
[1] "Hi. It is 11am. I want to leave at 12am"
But also I need the addition for each individual digit, so if this is the text:
text <- c("Hi. It is 2017. I am 35 years old.")
...I want the output to be:
> text
[1] "Hi. It is 75612. I am 810 years old."
I have tried 'grabbing' the numbers from the string and adding 5, but I don't know how to then get them back into the original string so I can get the full text back.
How should I go about this? Thanks in advance!

Here is how I would do the time. I would search for a number that is followed by am or pm and then sub in a math expression to be evaluated by gsubfn. This is pretty flexible, but would require whole hours in its current implementation. I added an am and pm if you wanted to swap those, but I didn't try to code in detecting if the number changes from am to pm. Also note that I didn't code in rolling from 12 to 1. If you add numbers over 12, you will get a number bigger than 12.
text1 <- c("Hi. It is 6am. I want to leave at 7am")
text2 <- c("It is 9am. I want to leave at 10am, but the cab comes at 11am. Can I push my flight to 12am?")
change_time <- function(text, hours, sign, am_pm){
string_change <- glue::glue("`(\\1{sign}{hours})`{am_pm}")
gsub("(\\d+)(?=am|pm)(am|pm)", string_change, text, perl = TRUE)|>
gsubfn::fn$c()
}
change_time(text = text1, hours = 5, sign = "+", am_pm = "am")
#> [1] "Hi. It is 11am. I want to leave at 12am"
change_time(text = text2, hours = 3, sign = "-", am_pm = "pm")
#> [1] "It is 6pm. I want to leave at 7pm, but the cab comes at 8pm. Can I push my flight to 9pm?"

text1 <- c("Hi. It is 2017. I am 35 years old.")
text2 <- c("Hi. It is 6am. I want to leave at 7am")
change_number <- function(text, change, sign){
string_change <- glue::glue("`(\\1{sign}{change})`")
gsub("(\\d)", string_change, text, perl = TRUE) %>%
gsubfn::fn$c() }
change_number(text = text1, change = 5, sign = "+")
#>[1] "Hi. It is 75612. I am 810 years old."
change_number(text = text2, change = 5, sign = "+")
#>[1] "Hi. It is 11am. I want to leave at 12am"
This works perfectly. Many thanks to #AndS., I tweaked (or rather, simplified) your code to fit my needs better. I was determined to figure out the other text myself haha, so thanks for showing me how!

Something quick and dirty with base R:
add_n = \(x, n, by_digit = FALSE) {
if (by_digit) ptrn = "[0-9]" else ptrn = "[0-9]+"
tmp = gregexpr(ptrn, x)
raw = regmatches(x, gregexpr(ptrn, x))
raw_plusn = lapply(raw, \(x) as.integer(x) + n)
for (i in seq_along(x)) regmatches(x[i], tmp[i]) = raw_plusn[i]
x
}
text = c(
"Hi. It is 6am. I want to leave at 7am",
"wow it's 505 dollars and 19 cents",
"Hi. It is 2017. I am 35 years old."
)
> add_n(text, 5)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 510 dollars and 24 cents"
# [3] "Hi. It is 2022. I am 40 years old."
> add_n(text, -2)
# [1] "Hi. It is 4am. I want to leave at 5am" "wow it's 503 dollars and 17 cents"
# [3] "Hi. It is 2015. I am 33 years old."
> add_n(text, 5, by_digit = TRUE)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 10510 dollars and 614 cents"
# [3] "Hi. It is 75612. I am 810 years old."

Here's a tidyverse solution:
data.frame(text) %>%
# separate `text` into individual characters:
separate_rows(text, sep = "(?<!^)(?!$)") %>%
# add `5` to any digit:
mutate(
# if you detect a digit...
text = ifelse(str_detect(text, "\\d"),
# ... extract it, convert it to numeric, add `5`:
as.numeric(str_extract(text, "\\d")) + 5,
# ... else leave `text` as is:
text)
) %>%
# string the characters back together:
summarise(text = str_c(text, collapse = ""))
# A tibble: 1 × 1
text
<chr>
1 Hi. It is 11am. I want to leave at 12am
Data 1:
text <- c("Hi. It is 6am. I want to leave at 7am")
Note that the same code works for the second text as well without any change:
# A tibble: 1 × 1
text
<chr>
1 Hi. It is 75612. I am 810 years old.
Data 2:
text <- c("Hi. It is 2017. I am 35 years old.")

R Extract partially matching string

I have a question about extracting a part of a string from several files that has these rows:
units = specified
- name 0 = prDM: Pressure, Digiquartz [db]
- name 1 = t090C: Temperature [ITS-90, deg C]
- name 2 = c0S/m: Conductivity [S/m]
- name 3 = t190C:Temperature, 2 [ITS-90, deg C]
- name 4 = c1S/m: Conductivity, 2 [S/m]
- name 5 = flSP: Fluorescence, Seapoint
- name 6 = sbeox0ML/L: Oxygen, SBE 43 [ml/l]
- name 7 = altM: Altimeter [m]
- name 8 = sal00: Salinity, Practical [PSU]
- name 9 = sal11: Salinity, Practical, 2 [PSU]
- span 0 = 1.000, 42.000
I need to extract only the information of the columns that start with "name" and extract everything between = and: .
For example, in the row "name 0 = prDM: Pressure, Digiquartz [db]" the desired result will be prDM.
Some files have different number of "name"rows (i.e. this example has 13 rows but other files has 16, and the number varies), so I want it to be as general as I can so I can allways extract the right strings independently the number of rows.Rows starts with # and a space before name.
I have tried this code but it only extract the first row. Can you please help me with this? Many thanks!
CNV<-NULL
for (i in 1:nro.files){
x <- readLines(all.files[i])
name.col<-grep("^\\# name", x)
df <- data.table::fread(text = x[name.col])
CNV[[i]]<-df
}

using stringr and the regex pattern "name \\d+ = (.*?):" which means in words "name followed by one or more digits followed by an equals sign followed by a space followed by a captured group containing any character (the period) zero or more times (the *) followed by a colon".
library(stringr)
strings <- c("name 0 = prDM: Pressure, Digiquartz [db]",
"name 1 = t090C: Temperature [ITS-90, deg C]",
"name 2 = c0S/m: Conductivity [S/m]",
"name 3 = t190C:Temperature, 2 [ITS-90, deg C]",
"name 4 = c1S/m: Conductivity, 2 [S/m]",
"name 5 = flSP: Fluorescence, Seapoint",
"name 6 = sbeox0ML/L: Oxygen, SBE 43 [ml/l]",
"name 7 = altM: Altimeter [m]",
"name 8 = sal00: Salinity, Practical [PSU]",
"name 9 = sal11: Salinity, Practical, 2 [PSU]")
result <- str_match(strings, "name \\d+ = (.*):")
result[,2]
[1] "prDM" "t090C" "c0S/m" "t190C" "c1S/m" "flSP" "sbeox0ML/L"
[8] "altM" "sal00" "sal11"
Or if you prefer base
pattern = "name \\d+ = (.*):"
result <- regmatches(strings, regexec(pattern, strings))
sapply(result, "[[", 2)
[1] "prDM" "t090C" "c0S/m" "t190C" "c1S/m" "flSP" "sbeox0ML/L"
[8] "altM" "sal00" "sal11"

Use str_extract from package stringr and positive lookahead and lookbehind:
str <- "name 0 = prDM: Pressure, Digiquartz [db]"
str_extract(str, "(?<== ).*(?=:)")
[1] "prDM"
Explanation:
(?<== )if you see =followed by white space on the left (lookbehind)
.* match anything until ...
(?=:)... you see a colon on the right (lookahead)

In Base R
test <- c("name 0 = prDM: Pressure, Digiquartz [db]","name 1 = t090C: Temperature [ITS-90, deg C]")
gsub("^name [0-9]+ = (.+):.+","\\1",test)
[1] "prDM" "t090C"
explanation
^name [0-9]+ Searches for a the beginning of a string ^ with name folowed by any length of number
= (.+): any length + of any character . found between = and : are stored ( ) to be later recalled by \\1

4 lines records into one line? How to combine in a single line?

I would like to ask you for help. I have data looking like this: (one record is in four lines:
9540 16
0.1586E-03-0.3713E-04 0.1559E-03-0.4054E-04 0.2610E-02 0.2589E-03 0.4509E-03
0.7271E-03 0.2286E-03 0.8627E-03 0.1511E-02 0.1208E-03 0.1169 0.5486E-01
0.1419E-01 0.1715
9546 16
0.1546E-03-0.2273E-04 0.1504E-03-0.1516E-04 0.2517E-02 0.1968E-03 0.5512E-03
0.7556E-03 0.2998E-03 0.1024E-02 0.1495E-02 0.6889E-03 0.1134 0.5461E-01
0.1418E-01 0.1708
I would like to read this into R and look like this (in one line):
9540 16 0.1586E-03 -0.3713E-04 0.1559E-03 -0.4054E-04 0.2610E-02 0.2589E-03 0.4509E-03 0.7271E-03 0.2286E-03 0.8627E-03 0.1511E-02 0.1208E-03 0.1169 0.5486E-01 0.1419E-01 0.1715
9546 16 0.1546E-03 -0.2273E-04 0.1504E-03 -0.1516E-04 0.2517E-02 0.1968E-03 0.5512E-03 0.7556E-03 0.2998E-03 0.1024E-02 0.1495E-02 0.6889E-03 0.1134 0.5461E-01 0.1418E-01 0.1708

We could read the file using readLines. Create a grouping variable using gl, paste the 'lines' based on the group with tapply. If needed, we can remove the leading and lagging spaces with str_trim from library(stringr)
lines <- readLines('fourlines.txt')
lines2 <- tapply(lines, as.numeric(gl(length(lines), 4, length(lines))),
FUN= paste, collapse=' ')
library(stringr)
lines2 <- str_trim(unname(lines2))
Output of 'lines2'
lines2
#[1] "9540 16 0.1586E-03-0.3713E-04 0.1559E-03-0.4054E-04 0.2610E-02 0.2589E-03 0.4509E-03 0.7271E-03 0.2286E-03 0.8627E-03 0.1511E-02 0.1208E-03 0.1169 0.5486E-01 0.1419E-01 0.1715"
#[2] "9546 16 0.1546E-03-0.2273E-04 0.1504E-03-0.1516E-04 0.2517E-02 0.1968E-03 0.5512E-03 0.7556E-03 0.2998E-03 0.1024E-02 0.1495E-02 0.6889E-03 0.1134 0.5461E-01 0.1418E-01 0.1708"
If we want to remove the extra spaces
lines2 <- gsub('\\s+', ' ', lines2)

read.table and files with excess commas

I am trying to import a CSV file into R using the read.table command. I keep getting the error message "more columns than column names", even though I have set the strip.white to TRUE. The program that makes the csv files adds a large number of comma characters to the end of each line, which I think is the source of the extra columns.
read.table("filename.csv", sep=",", fill=T, header=TRUE, strip.white = T,
as.is=T,row.names = NULL, quote = "")
How can I get R to strip away the extraneous columns of commas from the header line and from the rest of the CSV file as it reads it into the R console?
Also, numerous cells in the csv file do not contain any data. Is it possible to get R to fill in these empty cells with "NA"?
The first two lines of the csv file:
Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps‌),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average‌_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-va‌lue,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Va‌lue_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Chr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCA‌ATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,U‌ser,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

You can use a combination of colClasses with "NULL" entries to "blank-out" the commas (also still needing , fill=TRUE:
read.table(text="1,2,3,4,5,6,7,8,,,,,,,,,,,,,,,,,,
9,9,9,9,9,9,9,9,,,,,,,,,,,,,,,,,", sep=",", fill=TRUE, colClasses=c(rep("numeric", 8), rep("NULL", 30)) )
#------------------
V1 V2 V3 V4 V5 V6 V7 V8
1 1 2 3 4 5 6 7 8
2 9 9 9 9 9 9 9 9
Warning message:
In read.table(text = "1,2,3,4,5,6,7,8,,,,,,,,,,,,,,,,,,\n9,9,9,9,9,9,9,9,,,,,,,,,,,,,,,,,", :
cols = 26 != length(data) = 38
I needed to add back in the missing linefeed at the end of the first line. (Yet another reason why you should edit questions rather than putting data examples in the comments.) There was an octothorpe in the header which required the comment.char be set to "":
read.table(text="Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps‌),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average‌_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-va‌lue,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Va‌lue_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,\nChr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCA‌ATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,U‌ser,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", header=TRUE, colClasses=c(rep("character", 24), rep("NULL", 41)), comment.char="", sep=",")
Document_Name Sequence_Name Track_Name Type Name
1 Chr2_FT Chr2 Chr2.bed CDS 10000_ARHGAP15
Sequence Minimum Min_.with_gaps... Maximum
1 GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCA‌ATAACAAGTGGGCACTGAGAGAAAG 55916421 56019336 55916483
Max_.with_gaps. Length Length_.with_gaps. X._Intervals Direction Average.._Quality Coverage modified_by
1 56019399 63 64 1 forward U‌ser
Polymorphism_Type Strand.Bias Strand.Bias_.50._P.va..lue Strand.Bias_.65._P.value Variant_Frequency
1
Variant_Nucleotide.s. Variant_P.Va..lue_.approximate.
1
If you know what your colClasses will be, then you can get missing values to be NA in the numeric columns automatically. You could also use the na.strings setting to accomplish this. You could also do some editing on the header to take out the illegal characters in the column names. (I didn't think I needed to be the one to do that though.)
read.table(text="Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps‌),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average‌_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-va‌lue,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Va‌lue_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Chr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCA‌ATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,U‌ser,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", header=TRUE, colClasses=c(rep("character", 24), rep("NULL", 41)), comment.char="", sep=",", na.strings="")
#------------------------------------------------------
Document_Name Sequence_Name Track_Name Type Name
1 Chr2_FT Chr2 Chr2.bed CDS 10000_ARHGAP15
Sequence Minimum Min_.with_gaps... Maximum
1 GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCA‌ATAACAAGTGGGCACTGAGAGAAAG 55916421 56019336 55916483
Max_.with_gaps. Length Length_.with_gaps. X._Intervals Direction Average.._Quality Coverage modified_by
1 56019399 63 64 1 forward <NA> <NA> U‌ser
Polymorphism_Type Strand.Bias Strand.Bias_.50._P.va..lue Strand.Bias_.65._P.value Variant_Frequency
1 <NA> <NA> <NA> <NA> <NA>
Variant_Nucleotide.s. Variant_P.Va..lue_.approximate.
1 <NA> <NA>

I have been fiddling with the first two lines of your file, and the problem appears to be the # in one of your column names. read.table treats # as a comment character by default, so it reads in your header, ignores everything after # and returns 13 columns.
You will be able to read in your file with read.table using the argument comment.char="".
Incidentally, this is yet another reason why those who ask questions should include examples of the files/datasets they are working with.

Use the string of characters from a cell in a dataframe to create a vector

>titletool<-read.csv("TotalCSVData.csv",header=FALSE,sep=",")
> class(titletool)
[1] "data.frame"
>titletool[1,1]
[1] Experiment name : CONTROL DB AD_1
>t<-titletool[1,1]
>t
[1] Experiment name : CONTROL DB AD_1
>class(t)
[1] "character"
now i want to create an object (vector) with the name "Experiment name : CONTROL DB AD_1" , or even better if possible CONTROL DB AD_1
Thank you

Use assign:
varname <- "Experiment name : CONTROL DB AD_1"
assign(varname, 3.14158)
get("Experiment name : CONTROL DB AD_1")
[1] 3.14158
And you can use a regular expression and sub or gsub to remove some text from a string:
cleanVarname <- sub("Experiment name : ", "", varname)
assign(cleanVarname, 42)
get("CONTROL DB AD_1")
[1] 42
But let me warn you this is an unusual thing to do.
Here be dragons.

If I understand correctly, you have a bunch of CSV files, each with multiple experiments in them, named in the pattern "Experiment ...". You now want to read each of these "experiments" into R in an efficient way.
Here's a not-so-pretty (but not-so-ugly either) function that might get you started in the right direction.
What the function basically does is read in the CSV, identify the line numbers where each new experiment starts, grabs the names of the experiments, then does a loop to fill in a list with the separate data frames. It doesn't really bother making "R-friendly" names though, and I've decided to leave the output in a list, because as Andrie pointed out, "R has great tools for working with lists."
read.funkyfile = function(funkyfile, expression, ...) {
temp = readLines(funkyfile)
temp.loc = grep(expression, temp)
temp.loc = c(temp.loc, length(temp)+1)
temp.nam = gsub("[[:punct:]]", "",
grep(expression, temp, value=TRUE))
temp.out = vector("list")
for (i in 1:length(temp.nam)) {
temp.out[[i]] = read.csv(textConnection(
temp[seq(from = temp.loc[i]+1,
to = temp.loc[i+1]-1)]),
...)
names(temp.out)[i] = temp.nam[i]
}
temp.out
}
Here is an example CSV file. Copy and paste it into a text editor and save it as "funkyfile1.csv" in the current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv)
"Experiment Name: Here Be",,
1,2,3
4,5,6
7,8,9
"Experiment Name: The Dragons",,
10,11,12
13,14,15
16,17,18
Here is a second CSV. Again, copy-paste and save it as "funkyfile2.csv" in your current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv)
"Promises: I vow to",,
"H1","H2","H3"
19,20,21
22,23,24
25,26,27
"Promises: Slay the dragon",,
"H1","H2","H3"
28,29,30
31,32,33
34,35,36
Notice that funkyfile1 has no column names, while funkyfile2 does. That's what the ... argument in the function is for: to specify header=TRUE or header=FALSE. Also the "expression" identifying each new set of data is "Promises" in funkyfile2.
Now, use the function:
read.funkyfile("funkyfile1.csv", "Experiment", header=FALSE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
# "Experiment", header=FALSE) # Uncomment to load remotely
# $`Experiment Name Here Be`
# V1 V2 V3
# 1 1 2 3
# 2 4 5 6
# 3 7 8 9
#
# $`Experiment Name The Dragons`
# V1 V2 V3
# 1 10 11 12
# 2 13 14 15
# 3 16 17 18
read.funkyfile("funkyfile2.csv", "Promises", header=TRUE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv",
# "Experiment", header=TRUE) # Uncomment to load remotely
# $`Promises I vow to`
# H1 H2 H3
# 1 19 20 21
# 2 22 23 24
# 3 25 26 27
#
# $`Promises Slay the dragon`
# H1 H2 H3
# 1 28 29 30
# 2 31 32 33
# 3 34 35 36
Go get those dragons.
Update
If your data are all in the same format, you can use the lapply solution mentioned by Andrie along with this function. Just make a list of the CSVs that you want to load, as below. Note that the files all need to use the same "expression" and other arguments the way the function is currently written....
temp = list("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
"http://dl.dropbox.com/u/2556524/testing/funkyfile3.csv")
lapply(temp, read.funkyfile, "Experiment", header=FALSE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extracting gene name and ID number from a vector [duplicate] - r

Related

How to manipulate digits in a character string in R?

R Extract partially matching string

4 lines records into one line? How to combine in a single line?

read.table and files with excess commas

Use the string of characters from a cell in a dataframe to create a vector

Categories

Resources