R write dataframe column to csv having leading zeroes - r

I have a table that stores prefixes of different lengths..
snippet of table(ClusterTable)
ClusterTable[ClusterTable$FeatureIndex == "Prefix2",'FeatureIndex',
'FeatureValue')]
FeatureIndex FeatureValue
80 Prefix2 80
81 Prefix2 81
30 Prefix2 30
70 Prefix2 70
51 Prefix2 51
84 Prefix2 84
01 Prefix2 01
63 Prefix2 63
28 Prefix2 28
26 Prefix2 26
65 Prefix2 65
75 Prefix2 75
and I write to csv file using following:
write.csv(ClusterTable, file = "My_Clusters.csv")
The Feature Value 01 loses it leading zero.
I tried first converting the column to characters
ClusterTable$FeatureValue <- as.character(ClusterTable$FeatureValue)
and also tried to append it to an empty string to convert it to string before writing to file.
ClusterTable$FeatureValue <- paste("",ClusterTable$FeatureValue)
Also, I have in this table prefixes of various lengths, so I cant use simple format specifier of a fixed length. i.e the table also has Value 001(Prefix3),0001(Prefix4),etc.
Thanks

EDIT: As of testing again on 8/5/2021, this doesn't work anymore. :(
I know this is an old question, but I happened upon a solution for keeping the lead zeroes when opening .csv output in excel. Before writing your .csv in R, add an apostrophe at the front of each value like so:
vector <- sapply(vector, function(x) paste0("'", x))
When you open the output in excel, the apostrophe will tell excel to keep all the characters and not drop lead zeroes. At this point you can format the column as "text" and then do a find and replace to remove the apostrophes (maybe make a macro for this).

If you just need it for the visual, just need to add one line before you write the csv file, as such:
ClusterTable <- read.table(text=" FeatureIndex FeatureValue
80 Prefix2 80
81 Prefix2 81
30 Prefix2 30
70 Prefix2 70
51 Prefix2 51
84 Prefix2 84
01 Prefix2 01
63 Prefix2 63
28 Prefix2 28
26 Prefix2 26
65 Prefix2 65
75 Prefix2 75",
colClasses=c("character","character"))
ClusterTable$FeatureValue <- paste0(ClusterTable$FeatureValue,"\t")
write.csv(ClusterTable,file="My_Clusters.csv")
It adds a character to the end of the value, but it's hidden in Excel.

Save the file as a csv file, but with a txt extension. Then read it using read.table with sep=",":
write.csv(ClusterTable,file="My_Clusters.txt")
read.table(file=My_Clusters.txt, sep=",")

If you're trying to open the .csv with Excel, I recommend writing to excel instead. First you'll have to pad the data though.
library(openxlsx)
library(dplyr)
ClusterTable <- ClusterTable %>%
mutate(FeatureValue = as.character(FeatureValue),
FeatureValue = str_pad(FeatureValue, 2, 'left', '0'))
write.xlsx(ClusterTable, "Filename.xlsx")

This is pretty much the route you can take when exporting from R. It depends on the type of data and number of records (size of data) you are exporting:
if you have many rows such as thousands, txt is the best route, you can export to csv if you know you don't have leading or trailing zeros in the data, either use txt or xlsx format. Exporting to csv will most likely remove the zeros.
if you don't deal with many rows, then xlsx libraries are better
xlsx libraries may depend on java so make sure you use a library that does not require it
xlsx libraries are either problematic or slow when dealing with many rows, so still txt or csv can be a better route
for your specific problem, it seems you don't deal with a large number of rows, so you can use:
library(openxlsx)
# read data from an Excel file or Workbook object into a data.frame
df <- read.xlsx('name-of-your-excel-file.xlsx')
# for writing a data.frame or list of data.frames to an xlsx file
write.xlsx(df, 'name-of-your-excel-file.xlsx')

You have to modificate your column using format:
format(your_data$your_column, trim = F)
So when you export to .csv then leading zeros will keep on.

When dealing with leading zeros you need to be cautious if exporting to excel. Excel has a tendency to outsmart itself and automatically trim leading zeros. You code is fine otherwise and opening the file in any other text editor should show the zeros.

Related

r recognize and importing Multiple Tables from a Single excel file

I tried to read all posts like this but I did not succeed.
I need to extract tables of different layouts from a single sheet in excel, for each sheet of the file.
Any help or ideas that can be provided would be greatly appreciated.
A sample of the datafile and it's structure can be found Here
I would use readxl. The code below reads just one sheet, but it is easy enough to adapt to read multiple or different sheets.
First we just want to read the sheet. Obviously you should change the path to reflect where you saved your file:
library(readxl)
sheet = read_excel("~/Downloads/try.xlsx", col_names = LETTERS[1:12])
If you didn't know you had 12 columns, then using read_excel without specifying the column names would give you enough information to find that out. The different tables in the sheet are separated by one or two blank rows. You can find the blank rows by testing each row to see if all of the cells in that row are NA using the apply function.
blanks = which(apply(sheet, 1, function(row)all(is.na(row))))
blanks
> blanks
[1] 7 8 17 26 35 41 50 59 65 74 80 86 95 98
So you could extract the first table by taking rows 1--6 (7 - 1), the second table by taking rows 9--16 and so on.

Variant locations sometimes replaced by ID in subsetted large VCF file?

I have a large VCF file from which I want to extract certain columns and information from and have this matched to the variant location. I thought I had this working but for some variants instead of the corresponding variant location I am given the ID instead?
My code looks like this:
# see what fields are in this vcf file
scanVcfHeader("file.vcf")
# define paramaters on how to filter the vcf file
AN.adj.param <- ScanVcfParam(info="AN_Adj")
# load ALL allele counts (AN) from vcf file
raw.AN.adj. <- readVcf("file.vcf", "hg19", param=AN.adj.param)
# extract ALL allele counts (AN) and corressponding chr location with allele tags from vcf file - in dataframe/s4 class
sclass.AN.adj <- (info(raw.AN.adj.))
The result looks like this:
AN_adj
1:13475_A/T 91
1:14321_G/A 73
rs12345 87
1:15372_A/G 60
1:16174_G/A 41
1:16174_T/C 62
1:16576_G/A 87
rs987654 56
I would like the result to look like this:
AN_adj
1:13475_A/T 91
1:14321_G/A 73
1:14873_C/T 87
1:15372_A/G 60
1:16174_G/A 41
1:16174_T/C 62
1:16576_G/A 87
1:18654_A/T 56
Any ideas on what is going on here and how to fix it?
I would also be happy if there was a way to append the variant location using the CHROM and position fields but from my research data from these fields cannot be requested as they are essential fields used to create the GRanges of variant locations.

Collecting data in one row from different csv files by the name

It's hard to explain what exactly I want to achieve with my script but let me try.
I have 20 different csv files, so I loaded them into R:
tbl = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)
then with your help I combined them into one and removed all of the duplicates:
data_rd <- subset(transform(all_data, X = sub("\\..*", "", X)),
!duplicated(X))
I have now 1 master table which includes all of the names (Accession):
Accession
AT1G19570
AT5G38480
AT1G07370
AT4G23670
AT5G10450
AT4G09000
AT1G22300
AT1G16080
AT1G78300
AT2G29570
Now I would like to find this accession in other csv files and put the data of this accession in the same raw. There are like 20 csv files and for each csv there are like 20 columns so in same cases it might give me a 400 columns. It doesn't matter how long it takes. It has to be done. Is it even possible to do with R ?
Example:
First csv Second csv Third csv
Accession Size Lenght Weight Size Lenght Weight Size Lenght Weight
AT1G19570 12 23 43 22 77 666 656 565 33
AT5G38480
AT1G07370 33 22 33 34 22
AT4G23670
AT5G10450
AT4G09000 12 45 32
AT1G22300
AT1G16080
AT1G78300 44 22 222
AT2G29570
It looks like a hard task to do. Propably it has to be done by the loop. Any ideas ?
This is a merge loop. Here is rough R code that will inefficiently grow with each merge.
Begin as before:
tbls = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)
tbl=list_of_data[[1]]
for(i in 2:length(list_of_data))
{
tbl=merge(tbl, list of_data[[i]], by="Accession", all=T)
}
The matching column names (not used as a key) will be renamed with a suffix (.x,.y, and so on), the all=T argument will ensure that whenever a new Accession key is merged a new row will be made and the missing cells will be filled with NA.

How can i read MTL file in R

I am very much new to R programming kindly someone tell how can i read the MTL file which is archived with landsat satellite data.
For standard MTL file provided with Landsat scenes obtained from EarthExplorer or Glovis sevices, you could simply do:
mtl <- read.delim('L71181068_06820100518_MTL.txt', sep = '=', stringsAsFactors = F)
So, for something starting like this:
GROUP = L1_METADATA_FILE GROUP = METADATA_FILE_INFO...
You may use this:
> mtl[grep("LMAX",mtl$GROUP),]
GROUP L1_METADATA_FILE
64 LMAX_BAND1 293.700
66 LMAX_BAND2 300.900
68 LMAX_BAND3 234.400
70 LMAX_BAND4 241.100
72 LMAX_BAND5 47.570
74 LMAX_BAND61 17.040
76 LMAX_BAND62 12.650
78 LMAX_BAND7 16.540
80 LMAX_BAND8 243.100
84 QCALMAX_BAND1 255.0
86 QCALMAX_BAND2 255.0
88 QCALMAX_BAND3 255.0
90 QCALMAX_BAND4 255.0
92 QCALMAX_BAND5 255.0
94 QCALMAX_BAND61 255.0
96 QCALMAX_BAND62 255.0
98 QCALMAX_BAND7 255.0
100 QCALMAX_BAND8 255.0
There are dictionaries provided by each service, found here and here.
Information from MTL may be critical for applying atmospheric and radiometric correction. By the way, landsat package allows you to run some of more typical correction using DOS() and radiocorr() functions.
You will also need standard calibration values provided by Chander et al. (2009).
For more complex approaches this may be a good start.
The MTL file contains only metadata (I hope you knew that :-)) and is a plain text file, so you could just read it in and parse as desired. If you are reasonably familiar with Matlab, you could port this tool http://www.mathworks.com/matlabcentral/fileexchange/39073 , converting it into R code.
EDIT: I can't tell from your comments what you actually need. Here's an example MTL.txt file I pulled off the net:
http://landsat.usgs.gov/images/squares/processing_level_of_the_Landsat_scene_I_have_downloaded1.jpg
If you look at it, you can see the names and values of the data items. If those are what you want, perhaps the easiest way to get them would be to run the command
mtl.values <- read.table('filename.txt' , sep='=')
Which will give you a 2-column dataframe, with names in first column and values in the second.
for reading Mtl file along with your images (stack image) you can do the following:
Give the directory of you Mtl file. For example
mtlFile<- "\\LE07_L1TP_165035_20090803_20161220_01_T1_MTL.txt"
Read metadata
metaData <- readMeta(mtlFile)
metaData
Load rasters based on the metadata file
lsat <- (stackMeta(mtlFile, quantity = "all", category = "image",
+ allResolutions = FALSE))
lsat
plot(lsat)

read.csv appends/modifies column headings with date values

I'm trying to read a csv file into R that has date values in some of the colum headings.
As an example, the data file looks something like this:
ID Type 1/1/2001 2/1/2001 3/1/2001 4/1/2011
A Supply 25 35 45 55
B Demand 26 35 41 22
C Supply 25 35 44 85
D Supply 24 39 45 75
D Demand 26 35 41 22
...and my read.csv logic looks like this
dat10 <- read.csv("c:\data.csv",header=TRUE, sep=",",as.is=TRUE)
The read.csv works fine except it modifies the name of the colums with dates as follows:
x1.1.2001 x2.1.2001 x3.1.2001 x4.1.2001
Is there a way to prevent this, or a easy way to correct afterwards?
Set check.names=FALSE. But be aware that 1/1/2001 et al are syntactically invalid names, therefore they may cause you some headaches.
You can always change the column names using the colnames function. For example,
colnames(dat10) = gsub("\\.", "/", colnames(dat10))
However, having slashes in your column names isn't a particularly good idea. You can always change them just before you print out the table or when you create a graph.

Resources