How can i read MTL file in R - r

I am very much new to R programming kindly someone tell how can i read the MTL file which is archived with landsat satellite data.

For standard MTL file provided with Landsat scenes obtained from EarthExplorer or Glovis sevices, you could simply do:
mtl <- read.delim('L71181068_06820100518_MTL.txt', sep = '=', stringsAsFactors = F)
So, for something starting like this:
GROUP = L1_METADATA_FILE GROUP = METADATA_FILE_INFO...
You may use this:
> mtl[grep("LMAX",mtl$GROUP),]
GROUP L1_METADATA_FILE
64 LMAX_BAND1 293.700
66 LMAX_BAND2 300.900
68 LMAX_BAND3 234.400
70 LMAX_BAND4 241.100
72 LMAX_BAND5 47.570
74 LMAX_BAND61 17.040
76 LMAX_BAND62 12.650
78 LMAX_BAND7 16.540
80 LMAX_BAND8 243.100
84 QCALMAX_BAND1 255.0
86 QCALMAX_BAND2 255.0
88 QCALMAX_BAND3 255.0
90 QCALMAX_BAND4 255.0
92 QCALMAX_BAND5 255.0
94 QCALMAX_BAND61 255.0
96 QCALMAX_BAND62 255.0
98 QCALMAX_BAND7 255.0
100 QCALMAX_BAND8 255.0
There are dictionaries provided by each service, found here and here.
Information from MTL may be critical for applying atmospheric and radiometric correction. By the way, landsat package allows you to run some of more typical correction using DOS() and radiocorr() functions.
You will also need standard calibration values provided by Chander et al. (2009).
For more complex approaches this may be a good start.

The MTL file contains only metadata (I hope you knew that :-)) and is a plain text file, so you could just read it in and parse as desired. If you are reasonably familiar with Matlab, you could port this tool http://www.mathworks.com/matlabcentral/fileexchange/39073 , converting it into R code.
EDIT: I can't tell from your comments what you actually need. Here's an example MTL.txt file I pulled off the net:
http://landsat.usgs.gov/images/squares/processing_level_of_the_Landsat_scene_I_have_downloaded1.jpg
If you look at it, you can see the names and values of the data items. If those are what you want, perhaps the easiest way to get them would be to run the command
mtl.values <- read.table('filename.txt' , sep='=')
Which will give you a 2-column dataframe, with names in first column and values in the second.

for reading Mtl file along with your images (stack image) you can do the following:
Give the directory of you Mtl file. For example
mtlFile<- "\\LE07_L1TP_165035_20090803_20161220_01_T1_MTL.txt"
Read metadata
metaData <- readMeta(mtlFile)
metaData
Load rasters based on the metadata file
lsat <- (stackMeta(mtlFile, quantity = "all", category = "image",
+ allResolutions = FALSE))
lsat
plot(lsat)

Related

r recognize and importing Multiple Tables from a Single excel file

I tried to read all posts like this but I did not succeed.
I need to extract tables of different layouts from a single sheet in excel, for each sheet of the file.
Any help or ideas that can be provided would be greatly appreciated.
A sample of the datafile and it's structure can be found Here
I would use readxl. The code below reads just one sheet, but it is easy enough to adapt to read multiple or different sheets.
First we just want to read the sheet. Obviously you should change the path to reflect where you saved your file:
library(readxl)
sheet = read_excel("~/Downloads/try.xlsx", col_names = LETTERS[1:12])
If you didn't know you had 12 columns, then using read_excel without specifying the column names would give you enough information to find that out. The different tables in the sheet are separated by one or two blank rows. You can find the blank rows by testing each row to see if all of the cells in that row are NA using the apply function.
blanks = which(apply(sheet, 1, function(row)all(is.na(row))))
blanks
> blanks
[1] 7 8 17 26 35 41 50 59 65 74 80 86 95 98
So you could extract the first table by taking rows 1--6 (7 - 1), the second table by taking rows 9--16 and so on.

Creating LDA model using gensim from bag-of-words vectors

I want to create a topic model from data provided by Jstor (e.g. https://www.jstor.org/dfr/about/sample-datasets). However, because of copyright, they do not allow full text access. Instead, I can request a list of unigrams followed by their frequencies in the document (supplied in plain .txt). e.g:
his 295
old 181
he 165
age 152
p 110
from 79
life 74
de 71
petrarch 58
book 51
courtier 47
This should be easy to convert to a bag-of-words vector. However, I have only found examples of Gensim LDA models being built from fulltext. Would it be possible to pass it these vectors instead?
Yes, you only need to convert (word, frequency) to (word_number, frequency), and pass a list of tuples to corpus of any gensim model. To convert a word to a number, you can first count how many words are in the whole corpus, suppose we have V words, then each word can be represented as an integer between 1 to V.

Variant locations sometimes replaced by ID in subsetted large VCF file?

I have a large VCF file from which I want to extract certain columns and information from and have this matched to the variant location. I thought I had this working but for some variants instead of the corresponding variant location I am given the ID instead?
My code looks like this:
# see what fields are in this vcf file
scanVcfHeader("file.vcf")
# define paramaters on how to filter the vcf file
AN.adj.param <- ScanVcfParam(info="AN_Adj")
# load ALL allele counts (AN) from vcf file
raw.AN.adj. <- readVcf("file.vcf", "hg19", param=AN.adj.param)
# extract ALL allele counts (AN) and corressponding chr location with allele tags from vcf file - in dataframe/s4 class
sclass.AN.adj <- (info(raw.AN.adj.))
The result looks like this:
AN_adj
1:13475_A/T 91
1:14321_G/A 73
rs12345 87
1:15372_A/G 60
1:16174_G/A 41
1:16174_T/C 62
1:16576_G/A 87
rs987654 56
I would like the result to look like this:
AN_adj
1:13475_A/T 91
1:14321_G/A 73
1:14873_C/T 87
1:15372_A/G 60
1:16174_G/A 41
1:16174_T/C 62
1:16576_G/A 87
1:18654_A/T 56
Any ideas on what is going on here and how to fix it?
I would also be happy if there was a way to append the variant location using the CHROM and position fields but from my research data from these fields cannot be requested as they are essential fields used to create the GRanges of variant locations.

R readr package - written and read in file doesn't match source

I apologize in advance for the somewhat lack of reproducibility here. I am doing an analysis on a very large (for me) dataset. It is from the CMS Open Payments database.
There are four files I downloaded from that website, read into R using readr, then manipulated a bit to make them smaller (column removal), and then stuck them all together using rbind. I would like to write my pared down file out to an external hard drive so I don't have to read in all the data each time I want to work on it and doing the paring then. (Obviously, its all scripted but, it takes about 45 minutes to do this so I'd like to avoid it if possible.)
So I wrote out the data and read it in, but now I am getting different results. Below is about as close as I can get to a good example. The data is named sa_all. There is a column in the table for the source. It can only take on two values: gen or res. It is a column that is actually added as part of the analysis, not one that comes in the data.
table(sa_all$src)
gen res
14837291 822559
So I save the sa_all dataframe into a CSV file.
write.csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv',
row.names = FALSE)
Then I open it:
sa_all2 <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
table(sa_all2$src)
g gen res
1 14837289 822559
I did receive the following parsing warnings.
Warning: 4 parsing failures.
row col expected actual
5454739 pmt_nature embedded null
7849361 src delimiter or quote 2
7849361 src embedded null
7849361 NA 28 columns 54 columns
Since I manually add the src column and it can only take on two values, I don't see how this could cause any parsing errors.
Has anyone had any similar problems using readr? Thank you.
Just to follow up on the comment:
write_csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv')
sa_all2a <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
Warning: 83 parsing failures.
row col expected actual
1535657 drug2 embedded null
1535657 NA 28 columns 25 columns
1535748 drug1 embedded null
1535748 year an integer No
1535748 NA 28 columns 27 columns
Even more parsing errors and it looks like some columns are getting shuffled entirely:
table(sa_all2a$src)
100000000278 Allergan Inc. gen GlaxoSmithKline, LLC.
1 1 14837267 1
No res
1 822559
There are columns for manufacturer names and it looks like those are leaking into the src column when I use the write_csv function.

R write dataframe column to csv having leading zeroes

I have a table that stores prefixes of different lengths..
snippet of table(ClusterTable)
ClusterTable[ClusterTable$FeatureIndex == "Prefix2",'FeatureIndex',
'FeatureValue')]
FeatureIndex FeatureValue
80 Prefix2 80
81 Prefix2 81
30 Prefix2 30
70 Prefix2 70
51 Prefix2 51
84 Prefix2 84
01 Prefix2 01
63 Prefix2 63
28 Prefix2 28
26 Prefix2 26
65 Prefix2 65
75 Prefix2 75
and I write to csv file using following:
write.csv(ClusterTable, file = "My_Clusters.csv")
The Feature Value 01 loses it leading zero.
I tried first converting the column to characters
ClusterTable$FeatureValue <- as.character(ClusterTable$FeatureValue)
and also tried to append it to an empty string to convert it to string before writing to file.
ClusterTable$FeatureValue <- paste("",ClusterTable$FeatureValue)
Also, I have in this table prefixes of various lengths, so I cant use simple format specifier of a fixed length. i.e the table also has Value 001(Prefix3),0001(Prefix4),etc.
Thanks
EDIT: As of testing again on 8/5/2021, this doesn't work anymore. :(
I know this is an old question, but I happened upon a solution for keeping the lead zeroes when opening .csv output in excel. Before writing your .csv in R, add an apostrophe at the front of each value like so:
vector <- sapply(vector, function(x) paste0("'", x))
When you open the output in excel, the apostrophe will tell excel to keep all the characters and not drop lead zeroes. At this point you can format the column as "text" and then do a find and replace to remove the apostrophes (maybe make a macro for this).
If you just need it for the visual, just need to add one line before you write the csv file, as such:
ClusterTable <- read.table(text=" FeatureIndex FeatureValue
80 Prefix2 80
81 Prefix2 81
30 Prefix2 30
70 Prefix2 70
51 Prefix2 51
84 Prefix2 84
01 Prefix2 01
63 Prefix2 63
28 Prefix2 28
26 Prefix2 26
65 Prefix2 65
75 Prefix2 75",
colClasses=c("character","character"))
ClusterTable$FeatureValue <- paste0(ClusterTable$FeatureValue,"\t")
write.csv(ClusterTable,file="My_Clusters.csv")
It adds a character to the end of the value, but it's hidden in Excel.
Save the file as a csv file, but with a txt extension. Then read it using read.table with sep=",":
write.csv(ClusterTable,file="My_Clusters.txt")
read.table(file=My_Clusters.txt, sep=",")
If you're trying to open the .csv with Excel, I recommend writing to excel instead. First you'll have to pad the data though.
library(openxlsx)
library(dplyr)
ClusterTable <- ClusterTable %>%
mutate(FeatureValue = as.character(FeatureValue),
FeatureValue = str_pad(FeatureValue, 2, 'left', '0'))
write.xlsx(ClusterTable, "Filename.xlsx")
This is pretty much the route you can take when exporting from R. It depends on the type of data and number of records (size of data) you are exporting:
if you have many rows such as thousands, txt is the best route, you can export to csv if you know you don't have leading or trailing zeros in the data, either use txt or xlsx format. Exporting to csv will most likely remove the zeros.
if you don't deal with many rows, then xlsx libraries are better
xlsx libraries may depend on java so make sure you use a library that does not require it
xlsx libraries are either problematic or slow when dealing with many rows, so still txt or csv can be a better route
for your specific problem, it seems you don't deal with a large number of rows, so you can use:
library(openxlsx)
# read data from an Excel file or Workbook object into a data.frame
df <- read.xlsx('name-of-your-excel-file.xlsx')
# for writing a data.frame or list of data.frames to an xlsx file
write.xlsx(df, 'name-of-your-excel-file.xlsx')
You have to modificate your column using format:
format(your_data$your_column, trim = F)
So when you export to .csv then leading zeros will keep on.
When dealing with leading zeros you need to be cautious if exporting to excel. Excel has a tendency to outsmart itself and automatically trim leading zeros. You code is fine otherwise and opening the file in any other text editor should show the zeros.

Resources