I have a text file. It contains lots of text in following format:
text
text
Date in format of 12 December 2016
text
text
How do I extract only the date in such a case given that there is no other date in the text section of the file? Need a R program for it.
This would do the trick. You would get the the dates parsed while the rest would become NA objects which you can filter out.
text=c('a','b','12 December 2016','10 December 2015')
strptime(text,format='%d %B %Y')
I've called your data set demo_set for practical purposes.
You start by reading in your data set:
demo_set=readLines(con <- file("yourFile.txt") #read in file.
You can use other ways of reading in your data set.
Then you use regex to find lines with month names.
demo_set[grep(pattern = paste(month.name,collapse = "|"),demo_set)]
If your text doesn't starts with number you can use the below code
abc<- subset(abc, grepl("^[0-9]",name))
where abc is your dataframe and name is your column in your dataframe
You can also use an if statement to check if there are any values within a column such as Date, and print them to screen like so;
if(!is.na(data$date)) {
print(data$date)
}
This will print all the records where there is a value in Date but if you would rather just a sample, use;
print(data$date[1:10])
Related
I'm trying to format dates given to me in an excel spreadsheet, and some of the values are seen as a text. I've tried using the format option to set a date and time to the entire column.
The Sheet
Try
={""; ARRAYFORMULA(iferror(IF(A2:A="",,DATE(YEAR(E2:E), MONTH(E2:E), 1)),date("20" & REGEXEXTRACT(E2:E,".*\/(\d{2}) "),REGEXEXTRACT(E2:E,"\d+"),1)))}
Reference
REGEXEXTRACT
I use below code and its working fine. I don't want to change temp table field(dActiveDate) type but please help me to change the date format.
Note - Date format can be changed by user. It can be YY/MM/DD or DD/MM/YYYY or MM/DD/YY and so on...
DEFINE TEMP-TABLE tt_data NO-UNDO
FIELD cName AS CHARACTER
FIELD dActiveDate AS DATE.
CREATE tt_data.
ASSIGN
tt_data.cName = "David"
dActiveDate = TODAY
.
OUTPUT TO value("C:\Users\ast\Documents\QRF\data.csv").
PUT UNFORMATTED "Name,Activedate" SKIP.
FOR EACH tt_data NO-LOCK:
EXPORT DELIMITER "," tt_data. /* There are more than 15 fields available so using export delimeter helps to have less lines of code*/
END.
OUTPUT CLOSE.
As this a "part two" of this question: How to change date format based on variable initial value? why not build on the answer there?
Wrap the dateformat part in a function/procedure/method and call it in the EXPORT statement. The only change required will be to specify each field rather than just the temp-table.
EXPORT DELIMITER ","
dateformat(tt_data.dactivedate, cDateFormat)
tt_data.cName
This assumes that there's a function called dateformat that takes the date and format and returns a string with the formatted date (as in the previous question).
"and so on..." needs to be specified. Depending on the specification you may have to resort to a custom function like Jensd's answer.
If you can constrain the formats allowed, you can use normal handling by using:
session:date-format = "ymd" / "mdy" / "dmy".
session:year-offset = 1 / 1950. // for four vs two digit year
How you populate these two variables can be done in similar fashion as in the other question.
You may need to reset these session attributes to their initial state in a finally block.
I have excel file a date column, some of date are align to the right while others to the left.when i read the into r am getting this error.
Expecting date in A3547 / R3547C1: got '13/04/2018'
on dates align to the left. I have tried to clean the date in excel with no success,
You can do something similar to this
I have extracted the day, month and year from the string date using
some formulas
After that I have recreated the actual date from the
extracted day, month and year values
Formulas are as below
B2: =FIND("/",A2)
C2: =FIND("/",A2,1+FIND("/",A2))
D2: =LEFT(A2,B2-1)
E2: =MID(A2,B2+1,C2-B2-1)
F2: =RIGHT(A2,4)
G2: =DATE(F2,E2,D2)
Depending upon your actual data you may need to amend the formulas a little bit.
Please check whether all the cells in the column has date format and same date format.
Sometimes dates can be enter as text strings or other custom formats. Then there is a possibility to getting error like this.
This is how the data looks like in excel, both date and text
Load the same in MS Excel Power Query Editor, select the date column, Under the Transform Tab select "Split Column, By Delimiter", select "/" as the delimiter and Click OK. This separated the date into Month, Day and Year+Time.
Highlight the Day Column first, then Month, then Year+Time ...in that order, and then click "Merge Columns" under the Transform Tab.
Highlight the now Merged column and under "Data Type", select "Date/Time". You can now go back to the Home tab and Select Close & Load to get the cleaned data into Excel... as below;
I have a folder with tons of txt files from where I have to extract especific data. The problem is that the format of the file has changed once and the position of the data I need to extract has also changed. So I need to deal with files in different format.
To try to make it more clear, in column 4 I have the name of the variable and in 5 I have the value, but sometimes this is in a different row. Is there a way to find the name of the variable (in which row) and then extract its value?
Thanks in advance
EDITING
In some files I will have the data like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Current--------28
But in some point in life, there was a change in the software to add another variable and the new file iis like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
So I need to deal with these 2 types of data, extracting the same variables which are in different rows.
If these files can't be read with read.table use readLines and then find those lines that start with the keyword you need.
For example:
Sample file 1 (with the dashes included and extra line breaks):
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
Sample file2 (with a comma as separator):
Column 1,Column 2.
Device ID,A.
Current,555
Voltage, 500.
Error,5.
For both cases do:
text = readLines(con = file("your filename here"))
curr = text[grepl("^Current", text, ignore.case = T)]
Which returns:
for file 1:
[1] "Current--------28"
for file 2:
[1] "Current,555"
Then use gsub to remove anything that is not a number.
I have a csv file and am unsure how to get R to interpret it as a table because all the title info is in one cell and all the data relating to the titles is in a separate cell. So all the info I need is in 2 cells but it actually needs to be split up.
The cell A3 has a value called 'Team' , this corresponds to the part in the cell A4 that says 'Visitor'. Then each part after than corresponds to the bit below it. ..sorry I don't know how to describe it, but ultimately it would look like this …
Looks like the field separator in your data is a ;
read.csv has a parameter sep to change the field separator and another parameter header to tell it there is an initial line containing the column names. Use read.csv like this:
data = read.csv(file="/mydir/myfile.csv", sep=";", header=T)
To test you can print out the first 5 lines of the data table with:
head(data,5)