Reading .txt file to tab in R - r

I want to read .txt file in R which is exported by this lines:
cat(sprintf("%s\n",paste0(matematykData)),file=nazwapliku2,append = TRUE)
The line is in loop so it saves it line by line, and the variable matematykData is a 1dim tab that contains only one record which is replaced by another record on the next run of "for loop" and it looks like this:
[1] "1884"
The reading method i use in another R script is:
dane2=read.table(file=nazwapliku2,sep="\n",skipNul= FALSE)
From this i get a string without any rows and columns that looks like this:
2962 1847
2963 1866
2964 1906
2965 429
2966 450
2967 450
2968 1910
2969 1900
2970 1889
Where the first "column" is the number of line. I want to convert that string to tab so i can reffrence every row of it, simply by using dane2[i] where "i" is the number of row i'am looking for. I'm not sure if i should change the way it's beening saved or readed or should i just read it and then convert it.
I also have other variable that needs to be converted and its more complicated because it contains 3 records per row: full_name,date and place of birth,date and place of death. The method i use for saving it is the same:
cat(sprintf("%s\n%s\n%s\n",paste0(matematyk[1]),paste0(matematyk[2]),paste0(matematyk[3])),file=nazwapliku1,append = TRUE)

For the first case it should work with dane2[i] for the row number.
You have to distinguish between records and variables.
In the second case I understand it that you have several records, each with the 5 variables
full_name, date of birth, place of birth, date of death, place of death
In that case you need to change your way of saving your data. In between the variables you need to use for example \t to separate the variables, and only use \n as the last in your format string for sprintf() call to separate the individual records.

Related

How to make a line of code span multiple lines in R where a new-line isn't created? Picture of desired outcome included

I'm trying to make a comment spanning multiple lines, but want them all to be the same line.
As in, Line 1 starts a comment that goes on to cover more lines, but each new line doesn't need a new '#' and Line 2 doesn't start until the comment is done.
Picture shows what I want in lines 3 and 5
You can use strings!
"
Module 4 Lecture 3: 911 call times
You have 250 datapoints of the time it takes to answer and resolve a
911 call saved in a csv file called 911TaskTimes.csv
You want...
"
Here's more information

Loop through a comma separated text and fix value with variables or get a first column, second column etc. to define variables with a value

Want to loop through a comma separated text file.
For ex:
mytext <- 3,24,25,276,2,87678,20-07-2022,1,5
From this mytext I would like to loop through like below :
for (i in 1:length(mytext)) {
print(mytext[[i]])
}
I need to display like
3
24
25
276
2
87678
20-07.2021
1
5
Actually I need to set every value as an individual variable, like :
variable1:3
variable2:24
variable3:25
variable4:276
variable5:2
variable6:87678
variable7:20-07.2021
variable8:1
variable9:5
(my project is retrieve data from text file and then having database validations in R before entering records to database.)
Could anyone help me out? Thanks in advance.
Split your string:
strsplit(mytext, ",")[[1]]

R - extract data which changes position from file to file (txt)

I have a folder with tons of txt files from where I have to extract especific data. The problem is that the format of the file has changed once and the position of the data I need to extract has also changed. So I need to deal with files in different format.
To try to make it more clear, in column 4 I have the name of the variable and in 5 I have the value, but sometimes this is in a different row. Is there a way to find the name of the variable (in which row) and then extract its value?
Thanks in advance
EDITING
In some files I will have the data like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Current--------28
But in some point in life, there was a change in the software to add another variable and the new file iis like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
So I need to deal with these 2 types of data, extracting the same variables which are in different rows.
If these files can't be read with read.table use readLines and then find those lines that start with the keyword you need.
For example:
Sample file 1 (with the dashes included and extra line breaks):
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
Sample file2 (with a comma as separator):
Column 1,Column 2.
Device ID,A.
Current,555
Voltage, 500.
Error,5.
For both cases do:
text = readLines(con = file("your filename here"))
curr = text[grepl("^Current", text, ignore.case = T)]
Which returns:
for file 1:
[1] "Current--------28"
for file 2:
[1] "Current,555"
Then use gsub to remove anything that is not a number.

Character values stored in DATAFRAME with Double Quotes while reading into R

I have a csv file with almost 4 millions records and 30 + columns.
The Columns are of varied type that includes Numeric, Alphanumeric, Date Column, character etc.
Attempt 1:
When I first read the file in R using read.csv Function then only 2 millions of the records were read.
This may have happened because of some special characters in the DATA.
Attempt 2:
I provided the argument quote = "" in read.csv Function and all the records were read succesfully.
However this brings up 2 issues:
a. all teh Columns were appended with 'x.' modifier:
egs.: x.date , x.name
b. all the Character Columns were loaded in dataframe, enclosed with double quotes ""
Can someone, please advise me that how to resolve these 2 issues and get the data loaded in R succesfully?
I work for a financial insititution and the data is highly sensitive, hence cannot paste the screenshot over here.
I also tried to create the scenario at my home but all my efforts were of little or of no avail.
The below screenshot is closest I have came to the exact scenario:
DATAFRAME SCREENSHOT: Not exact copy

fread() error and strange behaviour when reading csv

I used fread() from data.table library to try read a 540MB csv file. It returned an error message saying:
' ends field 36 on line 4 when detecting types: 20.00,8/25/2006 0:00:00,"07:05:00 PM","CST",143.00,"OTTAWA","KS","HAIL",1.00,"S","MINNEAPOLIS",8/25/2006 0:00:00,"07:05:00 PM",0.00,,1.00,"S","MINNEAPOLIS",0.00,0.00,,88.00,0.00,0.00,0.00,,0.00,,"TOP","KANSAS, East",,3907.00,9743.00,3907.00,9743.00,"Dime to nickel sized hail.
I have no idea what caused the error and want to track down if it's a bug or just some data formating issue that I can tweak fread() to process.
I managed to read the csv using read.csv(), and decided to track down the row that triggered the error above (line 617174, not line 4 as the error message above). I then re-output the row and one row each immediately preceding and following the offending row, written out using write.csv() as testout.csv
I was able to read back testout.csv using read.csv() creating a data frame with 3 observations, as expected. Using fread() on testout.csv, however, resulted in a data table with only 1 observation, which is the last row.
The four lines in testout.csv are below (I start a new line for each entry below for readability).
"STATE__","BGN_DATE","BGN_TIME","TIME_ZONE","COUNTY","COUNTYNAME","STATE","EVTYPE","BGN_RANGE","BGN_AZI","BGN_LOCATI","END_DATE","END_TIME","COUNTY_END","COUNTYENDN","END_RANGE","END_AZI","END_LOCATI","LENGTH","WIDTH","F","MAG","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","WFO","STATEOFFIC","ZONENAMES","LATITUDE","LONGITUDE","LATITUDE_E","LONGITUDE_","REMARKS","REFNUM"
20,"8/25/2006 0:00:00","07:01:00 PM","CST",139,"OSAGE","KS","TSTM WIND",5,"WNW","OSAGE CITY","8/25/2006 0:00:00","07:01:00 PM",0,NA,5,"WNW","OSAGE CITY",0,0,NA,52,0,0,0,"",0,"","TOP","KANSAS, East","",3840,9554,3840,9554,".",617129
20,"8/25/2006 0:00:00","07:05:00 PM","CST",143,"OTTAWA","KS","HAIL",1,"S","MINNEAPOLIS","8/25/2006 0:00:00","07:05:00 PM",0,NA,1,"S","MINNEAPOLIS",0,0,NA,88,0,0,0,"",0,"","TOP","KANSAS, East","",3907,9743,3907,9743,"Dime to nickel sized hail.
.",617130
20,"8/25/2006 0:00:00","07:07:00 PM","CST",125,"MONTGOMERY","KS","TSTM WIND",3,"N","COFFEYVILLE","8/25/2006 0:00:00","07:07:00 PM",0,NA,3,"N","COFFEYVILLE",0,0,NA,61,0,0,0,"",0,"","ICT","KANSAS, Southeast","",3705,9538,3705,9538,"",617131
When I ran fread("testout.csv", sep=",", verbose=TRUE), the output was
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 1.05E-06B
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Looking for supplied sep ',' on line 5 (the last non blank line in the first 'autostart') ... found ok
Found 37 columns
First row with 37 fields occurs on line 5 (either column names or first row of data)
Some fields on line 5 are not type character (or are empty). Treating as a data row and using default column names.
Count of eol after first data row: 2
Subtracted 1 for last eol and any trailing empty lines, leaving 1 data rows
Type codes: 1444144414444111441111111414444111141 (first 5 rows)
Type codes: 1444144414444111441111111414444111141 (after applying colClasses and integer64)
Type codes: 1444144414444111441111111414444111141 (after applying drop or select (if supplied)
Any idea what may have caused the unexpected results, and the error in the first place? And any way around it? Just to be clear, my aim is to be able to use fread() to read the main file, even though read.csv() works so far.
UPDATE: Now fixed in v1.9.3 on GitHub :
fread() now accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting.See:
fread and a quoted multi-line column value
Windows users are reporting success with the latest version from GitHub.

Resources