Hi I want to do something that I thought should be simple but can't seem to find a way. I have some files I want to change something in the heading lines, given by a key word. The lines I want to change are always in the first 20 lines, but not necessarily exactly the same line number.
So I want to read in the first 20 lines (which is easy), find and change my string (again easy) and then write over the top 20 lines of the file and keep the thousands of rows below without changing. I'm going to have to do this hundreds and hundreds of times thus why I don't want to read in the entire file and write it all.
I have created a very simplified example. WARNING I'm creating a file called Temp.txt in your current working directory if you run this.
#Create Dummy text file
write.table(file = "Temp.txt", data.frame(Test = letters[1:26]), row.names = F, quote = F)
And then I can read in the lines and change
# read top 5 lines of text file (In real life I need to look at 20, but for examples sake 2)
TestHeader = readLines("Temp.txt", n = 5)
# Find and replace my search string
TestHeader[grepl("c",TestHeader)] <- "My New Line"
# <what to do here?> I want to only write over first 20 lines
Obviously I can read in the entire thing and change
TestHeader = readLines("Temp.txt")
TestHeader[grepl("c",TestHeader)] <- "My New Line"
writeLines(TestHeader, "Temp.txt")
But this would involve unnecessarily reading and writing thousands and thousands of lines.
Related
**
I have a txt file called 'All_Words' and it consist of 2000 words and i'm making a hangman so i needed to choose a random word i've already thought of picking a random number from 0 to 2000 and read the line of the number to chose but i don't know how to do that, also some background info:
i am in 8th grade and i like coding im trying to get better so i'm trying to get what people suggest and try to figure out what every part does and the reserved words such as 'global' for example
also i have also tried to just shuffle the txt file because i already got it to print the first word so if im a able to shuffle the txt file then it would print a different word an i could create an if statement saying if the word chosen was already chooses then it would shuffle it again and pick he first word again, also i got this idea of the shuffle the txt file from my dad but he only did something called 'dos' or something like that he said he did it before it was even called coding so i don't even know if it world word in python, and i've asked my coding teacher and he said he dont know how you would do that because he is use to java and javascript
this is what i have so far and also i would like it to only pick one word instead of every word in order:**
import random
with open("All_Words.txt") as file:
for line in file:
print(line)
break
Assuming each word is on a new line in file, you can read the text file into a list and use random.choice() to pick a random element in the list. Then you remove the word from the list so you don't pick it again.
import random
file = open("All_Words.txt", "r")
words = file.read()
listOfWords = words.split("\n")
randWord = random.choice(listOfWords)
print(randWord)
listOfWords.remove(randWord)
newUnqiueRandWord = random.choice(listOfWords)
print(newUnqiueRandWord)
I am reading text data using the read.delim() -&- read.delim2() methods. They accept a skip argument, but it works with number of lines, (i.e. it skips 2,3,4,100 the line you pass into it).
I am using the methods...
read.delim()
read.delim2()
...to read text data. The methods above are able to skip lines, consequently; the methods have a skip parameter, and the parameter accepts an array of line-numbers as an argument. All line-numbers passed into the skip-parameter are skipped by the reader methods (i.e. the lines are not read by the reader methods).
I want to iterate through a body of text, skipping every line until I get to a specific line. Does anyone know how this can be done?
You cannot do that in base R functions, and I don't know of a package that directly provides that. However, here are two ways to get the effect.
First, a file named file.txt:
I want to skip this
and this too
Absolute Irradiance
I need this line
txt <- readLines("file.txt")
txt[cumany(grepl("Absolute Irradiance", txt))]
# [1] "Absolute Irradiance" "I need this line"
If you don't want the "Irradiance" line but want everything after it, then add [-1] to remove the first of the returned lines:
txt[cumany(grepl("Absolute Irradiance", txt))][-1]
# [1] "I need this line"
If the file is relatively large and you do not want to read all of it into R, then
system2("sed", c("-ne", "'/Absolute Irradiance/,$p'", "file.txt"), stdout = TRUE)
# [1] "Absolute Irradiance" "I need this line"
This second technique is really not that great ... it might be better to run that from file.txt into a second (temp) file, then just readLines("tempfile.txt") directly.
I have got this .txt file outputed by a microscope to process.
#read the .txt file generated by microscope, skipping the first 9 lines of garbage information
df <- read.csv("Objects_Population - AllCells.txt", sep="\t", skip = 9,header=TRUE, fill = T)
Then I started looking at the structure of the dataframe, everything seems fine except I now found an extra column in the end of the data frame named "x.1" and all rows of it are NA values. I don't see this column when I open the .txt file in excel. I suspect the problem has something to do with the column names generated by microscope, they contain quite some special characters
Below is the dataframe read by Excel(only showing the last 2 columns since I have 132 columns, and their names are disgustingly long):
AllCells - Cell Contact Area with Neighbors [%] AllCells - Nucleus Nearest Neighbor Distance [µm]
0 4.82083
21.9512 0
15.7895 0
29.4118 0.584611
0 4.21569
0 1.99599
0 3.50767
...
This has happened to me before but I never took it too serious as I was always interested in a subset of my data frame. Now I'm looking at all columns then this starts to bothering me.
Is there any way I can read them correctly without R attaching that additional "X.1" column in the end? Preferably not manually delete or subset out the last column...
Cheers,
ML
If all other column names are correct, you have probably a trailing \t in the text file. R tries to include it and gives it the generic column name X.1.
You could try and read the file first as 'plain text' and remove the trailing \t and only then use read.csv:
file_connection <- file("Objects_Population - AllCells.txt")
content <- readLines(file_connection )
close(file_connection)
Now we try to get rid of these trailing \t (this might need some testing to fit your needs)
sanitized <- gsub("\\t$", "", content)
And then we read this sanitized string as if it was a file (using the argument text)
df <- read.csv(text=paste0(sanitized, collapse="\n"), sep="\t", skip = 9,header=TRUE, fill = T)
Had that problem too. Fixed it by saving the file as "CSV (MS-DOS (*csv)" instead of what I originally had as "CSV (Comma delimited)(*csv)".
This is almost certainly because you've got an extra empty column in your spreadsheet.
In Excel, open your sheet and press Ctrl-End. If you end up in an empty cell outside the range of your data, there's the problem. Select the column (Ctrl-Space), right-click, and choose Delete.
I also encountered similar problem. I found that three extra columns were created (X, X.1, X.2), after I loaded dataset from excel sheet to R studio.
Steps Followed by me:
a) I went to the excel sheet and selected those three extra columns after last column with actual values in excel sheet. Selected extra 3 columns by keeping cursor on top of columns and then right click the mouse and select delete.
b) Again loaded that excel sheet in R. I did not find those 3 columns.
I am facing a problem with the fwrite function from the DataTable package in R. In fact it appends the wrong way and I'd end up with something like:
**user ref version status Type DataExtraction**
user1 2.02E+11 1 Pending 1 No
user2 2.02E+11 1 Saved 2 No"user3" 2.01806E+11 1 Saved NB No
I am using the function as follows :
library(data.table)
fwrite(Save, "~/Downloads/Register.csv", append = TRUE, sep = ",", quote = TRUE)
Reproducible example:
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = FALSE)
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = TRUE)
I'm not sure if it isolates the problem, but it seems that if I manually change something in the .csv file (for instance rename DataExtraction to Extraction), the problem of appending in the wrong way occurs.
Does someone know what is going wrong?
Thanks!
When I run your example code I have no problems with the behavior - the file comes out as desired. Based on your comments about manually changing what is in the file, and what the undesired output looks like, here is what I believe is happening. When fwrite() (and many other similar IO functions) write to a file, each line has at the end of it a line break (in R, this is generally represented as \n). This is desired, so that subsequent lines of data indeed appear on subsequent lines of the file. Generally this will also mean that when you open a file in a text editor, there will be a blank line at the very end, since this reflects the line break in the last line that was written. (different editors handle this differently though). So, I suspect what is happening is that when you go in and manually edit the file in your editor, you are somehow losing that last line break. What this means is that when you go to write again using append, there is no line break at the end of the file, and therefore you get the undesired behavior of two lines of data on a single line of the file.
So, the solution would be to either find how to prevent your manual editing from deleting that last line break character. Barring that, there are ways to just write a single line break character to the file using R. E.g. with the cat() function.
I have a folder with tons of txt files from where I have to extract especific data. The problem is that the format of the file has changed once and the position of the data I need to extract has also changed. So I need to deal with files in different format.
To try to make it more clear, in column 4 I have the name of the variable and in 5 I have the value, but sometimes this is in a different row. Is there a way to find the name of the variable (in which row) and then extract its value?
Thanks in advance
EDITING
In some files I will have the data like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Current--------28
But in some point in life, there was a change in the software to add another variable and the new file iis like this:
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
So I need to deal with these 2 types of data, extracting the same variables which are in different rows.
If these files can't be read with read.table use readLines and then find those lines that start with the keyword you need.
For example:
Sample file 1 (with the dashes included and extra line breaks):
Column 1-------Column 2.
Device ID------A.
Voltage------- 500.
Error------------5.
Current--------28
Sample file2 (with a comma as separator):
Column 1,Column 2.
Device ID,A.
Current,555
Voltage, 500.
Error,5.
For both cases do:
text = readLines(con = file("your filename here"))
curr = text[grepl("^Current", text, ignore.case = T)]
Which returns:
for file 1:
[1] "Current--------28"
for file 2:
[1] "Current,555"
Then use gsub to remove anything that is not a number.