I want to print a head of a file in R. I know how to use read.table and other input methods supported by R. I just want to know R alternatives to unix command cat or head that reads in a file and print some of them.
Thank you,
SangChul
read.table() takes an nrows argument for just this purpose:
read.table(header=TRUE, text="
a b
1 2
3 4
", nrows=1)
# a b
# 1 1 2
If you are instead reading in (possibly less structured) files with readLines(), you can use its n argument instead:
readLines(textConnection("a b
1 2 3 4 some other things
last"), n=1)
# [1] "a b"
Related
I have 4K *txt files where the numbers of lines are important because if my *txt file is empty a need to count zero and if I have 1 or more lines this information is informative too.
But the function read_lines in readr package give to me 1 line always that I have an empty file, in my example:
library(tidyverse)
# *txt file with 1 line
myfile1<-read_lines("https://raw.githubusercontent.com/Leprechault/trash/main/sample_672.txt")
length(myfile1)
#[1] 1
myfile1
#[1] "0 0.229592 0.382716 0.214286 0.246914"
# *txt file with 0 line
myfile2<-read_lines("https://raw.githubusercontent.com/Leprechault/trash/main/sample_1206.txt")
length(myfile2)
#[1] 1
myfile2
#[1] ""
In myfile1 is OK, because I have 1 line, but in the second case (myfile2) I don't have one line and the file is empty. Please, how can I solve this problem?
You can use readLines() from base R. It already has the correct behavior.
You can put read_lines() inside a wrapper function that does what you want ...
rl <- function(...) {
x <- read_lines(...)
if (identical(x,"")) return(character(0))
return(x)
}
rl("https://raw.githubusercontent.com/Leprechault/trash/main/sample_1206.txt")
## character(0)
Imagine that I have 3 variables a = 1, b = 2, c = 3and then I want to print out:
a
b
c
a+b
When I use sink() I get a .txt file with this format
[1] 1
[1] 2
[1] 3
[1] 3
but in reality what I want is something like this in order to get traceability of my calculations...
[1] a = 1
[1] b = 2
[1] c = 3
[1] a + b = 3
I know I can write a string right before the operation and it prints that out in my .txt file but imagine that I have too many operations to do that. Is there a way to reproduce this to a .txt file?
Thank you!
EDIT: Just saw the txtStart function... it seems it can be this. Can someone confirm this?
I am trying to exclude all ".1" occurences from my labelexp data frame.
My input
ID
1 NE001403
2 NE001458.1
3 NE001494.1
4 NE001634.1
5 NE001635.1
6 NE001637.1
I have tried it: labelexp$ID <- gsub(".1", "", labelexp$ID), but my output was:
ID
1 NE0403
2 NE0458
3 NE0494
4 NE0634
5 NE0635
6 NE0637
Any ideas? Thank you.
The "." is a special character in regular expressions in R - it means any character. You need to put "\\" in front of it to tell R that you mean it to be the character ".". Thus, try:
labelexp$ID <- gsub("\\.1", "", labelexp$ID)
Does that work for you?
You can also use fixed=TRUE option:
sub(".1", "","NE001458.1",fixed=TRUE)
"NE001458"
Good afternoon,
Thanks for helping me out with this question.
I have a set of >5000 URLs within a list that I am interested in scraping. I have used lapply and readLines to extract the text for these webpages using the sample code below:
multipleURL <- c("http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?ndc=0002-1200&start=1&labeltype=all", "http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?ndc=0002-1407&start=1&labeltype=all", "http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?ndc=0002-1975&start=1&labeltype=all")
multipleText <- lapply(multipleURL, readLines)
Now I would like to query each of these texts for the word "radioactive". I am simply interested in figuring out if this term is mentioned in the text and have been using the logical grep command:
radioactive <- grepl("radioactive" , multipleText, ignore.case = TRUE)
When I count the number of items in our list that contain the word "radioactive" it returns a count of 0:
count(radioactive)
x freq
1 FALSE 3
However, a cursory review of the webpages for each of these URLs however reveals that the first link (http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?ndc=0002-1200&start=1&labeltype=all) DOES in fact contain the word radioactive. Our "multipleText" list even includes the word radioactive, although our grepl command doesn't seem to pick it up.
Any thoughts on what I am doing wrong would be greatly appreciated.
Many thanks,
Chris
I think you should you parse your document using html parser. Here I am using XML package. I convert your document to an R list and then I can apply grep on it.
library(XML)
multipleText <- lapply(multipleURL,function(x) {
y <- xmlToList(htmlParse(x))
y.flat <- unlist(y,recursive=TRUE)
length(grep('radioactive',c(y.flat,names(y.flat))))
})
multipleText
[[1]]
[1] 8
[[2]]
[1] 0
[[3]]
[1] 0
EDIT to search for multi search :
## define your words here
WORDS <- c('CLINICAL ','solution','Action','radioactive','Effects')
library(XML)
multipleText <- lapply(multipleURL,
function(x) {
y <- xmlToList(htmlParse(x))
y.flat <- unlist(y,recursive=TRUE)
sapply(WORDS,function(y)
length(grep(y,c(y.flat,names(y.flat)))))
})
do.call(rbind,multipleText)
CLINICAL solution Action radioactive Effects
[1,] 6 10 2 8 2
[2,] 1 3 1 0 3
[3,] 6 22 2 0 6
PS: maybe you should use ignore.case = TRUE for the grep command.
I have list written in file created by sink() - "file.txt". That file contains one list, which look like this, and it contains only numers:
[[1]]
[1] 1 2
[[2]]
[1] 1 2 3
how to read in data as list from such file ?
EDITION :
I'm going to try read it as a string, then use some regex to remove '[[*]]' and substitute '[*]' with special symbol - let it be '#'. Then take every substring between '#', split it into vector and put into empty list.
Something like this should do the trick. (The exact details may vary, but at least this will give you some ideas to work with.)
l <- readLines("file.txt")
l2 <- gsub("\\[{2}\\d+\\]{2}", "#", l) # Replace [[*]] with '#'
l3 <- gsub("\\[\\d+\\]\\s", "", l2)[-1] # Remove all [*]
l4 <- paste(l3, collapse=" ") # Paste together into one string
l5 <- strsplit(l4, "#")[[1]] # Break into list
lapply(l5, function(X) scan(textConnection(X))) # Use scan to convert 2 numeric
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 1 2 3