Hi all and thanks in advance for all your help.
In R, I'm sending a command to an external Windows program using system(command), which in turn outputs lines (with multiple values per line) that I see directly on the R console. They look something like this:
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8
3,4,5,7,1,3,4,9
7,5,3,1,8,1,5,7
What I would like to do is create an array that has the top row as column names and each subsequent row from the input should be the values that go into these columns. Any and all help in making this work would be very appreciated.
This is my first foray into this territory so I'm quite stuck as to how to do it. I've meddled with scan(), pipe() and readLines() but haven't been able to succeed. I have no particular attachment to system(command), any function that will run the executable that will give me the output I need is fine by me if it helps achieve what I want.
The comment made by user1935457 did the trick.
read.table(text = system(command, intern=TRUE), sep = ",", header=TRUE)
Related
I have a quick question that I cannot figure out. I am reading some results from an output file using the code below and stored as a list in R that can be seen in the picture. I want to delete all of the information after an empty row, in other words, it would be everything after line 42:
Does anybody know anything that I could use? I tried using gsub was I was not very successful.
Thanks for all of the help I am new to programming in R. Again any help is very much appreciated.
LoadFFA <- function(filename, folder.out, TYPE = "PeakFQ_17C",
colStandard = TRUE){ # standardize column output names
require(data.table)
if(grepl("PEAKFQSA",TYPE)){ # PeakfqSA Bulleting 17C analysis
text.list<-lapply(fileinput,readLines)
skip.rows<-sapply(text.list, grep, pattern = '^Ann. Exc. Prob.\\s+EMA Est.')-1
PFA <- lapply(seq_along(text.list),function(i) read.delim(fileinput[i],skip=skip.rows[i],sep="\n",stringsAsFactors = TRUE,blank.lines.skip = FALSE))
}
EDIT
I don't know if I could upload directly so here is the google drive link.
Also, here is the command to run the function LoadFFA("03606500peaks.out","D:/Documents/hydraulic.failures","PEAKFQSA"). The screenshot is the result using print(PFA).
The reason why I am using a loop is because I am reading multiple files (output files) and they have a lot of data, multiple lenghts, and I am reading the data beginning Ann.Exc.Prob. and as per the screenshot provided I would like to end after line 42 (after a full empty row). I hope that clears some confusion.
Basically read the output files, start reading on "Ann.Exc.Prob" and end until the end of that data (line 42 for this particular file). I am using a function because I am running several times.
Again, sorry for the trouble. Thank you for your time and I appreciate your patience.
https://drive.google.com/file/d/1PGbGWIHFj7IQRevTAEfqqA9Okg4fz7Mg/view?usp=sharing
I want to read in a pdf via pdf tools, extract some data from it and write it to a csv. I have been able to do this successfully for one pdf, but I have many (440) to do. I'm trying to write a loop that goes through a list I have created that has all my file paths in it. The problem is it overwrites every time. So I think my program is doing what I've asked of it, but I am not asking the correct thing! My code is below:
temp <-as.list(list.files(pattern = "*.pdf"))
file_path <- file.path(getwd(),temp)%>%as.list()
data_anz<-as.character()
for (i in 1:length(file_path)){
data_anz<-pdf_text(file_path[[i]])[2]%>%str_split("\n")%>%.[[1]]%>%str_split_fixed("\\s{2,}", n=4)%>%as.data.frame(i:length(file_path))%>%rename(Commodity =V1, Level = V2, Change = V3, Description = V4)
}
What I would like achieve is a data frame that adds to with every iteration, not over writes. So first run, the df = 1 row, 4 cols, the next run 2 rows ect.
I'm lost! But I can get it to work for an individual member of the list and it seems to work through the whole list, but overwrites.
Any help would be super appreciated!
Each iteration of the loop is assigning your table to the same variable. You might want to try something like
data_anz<-list()
for (i in 1:length(file_path)){
data_anz[[i]] <- ...
}
data_anz_all <- do.call(data_anz, rbind)
which puts each table into its own position in a list, and then row-binds them all together at the end (assuming the columns of the individual frames are compatible).
I've searched a bit for answered questions related to this, but I still keep running into issues.
I have a 1.4 million dataframe loaded into R, containing gps route data for ~56 vehicles. I used the split() function to parse my data into smaller chunks by bus name (Bus name example: '1367/E0007489'). I used the following line of code:
dfs <- split(sater001_paired, f=sater001_paired[, "vehicleName"])
Where sater001_pairedis my dataframe, and vehicleName is the variable I split with. The # of rows for each chunk is uneven, given that this data was captured real-time.
The problem I'm facing now is attempting to save each of these chunks into their own .csv files. I tried using lapply as such:
lapply(names(dfs), function(x){write.table(dfs[[x]], file = paste("bus", x, sep = ""))})
But R returns en error message "cannot open the connection". It's likely I'm missing something, as I'm very rusty on using the lapply function.
Any suggestions based off this?
MrFlick has helped me realize the issue I was having here.
So just to close this, the Vehicle Names column I had contained a forward slash halfway in each identification code. As Rstudio on windows does not take kindly to these characters, I did not realize this, as I have only recently switched over from primarily Mac OS use.
By using gsub in the following code:
sater001_paired$vehicleName <- gsub('/', '-', sater001_paired$vehicleName)
This issue has now resolved. Thanks again to MrFlick for the help.
I'm not too advanced with R so any help would be appreciated. I am trying to add values to columns in my dataset and my dataset is called 'katie'.
For example, in the column 'word' I'd like to select instances where 'SUBJECTED' is written and then post 'middle' in the column 'pre.environment', on the same line as 'SUBJECTED' is written. Is there something that I am doing wrong? With this code, the initial line definitely works (as I can see how many "SUBJECTED" items are recognized in the column 'word') but nothing happens when I enter the second line of code.
>x=grep("SUBJECTED", katie$word)
>katie[x,]$pre.environment= c('middle')
I hope this example is sufficient. Thanks in advance for your help.
Try the following code, if I understand your question correctly,
katie$pre.environment <- ifelse(grepl("SUBJECTED", katie$word),
yes = "middle",
no = katie$pre.environment)
I have written an R script that is to be used as part of a shell script based pipeline which will feed dozens of files containing genetic sequence data to the R script one after the other (using args[]).
I am having trouble finding a way to write the results of each run of this script to a single results file. I thought that the easiest way to do this might be to create an empty results.csv table and then ask the script to write to the next row of this file each time it is run (saves the problem of the script writing straight over the file on each run). In this vein a friend helped me out with the following code:
x<-readLines("results.csv")
if(x[[1]]==""){x[[1]]<-paste("meancoscore", "meanboot", "CIres", "RIres", "RC", "nodecount", sep= ",")}
x[[length(x)+1]]<-paste(meancoscore, meanboot, CIres, RIres, RC, nodecount, sep = ",")
x<-data.frame(x)
write.table(x,"results.csv", row.names = F, col.names = F, sep = ",")
In the above code "meancoscore", "meanboot", "CIres", "RIres", "RC", and "nodecount" are first used as a header if the data frame has nothing on the first row.
Following this the results (objects: meancoscore, meanboot, CIres, RIres, RC and nodecount are written in the columns corresponding with their headers. The idea here is that if you run the R script again with different source files it should simply write the results to the next line in the results.csv file.
However, the following is seen in the results.csv file after three runs of this code with different input files:
"\""\\""meancoscore,meanboot,CIres,RIres,RC,nodecount\\""\""
""\""\\""0.000,76.3247863247863,0.721002252252252,0.983235214508053,0.708914804154032,117\\""\""
""\""0.845,77.6923076923077,0.723259762308998,0.983410513459875,0.711261254217159,117\""
""0.85,77.4358974358974,0.728886344116805,0.983878381369061,0.717135516451654,117"
Where my desired result would be the following:
meancoscore,meanboot,CIres,RIres,RC,nodecount
0.000,76.3247863247863,0.721002252252252,0.983235214508053,0.708914804154032,117
0.845,77.6923076923077,0.723259762308998,0.983410513459875,0.711261254217159,117
0.85,77.4358974358974,0.728886344116805,0.983878381369061,0.717135516451654,117
It is worth noting that each successive fun seems to be adding more backslashes and more quotation marks to the results.csv file.
Ideally I would like to be able to simply read in the results.csv file when it is done and analyse the data by accessing the columns with results$meanboot, or summary(results$meanboot) for example.
Could anyone offer some advice on how to modify the above code or offer an alternative solution?
I should add here that I purposefully did not go for the option of writing into the R script a loop that will run through the input files of interest and simply assemble a full table of results as an object (I am aware that this would be very simple to write out). This was because the work being done by this script will be farmed out to multiple machines in a cluster.
Thank you for your time and any help you might be able to offer.
The problem was solved by adding quote = FALSE to the write.table() call as per voidHead's suspicion.