Ignore first row in R - r

How do I do in R, to ignore the first row of a data set and a second turn the column names?
I am currently reading a file that sometimes has garbage on the first or second line, and I am looking for a way to resolve this.

Related

How do I detect a specific string within five consecutive columns in a df with R, and mutate the value or copy it into another column?

I'm trying to write a script in which within my df, I want to find in several consecutive columns (e.g. 16:25) a specific string which is a combination of "SA (string and spaces within the bracket)". If the string is present in any of the columns, it will only appear in one of them.
When that string is detected, I would like it to be moved or copied to a different column named "SA_status".
If it is not found in any of the columns, then I'd like the value for that row on SA_status to appear as NA.
I've tried to create a for loop to search in the first column, and if it wasn't found, in the next one and so on; and if it was found to stop there. Then, I have used a combination of the functions mutate(), case_when(), str_detect() to copy that value into the new column.
Finally, if after doing this process the values in the new column were still blank, I've assigned them to NA.
However, I must be doing something wrong because it is not working and I'm getting really desperate.
Could you please give me a hand? Cheers!

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

find common rows between two dataframes based on two columns using bash

I found this very difficult to solve in bash - I have two files that I want to find the common rows between them based on two columns.
f1.csv:
col1,col2,col3,col4
Dalir,Cpne1,down,2174
Fendrr,Aco2,up,280
Cpne1,Tox1,down,8900
f2.csv
col1,col2,col3,col4,col5,col6
Linc,Rmo,ch2,ch2,p,l
Tox1,Cpne1,ch1,ch2,l,p
so basically the code should look only at the first two columns of the dfs and see if pairs are the same (the order of the pairs is not important). So you can see that in the first df there is
Cpne1,Tox1 in the third row and in the second df there is Tox1,Cpne1 in the second row - so this should be printed in the output from the second file.
Desired output:
Tox1,Cpne1
Unfortunately, I have not been able to develop a bash command for this - it would be great if you could help me with this. Thanks
Just adding the explanation to oguz' fine answer in the comments above:
BEGIN{FS=OFS=","} defines , to be the separator for both input and output.
NR==FNR{pair[$1,$2];next} while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair.
($1,$2) in pair||($2,$1) in pair{print $1,$2} operating on the second file, check if field one and two in any order are present as index in the array pair, and print them if they are.

Remove Column in R after an "if" clause

I am learning R and I have a R data table in which I want to remove unnecessary features (unnecessary table columns). For this I am using the ReliefexpRank algorithm from the CORElearn package, with table and originaltable being the R tables.
library(CORElearn)
estRelifF <-attrEval(FLAG_READMITIDO_MEAN ~.,table,estimator="ReliefFexpRank",ReliefIterations=30)
for( i in estRelifF ){
if(estReliefF[i]==0) {originaltable[i]<-NULL}
}
output <-data.frame (estReliefF)
I know that the estReliefF has the correct results, getting me results like this sample below for each feature
LOCAL
-4.428817e-01
HORA
0.000000e+00
And I want to remove the Hora one which is 0.
I don't know what the problem is though I suspect that's around the IF statement, since it's my first time using R I would appreciate some help since I can't seem to find the mistake.
The issue comes from you modifying your columns while running a loop on them. Let's say your vector and table are :
x<-c(1,1,0,1,0)
df<-data.frame(1:5,2:6,3:7,4:8,5:9)
If you run for(i in 1:5){if(x[i]==0){df[i]<-NULL}}, you'll see that the third column has been removed, but not the fifth. That's because after the third column has been removed, the fifth column is no longer the fifth but the fourth, and x[4]is not null.
You need to find all the unwanted columns before deleting them : one possible solution is :
df[-which(x==0)]

How to delete all rows in R until a certain value

I have a several data frames which start with a bit of text. Sometimes the information I need starts at row 11 and sometimes it starts at row 16 for instance. It changes. All the data frames have in common that the usefull information starts after a row with the title "location".
I'd like to make a loop to delete all the rows in the data frame above the useful information (including the row with "location").
I'm guessing that you want something like this:
readfun <- function(fn,n=-1,target="location",...) {
r <- readLines(fn,n=n)
locline <- grep(target,r)[1]
read.table(fn,skip=locline,...)
}
This is fairly inefficient because it reads the data file twice (once as raw character strings and once as a data frame), but it should work reasonably well if your files are not too big. (#MrFlick points out in the comments that if you have a reasonable upper bound on how far into the file your target will occur, you can set n so that you don't have to read the whole file just to search for the target.)
I don't know any other details of your files, but it might be safer to use "^location" to identify a line that begins with that string, or some other more specific target ...

Resources