I need help in initiating/writing an R script that can cross check one .txt file with another to find where there are multiple instances. I have an excel file with a list of miRNAs that I want to cross reference with another excel file that has a column that contains the same miRNA name.
So is it a text file or excel file you are importing? Without more details, file structure, or a reproducible example, it will be hard to help.
You can try:
# Get the correct package
install.packages('readxl')
df1 <- readxl::read_excel('[directory name][file name].xlsx')
df2 <- readxl::read_excel('[directory name][file name].xlsx')
# Creates a new variable flag if miRNA from your first dataset is in the second dataset
df1$is_in_df2 <- ifelse(df1$miRNA %in% df2$miRNA, 'Yes', 'No')
Related
I want to edit an existing excel file using R. For example, ExcelFile_1 has the data, and I need to place the data from ExcelFile_1 into another file called ExcelFile_2. This is based on the column and row names.
ExcelFile_1:
Store Shipped Qty
1111 100
2222 200
ExcelFile_2:
Store Shipped Qty
1111
2222
If I m working with a data frame, I generally do
ExcelFile_2$Shipped Qty <-
ExcelFile_1$Shipped Qty[match(ExcelFile_1$Store #, ExcelFile_2$Store #)
The above line works for my data frame, but I donot know how to place this formula while writing into a worksheet using XLConnect package. All I see is the below mentioned options.
writeWorksheet(object,data,sheet,startRow,startCol,header,rownames)
I do not want to edit as a data frame and save the data frame as another "worksheet" in an existing/new Excel File, as I want to preserve the ExcelFile_2 formats.
For example: I want to change the value of ExcelFile_2 cell "B2" using the values from another sheet.
Could anyone please help me with the above problem?
Assuming your files are stored in your home directory and named one.xlsx and two.xlsx, you can do the following:
library(XLConnect)
# Load content of the first sheet of one.xlsx
df1 <- readWorksheetFromFile("~/one.xlsx", 1)
# Do what you like to df1 ...
# Write df1 to the first sheet of two.xlsx
wb2 <- loadWorkbook("~/two.xlsx")
writeWorksheet(wb2, df1, sheet = 1)
saveWorkbook(wb2)
If needed, you can also use startRow and startCol in both readWorksheetFromFile() and writeWorksheet() to specify exact rows and columns and header to specify if you want to read/write the headers.
I have 500 csv. files with data that looks like:
sample data
I want to extract one cell (e.g. B4 or 0.477) per a csv file and combine those values into a single csv. What are some recommendations on how to do this easily?
You can try something like this
all.fi <- list.files("/path/to/csvfiles", pattern=".csv", full.names=TRUE) # store names of csv files in path as a string vector
library(readr) # package for read_lines and write_lines
ans <- sapply(all.fi, function(i) { eachline <- read_lines(i, n=4) # read only the 4th line of the file
ans <- unlist(strsplit(eachline, ","))[2] # split the string on commas, then extract the 2nd element of the resulting vector
return(ans) })
write_lines(ans, "/path/to/output.csv")
I can not add a comment. So, I will write my comment here.
Since your data is very large and it is very difficult to load it individually, then try this: Importing multiple .csv files into R. It is similar to the first part of your problem. For second part, try this:
You can save your data as a data.frame (as with the comment of #Bruno Zamengo) and then you can use select and merge functions in R. Then, you can easily combine them in single csv file. With select and merge functions you can select all the values you need and them combine them. I used this idea in my project. Do not forget to use lapply.
I am new to R and I have run into a problem. I have a folder with 50 csv files, each representing a city. I want to import the each csv files into R studio as independent data frames to eventually plot all 50 cities in one time series plot.
There are four things I want to do to each csv file, but in the end, have it automated that these four actions are done to each of the 50 csv files.
Skip the first 25 row of the csv file
Combine the Date and Time column for each csv file
Remove the rows where the values in the cells in column 3 is empty
Change the name of column 3 from "ug/m3" to "CO"
After skipping, the first row will be the header
I used the code below on one csv file to see if it would work on one csv.Everything work except for city[,3][!(is.na(city[,3]))].
city1 <- read.csv("path",
skip = 25)
city1$rtime <- strptime(paste(city1$Date, city1$Time), "%m/%d/%Y %H:%M")
colnames(city1)[3] <- "CO"
city[,3][!(is.na(city[,3]))] ## side note: help with this would be appreciated, I was if something goes before the comma especially.
I am not sure how to combine everything in an efficient manner in a function.
I would appreciate suggestions on an efficient manner to perform the 4 actions ( in a function statement maybe) to each csv file while importing them to R.
Use this function for each csv you want to read
read_combine <- function(yourfile){
file <- read.csv(yourfile,skip=25)
file$rtime <- strptime(paste(file $Date, file $Time), "%m/%d/%Y %H:%M")
colnames(file)[3] <- "CO"
file$CO[!is.na(file$CO)]
}
yourfile must be "path"
I have one complex fasta file containing 794 entries that I would like to subset based on various lists of IDs I have created.
The fasta file is in the format shown below:
>5_B1_CZ.1:572-889 ID:5_B1 Contig:1
ATGTCCTGGATDCGTTACTTGTGTATTGCCGGTCCTC
Based on a previous answer, I read the fasta file in using the code below.
fastafile<- read.fasta(file = "test.fasta", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
And then used the following line to subset the fasta file based on a a data frame containing a list of IDs.
f<-fastafile[c(which(names(fastafile) %in% Allint$`All Intersect`))]
An example of the ID list is shown below.
All Intersect
1 5_F2_CZ.13:475-2241
2 2_B8_CZ.9:133-1899
This seemed to work but gave an output that had the various fasta ID as column headers with the sequences in the rows below. (As shown in Image 1)
I have had trouble trying to export this as a complete fasta file due to this format.
Is there an easier way to complete this task?
Sorry if this is convoluted, I am new to R.
A few things:
1) To subset your fasta object you do no need c or which
f<-fastafile[names(fastafile) %in% Allint$`All Intersect`)]
2) To output the sequences you have subset you will want to use the write.fasta function. This will put the sequence back together and write a fasta file.
write.fasta(f, names(f), file.out="My_newfastaFile.fa")
I am new to R programming. I have imported a csv file using the following function
PivotTest <- read.csv(file.choose(), header=T)
The csv file has 7 columns: Meter_Serial_No, Reading_Date, Reading_Description, Reading_value, Entry_Processed, TS_Inserted, TS_LastUpdated.
When uploading, the Meter_Serial_No is filled with zero while there are data in that column in the csv file. When running a function to see what data are in that particular column (PivotTest$Meter_Serial_No), it's returning NULL. Can anyone assist me please.
Furthermore, the csv that I'm importing has more than 127,000 rows. When doing a test with 10 rows of data only, I don't have that problem where the column Meter_Serial_No is replaced with zero.
Depends on the class of values which are there in the column (PivotTest$Meter_Serial_No). I believe there is a problem in type conversion, try the following.
PivotTest <- read.csv("test.csv", header=T,colClasses=c(PivotTest$Meter_Serial_No="character",rep("numeric",6)))