Hi I have the following data frame:
b = data.frame(c(1,2),c(3,4))
> colnames(b) <- c("100.X0","100.00")
> b
100.X0 100.00
1 1 3
2 2 4
I would like to save this as a csv file with headers as strings. When I use write.csv the result ends up being:
100.X0 100
1 3
2 4
It turns the 100.00 to 100, how do I incorporate this?
I think the problem might be the way you read the csv file. Certain programs will guess the type and convert (for eg Excel)
Use write.xls from package dataframes2xls instead:
> library(dataframes2xls)
> write.xls(b, "test.csv")
Result :
Related
I want to read a xls file into R and select specific columns.
For example I only want columns 1 to 10 and rows 5 - 700. I think you can do this with xlsx but I can't use that library on the network that I am using.
Is there another package that I can use? And how would I go about selecting the columns and rows that I want?
You can try this:
library(xlsx)
read.xlsx("my_path\\my_file.xlsx", "sheet_name", rowIndex = 5:700, colIndex = 1:10)
Since you are unable to lead the xlsx package, you might want to consider base R and use read.csv. For this, save your Excel file as a csv. The explanation for how to do this can be easily found on the web. Note, csv files can still be opened as Excel.
These are the steps you need to take to only read the 2nd and 3rd column and row.
hd = read.csv('a.csv', header=F, nrows=1, as.is=T) # first read headers
removeCols <- c('NULL', NA, NA) #define which columns to keep/remove
df <- read.csv('a.csv', skip=2, header=F, colClasses=removeCols) #skip says which rows not to read
colnames(df) <- hd[is.na(removeCols)]
df
two three
1 5 8
2 6 9
This is the example data I used.
a <- data.frame(one=1:3, two=4:6, three=7:9)
write.csv(a, 'a.csv', row.names=F)
read.csv('a.csv')
one two three
1 1 4 7
2 2 5 8
3 3 6 9
I have data in text format whose structure is as follows:
ATCTTTGAT*TTAGGGGGAAAAATTCTACGC*TTACTGGACTATGCT
.........T.....,,,,,,,,,.......T,,,,,,.........
......A..*............,,,,,,,,.A........T......
....*..................,,,T...............
...*.....................*...........
...................*.....
I have been trying to import it into R using the read.table() command but when I do the output has an altered structure like this:
V1
1 ATCTTTGAT*TTAGGGGGAAAAATTCTACGC*TTACTGGACTATGCT
2 .........T.....,,,,,,,,,.......T,,,,,,.........
3 ......A..*............,,,,,,,,.A........T......
4 ....*..................,,,T...............
5 ...*.....................*...........
6 ...................*.....
For some reason, R is shifting the rows with lesser number of characters to the right. How can I load my data into R without altering the data structure present in the original text file?
Try this :)
read.table(file, sep = "\n")
result:
V1
1 ATCTTTGAT*TTAGGGGGAAAAATTCTACGC*TTACTGGACTATGCT
2 .........T.....,,,,,,,,,.......T,,,,,,.........
3 ......A..*............,,,,,,,,.A........T......
4 ....*..................,,,T...............
5 ...*.....................*...........
6 ...................*.....
I have a data frame where some of the rows have blanks entries, e.g. to use a toy example
Sample Gene RS Chromosome
1 A rs1 10
2 B X
3 C rs4 Y
i.e. sample 2 has no rs#. If I attempt to save this data frame in a file using:
write.table(mydata,file="myfile",quote=FALSE,sep='\t')
and then read.table('myfile',header=TRUE,sep='\t'), I get an error stating that the number of entries in line 2 doesn't have 4 elements. If I set quote=TRUE, then a "" entry appears in the table. I'm trying to figure out a way to create a table using write.table with quote=FALSE while retaining a blank placeholder for rows with missing entries such as 2.
Is there a simple way to do this? I attempted to use the argument NA="" in write.table() but this didn't change anything.
If result of my script's data frame has NA I always replace it , One way would be to replace NA in the data frames with a some other text which tells you that this entry was NA in the data frame -Specially if you are saving the result in a csv /database or some non -R env
a simple script to do that
replace_NA <- function(x,replacement="N/A"){
x[is.na(x)==T] <- replacement
}
sapply(df,replace_NA,replacement ="N/A" )
You are attempting to reinvent the fixed-width file format. Your requested format would have a blank column between every real column. I don't find a write.fwf, although the 'utils' package has read.fwf. The simplest method of getting your requested output would be:
capture.output(dat, file='test.dat')
# Result in a text file
Sample Gene RS Chromosome
1 1 A rs1 10
2 2 B X
3 3 C rs4 Y
This essentially uses the print method (at the end of the R REPL) for dataframes to do the spacing for you.
Let me start by saying I am brand new to R, so any solution with a detailed explanation would be appreciated so I can learn from it.
I have a set of csv files with the following rows of information:
"ID" "Date" "A" "B" (where A and B are some data points)
I am attempting to get the output in a meaningful manner and am stuck on what I am missing.
observations <- funtion(dir, id= 1:10){
#get all file names in a vector
all_files <- list.files(directory, full.names=TRUE)
#get the subset of files we want to read
file_contents <- lapply(all_files[id], read.csv)
#cbind the file contents
output <- do.call(rbind, file_contents)
#remove all NA values
output <- output[complete.cases(output), ]
#at this point output is a data.frame so display the output
table(output[["ID"]])
}
My current output is :
2 4 8 10 12
1000 500 200 150 100
which is correct but I need it in column form so it can be understood by looking at it. The output I am trying to get to is below:
id obs_total
1 2 1000
2 4 500
3 8 200
4 10 150
5 12 100
What am I missing here?
table outputs a contingency table. You want a data frame. You can wrap as.data.frame(...) around you output to convert it.
as.data.frame(table(ID = output[["ID"]]))
Assuming that the numbers are correct, looks like you have everything you need, just transpose the data frame. Try this:
mat<-matrix(round(runif(10),3),nrow=2)
df<-as.data.frame(mat)
colnames(df)=c("1","2","3","4","5")
t(df)
In excel I have a table that looks like this:
` Data Freq
1 [35-39] 1
2 [40-44] 3
3 [45-49] 5
4 [50-54] 11
5 [55-59] 7
6 [60-64] 7`
I'm trying to figure out a way of being able to read the value in the Data column as the intervals for calculations in the R Project software.
I need to calculate things as:
`(39-35)/2`
# read
library(xlsx)
d <- read.xlsx('data.xlsx',header=T,sheetIndex=1)
# reorder
dl <- do.call(rbind,strsplit(as.character(d$Data),split='-|\\[|\\]'))
d$b <- as.numeric(dl[,3])
d$a <- as.numeric(dl[,2])
# calculate
d$mid <- (d$b-d$a)/2+d$a
Another way that doesn't use libraries is to convert you excel file into a csv (via save as in excel) and then read the data using read.csv.
xlsx uses rJava and needs Java. An alternative is readxl
library(readxl)
ed=read_excel("myfile.xlsx")