HI folks: I'm trying to write a vector of length = 100 to a single-column .csv in R. Each time I try, I get two columns in the csv file: first with index numbers from the vector, second with the contents of my vector. For example:
MyPath<-("~/rstudioshared/Data/HW3")
Files<-dir(MyPath)
write.csv(Files,"Names.csv",row.names = FALSE)
If I convert the vector to a data frame and then check its dimensions,
Files<-data.frame(Files)
dim(Files)
I get 100 rows by 1 column, and the column contains the names of the files in my directory folder. This is what I want.
Then I write the csv. When I open it outside of R or read it back in and look at it, I get a 100 X 2 DF where the first column contains the index numbers and the second column has the names of my files.
Why does this happen?
How do I write just the single column of data to the .csv?
Thanks!
Row names are written by write.csv() by default (and by default, a data frame with n rows will have row names 1,...,n). You can see this by looking at e.g.:
dat <- data.frame(mevar=rnorm(10))
# then compare what gets written by:
write.csv(dat, "outname1.csv")
# versus:
rownames(dat) <- letters[1:10]
write.csv(dat, "outname2.csv")
Just use write.csv(dat, "outname.csv", row.names=FALSE) and the row names won't show up.
And a suggestion: might be easier/cleaner to just just write the vector directly to a text file with writeLines(your_vector, "your_outfile.txt") (you can still use read.csv() to read it back in if you prefer using that :p).
Related
I'm trying to pull data from a file, but only pull certain columns based on the column name.
I have this bit of code:
filepath <- ([my filepath])
files <- list.files(filepath, full.names=T)
newData <- fread(file,select=c(selectCols))
selectCols contains a list of column names (as strings). But in the data I'm pulling, there may be underscores placed differently in each file for the same data.
Here's an example:
PERIOD_ID
PERIOD_ID_
_PERIOD_ID_
And so on. I know I can use gsub to change the column names once the data is already pulled:
colnames(newData) <- gsub("_","",newData)
Then I can select by column name, but given that it's a lot of data I'm not sure this is the most efficient idea.
Is there a way to do ignore underscores or other characters within the fread function?
I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution
Is it possible to read multiple csv excell files into R. All of the csv files have the same 4 columns. the first is a character, the second and third are numeric and the fourth is integer. I want to combine the data in each numeric column and find the mean.
I can get the csv files into R with
data <- list.files(directory)
myFiles <- paste(directory,data[id],sep="/")
I am unable to get the numbers from the individual columns add them and find the mean.
I am completely new to R and any advice is appreciated.
Here is a simple method:
Prep: Generate dummy data: (You already have this)
dummy <- data.frame(names=rep("a",4), a=1:4,b=5:8)
write.csv(dummy,file="data01.csv",row.names=F)
write.csv(dummy,file="data02.csv",row.names=F)
write.csv(dummy,file="data03.csv",row.names=F)
Step0: Load the file names: (just like you are doing)
data <- dir(getwd(),".csv")
Step1: Read and combine:
DF <- do.call(rbind,lapply(data,function(fn) read.csv(file=fn,header=T)))
DF
Step2: Find mean of appropriate columns:
apply(DF[,2:3],2,mean)
Hope that helps!!
EDIT: If you are having trouble with file path, try ?file.path.
I am importing a csv of stock data into R, with column names of stock ticker which starts with number and containing space inside, e.g. "5560 JP". After reading into R, the column names are added with "X" and space replaced by ".", e.g. "X5560.JP". After all the works are done in R, I want to write the processed data back to a new csv, but with the original column name, e.g. "5560 JP" instead of "X5560.JP", how can I do that?
Thank you!
When you use write.csv or write.table to save your data to a CSV file, you can set the column names to whatever you like by setting the col.names argument.
But that assumes you have the column names to available.
Once you've read in the data and R has converted the names, you've lost that information. To get around this, you can suppress the conversion to get the column names:
df <- read.csv("mydata.csv", check.names=FALSE)
orig.cols <- colnames(df)
colnames(df) <- make.names(colnames(df))
[your original code]
write.csv(df, col.names=orig.cols)
I have several txt files in which each txt file contains 3 columns(A,B,C).
Column A will be common to all txt files. Now I want to combine txt files with coulmn A appearing only once while the other columns (B and C) of respective files. I used cbind but it creates a data frame with repeats of column A, which I dont want. The column A must be repeated only once. Here is the R code I tried:
data <- read.delim(file.choose(),header=T)
data2 <- read.delim(file.choose(),header=T)
data3 <- cbind(data1,data2)
write.table(data3,file="sample.txt",sep="\t",col.names=NA)
Unless your files are all sorted precisely the same, you'll need to use merge:
dat <- merge(data,data2,by="A")
dat <- merge(dat,data3,by="A")
This should automatically prevent you from having multiple A's, since merge knows they're all a key/index column. You'll likely want to rename the duplicate B's and C's before merging.