I need to save a number of tables in a single CSV file and am having difficulty seeing how to retain dimension names. I searched SO and the closest I found was:
How to get dimnames in xtable.table output?
The problem he has with xtable is the problem I've got with write.table – dimnames exist in the table (and prop.table and ftable as well if I use that) but get dropped by write.table. I'm using write.table not write.csv for append=T.
The dataset is from a survey and the aim is to create the complete set of crosstabs, with labelled axes. In this case, actual row/column labels are not important, only dimension labels. I'm new to R, so hope I haven't missed something obvious.
d<-read.csv('dataset.csv') # dataset with column headings, no row labels
cat('BEGIN\n',file='xtabs.csv')
for (i in 1:ncol(d)) {
for (j in 1:ncol(d)) {
cat(paste('\ni=',i,' j=',j,'\n'),file='xtabs.csv',append=T)
t<-table(d[,i],d[,j],dnn=c(names(d[i]),names(d[j])))
pt<-prop.table(t,1)
write.table(pt,'xtabs.csv',sep=',',dec='.',row.names=F,col.names=F,append=T)
print(pt) # shows dimnames in the console as expected
}
}
Try this:
tbl <- with(warpbreaks, table(wool, tension))
pt <- prop.table(tbl)
write.ftable(ftable(pt),file = "~/Desktop/table.csv", sep = ",",
quote = FALSE)
I'm possibly abusing ftables here, which are intended for multi-dimensional tabular data (i.e. more than two variables). But it's the only thing I've found that will write the table to a text file with (seemingly) the formatting you want.
Related
I have several .txt files which need to be imported to R as dataframes for some data analysis. One of these files has no EOL in any form, so I'm left wondering how I would go about to import that.
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
This is how the first ~500 characters of that .txt file look like. The EOL would need to be placed like this:
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"
\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
Normally I would just gsub a "\n" to the places I need it to be, but there is no reoccurring string at the places where I would place a \n, so I don't think that gsub would work in this instance.
Seeing how the missing values are clearly indicated with NA, is there a function similar to read_delim that has a "col_number = x" argument? Like the first x values are the headers, the next x values are the values of the first row and so on and so forth?
If it changes anything, these .txt files are rather big (>300mb).
Big thank you to Julian_Hn. Works like a charm.
I would probably just read this in as a vector and then reformat as matrix with the number of columns you know are in the dataset. This essentially does what you want
str <- "\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\";\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA"
vec <- strsplit(str,";")[[1]]
//EDIT: add byrow = T To stay in the right format. Thanks Yuriy
table <- matrix(vec,ncol=23,nrow=3, byrow = T)
df <- as.data.frame(table)
I'm new to R and need to create a bunch of histograms that are named according to the population they came from. When I try running the loop without the "names" part, it works fine. The code below loops through the list of names and applies them in order, but I end up with 3,364 versions of the same exact histogram. If anyone has any suggestions, I'd really appreciate it.
popFiles <- list.files(pattern = "*.txt") # generates a list of the files I'm working with
popTables <- lapply(popFiles, read.table, header=TRUE, na.strings="NA")
popNames <- read.table(file.path("Path to file containing names", "popNamesR.txt"), header=FALSE,)
popNames <- as.matrix(popNames)
name <- NULL
table <- c(1:58)
for (table in popTables){
for (name in popNames){
pVals <- table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
}
Try making a distinct iterator, and use that, rather than iterating over the table list itself. It's just easier to see what's going on. For example:
pdf("Myhistograms.pdf")
for(i in 1:length(popTables)){
table = popTables[[i]]
name = popNames[i]
pVals = table$p
hist(pVals, breaks=20, xlab="P-val", main=name))
}
dev.off()
In this case, your problem is that name and table are actually linked, but you have two for loops, so actually every combination of table and name are generated.
I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that?
I know that the summary() function gives a rough idea but I would like to learn more.
I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out.
d = read.csv("foo.csv")
names(d)
v = d$whatever # for example
hist(v) # for example
This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv
So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]"
So basically you can already handle your data as vector.
A quick research on SO gave back this two posts among others:
Converting a dataframe to a vector (by rows) and
Extract Column from data.frame as a Vector
And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case).
There is a ton of good ones on the Internet, e.g:
* http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some)
or
* http://tryr.codeschool.com/
Anyways, one way to deal with your csv would be:
#import the data to R as a data.frame
mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",",
quote = "\"",dec = ".", fill = TRUE, comment.char = "")
#extract a column to a vector
firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector
#This previous line is equivalent to:
firstColumn = mydata[,"col1"]
#extract a row to a vector
firstline = mydata[1,] #extract the first row of mydata to a vector
Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character:
firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector
#Note: the entire row *has to be* numeric or compatible with that class
[1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function
I have a script that is working perfectly except that in my R cbind operation, adjacent to the numerical value that I require in the first row, is an 'X'.
Here is my script:
library(ncdf)
library(Kendall)
library(forecast)
library(zoo)
setwd("/home/cohara/RainfallData")
files=list.files(pattern="*.nc")
j=81
for (i in seq(1,9))
{
file<-open.ncdf(sprintf("/home/cohara/RainfallData/%s.nc",i))
year<-get.var.ncdf(file,"time")
data<-get.var.ncdf(file,"var61")
fit<-lm(data~year) #least sqaures regression
mean=rollmean(data,4,fill=NA)
kendall<-Kendall(data,year)
write.table(kendall[[2]],file="/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
write.table(kendall[[1]],file="/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
png(sprintf("./10 percent increase over %s years.png",j))
par(family="serif",mar=c(4,6,4,1),oma=c(1,1,1,1))
plot(year,data,pch="*",col=4,ylab="Precipitation (mm)",main=(sprintf("10 percent increase over %s years",j)),cex.lab=1.5,cex.main=2,ylim=c(800,1400),abline(fit,col="red",lty=1.5))
par(new=T)
plot(year,mean,type="l",xlab="year",ylab="Precipitation (mm)",cex.lab=1.5,ylim=c(800,1400),lty=1.5)
legend("bottomright",legend=c("Kendall tau = ",kendall[[1]]))
legend("bottomleft",legend=c("Kendall 2-tailed p-value = ",kendall[[2]]))
legend(x="topright",c("4 year moving average","Simple linear trend"),lty=1.5,col=c("black","red"),cex=1.2)
legend("topleft",c("Annual total"),pch="*",col="blue",cex=1.2)
dev.off()
j=j+1
}
tmp<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv")
tmp2<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv")
tmp<-cbind(tmp,tmp2)
tmp3<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv")
tmp4<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv")
tmp3<-cbind(tmp3,tmp4)
write.table(tmp,"/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)
write.table(tmp3,"/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)
The output looks like this, from the .csv files created:
X0.0190228056162596 X0.000701081415172666
0.0395622998 0.00531819
0.0126547674 0.0108218994
0.0077754743 0.0015568719
0.0001407317 0.002680057
0.0096391216 0.012719159
0.0107234037 0.0092436085
0.0503448173 0.0103918528
0.0167525802 0.0025036721
I want to be able to use excel functions on the data, so, for simplicity, I don't want row names (I'll be running this loop maybe a hundred times), but I need column names because otherwise the first set of values is cut off.
Can anyone tell me where the 'X' is coming from and how to get rid of it?
Thanks in advance,
Ciara
Here is what I think is going on. Start by running these small examples:
df1 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998, 0.00531819
0.0126547674, 0.0108218994")
df2 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998, 0.00531819
0.0126547674, 0.0108218994", header = FALSE)
df1
df2
str(df1)
str(df2)
names(df1)
names(df2)
make.names(c(0.0190228056162596, 0.000701081415172666))
Please read ?read.csv and about the header argument. As you will find, header = TRUE is default in read.csv. Thus, if the csv file you read lacks header, read.csv will still 'assume' that the file has a header, and use the values in the first row as a header. Another argument in read.csv is check.names, which defaults to TRUE:
If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names).
In your case, it seems that the data you read lack a header and that the first row is numbers only. read.csv will default treat this row as a header. make.names takes values in the first row (here numbers 0.0190228056162596, 0.000701081415172666), and spits out the 'syntactically valid variable names' X0.0190228056162596 and X0.000701081415172666. Which is not what you want.
Thus, you need to explicitly set header = FALSE to avoid that read.csvconvert the first row to (valid) variable names.
For next time, please provide a minimal, self contained example. Check these links for general ideas, and how to do it in R: here, here, here, and here
I have a CSV file in Excel to be read into R using the read.csv function. However, in the Excel file, some elements are blank, which indicate 0's. When I read the file into R, those elements are still blank. How can I fill these elements as 0's in R? It seems that is.na-like functions won't apply to this situation. Thanks!
Depends on how they're being read in to the R. Blank cells in a numeric column should usually be interpreted as NA, in which case
your_df$your_column[is.na(your_df$your_column)] <- 0
should work. Your question suggests that doesn't work, in which case they might be read in as empty characters. In that case,
your_df$your_column[your_df$your_column==""] <- 0
ought to do it. If you post a reproducible example (e.g. with a link to the file on Dropbox) it will be possible to be more specific.
Like Drew says the way to get NAs from blank is when you read in. Provide code and example output of the read in data for better responses. Also play around with str as you can see what classes are in the data, which is valuable info.
You may run into some hanky panky (i.e., the data is a factor or character and not a numeric vector) if you have blank cells and the column is numeric. This approach would address that:
## Make up some data
dat <- data.frame(matrix(c(1:3, "", "", 1:3, "", 1:3, rep("", 3), 5), 4))
data.frame(apply(dat, 2, function(x) {
x[x == ""] <- 0
as.numeric(x)
}))