Tucson File is a standard format for tree-ring dataset (see : http://www.cybis.se/wiki/index.php?title=Tucson_format) for a precise description.
The aim is to convert Excel files with 1st Column as YEARS, and other columns as MEASUREMENTS into that RWL format to run DplR package on R.
Some clues are already on (creating a .rwl object) but actually, Chron() and Detrend() functions doen't handle column files as they introduce NAs by coercion.
I've been working many ways to built a "brutal" loop without succeeding, but I'm wondering if a smarter way is possible under R environment ?
Anyway, if somebody here is able to help on a loop I'll take it :)
Thanks a lot !
Alex,
OK, DplR Package have a write.tucson()function (o_O)
library("dplR")
dat <- read.table ("column.txt", header = T, row.names = 1)
write.tucson (dat, "tucson.txt", prec = 0.01, long.names = TRUE)
Related
I am not very familiar with loops in R, and am having a hard time stating a variable such that it is recognized by a function, DESeqDataSetFromMatrix.
pls is a table of integers. metaData is a data frame containing sample IDs and conditions corresponding to pls. I verified that the below steps run error-free with the individual elements of cond run successfully .
I reviewed relevant posts on referencing variables in R:
How to reference variable names in a for loop in R?
How to reference a variable in a for loop?
Based on these posts, I modified i in line 3 with single brackets, double brackets and "as.name". No luck. DESeqDataSetFromMatrix is reading the literal text after ~ and spits out an error.
cond=c("wt","dhx","mpp","taz")
for(i in cond){
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=~i, tidy = TRUE)
"sizeFactors"(dds) <- 1
paste0("PLS",i)<-DESeq(dds)
pdf <- paste(i,"-PLS_MA.pdf",sep="")
tsv <- paste(i,"-PLS.tsv",sep="")
pdf(file=pdf,paper = "a4r", width = 0, height = 0)
plotMA(paste0("PLS",i),ylim=c(-10,10))
dev.off()
write.table(results(paste0("PLS",i)),file = tsv,quote=FALSE, sep='\t', col.names = NA)
}
With brackets, an unexpected symbol error populates.
With i alone, DESEqDataSetFromMatrix tries to read "i" from my metaData column.
Is R just not capable of reading variables in some situations? Generally speaking, is it better to write loops outside of R in a more straightforward language, then push as standalone commands? Thanks for the help—I hope there is an easy fix.
For anyone else who may be having trouble looping with DESeq2 functions, comments above addressed my issue.
Correct input:
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=as.formula(paste0("~", i)), tidy = TRUE)
as.formula worked well with all DESeq functions that I tested.
reformulate(i) also worked well in most situations.
Thanks, everyone for the help!
I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")
I´m trying to do redundancy analysis (RDA) on my data in R. The data frame I´m using was uploaded as a Microsoft Excel csv file. The data frame looks something like this:
site biomass index
1 0.001 1.5
2 0.122 2.3
3 0.255 4.9
When trying to create a formula for the RDA, I constantly get the following message: "Error in formula.data.frame(object, env = baseenv()) :
cannot create a formula from a zero-column data frame"
Does anyone know how I can change my data frame so that I no longer get this error message?
Thanks in advance!
You load all your data to row.names instead of columns
you used default separator ',' while your data is separated by
';'
you specified row.names = 1 so that first column (and the
only one as the separator is wrong) goes to row.names.
that's why your data.frame has no columns. To fix this, use
read.csv('data.csv', sep = ';', row.names=NULL)
Problem solved :) Thanks for the your answers. I tried the suggestions but what ended up working was to import the data with the <- read.csv2("data.csv", row.names=1), so basically to use read.csv2 instead of read.csv, as it seems to be used in many European locales.
There is more info on the difference between csv and csv2 here: Difference between read.csv() and read.csv2() in R
I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that?
I know that the summary() function gives a rough idea but I would like to learn more.
I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out.
d = read.csv("foo.csv")
names(d)
v = d$whatever # for example
hist(v) # for example
This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv
So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]"
So basically you can already handle your data as vector.
A quick research on SO gave back this two posts among others:
Converting a dataframe to a vector (by rows) and
Extract Column from data.frame as a Vector
And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case).
There is a ton of good ones on the Internet, e.g:
* http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some)
or
* http://tryr.codeschool.com/
Anyways, one way to deal with your csv would be:
#import the data to R as a data.frame
mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",",
quote = "\"",dec = ".", fill = TRUE, comment.char = "")
#extract a column to a vector
firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector
#This previous line is equivalent to:
firstColumn = mydata[,"col1"]
#extract a row to a vector
firstline = mydata[1,] #extract the first row of mydata to a vector
Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character:
firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector
#Note: the entire row *has to be* numeric or compatible with that class
[1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function
I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
This is what I have done so far:
I have a csv file which I open in excel. I manipulate the columns algebraically to obtain a new column "A". I import the file into R using read.csv() and the entries in column A are stored as factors - I want them to be stored as numeric. I find this question on the topic:
Imported a csv-dataset to R but the values becomes factors
Following the advice, I include stringsAsFactors = FALSE as an argument in read.csv(), however, as Hong Ooi suggested in the page linked above, this doesn't cause the entries in column A to be stored as numeric values.
A possible solution is to use the advice given in the following page:
How to convert a factor to an integer\numeric without a loss of information?
however, I would like a cleaner solution i.e. a way to import the file so that the entries of column entries are stored as numeric values.
Cheers for any help!
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
In read.table (and its relatives) it is the na.strings argument which specifies which strings are to be interpreted as missing values NA. The default value is na.strings = "NA"
If missing values in an otherwise numeric variable column are coded as something else than "NA", e.g. "." or "N/A", these rows will be interpreted as character, and then the whole column is converted to character.
Thus, if your missing values are some else than "NA", you need to specify them in na.strings.
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.
Try this instead.
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
The code does the following:
Imports your data as character columns.
Creates an instance of your data as numeric columns.
Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric).
Merging the numeric and character columns into a final dataset.
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
Including this in the read.csv command worked for me: strip.white = TRUE
(I found this solution here.)
version for data.table based on code from dmanuge :
convNumValues<-function(ds){
ds<-data.table(ds)
dsnum<-data.table(data.matrix(ds))
num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5})
nds <- data.table( dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]]
,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] )
return(nds)
}
I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.
So, I had the similar situation here in my data file when I readin as a csv. All the numeric value were turned into char. But in my file there was a value with a word "Filtered" instead of NA. I converted "Filtered" to NA in vim editor of linux terminal with a command <%s/Filtered/NA/g> and saved this file and later used it and read it in R, all the values were num type and not char type any more.
Looks like character value "Filtered" was inducing all values to be char format.
Charu
Hello #Shawn Hemelstrand here are the steps in detail below:
example matrix file.csv having 'Filtered' word in it
I opened the file.csv in linux command terminal
vi file.csv
then press "Esc shift:"
and type the following command at the bottom
"%s/Filtered/NA/g"
press enter
then press "Esc shift:"
write "wq" at the bottom (this save the file and quit vim editor)
then in R script I read the file
data<- read.csv("file.csv", sep = ',', header = TRUE)
str(data)
All columns were num type which were earlier char type.
In case you need more help, it would be easier to share your txt or csv file.