I've been a longtime user of xmgrace/grace, but today I'm stumped. I have a .dat file that is simply two columns of space-separated X Y data like I'm accustomed to plotting. But for some reason the program is only using the Y information and plotting X as the integer sequence/row numbers. Here's a sample of the data:
$ cat test.dat
0.016426149 0.91780442
0.016559154 1.942617893
0.016692159 1.937870622
0.016758662 2.160227537
0.016825165 1.464688301
0.016891667 2.413390636
0.017024672 2.62378788
0.017024672 2.396263838
0.01722418 2.30436182
but when I run xmgrace test.dat (or try Data > Import > ASCII...), the x-axis goes from 0 to 14 instead of 0.0164 to 0.0172.
I don't see any hidden characters in the file (used :set list in vi)...
If this helps: The file was originally a CSV that I exported from Excel on my Mac, and used vi to replace the commas with spaces. So...there are no tabs in the file, and no Windows-style ^M's.
How do I get xmgrace to use my x information properly?
Related
How would one write an R data frame to the SAS xpt format and specify the length of each column? For example, in a column of text variables the longest string is 157 characters, however I'd like field length attribute to have 200 characters.
The package haven does not seem to have this option and the package SASxport's documentation is less than clear on this issue.
The SASformat() and SASiformat() functions are used to set an attribute on an R object that sets its format when written to a SAS xport file. To set a data frame column to a 200 character format, use the following approach:
SASformat(mydata$var) <- 'CHAR200.'`
SASiformat(mydata$var) <- 'CHAR200.'`
Then use write.xport() to write the data frame to a SAS xport format.
See page 17 of the SASxport package documentation for details.
SASxport is an old package, so you'll need to load an older version of Hmisc to get it to work properly, per another SO question.
However, on reading the file into SAS it uses the length of the longest string in any observation to set the length of the column, regardless of the format and informat attributes. Therefore, one must write at least one observation containing trailing blanks to the desired length in order for SAS to set the length to the desired size. Ironically, this makes the format and informat superfluous.
This can be accomplished with the str_c() function from the stringr package.
Putting it all together...
library("devtools")
install_version("Hmisc", version = "3.17-2")
library(SASxport)
library(Hmisc)
## manually create a data set
data <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'B', NA, '*' ), z=c("this is a test","line 2","another text string",
"bottom line") )
# workaround - extend the string variable to desired length (30 characters) by
# adding trailing blanks, using stringr::str_c() function
library(stringr)
data$z <- sapply(data$z,function(x){str_c(x,str_dup(" ",30-nchar(x)),collapse=TRUE)})
nchar(data$z)
# write to SAS XPORT file
tmp <- tempfile(fileext = ".dat")
write.xport( data, file = tmp )
We'll read the file into SAS and use lengthc() to check the size of the z column.
libname testlib xport '/folders/myfolders/xport.dat';
proc copy in=testlib out=work;
run;
data data;
set data;
lenZ = lengthc(z);
run;
...and the output:
How can I import data from a .xlsx file into R so that numbers are represented as numbers, when their original decimal separator is comma not a dot?
The only package I know of, when dealing with excel is readxl from tidyverse.
I'm looking for a solution that won't need opening and editing excel files in any other software (and can deal with hundreds of columns to import) - if that would be possible I'd export all excels to .csv and import them using tools I know of, that can take the dec= argument.
So far my best working solution is to import numbers as characters and then transform it:
library(dplyr)
library(stringr)
var1<- c("2,1", "3,2", "4,5")
var2<- c("1,2", "3,33", "5,55")
var3<- c("3,44", "2,2", "8,88")
df<- data.frame(cbind(var1, var2, var3))
df %>%
mutate_at(vars(contains("var")),
str_replace,
pattern = ",",
replacement = "\\.") %>%
mutate_at(vars(contains("var")), funs(as.numeric))
I suspect strongly that there is some other reason these columns are being read as character, most likely that they are the dreaded "Number Stored as Text".
For ordinary numbers (stored as numbers), after switching to comma as decimal separator either for an individual file or in the overall system settings, readxl::read_excel reads in a numeric properly. (This is on my Windows system.) Even when adding a character to one of the cells in that column or setting col_types="text", I get the number read in using a period as decimal, not as comma, giving more evidence that readxl is using the internally stored data type.
The only way I have gotten R to read in a comma as a decimal is when the data is stored in Excel as text instead of as numeric. (You can enter this by prefacing the number with a single quote, like '1,7.) I then get a little green triangle in the corner of the cell, which gives the popup warning "Number Stored as Text". In my exploration, I was surprised to discover that Excel will do calculations on numbers stored as text, so that's not a valid way of checking for this.
It's pretty easy to replace the "," with a "." and recast the column as numeric. Example:
> x <- c('1,00','2,00','3,00')
> df <- data.frame(x)
> df
x
1 1,00
2 2,00
3 3,00
> df$x <- gsub(',','.',df$x)
> df$x <- as.numeric(df$x)
> df
x
1 1
2 2
3 3
> class(df$x)
[1] "numeric"
>
Just using base R and gsub.
I just had the same problem while dealing with an Excel spreadsheet I had received from a colleague. After I had tried to import the file using readxl (which failed), I converted the file into a csv file hoping to solve the problem using read_delim and fiddling with the locale and decimal sign options. But the problem was still there, no matter which options I used.
Here is the solution that worked for me: I found out that the characters that were used in the cells containing the missing values (. in my case) were causing trouble. I went back to the Excel file, replaced . in all cells with missing values with blanks while just keeping the default option for the decimals (,). After that, all columns were imported correctly as numeric using readxl.
If you should face this problem with your decimals set to . make sure to tick the box saying "Match entire cell contents" in Excel before replacing all instances of the missing values .
I'm sure this is simple, but I'm not coming across an answer. I would like to import a data frame into R without processing the lines in a text editor first. Essentially, I want R to do it on read in. So all lines containing
FRAME 1 of ***
OR
ATOM-WISE TOTAL CONTACT ENERGY
will be skipped, deleted or ignored.
And all that will be left is;
Chain Resnum Atom number Energy(kcal/mol)
ATOM C 500 1519 -2.1286
ATOM C 500 1520 -1.1334
ATOM C 500 1521 -0.8180
ATOM C 500 1522 -0.7727
Is there a simple solution to this? I'm not sure which scan() of read.table() arguments would work.
EDIT
I was able to use readLines and gsub to read in the file and remove the (un)necessary lines. I omitted the "" left from the deleted words and now I am trying to convert the character df to a regular(numeric) df. When I use data.frame(x) or as.data.frame(x) I am left with a data frame with 100K rows and only one variable. There should be at least 5 variables.
readLines gives you a vector with one character string for each line of the file. So you have to split these strings into the elements you want before you convert to a dataframe. If you have nice space-separated values, try:
m = matrix(unlist(strsplit(data, " +")), ncol=5, byrow=TRUE)
# where 'data' is the name of the vector of strings
df = data.frame(m, stringsAsFactors=FALSE)
Then for each column with numeric data, use as.numeric() on the column to convert.
I'd like to export random value defined in R as vector (or any other object) to a specific location in a text file. With the use of read.fwf I managed to read the data that is not csv or tab delineated (based on location in file), but no I can not find a suitable way to write/export some random value in the selected (defined) line/row and column in a txt file. I would appreciate any help or suggestions. I was looking to write.table, sink and also some other options for data export, but none of them worked or at least I was not able to complete the task ...
You don't need to use read.fwf if you just want to replace specific characters. Instead, scan in the file line by line as a vector of character strings. Then you can use substring<- to replace specific positions by line and column.
Here's a simple example:
mydat <- scan(text='1234567890\n2345678901\n3456789012', what='character')
mydat
# [1] "1234567890" "2345678901" "3456789012"
substring(mydat[2],5,5) <- 'X'
mydat
# [1] "1234567890" "2345X78901" "3456789012"
substring(mydat[3],1,1) <- 'Y'
mydat
# [1] "1234567890" "2345X78901" "Y456789012"
The result can be written back to file using writeLines:
> writeLines(mydat)
1234567890
2345X78901
Y456789012
I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
This is what I have done so far:
I have a csv file which I open in excel. I manipulate the columns algebraically to obtain a new column "A". I import the file into R using read.csv() and the entries in column A are stored as factors - I want them to be stored as numeric. I find this question on the topic:
Imported a csv-dataset to R but the values becomes factors
Following the advice, I include stringsAsFactors = FALSE as an argument in read.csv(), however, as Hong Ooi suggested in the page linked above, this doesn't cause the entries in column A to be stored as numeric values.
A possible solution is to use the advice given in the following page:
How to convert a factor to an integer\numeric without a loss of information?
however, I would like a cleaner solution i.e. a way to import the file so that the entries of column entries are stored as numeric values.
Cheers for any help!
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
In read.table (and its relatives) it is the na.strings argument which specifies which strings are to be interpreted as missing values NA. The default value is na.strings = "NA"
If missing values in an otherwise numeric variable column are coded as something else than "NA", e.g. "." or "N/A", these rows will be interpreted as character, and then the whole column is converted to character.
Thus, if your missing values are some else than "NA", you need to specify them in na.strings.
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.
Try this instead.
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
The code does the following:
Imports your data as character columns.
Creates an instance of your data as numeric columns.
Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric).
Merging the numeric and character columns into a final dataset.
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
Including this in the read.csv command worked for me: strip.white = TRUE
(I found this solution here.)
version for data.table based on code from dmanuge :
convNumValues<-function(ds){
ds<-data.table(ds)
dsnum<-data.table(data.matrix(ds))
num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5})
nds <- data.table( dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]]
,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] )
return(nds)
}
I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.
So, I had the similar situation here in my data file when I readin as a csv. All the numeric value were turned into char. But in my file there was a value with a word "Filtered" instead of NA. I converted "Filtered" to NA in vim editor of linux terminal with a command <%s/Filtered/NA/g> and saved this file and later used it and read it in R, all the values were num type and not char type any more.
Looks like character value "Filtered" was inducing all values to be char format.
Charu
Hello #Shawn Hemelstrand here are the steps in detail below:
example matrix file.csv having 'Filtered' word in it
I opened the file.csv in linux command terminal
vi file.csv
then press "Esc shift:"
and type the following command at the bottom
"%s/Filtered/NA/g"
press enter
then press "Esc shift:"
write "wq" at the bottom (this save the file and quit vim editor)
then in R script I read the file
data<- read.csv("file.csv", sep = ',', header = TRUE)
str(data)
All columns were num type which were earlier char type.
In case you need more help, it would be easier to share your txt or csv file.