I am pretty new to R and I wonder if I can replace NA value (which looks like string) with blank, nothing
It is easy when entire table is as.character however my table contains double's as well therefore when I try to run
f <- as.data.frame(replace(df, is.na(df), ""))
or
df[is.na(df)] <- ""
Both does not work.
Error is like
Assigned data `values` must be compatible with existing data.
Error occurred for column `ID`.
Can't convert <character> to <double>.
and I understand, but I really need ID's as well as any other cell in the table (character, double or other) blank to remain in the table, later on it is connected to BI tool and I can't present "NA", just blank for the sake of clarity
If your column is of type double (numbers), you can't replace NAs (which is the R internal for missings) by a character string. And "" IS a character string even though you think it's empty, but it is not.
So you need to choose: converting you whole column to type character or leave the missings as NA.
EDIT:
If you really want to covnert your numeric column to character, you can just use as.character(MYCOLUMN). But I think what you really want is:
Telling your exporting function how to treat NA'S, which is easy, e.g. write.csv(df, na = ""). Also check the help function with ?write.csv.
Related
please see the the column name "if" in the second column,the deifference is :when check.name=F,"." beside "if" disappear
Sorry for the code,because I try to type some codes to generate this data.frame like in the picture,but i failed due to the "if".We know that "if" is a reserved word in R(like else,for, while ,function).And here, i deliberately use the "if" as the column name (the 2nd column),and see whether R will generate some novel things.
So using another way, I type the "if" in the excel and save as the format of csv in order to use read.csv.
Question is:
Why "if." changes to "if"?(After i use check.names=FALSE)
enter image description here
?read.csv describes check.names= in a similar fashion:
check.names: logical. If 'TRUE' then the names of the variables in the
data frame are checked to ensure that they are syntactically
valid variable names. If necessary they are adjusted (by
'make.names') so that they are, and also to ensure that there
are no duplicates.
The default action is to allow you to do something like dat$<column-name>, but unfortunately dat$if will fail with Error: unexpected 'if' in "dat$if", ergo check.names=TRUE changing it to something that the parser will not trip over. Note, though, that dat[["if"]] will work even when dat$if will not.
If you are wondering if check.names=FALSE is ever a bad thing, then imagine this:
dat <- read.csv(text = "a,a\n2,3")
dat
# a a.1
# 1 2 3
dat <- read.csv(text = "a,a\n2,3", check.names = FALSE)
dat
# a a
# 1 2 3
In the second case, how does one access the second column by-name? dat$a returns 2 only. However, if you don't want to use $ or [[, and instead can rely on positional indexing for columns, then dat[,colnames(dat) == "a"] does return both of them.
df <- read.csv(
text = '"2019-Jan","2019-Feb",
"3","1"',
check.names = FALSE
)
OK, so I use check.names = FALSE and now my column names are not syntactically valid. What are the practical consequences?
df
#> 2019-Jan 2019-Feb
#> 1 3 1 NA
And why is this NA appearing in my data frame? I didn't put that in my code. Or did I?
Here's the check.names man page for reference:
check.names logical. If TRUE then the names of the variables in the
data frame are checked to ensure that they are syntactically valid
variable names. If necessary they are adjusted (by make.names) so that
they are, and also to ensure that there are no duplicates.
The only consequence is that you need to escape or quote the names to work with them. You either string-quote and use standard evaluation with the [[ column subsetting operator:
df[['2019-Jan']]
… or you escape the identifier name with backticks (R confusingly also calls this quoting), and use the $ subsetting:
df$`2019-Jan`
Both work, and can be used freely (as long as they don’t lead to exceedingly unreadable code).
To make matters more confusing, R allows using '…' and "…" instead of `…` in certain contexts:
df$'2019-Jan'
Here, '2019-Jan' is not a character string as far as R is concerned! It’s an escaped identifier name.1
This last one is a really bad idea because it confuses names2 with character strings, which are fundamentally different. The R documentation advises against this. Personally I’d go further: writing 'foo' instead of `foo` to refer to a name should become a syntax error in future versions of R.
1 Kind of. The R parser treats it as a character string. In particular, both ' and " can be used, and are treated identically. But during the subsequent evaluation of the expression, it is treated as a name.
2 “Names”, or “symbols”, in R refer to identifiers in code that denote a variable or function parameter. As such, a name is either (a) a function name, (b) a non-function variable name, (c) a parameter name in a function declaration, or (d) an argument name in a function call.
The NA issue is unrelated to the names. read.csv is expecting an input with no comma after the last column. You have a comma after the last column, so read.csv reads the blank space after "2019-Feb", as the column name of the third column. There is no data for this column, so an NA value is assigned.
Remove the extra comma and it reads properly. Of course, it may be easier to just remove the last column after using read.csv.
df <- read.csv(
text = '"2019-Jan","2019-Feb"
"3","1"',
check.names = FALSE
)
df
# 2019-Jan 2019-Feb
# 1 3 1
Consider df$foo where foo is a column name. Syntactically invalid names will not work.
As for the NA it’s a consequence of there being three columns in your first line and only two in your second.
I am interested to assign names to list elements. To do so I execute the following code:
file_names <- gsub("\\..*", "", doc_csv_names)
print(file_names)
"201409" "201412" "201504" "201507" "201510" "201511" "201604" "201707"
names(docs_data) <- file_names
In this case the name of the list element appears with ``.
docs_data$`201409`
However, in this case the name of the list element appears in the following way:
names(docs_data) <- paste("name", 1:8, sep = "")
docs_data$name1
How can I convert the gsub() result to receive the latter naming pattern without quotes?
gsub() and paste () seem to produce the same class () object. What is the difference?
Both gsub and paste return character objects. They are different because they are completely different functions, which you seem to know based on their usage (gsub replaces instances of your pattern with a desired output in a string of characters, while paste just... pastes).
As for why you get the quotations, that has nothing to do with gsub and everything to do with the fact that you are naming variables/columns with numbers. Indeed, try
names(docs_data) <- paste(1:8)
and you'll realize you have the same problem when invoking the naming pattern. It basically has to do with the fact that R doesn't want to be confused about whether a number is really a number or a variable because that would be chaos (how can 1 refer to a variable and also the number 1?), so what it does in such cases is change a number 1 into the character "1", which can be given names. For example, note that
> 1 <- 3
Error in 1 <- 3 : invalid (do_set) left-hand side to assignment
> "1" <- 3 #no problem!
So R is basically correcting that for you! This is not a problem when you name something using characters. Finally, an easy fix: just add a character in front of the numbers of your naming pattern, and you'll be able to invoke them without the quotations. For example:
file_names <- paste("file_",gsub("\\..*", "", doc_csv_names),sep="")
Should do the trick (or just change the "file_" into whatever you want as long as it's not empty, cause then you just have numbers left and the same problem)!
I have a dataset of GPS locations (x), and there are occasional NA values in the dataset. When applying a read.csv command, the GPS lat/long values (in UTM meters) are imported as a factor class, with each GPS value as a level.
To convert back to numbers, I have attempted to use the
print(x$lat, quotes = F)
command to remove quotes. The output appears to lack quotes, but when I store it
x$lat <- print(x$lat, quotes = F)
the column is coerced into a character string. This is a good first step, but the quotes are retained in the character string. Normally, I read that applying the following function usually works for removing non-numeric data
x$lat <- x$lat[!is.na(as.numeric(as.character(x$lat)))]
However, because the quotes are retained, none of the data are entirely "numeric", and so the !is.na(...) part returns a vector completely filled with FALSE values, and the resulting vector is NA of length x$lat.
I also copied this string to attempt to remove the quotes from the character vector that I can make, to no avail.
x$lat <- gsub("\\'", "", x$lat)
I suppose I could go into excel and delete the NA values, but I would like to learn how to manage the data effectively in R instead. T
I am trying to read a data table into R. The data contains:
two columns with numeric values (continuous),
1556 columns with either 0 or 1, and
one last column with strings, representing two groups (group A and
group B).
Some values are missing, and they are replaced with either ? or some spaces and then ?. As a result, when I read the table into R, the numbers were read as characters.
For example, if Data[1,1]=125, when I write is.numeric(Data[1,1]) I get FALSE. I want to turn all the numbers into numbers, and I want all the ? (with or without spaces before) into missing values. Do not know how to do this. Thank you! (I have 3279 rows).
You can specify the na.strings argument of ?read.table to be na.strings = c("?", "?."). Use that inside the read.table() call when you read the data into R. It should then by recognised correctly. Since you also have some spaces in your data, you could additionally use the strip.white = TRUE argument inside the read.table call.