R: Character variable becomes numeric after cbind() and data.frame() - r

I try to combine one column with a data.frame. I used both cbind() and data.frame(), but after that the character variable became a numeric one.
>is.character(new_listing_zip)
[1] TRUE
> new_race_disp_use2 <- cbind(new_listing_zip,opo_trans)
> is.character(new_race_disp_use2$new_listing_zip)
[1] FALSE
> is.character(new_listing_zip)
[1] TRUE
> new_race_disp_use2 <- data.frame(new_listing_zip,opo_trans)
> is.character(new_race_disp_use2$new_listing_zip)
[1] FALSE
Does anyone could help me with this? Thank you.

if you check the help files for data.frame() I think you will find your answer
?data.frame
You'll want to set your
options(stringsAsFactors = TRUE)
to change it globally or just set your parameter for
stringsAsFactors = TRUE
when declaring your data.frame, assuming these are actual character strings. Otherwise I would simply declare your variable as a factor when joining it
new_race_disp_use2 <- cbind(factor(new_listing_zip),opo_trans)
Now of course if your 'factor' is actually a numeric you want as a string (seemingly zip codes in your example) you'll want to either set your zip codes as strings to begin with using quotes (i.e. "12345") or set the data type after the data.frame is built
new_race_disp_use$new_listing_zip <- as.character(new_race_disp$new_listing_zip)
or
as.factor(varName)
or simply
factor() instead of as.character()

Related

unexpected output of paste(data.frame(a=as.character(as.Date("2019-12-31"))))

#eg1:
paste(data.frame(a=as.character(as.Date("2019-12-31"))))
[1]"1"
#eg2:
paste(data.table(a=as.character(as.Date("2019-12-31"))))
[1]"2019-10-12"
#eg3:
paste(data.frame(a=as.Date("2019-12-31")))
[1] 18261
my expected is like eg2, but i don't want use data.table
I have only one question: how to fix this issue, both eg2 and eg3 ?
When you put a character into a data.frame, it is turned into a factor. When you print a factor, it would seem data.table and data.frame are coerced differently. For your particular case, I was able to get around it by unlisting and converting to character before using paste.
> paste(as.character(unlist(data.frame(a=as.character(as.Date("2019-12-31"))))))
[1] "2019-12-31"
Alternatively, you could avoid this by setting stringsAsFactors = FALSE and avoid the factor conversion.
> paste(data.frame(a=as.character(as.Date("2019-12-31")), stringsAsFactors = FALSE))
[1] "2019-12-31"
I don't understand why you are trying to use paste() if what you want to do is view what is contained inside the data frame. Instead, just enter the variable name of the data frame:
df <- data.frame(a=as.character(as.Date("2019-12-31")))
df
a
1 2019-12-31

Can not convert character to numeric in R

I'm struggling to convert a character vector to numeric in R. I import a dataframe from csv with:
a = read.csv('myData.csv', header=T, stringsAsFactors=F)
One of my factors, fac1, is a vector of numbers but contains some instances of "na" and "nr". Hence, typeof(a$fac1) returns "character"
I create a new dataframe without "na" and "nr" entries
k = a[a$fac1 != "na" & a$fac1 != "nr", ]
I then try to convert fac1 to numeric with:
k$fac1_num = as.numeric(k$fac1)
The problem is that this doesn't work, as typeof(k$fac1_num) now returns "double" instead of "numeric"
Can anyone suggest a fix / point out what I'm doing wrong? Thanks in advance!
Try just coercing to numeric:
a = read.csv('myData.csv', header=T, stringsAsFactors=F)
a$fac1_num = as.numeric(a$fac1)
If you need to subset (which is generally not needed and I would advise against doing routinely since there might be value in knowing what the other column value tell you about the "reality" behind the data), then just:
k <- a[ !is.na(a$fac1_num) , ]
That way you will still have the original character value in the a data-object and can examine its values if needed. The proper test for "numericy" is is.numeric
Try to use sapply with mode :
sapply(your_df, mode)

Sapply function in R

I have read two .csv files and did some editing.
a1<-read.csv("2013.csv",header=T, na.strings = c("NULL","PrivacySuppressed"))
a2<-a1[,441,drop=F]
a3<-a1[,-441,drop=F]
a4<-cbind(a1,a2)
a4<-a4[, colSums(is.na(a4)) != nrow(a4)]
mode(a4)
[1] "list"
I need the a4 to be an integer so I used sapply
s<-sapply(a4, as.numeric)
mode(s)
[1] "numeric"
However, the problem is, the column names disappeared.
names(s)
NULL
All the previous datas had column names. Sorry it is impossible to type here since there are 600 variables (600 different column names). I had names for my column until a4. After apply "sapply", the names says "NULL". When I just input s, I see the names of the columns but it is not detecting them as names for columns. Please help.
Thank you.
I think the right command is:
a4 <- as.numeric(a4)

Read data in form '1.4523e-9'

I'm trying to read data from a *.txt or *.csv file into R with read.table or read.csv. However, my data is written as e.g. 1.4523e-9 in the file denoting 1.4523*10^{-9} though ggplot recognizes this as a string instead of a real. Is there some sort of eval( )-function to convert this to its correct value ?
Depending on the exact format of the csv file you import,read.csv and read.table often simply convert all columns to factors. Since a straightforward conversion to numeric as failed, I assume this is your problem. You can change this using the colClasses argument as such:
# if every column should be numeric:
df <- read.csv("foobar.csv", colClasses = "numeric")
#if only some columns should be numeric, use a vector.
#to read the first as factor and the second as numeric:
read.csv("foobar.csv", colClasses = c("factor", "numeric")
Of course, both of the above are barebones examples; you probably want to supply other arguments as well, eg header = T.
If you don't want to supply the classes of each column when you read the table (maybe you don't know them yet!), you can convert after the fact using either of the following:
df$a <- as.numeric(as.character(a)) #as you already discovered
df$a <- as.numeric(levels(df$a)[df$a])
Yes, these are both clunky, but they are standard and frequently recommended.

Write different datatype values to a file in R

Is it possible to write values of different datatypes to a file in R? Currently, I am using a simple vector as follows:
> vect = c (1,2, "string")
> vect
[1] "1" "2" "string"
> write.table(vect, file="/home/sampleuser/sample.txt", append= FALSE, sep= "|")
However, since vect is a vector of string now, opening the file has following contents being in quoted form as:
"x"
"1"|"1"
"2"|"2"
"3"|"string"
Is it not possible to restore the data types of entries 1 and 2 being treated as numeric value instead of string. So my expected result is:
"x"
"1"|1
"2"|2
"3"|"string"
also, I am assuming the left side values "1", "2" and "3" are vector indexes? I did not understand how the first line is "x"?
I wonder if simply removing all the quotes from the output file will solve your problem? That's easy: Add quote=FALSE to your write.table() call.
write.table(vect, file="/home/sampleuser/sample.txt",
append=FALSE, sep="|", quote=FALSE)
x
1|1
2|2
3|string
Also, you can get rid of the column and row names if you like. But now your separator character doesn't appear because you have a one-column table.
write.table(vect, file="/home/sampleuser/sample.txt", append=FALSE, sep="|",
quote=FALSE, row.names=FALSE, col.names=FALSE)
1
2
string
For vectors and matrices, R requires everything to have the same data type. By default, R will coerce all of the data in the vector/matrix into the same format. R will coerce more specific types of data into less specific data types. In this case, any of the items stored in your vector can be reasonably represented as type "character", so it will automatically coerce the numeric parts of the vector to fit that data type.
As #Dason said, you're better off using a list if this isn't something you want.
Alternatively, you can use a data.frame, which lets you store different datatypes in different columns (internally, R stores data.frames as lists, so it makes sense that this would be another option).

Resources