How to convert char to int in R retaining leading zeroes - r

I am converting a column that has characters such as 000024, 000120 etc to integers.
My code is as below
df$colname <- as.integer(df$colname)
But this removes the leading zeroes and I see result as 24, 120. Is there any way I can prevent it?

Integers don't have a fixed number of leading zeros (or I guess you could say they have infinitely many leading zeros) so computers don't track those if the values as numeric. It's only when displaying them, or turning them into a string that you add a certain number of zeros. When you need them to be pretty, you can add zeros with functions like sprintf("%06d", c(12, 120)) but those return strings in the end (and they assume all values will use the same number of digits).

Related

Finding specific decimal point digit [duplicate]

This question already has an answer here:
Extract digit from numeric in r
(1 answer)
Closed 1 year ago.
For example, if I had the number 7.12935239484 and wanted just the 10th decimal place digit (in this example the answer would be 8), how would I go about displaying that using R?
Multiple by 1e10, convert to an integer, and then perform mod 10 to retrieve the number.
floor(7.12935239484* 1e10) %%10
The easiest way is probably by string manipulation.
Use format() with enough digits to make sure that you include the digits you want.
I have written the digit position as 10+2 to emphasize that you are skipping over the first two digits (7.) and taking the 10th digit after the decimal point.
x <- 7.12935239484
substr(format(x,digits=20), start = 10+2, stop = 10+2)
It might be more principled (and robust) to use numerical manipulation
floor((x*1e10) %% 10)
This shifts the decimal point 10 places and then calculates the reminder modulo 10 (the parentheses around x*1e10 are needed to get the right order of operations). This would still work if there were more digits to the left of the decimal point (unlike the string-based solution).
Extract digit from numeric in r is almost a duplicate ...

convert a character type column with leading zeros to numeric [duplicate]

This question already has answers here:
pad numeric column with leading zeros
(2 answers)
Closed 4 years ago.
I want to convert a character column into numeric column in a data frame.
The column values are like "0001" "0002"...
I used
as.numeric(as.character(column_name))
it returns 1,2...
but the leading zeros are removed.
I want to keep the leading zeros and change column's type.
If you want leading zeroes, you do not have the option to make it numeric. A numeric class is by definition, logically, numeric, which means leading zeroes would not have any meaning. So you need to decide whether you like to keep it as a character column, OR to convert it to numeric and accept that that leads to a deletion of the leading zeroes.

How do I select specific vectors of a matrix to be plotted against each other (such as when using hexplom)?

Is there a quick way to code for those specific vectors? Like I only want to use every 4th column in my matrix then plot the selected columns. I'm very new to R and have absolutely no idea what I'm doing. I know how to select a single vector and how to select a certain number in a row but that doesn't really help.
If you're looking to extract every 4th column from a matrix you can use seq().
Here's an example. I made a dummy dataset: foo<-matrix(c(rep(c(4,3,2,7),100)),nrow=10,ncol=10)
Then you can store the column indexes that you want from your matrix like so:
colsyouwant<-seq(from = 4, to = ncol(foo), by = 4)
from = whatever column you'd like to start from, in your case the 4th. Then you specify where you'd like to stop, so I used the ncol function to count how many columns are in the matrix. In this case my matrix isn't a multiple of 4 but it doesn't matter because seq stops before then. Then by=4 because you want to select every fourth column.
The colsyouwant now equals to 4 8. Simply use brackets and the name of your variable to get the columns you want out. foo[,colsyouwant]. Here the brackets just specify what part of the matrix I want as an output, it goes [rows,columns]. Since I want all the rows I leave that spot blank and then specify the rows using the colsyouwant variable, or in other words 4 8.

Change decimal digits for data frame column in R

Questions about displaying of certain numbers of digits have been posted, however, just for single values or vectors, so I hope someone can help me with this.
I have a data frame with several columns and want to display all values in one column with two decimal digits (this column only). I have tried round() and format() and options(digits) but none worked on a column (numerical). I wonder if there is a method to do this without going the extra way of converting the column to a vector and gluing all together again.
Thanks a lot!
Here's an example of how to do this with the cars data.frame that comes installed with R.
First I'll add some variability so that we have numbers with decimal places:
data=cars+runif(nrow(cars))
Then to round just a single column (in this case the dist column to 2 decimal places):
data[,'dist']=round(data[,'dist'],2)
If your data contain whole numbers then you can guarantee that all values will have 2 decimal places by using:
cars[,'dist']=format(round(cars[,'dist'],2),nsmall=2)

Cluster analysis on two columns that contain name of person in R

I am a beginner in R. I have to do cluster analysis in data that contains two columns with name of persons. I converted it in data frame but it is character type. To use dist() function the data frame must be numeric. example of my data:
Interviewed.Type interviewed.Relation.Type
1. An1 Xuan
2. An2 The
3. An3 Ngoc
4. Bui Thi
5. ANT feed
7. Bach Thi
8. Gian1 Thi
9. Lan5 Thi
.
.
.
1100. Xung Van
I will be grateful for your help.
You can convert a character vector to a factor using factor. A factor is basically a vector of numbers together with an attribute giving the text associated with each number, which are called levels in R. One can use as.numeric or unclass to get at the raw numbers. These can then be fed into algorithms which require numbers, like e.g. dist.
Note that the order in which numbers are associated with texts is pretty much arbitrary (in fact alphabetical), so the difference between numbers has no meaning in most applications. Therefore calling dist on this result is technically possible, but not neccessarily meaningful. For this reason, the author of this answer is not satisfied with it, even if the original poster seems to be happy about it. :-)
Also note that if there are different vectors, converting each separately will mean that the same number will represent different textual values and vice versa, unless both vectors are compromised from exactly the same set of distinct values. Additional care has to be taken if you want the same levels for both factors. One way would be to concatenate both vecotrs, turn that into a factor, and then split the result into two factor vectors.

Resources