convert commas in a column of a data set points r - r

I've imported from excel a dataset. And I have a column 'Height' and I would want to replace the ',' by '.' .
I tried with this command but it gives me error.
apply(apply(DATASET$Height, 2, gsub, patt=",", replace="."), 2, as.numeric)
Thank you very much for your help

To recode column 'Height' in data frame 'DATASET':
DATASET$Height <- gsub(",",".",DATASET$Height,fixed=TRUE)
Any errors? If no you can proceed to convert the column to numeric.
Get errors when converting to numeric? Perhaps you have still other characters besides "," that prevent R from reading the values as numbers. In that case you would need to apply gsub a second time to remove all non-numeric characters.

First, you should check if it is character. Then, I would split the strings by the comma, then paste them with a dot:
suppose a is what you get with DATASET[["Height"]]
a <- c("234,23", "2314,54", "234,65")
then with sapply, you can split and collapse each character element:
b <- sapply(a,
function(string){
paste0(unlist(strsplit(string, split=",")),collapse=".")
})
Now, you can replace the DATASET[["Height"]] with b.

Related

Using grepl() in a particular type of pattern matching

I'm not sure how to do this, I have a feeling that I can use grepl() with this but I am not sure how.
I have a column in my dataset where I have names like "Abbot", "Baron", "William", and hundreds of other names, and many blanks/missing-values.
I want to extract it in such a way where the first letter is extracted and put in a new column that only contains the letter, and if its missing a value then fill in with unknown.
Below I use a quick sapply statement and strsplit to grab the first letter. There is likely a better way to do this, but here's one solution. :)
test <- c('Abbot', 'Baron', 'William')
firstLetter <- sapply(test, function(x){unlist(strsplit(x,''))[1]})
What do you mean with
and if its missing a value then fill in with unknown
?
The following code using substr should be very fast with a large number of rows. It always returns the first letter and returns NA if the respective value in test$name is NA.
test <- data.frame(name = c('Abbot', 'Baron', 'William', NA))
test$first.letter <- substr(test$name, 1, 1)
If you want to convert all NAin test$first.letter to 'unknown' you can do this afterwards:
test$first.letter <- ifelse(is.na(test$first.letter), "unknown", test$first.letter)

R: Replace value in data frame as.numeric

I've imported a data frame from a csv-file
dat3 <- read.csv(file.choose(),as.is = TRUE)
contains names and values. My problem is, that when I try to replace a value in the data frame, e.g.
dat3[3,6]<-12
then it just assumes, that "12" is a text string and not a value, thus preventing me from using that number to mathematical operations. I'd like to being able to replace some numbers in the data frame and using them for mathematical operations.
When I try adding 1 to dat3[3,6] I get: "Error in dat3[3, 6] + 1 : non-numeric argument to binary operator".
I've tried:
lapply(dat3[3,6], as.numeric)
dat3[3,6]<-as.numeric(12)
But it doesn't work. I have though no problems in using the already imported numbers in the data frame. This only happens for numbers which I replace.
Yes!
I've found the answer!
It is:
dat[, c(3:6)] <- sapply(dat[, c(3:6)], as.numeric)
to convert column to numbers.
Thank you all!

Converting factors to numeric in R

I have 100s of columns in my database as factors. They actually contains numbers, but R considers them as factors. For my project requirement, I want to convert them to numeric.
I can do that in bulk using sapply / for loop. However i am not sure how to check that variable contains numbers? I cannot just check is.factor(var_name) as the data base also contains character variables which are considered as factors.
is there some other way to execute the below check:
if (is.numeric(var_name)) {
convert the variable to numeric
}
I am looking for something similar to "stringasfactors= FALSE"
which is used for retaining character variable as a character variable instead of converting to factors.
Any help/pointer would be really helpful.
One way would be to use type.convert after converting all the columns to character
df1[] <- lapply(df1, function(x) type.convert(as.character(x)))
Now, the non-numeric character columns will be converted to factor class. We can reconvert those columns back to character
df1[] <- lapply(df1, function(x) if(is.factor(x)) as.character(x) else x)

How to combine the values in a column of dataframe

I have a dataframe with two column. I want to concatenate the values in a second column and return a string. How can I do this in R?
You can use paste with the appropriate delimiter. Here, I am using ''. You can specify it to -, _ or anything else.
paste(df$Col2, collapse="")
If there are NAs you could use na.omit
paste(na.omit(df$V2), collapse="")

Prevent R from coercing non-numeric strings to "NA" when using "as.numeric"

I want to convert a column of numbers to numeric, but there are certain cells that say "New" and "Gone", which I want to retain as characters.
If I use as.numeric(df$col1), the numbers are converted to numeric, but the words are coerced into "NA" values.
Is there any way that I could convert all the numbers to numeric while preventing this coercion?
You can't do it with a vector because vectors can only contain a single type. However, you could do it with a list.
Data <- data.frame(col1=c("1","2","New","3","Gone"), stringsAsFactors=FALSE)
List <- lapply(as.list(Data$col1), type.convert, as.is=TRUE)
A column of a data.frame will always be all of the same type. So you cannot have the string "New" and the number 5 in the same column.
However, an example to get you on your way:
x <- c('New', 1, 'Gone', 2)
ifelse(is.na(as.numeric(x)), x, as.numeric(x))
Depending on what you're doing this can be extended to apply to your specific case.
Per Joshua's comment, you can use functions in the ifelse statement:
ifelse(is.na(as.numeric(x)), sprintf('its a string %s', x), sprintf('its a number %f', as.numeric(x)))
However, the usual technique for dealing with this situation is as Joshua outlined in his answer.

Resources