Cannot remove "," in data.frame(they said "Unexpected numeric constant.") - r

I have that df.
I'd like to remove comma in field in cloumn "2019" ~ "2015".
So I used the following function.
(df <- as.numeric(gsub(",", "", df$2019)))
but The R said
"Error : that is unexpected numeric constant.
df <- as.numeric(gsub(",", "", df$2019)
^ "
How can I solve the problem??

You can use lapply to loop over all the columns and remove commas and turn them to numeric.
df[] <- lapply(df, function(x) as.numeric(gsub(',', '', x)))

With tidyverse, we can do
library(dplyr)
library(stringr)
df <- df %>%
mutate(across(everything(), ~ as.numeric(str_remove_all(., ",")))

Related

dummy variable in R for partial string

I am want to create a dummy variable that is 1 if it is contains a part of the numbers. For some reason the str_detect is not working. My error code is as follows:
Error in type(pattern) : argument "pattern" is missing, with no default
sam_data_rd$high_int <- as.integer(str_detect(sam_data_rd$assertions.primarynaics,
c("2111", "3254", "3341", "3342", "3344","3345", "3364", "5112", "5171", "51331",
"5179", "5133Z", "5182", "5191", "5142", "5141Z", "5191Z","5191ZM", "5413", "5415", "5417")))
Try this:
library(dplyr)
library(stringr)
pattern <- paste(c("2111", "3254", "3341", "3342", "3344","3345", "3364", "5112", "5171", "51331",
"5179", "5133Z", "5182", "5191", "5142", "5141Z", "5191Z","5191ZM", "5413", "5415", "5417"), collapse = "|")
sam_data_rd %>%
mutate(high_int = ifelse(str_detect(assertions.primarynaics, pattern), 1, assertions.primarynaics)
The pattern can be a single string with OR (|). Note that the pattern is vectorized to allow multiple elements, but the condition is that the length of the pattern should match the length of the string (or the column i.e. it will be an elementwise comparison)
library(stringr)
v1 <- c("2111", "3254", "3341", "3342", "3344","3345", "3364", "5112", "5171", "51331", "5179", "5133Z", "5182", "5191", "5142", "5141Z", "5191Z","5191ZM", "5413", "5415", "5417")
pat <- str_c("\\b(", str_c(v1, collapse = "|"), ")\\b")
sam_data_rd$high_int <-
as.integer(str_detect(sam_data_rd$assertions.primarynaics, pat))
Or another option is to loop over each of the elements and then reduce it to a single logical vector
library(purrr)
library(dplyr)
sam_data_rd <- sam_data_rd %>%
mutate(high_int = map(v1,
~ str_detect(assertions.primarynaics, .x)) %>%
reduce(`|`) %>%
as.integer)

How to replace commas in a non-numerical list in R?

I have a data.frame in R, that is also a list. I want to replace the "," with "." in the numbers. The data.frame is not numerical, but I think it has to be to be able to change the decimal separator.
I tried a lot, but nothing works. I do not want to rearrange or manipulate my data.frame. All I want is to get rid off "," in the deciaml numbers.
df <- data.frame(colnames(c("a","b","c")),"row1"=c("2,3","6"),"row2"=c("56,0","56,8"),"row3"=c("1",0"))
#trials to make df numeric and change from , to .
as.numeric(str_replace_all(df,",","."))
as.numeric(unlist(df[ ,2:3]))
lapply(df, as.numeric)
as.numeric(gsub(pattern = ",",replacement = ".",df[ ,2:3]))
as.numeric(df$a)
What else can I do about this nasty problem?
I guess you read the data incorrectly (you can specify dec = ",") while reading the data).
You can use gsub to replace commas (,) with dot (.) and turn them to numeric.
df[] <- lapply(df, function(x) as.numeric(gsub(',', '.', x)))
We can also use mutate_all
library(dplyr)
library(stringr)
df %>%
mutate_all(~ as.numeric(str_replace(., ",", ".")))

Replace special character in data frame

I have a dataframe which contains in different cells a special character which I know which is. An example of the structure:
df = data.frame(col_1 = c("21 myspec^ch2 12",NA),
col_2 = c("1 myspec^ch2 4","4 myspec^ch2 212"))
The character is this myspec^ch2 and I would like to replace with -. An example of expected output:
df = data.frame(col_1 = c("21-12",NA),
col_2 = c("1-4","4-212"))
I tried this but it is not working:
df [ df == " myspec^ch2 " ] <- "-"
To get gsub work on whole dataframe use apply:
apply(df, 2, function(x) gsub(" myspec\\^ch2 ", "-", x))
You really want to do a regex-style substitution here. However, in regex, ^ is seen as the beginning of the line (rather than a literal caret). So you can do something like this (using the stringr package):
library(dplyr)
library(stringr)
fixed_df <- df %>%
mutate_all(funs(str_replace_all( . , " myspec\\^ch2 ", "-"))
Note the double backslashes in front of the caret--that escapes the caret and tells R to interpret it literally, rather than as the beginning of the line.

gsub not working on colnames?

I have a dataframe called df with column names in the following format:
"A Agarwal" "A Agrawal" "A Balachandran"
"A.Brush" "A.Casavant" "A.Chakrabarti"
They are first initial and last name. However, some of them are separated with a space, while other are with a period. I need to replace the period with a period.(The first column is called author.ID, and I excluded it from the following code)
I have tried the following codes but the resulting colnames still do not change.
colnames(df[, -1]) = gsub("\\s", "\\.", colnames(df[, -1]))
colnames(df[, -1]) = gsub(" ", ".", colnames(df[, -1]))
What am I doing wrong?
Thanks.
Note that df[, -1] gets you all rows and columns except the first column (see this reference). In order to modify the column names you should use colnames(df).
To replace the first literal space with a dot, use
colnames(df) <- sub(" ", ".", colnames(df), fixed=TRUE)
If there can be more than one whitespace, use a regex:
colnames(df) <- sub("\\s+", ".", colnames(df))
If you need to remove all whitespaces sequences with a single dot in the column names, use gsub:
colnames(df) <- gsub("\\s+", ".", colnames(df))

Whitespace string can't be replaced with NA in R

I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA, and that works for most of the cells of my data frame....but not for everyone!
I have the following code:
library(rvest)
library(dplyr)
library(tidyr)
#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")
#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)
#Set whitespaces into NA's
df[df == ""] <- NA
I figured out that some whitespaces have a little whitespace between the quotation marks
df[11,1]
[1] " "
So my solution was to do the next: df[df == " "] <- NA
However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...
#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)
However, the problem can't go off.
Any ideas?
We need to use lapply instead of sapply as sapply returns a matrix instead of a list and this can create problems in the quotes.
df[1:10] <- lapply(df[1:10], trimws)
and another option if we have spaces like " " is to use gsub to replace those spaces to ""
df[1:10] <- lapply(df[,c(1:10)], function(x) gsub("^\\s+|\\s+$", "", x))
and then change the "" to NA
df[df == ""] <- NA
Or instead of doing the two replacements, we can do this one go and change the class with type.convert
df[] <- lapply(df, function(x)
type.convert(replace(x, grepl("^\\s*$", trimws(x)), NA), as.is = TRUE))
NOTE: We don't have to specify the column index when all the columns are looped
I just spent some time trying to determine a method usable in a pipe.
Here is my method:
df <- df %>%
dplyr::mutate_all(funs(sub("^\\s*$", NA, .)))
Hope this helps the next searcher.

Resources