I want to make a character column to numeric, so I can calculate the mean of basepay. However I keep getting different errors.
I use the code
dataset <- read.csv("Wagegap.csv")
SFWage <- dataset %>%
as.numeric(dataset$BasePay)%>%
group_by(gender,JobTitle, Year) %>%
summarise(averageBasePay = mean(BasePay, na.rm=TRUE)) %>%
select(gender, JobTitle, averageBasePay, Year)
clean <- SFWage %>% filter(gender != "")
It either wont recognize my basepay column if i don't use $, and if i use $ it shows
Error in function_list[i] :
'list' object cannot be coerced to type 'double'
The basepay column shows numbers with a "." instead of "," so I don't have to use a gsub()?
Try this before all the piping :
dataset$BasePay <- as.numeric(dataset$BasePay)
Related
I need to create a line ID column within a dataframe for further pre-processing steps. The code worked fine up until yesterday. Today, however I am facing the error message:
"Error in mutate():
ℹ In argument: line_id = (function (x, y) ....
Caused by error:
! Can't convert y to match type of x ."
Here is my code - the dataframe consists of two character columns:
split_text <- raw_text %>%
mutate(text = enframe(strsplit(text, split = "\n", ))) %>%
unnest(cols = c(text)) %>%
unnest(cols = c(value)) %>%
rename(text_raw = value) %>%
select(-name) %>%
mutate(doc_id = str_remove(doc_id, ".txt")) %>%
# removing empty rows + add line_id
mutate(line_id = row_number())
Besides row_number(), I also tried rowid_to_column, and even c(1:1000) - the length of the dataframe. The error message stays the same.
Try explicitly specifying the data type of the "line_id" column as an integer using the as.integer() function, like this:
mutate(line_id = as.integer(row_number()))
This code works but is not fully satisfying, since I have to break the pipe:
split_text$line_id <- as.integer(c(1:nrow(split_text)))
I'm trying
qual %>% select(reasons_code) %>% str_replace('\\+.*',replacement = '')
but I get the Warning message: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : argument is not an atomic vector; coercing.
However, when I do the following, the replacement works fine.
str_replace(qual$reasons_code,'\\+.*',replacement = '')
Does anyone know why this is happening?
For ?str_replace, the input string is
string - Input vector. Either a character vector, or something coercible to one.
while, the output from select is a data.frame with a single column selected. It is not converted to vector. Instead of select, we can pull the column as vector and it should work
library(dplyr)
qual %>%
pull(reasons_code) %>%
str_replace('\\+.*',replacement = '')
Or if we prefer to use the OP's code with select, there are several ways to convert to vector - unlist is one of them
qual %>%
select(reasons_code) %>%
unlist %>%
str_replace('\\+.*',replacement = '')
I'm trying to do analysis from multiple csv files, and in order to create a key that can be used for left_join I think that I need to try and merge two columns. At present I'm trying to use the tidyverse packages (inc. mutate), but I'm running into an issue as the two columns to merge have different formatting: 1 is a double and the other is in date format. I'm using the following code
qlik2 <- qlik %>%
separate('Admit DateTime', into = c('Admit Date', 'Admit Time'), sep = 10) %>%
mutate(key = MRN + `Admit Date`)
and getting tis output error:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
If there's another way around this (or if the error is actually related to something else), then I'd appreciate any thoughts on the matter. Equally, if people know of a way to left_join with multiple keys, then that would work as well.
Thanks,
Cal
Hard without a reproducible example. But if i understand your question you either want a numeric key, or trying to concatinate a string with the plus +.
Numeric key
library(hablar)
qlik2 <- qlik %>%
separate('Admit DateTime',
into = c('Admit Date', 'Admit Time'),
sep = 10) %>%
convert(num(MRN, `Admit Date`)) %>%
mutate(key = MRN + `Admit Date`)
String key
qlik2 <- qlik %>%
separate('Admit DateTime',
into = c('Admit Date', 'Admit Time'),
sep = 10) %>%
mutate(key = paste(MRN, `Admit Date`))
I need to create a subset of my main data frame (mydata1) in R.
The Date column in mydata1 has already been formatted as a Date using the following codes:
mydata1$Date = as.Date(mydata1$Date)
I have the current codes running to create the subset of my data:
mydata3 <- mydata1 %>%
filter(Total.Extras.Per.GN >= 100) %>%
filter(Original.Meal.Plan.Code %in% target) %>%
filter(Date, between ("2017-01-01"), ("2017-06-01")) %>%
select(PropertyCode, Date, Market, Original.Meal.Plan.Code, GADR, Total.Extras.Per.GN)
However, the line filter(Date, between ("2017-01-01"), ("2017-06-01")) %>% is giving me an error. How do I write it properly so that it filters my Date column with the dates specified therein?
Error message:
Error in filter_impl(.data, dots) :
argument "left" is missing, with no default
Simply place Date inside the between arg and wrap date strings in as.Date() for comparison:
mydata3 <- mydata1 %>%
filter(Total.Extras.Per.GN >= 100) %>%
filter(Original.Meal.Plan.Code %in% target) %>%
filter(between(Date, as.Date("2017-01-01"), as.Date("2017-06-01"))) %>%
select(PropertyCode, Date, Market, Original.Meal.Plan.Code, GADR, Total.Extras.Per.GN)
I know this question has been asked many times (Converting Character to Numeric without NA Coercion in R, Converting Character\Factor to Numeric without NA Coercion in R, etc.) but I cannot seem to figure out what is going on in this one particular case (Warning message:
NAs introduced by coercion). Here is some reproducible data I'm working with.
#dependencies
library(rvest)
library(dplyr)
library(pipeR)
library(stringr)
library(translateR)
#scrape data from website
url <- "http://irandataportal.syr.edu/election-data"
ir.pres2014 <- url %>%
read_html() %>%
html_nodes(xpath='//*[#id="content"]/div[16]/table') %>%
html_table(fill = TRUE)
ir.pres2014<-ir.pres2014[[1]]
colnames(ir.pres2014)<-c("province","Rouhani","Velayati","Jalili","Ghalibaf","Rezai","Gharazi")
ir.pres2014<-ir.pres2014[-1,]
#Get rid of unnecessary rows
ir.pres2014<-ir.pres2014 %>%
subset(province!="Votes Per Candidate") %>%
subset(province!="Total Votes")
#Get rid of commas
clean_numbers = function (x) str_replace_all(x, '[, ]', '')
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(clean_numbers), -province)
#remove any possible whitespace in string
no_space = function (x) gsub(" ","", x)
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(no_space), -province)
This is where things start going wrong for me. I tried each of the following lines of code but I got all NA's each time. For example, I begin by trying to convert the second column (Rouhani) to numeric:
#First check class of vector
class(ir.pres2014$Rouhani)
#convert character to numeric
ir.pres2014$Rouhani.num<-as.numeric(ir.pres2014$Rouhani)
Above returns a vector of all NA's. I also tried:
as.numeric.factor <- function(x) {seq_along(levels(x))[x]}
ir.pres2014$Rouhani2<-as.numeric.factor(ir.pres2014$Rouhani)
And:
ir.pres2014$Rouhani2<-as.numeric(levels(ir.pres2014$Rouhani))[ir.pres2014$Rouhani]
And:
ir.pres2014$Rouhani2<-as.numeric(paste(ir.pres2014$Rouhani))
All those return NA's. I also tried the following:
ir.pres2014$Rouhani2<-as.numeric(as.factor(ir.pres2014$Rouhani))
That created a list of single digit numbers so it was clearly not converting the string in the way I have in mind. Any help is much appreciated.
The reason is what looks like a leading space before the numbers:
> ir.pres2014$Rouhani
[1] " 1052345" " 885693" " 384751" " 1017516" " 519412" " 175608" …
Just remove that as well before the conversion. The situation is complicated by the fact that this character isn’t actually a space, it’s something else:
mystery_char = substr(ir.pres2014$Rouhani[1], 1, 1)
charToRaw(mystery_char)
# [1] c2 a0
I have no idea where it comes from but it needs to be replaced:
str_replace_all(x, rawToChar(as.raw(c(0xc2, 0xa0))), '')
Furthermore, you can simplify your code by applying the same transformation to all your columns at once:
mystery_char = rawToChar(as.raw(c(0xc2, 0xa0)))
to_replace = sprintf('[,%s]', mystery_char)
clean_numbers = function (x) as.numeric(str_replace_all(x, to_replace, ''))
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(clean_numbers), -province)