Losing trailing decimal zero when converting from character to numeric - r

I'm trying to convert character values (including those with decimal values) to numeric but it loses the decimal 0 and just converts it to integer:
results <- c("600.0","600","50","50.0","unknown","300xx300")
df <- data.frame(MIX = results, NUM_ONLY = as.numeric(results))
How can I change it so that it looks like this:
df2<- data.frame(MIX = results ,NUM_ONLY = c("600.0","600","50","50.0",NA,NA))

Using ifelse make those NA that yield NA when coercing to numeric. The result is class "character" though, since decimal zeros are not possible as "numeric". I would stick to your own solution, which by the way is not "integer" but "numeric", try e.g. with c("600.1","600","50","50.0","unknown","300xx300").
data.frame(results,
NUM_ONLY=suppressWarnings(ifelse(is.na(as.numeric(results)), NA, results)))
# results NUM_ONLY
# 1 600.0 600.0
# 2 600 600
# 3 50 50
# 4 50.0 50.0
# 5 unknown <NA>
# 6 300xx300 <NA>

Related

How do I convert hex to numbers keeping the sign (+/-) in R?

I am a first time poster so sorry if the format is not exactly as required.
I have a data frame that looks something like this, with each row containing three columns of hex strings:
id x
1 1 FFF8
2 2 FFBC
3 3 FFAE
4 4 0068
If I understand correctly, "FFF8" should convert to "-8", however all I have managed to do is convert it to the positive equivalent - "65528".
I have used:
dataframe$x<-as.numeric(dataframe$x)
I haven't found any R function that can maintain the minus sign, as intended.
Can anyone kindly help with converting the hex strings into a number whilst maintaining the intended minus sign?
Many thanks in advance.
If you're assuming that the high-bit indicates negative, then
strtoi(dat$x, base=16)
# [1] 65528 65468 65454 104
dat$x2 <- strtoi(dat$x, base=16)
dat$x3 <- ifelse(bitwAnd(dat$x2, 0x8000) > 0, -0xFFFF-1, 0) + dat$x2
dat
# id x x2 x3
# 1 1 FFF8 65528 -8
# 2 2 FFBC 65468 -68
# 3 3 FFAE 65454 -82
# 4 4 0068 104 104

Convert delimited string to numeric vector in dataframe

This is such a basic question, I'm embarrassed to ask.
Let's say I have a dataframe full of columns which contain data of the following form:
test <-"3000,9843,9291,2161,3458,2347,22925,55836,2890,2824,2848,2805,2808,2775,2760,2706,2727,2688,2727,2658,2654,2588"
I want to convert this to a numeric vector, which I have done like so:
test <- as.numeric(unlist(strsplit(test, split=",")))
I now want to convert a large dataframe containing a column full of this data into a numeric vector equivalent:
mutate(data,
converted = as.numeric(unlist(strsplit(badColumn, split=","))),
)
This doesn't work because presumably it's converting the entire column into a numeric vector and then replacing a single row with that value:
Error in mutate_impl(.data, dots) : Column converted must be
length 20 (the number of rows) or one, not 1274
How do I do this?
Here's some sample data that reproduces your error:
data <- data.frame(a = 1:3,
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)
Here's the error:
library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18
A straightforward way would be to just use strsplit on the entire column, and lapply ... as.numeric to convert the resulting list values from character vectors to numeric vectors.
x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3
This might help:
library(purrr)
mutate(data, converted = map(badColumn, function(txt) as.numeric(unlist(strsplit(txt, split = ",")))))
What you get is a list column which contains the numeric vectors.
Base R
A=c(as.numeric(strsplit(test,',')[[1]]))
A
[1] 3000 9843 9291 2161 3458 2347 22925 55836 2890 2824 2848 2805 2808 2775 2760 2706 2727 2688 2727 2658 2654 2588
df$NEw2=lapply(df$NEw, function(x) c(as.numeric(strsplit(x,',')[[1]])))
df%>%mutate(NEw2=list(c(as.numeric(strsplit(NEw,',')[[1]]))))

Adding NA's to a vector

Let's say I have a vector of prices:
foo <- c(102.25,102.87,102.25,100.87,103.44,103.87,103.00)
I want to get the percent change from x periods ago and, say, store it into another vector that I'll call log_returns. I can't bind vectors foo and log_returns into a data.frame because the vectors are not the same length. So I want to be able to append NA's to log_returns so I can put them in a data.frame. I figured out one way to append an NA at the end of the vector:
log_returns <- append((diff(log(foo), lag = 1)),NA,after=length(foo))
But that only helps if I'm looking at percent change 1 period before. I'm looking for a way to fill in NA's no matter how many lags I throw in so that the percent change vector is equal in length to the foo vector
Any help would be much appreciated!
You could use your own modification of diff:
mydiff <- function(data, diff){
c(diff(data, lag = diff), rep(NA, diff))
}
mydiff(foo, 1)
[1] 0.62 -0.62 -1.38 2.57 0.43 -0.87 NA
data.frame(foo = foo, diff = mydiff(foo, 3))
foo diff
1 102.25 -1.38
2 102.87 0.57
3 102.25 1.62
4 100.87 2.13
5 103.44 NA
6 103.87 NA
7 103.00 NA
Let's say you have an array with number 1 to 10 arranged in the matrix form, in which
The matrix contains Elements from 5 rows 2 columns & 2nd column to be assigned NA , #
then Making one 5*2 matrix of elements 1:10
Array_test=array(c(1:10),dim=c(5,2,1))
Array_test
Array_test[ ,2, ]=c(NA)# Defining 2nd column to get NA
Array_test
# Similarly to make only one element of the entire matrix be NA
# let's say 4nd-row 2nd column to be made NA then
Array_test[4 ,2, ]=c(NA)

Reformatting Messy Data Frame Column in R

I've imported a large data frame from a CSV file with oddly formatted numerical data. Here's a reproducible example of the data frame I'm working with:
df <- data.frame("r1" = c(1,2,3,4,5), "r2" = c(1,2.01,-3,"-","2,000"))
'r2' contains values with negatives signs, e.g. "-", and values with zeros represented as dashes '-'. To run some numerical analysis on this messy r2 column, I will need to:
Replace the "-" with zeros "0" while avoiding to remove the
negative sign in front of the negative values.
Avoid coercion of legitimate values like "2,000" to NAs. For some reason, when I run the command: foo$row2<- as.numeric(sub("-",0,foo$row2)) R coerces the values formatted with commas to NAs, thus corrupting the data in the column.
Here's an example of output after running foo$row2<- as.numeric(sub("-",0,foo$row2)) :
Warning message:
NAs introduced by coercion
r1 r2
1 1 1.00
2 2 2.01
3 3 3.00
4 4 0.00
5 5 NA
As you can see, "2,000" was coerced to NA. -3 was erroneously converted to 3 (dash removed). But hey, at least we got rid of the "-" in row 3, right!!!
Here's ultimately what I would like to produce:
r1 r2
1 1 1.00
2 2 2.01
3 3 -3.00
4 4 0.00
5 5 2000
Note that the comma from row 5 is removed. Column r2 should be formatted such that I can run commands like sum(df$r2) on it.
Your approach was sound. Just run the substitution twice, once to remove anything that is just a dash, and once more to remove any commas.
df$r2<-as.numeric(gsub('^-$','0',gsub(',','',df$r2)))
And, if you aren't familiar with regular expressions, by ^-$ I mean remove only strings that start (^), have a dash, and then end ($).
nograpes' solution is way cooler:
## df <- data.frame("r1" = c(1,2,3,4,5), "r2" = c(1,2.01,-3,"-","2,000"))
df$r2 <- as.numeric(gsub(",", "", df$r2))
df$r2[is.na(df$r2)] <- 0
## r1 r2
## 1 1 1.00
## 2 2 2.01
## 3 3 -3.00
## 4 4 0.00
## 5 5 2000.00

Creating a series of vectors from a vector

I have a simple two vector dataframe (length=30) that looks something like this:
> mDF
Param1 w.IL.L
1 AuZgFw 0.5
2 AuZfFw 2
3 AuZgVw 74.3
4 AuZfVw 20.52
5 AuTgIL 80.9
6 AuTfIL 193.3
7 AuCgFL 0.2
8 ...
I'd like to use each of the rows to form 30 single value numeric vectors with the name of the vector taken from mDF$Param1, so that:
> AuZgFw
[1] 0.5
etc
I've tried melting and casting, but I suspect there may be an easier way?
The simplest/shortest way is to apply assign over rows:
mDF <- read.table(textConnection("
Param1 w.IL.L
1 AuZgFw 0.5
2 AuZfFw 2
3 AuZgVw 74.3
4 AuZfVw 20.52
5 AuTgIL 80.9
6 AuTfIL 193.3
7 AuCgFL 0.2
"),header=T,stringsAsFactors=F)
invisible(apply(mDF,1,function(x)assign(x[[1]],as.numeric(x[[2]]),envir = .GlobalEnv)))
This involves converting the second column of the data frame to and from a string. invisible is there only to suppress the output of apply.
EDIT: You can also use mapply to avoid coersion to/from strings:
invisible(mapply(function(x,y)assign(x,y,envir=.GlobalEnv),mDF$Param1,mDF$w.IL.L))

Resources