R divide two dataframes by column [duplicate] - r

I've tried:
i <- as.numeric(as.character(Impress))
i <- as.numeric(as.character(levels(Impress)))
i <- as.numeric(paste(Impress))
I always get:
Warning message:
NAs introduced by coercion
> i
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
This is the data I want to be numeric:
> Impress
[1] 24,085,563.00 35,962,587.00 31,714,513.00 28,206,422.00 40,161,010.00 36,292,929.00 31,545,482.00
[8] 28,213,878.00 35,799,224.00 32,400,885.00 28,496,459.00 37,456,344.00 38,108,667.00 33,407,771.00
[15] 32,540,479.00 30,692,707.00 22,873,000.00 21,329,146.00 28,921,953.00 30,471,519.00 28,601,289.00
[22] 27,450,630.00 26,708,790.00 19,825,041.00 18,844,169.00 29,592,039.00 31,012,594.00 28,792,531.00
[29] 28,578,028.00 24,913,985.00
30 Levels: 18,844,169.00 19,825,041.00 21,329,146.00 22,873,000.00 24,085,563.00 24,913,985.00 ... 40,161,010.00
> paste(Impress)
[1] " 24,085,563.00 " " 35,962,587.00 " " 31,714,513.00 " " 28,206,422.00 " " 40,161,010.00 " " 36,292,929.00 " " 31,545,482.00 "
[8] " 28,213,878.00 " " 35,799,224.00 " " 32,400,885.00 " " 28,496,459.00 " " 37,456,344.00 " " 38,108,667.00 " " 33,407,771.00 "
[15] " 32,540,479.00 " " 30,692,707.00 " " 22,873,000.00 " " 21,329,146.00 " " 28,921,953.00 " " 30,471,519.00 " " 28,601,289.00 "
[22] " 27,450,630.00 " " 26,708,790.00 " " 19,825,041.00 " " 18,844,169.00 " " 29,592,039.00 " " 31,012,594.00 " " 28,792,531.00 "
[29] " 28,578,028.00 " " 24,913,985.00 "
and when I do i<-as.numeric(Impress), it pastes the wrong values.
Thanks!

As far as the computer is concerned, , is not a number and hence any number string containing it must not be numeric, even if to a human these look like perfectly acceptable numbers.
Get rid of the , and then it will work, e.g. using gsub()
i <- as.numeric(gsub(",", "", as.character(Impress)))
E.g.
Impress <- c("24,085,563.00", "35,962,587.00", "31,714,513.00", "28,206,422.00")
gsub(",", "", as.character(Impress))
i <- as.numeric(gsub(",", "", as.character(Impress)))
i
R> gsub(",", "", as.character(Impress))
[1] "24085563.00" "35962587.00" "31714513.00" "28206422.00"
R> i
[1] 24085563 35962587 31714513 28206422
R> is.numeric(i)
[1] TRUE

Because the data has commas, R cannot convert it to a numeric. You have to remove the commas with sub() first and then convert:
i <- as.numeric(gsub(",", "", as.character(impress)))

Related

How can I send a text to a logfile given that a certain event has finished within a loop in R?

I try to get error rates for different parameter settings for a random forest (classification).
Given that I use a loop and this takes considerable time i would like to know how much time has passed up until a certain point. For this I would like to get a result saved on a logfile each time a certain event has passed. the code looks like this.
library(randomForest)
ntree<-c(1:1000)
mtry<-c(1:30)
set.seed(123)
for (j in mtry) {
for (i in ntree) {
rf1 <- randomForest(mymodel,mtry=j, ntree=i)
result = data.frame(mtry=j,ntree=i,
OOB=rf1[["err.rate"]][nrow(rf1[["err.rate"]]),"OOB"])
oob_NP = rbind(oob_NP, result)
}
}
I would like to get a result in a log file for every hundred model...So show me the error rate result for
mtry=1, ntree=100
mtry=1, ntree=200
.
.
.
mtry=30,ntree=1000
Anyone an idea how to integrate this in the code?
This can be solved with sprintf to produce the log text lines and cat to write them to a connection.
logfile <- "Tacatico.log"
ntree <- 1:10
mtry <- 1:3
logfile_con <- file(logfile, open = "wt")
for (j in mtry) {
for (i in ntree) {
logtext <- sprintf("mtry=%d ntree=%d", j, i)
cat(logtext, '\n', file = logfile_con)
}
}
close(logfile_con)
Check what was written to the log file.
readLines(logfile)
# [1] "mtry=1 ntree=1 " "mtry=1 ntree=2 " "mtry=1 ntree=3 "
# [4] "mtry=1 ntree=4 " "mtry=1 ntree=5 " "mtry=1 ntree=6 "
# [7] "mtry=1 ntree=7 " "mtry=1 ntree=8 " "mtry=1 ntree=9 "
#[10] "mtry=1 ntree=10 " "mtry=2 ntree=1 " "mtry=2 ntree=2 "
#[13] "mtry=2 ntree=3 " "mtry=2 ntree=4 " "mtry=2 ntree=5 "
#[16] "mtry=2 ntree=6 " "mtry=2 ntree=7 " "mtry=2 ntree=8 "
#[19] "mtry=2 ntree=9 " "mtry=2 ntree=10 " "mtry=3 ntree=1 "
#[22] "mtry=3 ntree=2 " "mtry=3 ntree=3 " "mtry=3 ntree=4 "
#[25] "mtry=3 ntree=5 " "mtry=3 ntree=6 " "mtry=3 ntree=7 "
#[28] "mtry=3 ntree=8 " "mtry=3 ntree=9 " "mtry=3 ntree=10 "

Removing " " (empty values) from a Character of Strings

I have been looking around for few hours now and have not been able not remove "" from the character of strings below.
c("Final", "A", "7.43", "8.50", "15.93", "2.00",
"1.00", "0.30", "0.37", " 7.43", " 8.50", "0.50", "0.67", " ",
" ", " ", " ", " ", " ", " ", "B", "7.00", "3.77", "10.77",
" 7.00", "1.67", "3.77", " ", " ", " ", " ", " ", " ", " ", " ",
I have many more of these empty values in this dataset and just want to get rid of them before organizing then as a data frame like
Final
A B
7.43 7.43
8.50 8.50
15.93 0.50
2.00 0.67
1.00
0.30
Thanks,
You can use the base grep with values = TRUE. That searches the character vector for a given regex pattern and returns all values where that pattern is found.
You can think about the logic of your pattern a couple ways. One might be to think of it as keeping values with a "word" character, which are letters, numbers, or underscores.
x <- c("Final", "A", "7.43", "8.50", "15.93", "2.00", "1.00", "0.30", "0.37", " 7.43", " 8.50", "0.50", "0.67", " ", " ", " ", " ", " ", " ", " ", "B", "7.00", "3.77", "10.77", " 7.00", "1.67", "3.77", " ", " ", " ", " ", " ", " ", " ", " ")
grep("\\w", x, value = T)
#> [1] "Final" "A" "7.43" "8.50" "15.93" "2.00" "1.00" "0.30"
#> [9] "0.37" " 7.43" " 8.50" "0.50" "0.67" "B" "7.00" "3.77"
#> [17] "10.77" " 7.00" "1.67" "3.77"
Another way is to find values with a character that isn't a space (\\S is the negation of \\s):
grep("\\S", x, value = T)
#> [1] "Final" "A" "7.43" "8.50" "15.93" "2.00" "1.00" "0.30"
#> [9] "0.37" " 7.43" " 8.50" "0.50" "0.67" "B" "7.00" "3.77"
#> [17] "10.77" " 7.00" "1.67" "3.77"
Created on 2018-12-10 by the reprex package (v0.2.1)

Convert character matrix column to numeric matrix

I would like to perform heatmap. I transferred the data frame to matrix. My first column in the matrix contains 51 state names in character format. Due to this when I execute heatmap an error pops out ('X' must be numeric). If I convert the matrix into numeric all the states get converted to numeric values from 1 to 51. Name of the state gets changed to numbers. I would like someone to help me in converting the character column into numeric without any value change in the column.
enter image description here
I get the following error:
> heatmap.2(matrix)
Error in heatmap.2(matrix) : `x' must be a numeric matrix
dput(matrix[1:20,1:5])
structure(c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE",
"FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA",
" 156023.01", " 934292.20", " 565543.16", " 859246.77", "1802826.03",
" 236048.04", " 277419.16", " 44170.06", " 364245.19", "3059883.80",
"1032052.28", " 49148.00", " 484355.76", " 103032.97", "1501399.16",
"1098716.37", " 536964.81", " 714912.96", " 930454.92", "1006184.61",
NA, " 647281.97", " 243467.03", " 222016.05", "1955376.54", " 284157.80",
" 546510.14", " 310209.01", " 238855.76", "3055374.94", " 620487.04",
" 52286.08", " 183689.95", " 101198.95", "2299302.42", " 682522.43",
" 203429.06", " 566182.29", " 434137.97", "1269701.60", " 279984.88",
" 1785117.72", " 1210217.08", " 1738388.11", "12313826.52", " 1033786.31",
" 1905870.34", " 1589936.20", " 1177198.27", " 7379680.11", " 3182089.09",
" 539865.15", " 907408.47", " 706547.91", " 5616722.28", " 2793763.32",
" 751262.24", " 2620593.80", " 3327343.31", " 3423941.61", " 277346.4",
" 3231424.9", " 1784411.7", " 2539940.3", "13107647.6", " 1623508.4",
" 2475804.7", " 1382151.2", " 1362240.3", "10431341.9", " 4514651.7",
" 1081821.1", " 1653629.7", " 594605.5", " 9147134.3", " 4121661.9",
" 1292330.2", " 3252592.8", " 3360762.2", " 4269284.1"), .Dim = c(20L,
5L), .Dimnames = list(NULL, c("Provider.State", "039 ", "057 ",
"064 ", "065 ")))
(I named it m so that I don't override the matrix function.)
First, your first column is an identifier. I'm going to infer that they have meaning, so I'll keep them around as row-names, but that doesn't change the outcome.
head(m)
# Provider.State 039 057 064 065
# [1,] "AK" " 156023.01" NA " 279984.88" " 277346.4"
# [2,] "AL" " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] "AR" " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] "AZ" " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "CA" "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] "CO" " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
rn <- m[,1]
m <- m[,-1]
rn
# [1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA" "ID" "IL" "IN" "KS" "KY" "LA" "MA"
head(m)
# 039 057 064 065
# [1,] " 156023.01" NA " 279984.88" " 277346.4"
# [2,] " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
(We'll use rn in a minute.) Now we need to convert everything to numbers.
m <- apply(m, 2, as.numeric)
rownames(m) <- rn
head(m)
# 039 057 064 065
# AK 156023.0 NA 279984.9 277346.4
# AL 934292.2 647282.0 1785117.7 3231424.9
# AR 565543.2 243467.0 1210217.1 1784411.7
# AZ 859246.8 222016.0 1738388.1 2539940.3
# CA 1802826.0 1955376.5 12313826.5 13107647.6
# CO 236048.0 284157.8 1033786.3 1623508.4
And now the heatmap works.
heatmap(m)
it can be done with purrr package
try with below :
library(purrr)
df<-df %>%
map_if(is.factor,as.character) %>%
as.matrix

Scraping Financial Tables From Web Page with R, rvest,Rcurl

I'm trying parsing financial tables from web page. I proceeded. But I am not able to arrange list, or data.frame
library(rvest)
link <- "http://www.marketwatch.com/investing/stock/garan/financials/balance-sheet/quarter"
read <- read_html(link)
prs <- html_nodes(read, ".financials")
irre <- html_text(prs)
re <- strsplit(irre, split = "\r\n")
re is something like this:
[27] "Assets"
[28] ""
[29] " "
[30] " "
[31] " All values TRY millions."
[32] " 31-Dec-201431-Mar-201530-Jun-201530-Sep-201531-Dec-2015"
[33] " 5-qtr trend"
[34] " "
[35] " "
[36] " "
[37] " "
[38] " Total Cash & Due from Banks"
[39] " 27.26B26.27B26.7B34.51B27.9B"
[40] " "
[41] " "
bla bla...
How Can I edit this list through data.frame that properly like this page
Try
library(XML)
theurl <- "http://www.marketwatch.com/investing/stock/garan/financials/balance-sheet/quarter"
re <- readHTMLTable(theurl)
The result is a list with two dataframes.

Cant convert factor to numeric in R

I've tried:
i <- as.numeric(as.character(Impress))
i <- as.numeric(as.character(levels(Impress)))
i <- as.numeric(paste(Impress))
I always get:
Warning message:
NAs introduced by coercion
> i
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
This is the data I want to be numeric:
> Impress
[1] 24,085,563.00 35,962,587.00 31,714,513.00 28,206,422.00 40,161,010.00 36,292,929.00 31,545,482.00
[8] 28,213,878.00 35,799,224.00 32,400,885.00 28,496,459.00 37,456,344.00 38,108,667.00 33,407,771.00
[15] 32,540,479.00 30,692,707.00 22,873,000.00 21,329,146.00 28,921,953.00 30,471,519.00 28,601,289.00
[22] 27,450,630.00 26,708,790.00 19,825,041.00 18,844,169.00 29,592,039.00 31,012,594.00 28,792,531.00
[29] 28,578,028.00 24,913,985.00
30 Levels: 18,844,169.00 19,825,041.00 21,329,146.00 22,873,000.00 24,085,563.00 24,913,985.00 ... 40,161,010.00
> paste(Impress)
[1] " 24,085,563.00 " " 35,962,587.00 " " 31,714,513.00 " " 28,206,422.00 " " 40,161,010.00 " " 36,292,929.00 " " 31,545,482.00 "
[8] " 28,213,878.00 " " 35,799,224.00 " " 32,400,885.00 " " 28,496,459.00 " " 37,456,344.00 " " 38,108,667.00 " " 33,407,771.00 "
[15] " 32,540,479.00 " " 30,692,707.00 " " 22,873,000.00 " " 21,329,146.00 " " 28,921,953.00 " " 30,471,519.00 " " 28,601,289.00 "
[22] " 27,450,630.00 " " 26,708,790.00 " " 19,825,041.00 " " 18,844,169.00 " " 29,592,039.00 " " 31,012,594.00 " " 28,792,531.00 "
[29] " 28,578,028.00 " " 24,913,985.00 "
and when I do i<-as.numeric(Impress), it pastes the wrong values.
Thanks!
As far as the computer is concerned, , is not a number and hence any number string containing it must not be numeric, even if to a human these look like perfectly acceptable numbers.
Get rid of the , and then it will work, e.g. using gsub()
i <- as.numeric(gsub(",", "", as.character(Impress)))
E.g.
Impress <- c("24,085,563.00", "35,962,587.00", "31,714,513.00", "28,206,422.00")
gsub(",", "", as.character(Impress))
i <- as.numeric(gsub(",", "", as.character(Impress)))
i
R> gsub(",", "", as.character(Impress))
[1] "24085563.00" "35962587.00" "31714513.00" "28206422.00"
R> i
[1] 24085563 35962587 31714513 28206422
R> is.numeric(i)
[1] TRUE
Because the data has commas, R cannot convert it to a numeric. You have to remove the commas with sub() first and then convert:
i <- as.numeric(gsub(",", "", as.character(impress)))

Resources