Printing side-by-side data.table with title to PDF - r

I created a table from a data.frame using:
t_eth1 <- table(df_all$lettergrade1,df_all$ethnicity).
My output is great. I can run chisq.test on it just fine. class(t_eth1) = "Table"
It looks like: (there are headers for columns tmp2....tmp8 but no header for column tmp1...this gives me grief when trying to tibble as there is size mismatch and I don't know what to do about that).
tmp1 <- c("A","B","C","D","F")
tmp2 <- c(0,1,1,1,0)
tmp3 <- c(4,1,4,1,0)
tmp4 <- c(0,0,2,0,0)
tmp5 <- c(3,5,12,2,0)
tmp6 <- c(1,5,6,2,0)
tmp7 <- c(8,10,16,3,2)
tmp8 <- c(0,2,2,1,0)
My table will look similar to:
tst <- table(mtcars$cyl,mtcars$mpg)
I have many of these tables. I'd like to print them in some sort of grid fashion with a title above each table and the chisq.stat I calculated below each table.
table1title1 tabletitle2
| val val val | | val val val|
| val val val | | val val val|
| val val val | | val val val|
p-value chi stuff p-value chi stuff
I cannot seem to get these to print using either of these. (it will print to the code box in Rstudio, but not to console like I'd expect from plot or ggplot
pdf(file="testprint.pdf",onefile=T)
t_eth1 <or> print(t_eth1)
dev.off()
I've tried gtable, but it wants a tibble input and I can't seem to convert data.table to tibble.
I've tried miktex, but it won't take the table format; in the error log it give me: "! Sorry, but C:\Users\datad\AppData\Local\Programs\MiKTeX\miktex\bin\x64\pdflatex.exe did not succeed."
I've tried expss which allows me to set.caption above tables, but it won't print the table to the pdf file.
I'm stuck. A little hand holding would be great.
> R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "4"
$minor
[1] "0.2"
$year
[1] "2020"
$month
[1] "06"
$day
[1] "22"
$`svn rev`
[1] "78730"
$language
[1] "R"
$version.string
[1] "R version 4.0.2 (2020-06-22)"
$nickname
[1] "Taking Off Again"

Related

efficient processing of large amounts of data from a web socket in R

I am using the "websocket" library to process quotes from the Binance exchange. My script processes a large amount of incoming data and finds abnormally large data in them, which it stores in a small file. I ran into a problem in the processing speed of a large data stream. This results in a significant processing delay. How can I handle such large streams efficiently? unfortunately I cannot reproduce an example here, but I would like to know about the general practices of processing large amounts of data
DataProcess <- function(message){
data <- message # I assign the incoming message to a variable
ticker <- data$s # I get from the message the name to which the data belongs
bid <- data$b # data in which I am looking for large values
ask <- data$a # data in which I am looking for large values
orderBook <- c(bid, ask) # data in which I am looking for large values
secondValuesNumeric <- as.numeric(sapply(orderBook,"[[",2)) #
maxIndex <- which.max(secondValuesNumeric) #
biggest_size <- orderBook[[maxIndex]] # 3 lines with which I find the highest value in the received message
x <- list(symbol = ticker, biggest_size)
# I create a connection to the file in which the last recorded large value is stored
con <- file(description = paste(ticker, '.txt', sep = ''))
#if the largest value in the received data is greater than the specified large value and the price at which this value is located is not equal to the last recorded one, then I rewrite the incoming message to the file and send the message to the messenger
if (as.double(x[[2]][2]) > dat$bigSize[dat$ticker == ticker] & !isTRUE(all.equal(as.double(readLines(con = con, n = 1, warn = F)), as.double(x[[2]][1])))) {
writeLines(paste(x[[2]][1]), con)
bot$send_message(chat_id = -498542337,
text = paste(ticker, ': ', 'Large size for the price: ', x[[2]][1], ' Volume: ', x[[2]][2]))
}
close(con)
}
$e
[1] "depthUpdate"
$E
[1] 1616530913906
$T
[1] 1616530913402
$s
[1] "COMPUSDT"
$U
[1] 276620494586
$u
[1] 276620494586
$pu
[1] 276620491003
$b
$b[[1]]
[1] "379.81" "0.170"
$b[[2]]
[1] "379.80" "2.200"
$b[[3]]
[1] "379.79" "0.486"
$b[[4]]
[1] "379.75" "4.269"
$b[[5]]
[1] "379.74" "1.410"
$b[[6]]
[1] "379.71" "0.427"
$b[[7]]
[1] "379.68" "1.208"
$b[[8]]
[1] "379.67" "3.949"
$b[[9]]
[1] "379.66" "1.115"
$b[[10]]
[1] "379.65" "7.200"
$b[[11]]
[1] "379.58" "0.308"
$b[[12]]
[1] "379.57" "0.500"
$b[[13]]
[1] "379.55" "1.100"
$b[[14]]
[1] "379.53" "1.000"
$b[[15]]
[1] "379.45" "1.855"
$b[[16]]
[1] "379.41" "5.333"
$b[[17]]
[1] "379.36" "3.149"
$b[[18]]
[1] "379.33" "6.838"
$b[[19]]
[1] "379.32" "10.000"
$b[[20]]
[1] "379.31" "0.806"
$a
$a[[1]]
[1] "380.09" "0.658"
$a[[2]]
[1] "380.10" "0.345"
$a[[3]]
[1] "380.12" "0.490"
$a[[4]]
[1] "380.18" "1.000"
$a[[5]]
[1] "380.20" "1.028"
$a[[6]]
[1] "380.21" "2.476"
$a[[7]]
[1] "380.22" "1.689"
$a[[8]]
[1] "380.28" "0.846"
$a[[9]]
[1] "380.30" "11.570"
$a[[10]]
[1] "380.34" "16.795"
$a[[11]]
[1] "380.35" "7.689"
$a[[12]]
[1] "380.36" "2.984"
$a[[13]]
[1] "380.37" "1.920"
$a[[14]]
[1] "380.38" "5.508"
$a[[15]]
[1] "380.39" "1.565"
$a[[16]]
[1] "380.40" "14.881"
$a[[17]]
[1] "380.41" "19.781"
$a[[18]]
[1] "380.42" "3.557"
$a[[19]]
[1] "380.43" "6.826"
$a[[20]]
[1] "380.44" "13.258"

Spatial data and memory

I am trying to add up geotiffs but am running into memory issues. R is using all 32GB according the the following R error...
In writeValues(y, x, start = 1) :
Reached total allocation of 32710Mb: see help(memory.size)
I also checked the properties of R and it is 64 bit and the target is...
"C:\Program Files\R\R-3.3.0\bin\x64\Rgui.exe"
The version is
R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "3"
$minor
[1] "3.0"
$year
[1] "2016"
$month
[1] "05"
$day
[1] "03"
$`svn rev`
[1] "70573"
$language
[1] "R"
$version.string
[1] "R version 3.3.0 (2016-05-03)"
$nickname
[1] "Supposedly Educational"
So it looks like my max memory is being used by R. I tried the to use bigmemory package in R. So in the code below I tried changing the matrix to big.matrix but that failed and the error occurs when trying to write the output file. Any suggestions for trying to alter the code so less memory is used or try to work in the package ff or bigmemory?
############ LOOP THROUGH AGE MAPS TO COMPILE THE NUMBER OF TIMES A CELL BURNS DURING A GIVEN SPAN OF TIME ####################
## Empirical Fires
print("1 of 3: 2010-2015")
burn.mat<- matrix(0,nrow,ncol) #create matrix of all zero's, the dimension of your landscape (row, col)
# Read in Historical Fire maps
for (j in 2010:2015){ #Year Loop
age.tmp<- as.matrix(raster(paste('fr',j,'.tif',sep=''))) #read in Age Map
burn.mat<- burn.mat+(age.tmp==1) #when something has burned in ALFRESCO empirical fire history files, AGE=1. (age.tmp==0) is a 'logic' cmd, returning a 0,1 map for True/False
#Write the data to a geotiff
out <- raster(burn.mat,xmn=-1692148,xmx= 1321752, ymn = 490809.9, ymx = 2245610, crs = '+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs')
writeRaster(out,filename=paste(outdir,'/burn.mat.hist.1950-2007.tif',sep=''),format = 'GTiff',options='COMPRESS=LZW',datatype='FLT4S',overwrite=T)
}
The problem will probably go away if you use Raster* objects rather than matrices. Something like
library(raster)
r <- raster('fr2010.tif')
burn.mat <- setValues(r, 0)
for (j in 2010:2015) {
age.tmp <- raster(paste0('fr', j, '.tif'))
burn.mat <- burn.mat + (age.tmp==1)
# if age.tmp only has values of 0 and 1 use this instead:
# burn.mat <- burn.mat + age.tmp
}
# write the results outside of the loop
writeRaster(burn.mat, filename=file.path(outdir, 'burn.mat.hist.1950-2007.tif'), options='COMPRESS=LZW',datatype='FLT4S',overwrite=TRUE)
A more direct approach without a loop
files <- paste0('fr', 2010:2015, '.tif'))
s <- stack(files)
burn <- sum(s)
Or
burn <- sum(s == 1)
Or to write to a file in one step
b <- calc(s, sum, filename=file.path(outdir, 'burn.mat.hist.1950-2007.tif'), options='COMPRESS=LZW', datatype='FLT4S', overwrite=TRUE)

Dynamic data exporting using R

mybrowser$navigate("http://bitcointicker.co/transactions/")
> a <- mybrowser$findElement(using = 'css selector',"#transactionscontainer")
> a
[1] "remoteDriver fields"
$remoteServerAddr
[1] "localhost"
$port
[1] 4444
$browserName
[1] "firefox"
$version
[1] ""
$platform
[1] "ANY"
$javascript
[1] TRUE
$autoClose
[1] FALSE
$nativeEvents
[1] TRUE
$extraCapabilities
list()
[1] "webElement fields"
$elementId
[1] "0"
I am trying to web scrape live Data using RSelenium and Rvest. I am planning to create a control loop with a timer to run every minute but I am struggling with the dynamic exporting of data into a folder on my computer. The ideal would be to create an output file and R would update rows automatically on the one file although I am not sure if this is possible using R.

Count number of times a word-wildcard appears in text (in R)

I have a vector of either regular words ("activated") or wildcard words ("activat*"). I want to:
1) Count the number of times each word appears in a given text (i.e., if "activated" appears in text, "activated" frequency would be 1).
2) Count the number of times each word wildcard appears in a text (i.e., if "activated" and "activation" appear in text, "activat*" frequency would be 2).
I'm able to achieve (1), but not (2). Can anyone please help? thanks.
library(tm)
library(qdap)
text <- "activation has begun. system activated"
text <- Corpus(VectorSource(text))
words <- c("activation", "activated", "activat*")
# Using termco to search for the words in the text
apply_as_df(text, termco, match.list=words)
# Result:
# docs word.count activation activated activat*
# 1 doc 1 5 1(20.00%) 1(20.00%) 0
Is it possible that this might have to do something with the versions? I ran the exact same code (see below) and got what you expected
> text <- "activation has begunm system activated"
> text <- Corpus(VectorSource(text))
> words <- c("activation", "activated", "activat")
> apply_as_df(text, termco, match.list=words)
docs word.count activation activated activat
1 doc 1 5 1(20.00%) 1(20.00%) 2(40.00%)
Below is the output when I run R.version(). I am running this in RStudio Version 0.99.491 on Windows 10.
> R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "3"
$minor
[1] "2.3"
$year
[1] "2015"
$month
[1] "12"
$day
[1] "10"
$`svn rev`
[1] "69752"
$language
[1] "R"
$version.string
[1] "R version 3.2.3 (2015-12-10)"
$nickname
[1] "Wooden Christmas-Tree"
Hope this helps
Maybe consider different approach using library stringi?
text <- "activation has begun. system activated"
words <- c("activation", "activated", "activat*")
library(stringi)
counts <- unlist(lapply(words,function(word)
{
newWord <- stri_replace_all_fixed(word,"*", "\\p{L}")
stri_count_regex(text, newWord)
}))
ratios <- counts/stri_count_words(text)
names(ratios) <- words
ratios
Result is:
activation activated activat*
0.2 0.2 0.4
In the code I convert * into \p{L} which means any letter in regex pattern. After that I count found regex occurences.

Split string in R

I am trying to split the output of "ls -lrt" command from Linux. but it's taking only one space as delimeter. If there is two space then its taking 2nd space as value. So I think I need to suppress multiple space as one. Does anybody has any idea on this?
> a <- try(system("ls -lrt | grep -i .rds", intern = TRUE))
> a
[1] "-rw-r--r-- 1 u7x9573 sashare 2297 Jun 9 16:10 abcde.RDS"
[2] "-rw-r--r-- 1 u7x9573 sashare 86704 Jun 9 16:10 InputSource2.rds"
> str(a)
chr [1:6] "-rw-r--r-- 1 u7x9573 sashare 2297 Jun 9 16:10 abcde.RDS" ...
>
>c = strsplit(a," ")
>c
[[1]]
[1] "-rw-r--r--" "1" "u7x9573" "sashare" ""
[6] "2297" "Jun" "" "9" "16:10"
[11] "abcde.RDS"
[[2]]
[1] "-rw-r--r--" "1" "u7x9573" "sashare"
[5] "86704" "Jun" "" "9"
[9] "16:10" "InputSource2.rds"
In next step I needed just file name and I used following code which worked fine:
mtrl_name <- try(system("ls | grep -i .rds", intern = TRUE))
This returns that info in a data frame for the indicated files:
file.info(list.files(pattern = "[.]rds$", ignore.case = TRUE))
or if we knew the extensions were lower case:
file.info(Sys.glob("*.rds"))
strsplit takes a regular expression so we can use those to help out. For more info read ?regex
> x <- "Spaces everywhere right? "
> # Not what we want
> strsplit(x, " ")
[[1]]
[1] "Spaces" "" "" "everywhere" "right?"
[6] ""
> # Use " +" to tell it to split on 1 or more space
> strsplit(x, " +")
[[1]]
[1] "Spaces" "everywhere" "right?"
> # If we want to be more explicit and catch the possibility of tabs, new lines, ...
> strsplit(x, "[[:space:]]+")
[[1]]
[1] "Spaces" "everywhere" "right?"

Resources