how to round specific columns by function using R? - r

I want to round off specific columns with each column have different rounding values. I tried with the following code but it gives an error:
roundCols <-function(repo, namcol, digiround){
repo[,"namcol"] = round(repo[,"namcol"], digits = digiround)
round.staus = TRUE
return(round.staus)
}
round.staus = FALSE
ils <- config[13]$ignoreColumns
ils <- gsub("\\{|\\}", "", ils)
ils <- ils %>% str_replace_all("\\&", ",")
coldrp <- unlist(strsplit(ils, "\\,"))
coldrp = gsub("[^a-zA-Z]+", ".", coldrp)
td <- fread(config[13]$save.location,stringsAsFactors = FALSE,drop=coldrp,blank.lines.skip = TRUE)
col_rnm <- c(names(td[,2]),names(td[,3])) #it has 2 column who's will be round off
col_rd <- c(2,3) #it gives digits how much rounding off required
for (i in 1:length(col_rnm)) {
round.staus = roundCols(td,col_rnm,col_rd[i])
}
td
error is :
Error in [.data.table(repo, , "namcol") :
column(s) not found: namcol
I tried the same given in function on a console which gives an exact result.
Expected Output:
Account Chargeable.Capacity Expected.Capacity.in.30.days Deviation
Kishore 0.01 0.007 3.778268e-11
Initially My data :
Account Chargeable.Capacity Expected.Capacity.in.30.days Deviation
Kishore 0.007124108 0.007283185 3.778268e-11
above what is expected from the function given the code. Help me to solve that error. The effort will be appreciated.

Do this instead:
for (i in 1:length(col_rnm)) {
set(td, , col_rnm[i], round(td[, col_rnm[i], with = FALSE], col_rd[i]))
}
If you look at the help page for ?set (same help page as ?":="), you'll see it described as
set is a low-overhead loop-able version of :=
You'll find set used in many answers here, such as this one and this one.
Reasons your approach didn't work:
You're missing an i in your loop: roundCols(td,col_rnm,col_rd[i]) needs to use col_rnm[i]
Your roundCols function neither updates the data by reference using data.table syntax (either set() or :=), nor does it return the updated data, so any changes are local to the function
The string "namcol" with quotes is just a string. To use the argument namcol, you need to use it without quotes.
You don't need an extra function for this---the approach above with set is simpler.

Related

Transaction problem in RStudio for tweet apriori analysis

I want to use the apriori algorithm to apply association rules between words on the tweet database I have with RStudio. However, the code below gives an error on a million rows of data, while working on a small number of data. I needed your help as I couldn't understand what caused the error.
TweetTrans <- read.transactions("../input/tweets/output.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
The Error is:
Error in validObject(.Object): invalid class “ngCMatrix” object: row indices are not sorted within columns
Traceback:
1. read.transactions("../input/tweets/output.csv", rm.duplicates = FALSE,
. format = "basket", sep = ",", encoding = "UTF-8")
2. as(data, "transactions")
3. asMethod(object)
4. new("transactions", as(from, "itemMatrix"), itemsetInfo = data.frame(transactionID = names(from),
. stringsAsFactors = FALSE))
5. initialize(value, ...)
6. initialize(value, ...)
7. callNextMethod()
8. .nextMethod(.Object = .Object, ... = ...)
9. callNextMethod()
10. .nextMethod(.Object = .Object, ... = ...)
11. as(from, "itemMatrix")
12. asMethod(object)
13. new("ngCMatrix", p = c(0L, p), i = as.integer(i) - 1L, Dim = c(length(levels(i)),
. length(p)))
14. initialize(value, ...)
15. initialize(value, ...)
16. callNextMethod()
17. .nextMethod(.Object = .Object, ... = ...)
18. validObject(.Object)
19. stop(msg, ": ", errors, domain = NA)
Here are some ideas for how to find a rogue line in the data file. The input to read.transactions should be a text file the looks something like
A, B, C
B, C
C, D, E
D, A, B, F
where A, B ,C, etc are the names of the items (probably longer than one character each!)
So you could read in the file using readLines...
data <- readLines("../input/tweets/output.csv")
Each element of data (one per line of the file) should be a string of the form "A, B, C" etc, as above.
You could then use functions (e.g. from the stringr package) to check if any lines contain unusual characters, or have an odd format. Without seeing your file, it is hard to say how to do this, but you might, for example, look for quotes in odd places (str_detect(data, '\\"')) or characters that are not letters, digits , spaces or commas (str_detect(data, "[^\\w\\d\\s,]")).
Another thing you could try is to write a for loop to take each element of data (or perhaps larger chunks if that is too slow), save it as a file, try reading it with read.transactions, and see where it crashes.
for(i in seq_along(data)){
writeLines(data[i], "dummyfile.csv")
trans <- read.transactions("dummyfile.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
}
The value of i when it crashes will give you the problem row number. It might take a long time to run, though!
I ran into a very similar problem: the same error got triggered when trying to cast a list to a transaction object.
I also couldn't easily figure out what lines in the data caused the issue, as it seems to be triggered by a combination of transactions and not necessarily by any individual one, but I managed to track down the source of the problem in this assignment (source):
p <- new("ngCMatrix", p = c(0L, p),
i = as.integer(i) - 1L,
Dim = c(length(levels(i)), length(p)))
My R got pretty rusty over time and I couldn't find an immediate way to patch the code, but I came up with an alternative solution for constructing the ngCMatrix object:
Assume you have the data in a data.frame following some sort of (user, item) format - in your case it would most likely be (tweet_id, term/word)
Create a unique incremental ID for every user and item and add it to your data.frame
Use those ID to create the sparse matrix and - optionally - enrich it with the labels for item and user to make it more interpretable
Finally, cast the sparse matrix to a transaction object
Example (I implemented mine with data.table, but a traditional dataframe implementation would be very similar):
library(Matrix)
library(data.table)
library(arules)
DT <- data.table(user = c('A','A','B','B','A','C','D'),
item = c('AAB','AAA','AAB','BBB','ABA','BBB','AAB'))
# Create user_ids
unique_users <- unique(DT$user)
users <- data.table(user=unique_users,
user_id=c(1:length(unique_users)))
# Repeat for items
unique_items <- unique(DT$item)
items <- data.table(item=unique_items,
item_id=c(1:length(unique_items)))
# Add indexes to original data table (setting keys helps with performance)
DT <- merge.data.table(x=DT, y=users, by='user')
DT <- merge.data.table(x=DT, y=items, by='item')
# Create the sparse matrix
mat <- sparseMatrix(
i = DT$item_id,
j = DT$user_id,
dims = c(nrow(items), nrow(users)),
dimnames = list(items$item, users$user)
)
# transform to arules 'transactions'
txn <- as(op, "transactions")
Please note that this doesn't help understanding what caused the issue, but rather provides a workaround to solve it. In my data.table implementation the code is pretty performant, taking only a few seconds to process over 30M transactions on a laptop-sized machine (2 CPUs, 16gb RAM).

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

R get_ga function: filter component

I want to get Google analytic data from a specific list of cardnumbers. The component ga:dimension10 contains the cardnumbers. The following code works:
ga_datasubset <- subset(get_ga(id, Startdatum, Einddatum,
metrics = c("ga:sessions", " ga:pageviews","ga:sessionDuration"),
dimensions="ga:dimension10, ga:deviceCategory, ga:medium",
fetch.by ="day"),
dimension10 %in% Datatest[,1])
But I want to make this code without using the subset function. I tried the code below, but this doesn’t work.
ga_datasubset <- get_ga(id, Startdatum, Einddatum,
metrics = c("ga:sessions", " ga:pageviews","ga:sessionDuration"),
dimensions="ga:dimension10, ga:deviceCategory, ga:medium",
filters ="ga:dimension10 %in% Datatest[,1]" ,
fetch.by ="day")
Error: Invalid parameter: Invalid value 'ga:dimension10 %in% Datatest[,1]' for filters parameter.
Any help will be greatly appreciated

R shiny: how to create a dynamic list with names and values

I want to create a dynamic list with the names and values based on user inputs. I need to pass a list with the names of each factor as well as two values for each factor to a function.For example,
factor.names=list( A=c(-1,1),B=c(-1,1),C=c(-1,1),D=c(-1,1) ) )
The code below changes the factor values but leaves the names as nf1,nf2 etc.
if(input$fac==2){
names<-list(nf1 = c(input$l1,input$h1),nf2 = c(input$l2,input$h2))
}
I have tried using
names<-list(input$nf1 = c(input$l1,input$h1), input$nf2 = c(input$l2,input$h2))
But I keep on getting the following error:
Error in source(file, ..., keep.source = TRUE, encoding = checkEncoding(file)) :
C:\Users\Fred\Documents\App/server.R:49:59: unexpected '='
})
names<-list(n1 = c(input$l1,input$h1),input$nf2 =
^
I have also tried
n1<-reactive({
as.character(input$nf1)
})
names<-list(n1 = c(input$l1,input$h1),n2 = c(input$l2,input$h2))
}
But the names just stay as n1, n2 etc.
Any help or advice on the topic would be highly appreciated.

Error in table(x, y) : attempt to make a table with >= 2^31 elements

I have a problem with plotting my results. Previously (about two weeks ago) I can use same code at below to plot my data but now I'am getting error
data<- read.table("my_step.odt", header = FALSE, sep = "", quote="\"'", dec=".", as.is = FALSE, strip.white=FALSE, col.names=c(.......);
mgn_my <- data[1:49999,18]
sim <- data[1:49999, 21]
plot(sim , mgn_my , type="l",xlab="Time (ns)",ylab="mx")
error
Error in table(x, y) : attempt to make a table with >= 2^31 elements
any suggestion?
I have had a similar problem as you before. Based on my response from another post, here's what I would suggest before you run plot:
Option 1: Use droplevels
mgn_my <- droplevels(data[1:49999,18])
Option 2: Use apply. This approach seems "friendlier" if you are familiar with apply-family functions in R. For example:
mgn_my <- data[1:49999,18]
apply(mgn_my,1,plot)

Resources