Undefined columns selected error in R - r

I apologize in advance because I'm extremely new to coding and was thrust into it just a few days ago by my boss for a project.
My data set is called s1. S1 has 123 variables and 4 of them have some form of "QISSUE" in their name. I want to take these four variables and duplicate them all, adding "Rec" to the end of each one (That way I can freely play with the new variables, while still maintaining the actual ones).
Running this line of code keeps giving me an error:
b<- llply(s1[,
str_c(names(s1)
[str_detect(names(s1), fixed("QISSUE"))],
"Rec")],table)
The error is as such:
Error in `[.data.frame`(s1, , str_c(names(s1)[str_detect(names(s1), fixed("QISSUE")) & :
undefined columns selected
Thank you!

Use this to get the subset. Of course there is other ways to do that with simpler code
b<- llply(s1[,
names(s1)[str_detect(names(s1), fixed("QISSUE"))]
],c)
nwnam=str_c(names(s1)[str_detect(names(s1), fixed("QISSUE"))],"Rec")
ndf=data.frame(do.call(cbind,b));colnames(ndf)=nwnam
ndf
# of course you can do
cbind(s1,ndf)

Related

Error while using "EpiEstim" and "ggplot2" libraries

First of all, I must say I'm completely noob in R. So I apologize in advance for asking for help with such a simple task. My task is to form a graph of COVID-19 cases for a certain period using data from the CSV file. Unfortunately, at the moment I cannot contact the person from the World Health Organization who provided the data and the script for launching. But I was left with an error that I cannot fix either myself, not with the help of Google.
script.R
library(EpiEstim)
library(ggplot2)
COVID<-read.csv("dataset.csv")
res_parametric_si<-estimate_R(COVID$I,method="parametric_si",config=make_config(list(mean_si=4,std_si=3)))
plot(res_parametric_si)
dataset.csv
Date,Suspected per day,Total suspected,Discarded/pending,Confirmed per day,Total confirmed,Deaths per day,Deaths Total,Case fatality rate,Daily confirmed,Recovered per day,Recovered total,Active cases,Tested with PCR,# of PCR tests total,average tests/ 7 days,Inf HCW,Inf HCW/d,Vent HCW,Susp per day
01-Jul-20,1239,91172,45285,889,45887,12,1185,2.58%,889,505,20053,24649,11109,676684,10073,6828,63,,1239
02-Jul-20,1249,92421,45658,876,46763,27,1212,2.59%,876,505,20558,24993,13167,689851,9966,6874,46,,1249
03-Jul-20,1288,93709,46032,914,47677,15,1227,2.57%,914,597,21155,25295,11825,701676,9915.7,6937,63,,1288
04-Jul-20,926,94635,46135,823,48500,22,1249,2.58%,823,221,21376,25875,9934,711610,9957,6990,53,,926
05-Jul-20,680,95315,46272,543,49043,13,1262,2.57%,543,327,21703,26078,6696,718306,9963.7,7030,40,,680
06-Jul-20,871,96186,46579,564,49607,21,1283,2.59%,564,490,22193,26131,9343,727649,10303.9,7046,16,,871
07-Jul-20,1170,97356,46942,807,50414,23,1306,2.59%,807,926,23119,25989,13568,741217,10806,7092,46,,1170
Error
Error in process_I(incid) (script.R#4): incid must be a vector or a dataframe with either i) a column called 'I', or ii) 2 columns called 'local' and 'imported'.
For the example data the issue seems to be that it does only cover 7 data points, and the configurator assumes that there it can window over more than 7 days. What worked for me was the following code (working in the sense that it does not throw an error).
config <- make_config(incid = COVID$Daily.confirmed,
method="parametric_si",
list(mean_si=4,std_si=3, t_start = c(2,3),t_end = c(6,7)))
res_parametric_si<-estimate_R(COVID$Daily.confirmed,method="parametric_si",config=config)
plot(res_parametric_si)

Adding rows to different dataframe programatically

Im creating a dataframe dynamically and Im using custom names to refer to those data frames.How ever, I can succesfully create the data frames dynamically and add information individually but manually when i try to add a record to it it will run the action but nothing happens. I can open the data frame and it shows as empty
#Extract unique machines on the system
machines <- unique(wo_raw$MACHINE)
for(machine in machines){
#Check if the machine is present on current data frames or has a record
if(exists(machine) && is.data.frame(get(machine))){
#Machine already exists on the system
cat(machine," is a dataframe","\n")
netlbs <- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = NET_LBS)
scraplbs<- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = SCRAP_LBS)
if(is.data.frame(netlbs) && nrow(netlbs)!=0){
totalNet<- sum(netlbs)
totalScrap<- sum(scraplbs)
scrapRate <- percent(totalScrap/(sum(totalNet,totalScrap)),accuracy = 2)
tempDf<-data.frame(curYear,curMonth,curDay,curWeek,totalNet,totalScrap,scrapRate)
names(tempDf)<-c("year","month","day","week","net_lbs","scrap_lbs","scrap_rate")
cat("Total Net lbs for ",machine,": ",totalNet,"\n")
cat("Total Scrap lbs for ",machine,": ",totalScrap,"\n")
cat("Total Scrap Rate for ",machine,": ",scrapRate,"\n")
#machine<-rbind(get(machine),tempDf)
#assign(machine,rbind(machine,tempDf))
add_row(get(machine),year=curYear,
month=curMonth,
day=curDay,
week=curWeek,
net_lbs=totalNet,
scrap_lbs=totalScrap,
scrap_rate=scrapRate)
cat("added row \n")
}
#info<-c(curYear,curMonth,curDay,curWeek,netlbs)
#cat("Total Net lbs: ",netlbs,"\n")
#netlbs <-NULL
}else{
cat("Creating machine dataframe: ",machine,"\n")
#Create a dataframe labeled with machine name contining
#date information, net lbs,scrap lbs and scrap rate
assign(paste0(machine,""),data.frame(year=integer(),
month=integer(),
day=integer(),
week=integer(),
net_lbs=double(),
scrap_lbs=double(),
scrap_rate=integer()
)
)
#machine$year<-curYear
}
#machine<-NULL
}
All the functions that I've tried are in commented lines from previous answers found on Stack Overflow. I did get working with a for but i dont think that would be really feasible since it will consume a lot of resources plus it doesn't work well when handling various data types . Does anybody have an idea of whats going on, I don't have an error to go by.
I think your code needs quite a lot of cleanup. Make sure you know for yourself at each step what exactly you are handling.
Some hints:
Try to make your code self-contained. If I run your code, I get an error right away, as I don't have wo_raw defined. I understand it's some kind of data.frame, but exactly what is in there? What do I need to do to try to run your code? Also with variables like curYear. I get that it needs to be 2019, but I need to type an awful lot to just get to the problem, I can't just copy-paste.
If you use any libraries, please also include a line for them. I don't know what add_row does or is supposed to do. So I also don't know if that's where your expectations are wrong?
Try to make your code minimal before posting it here. I like the comments and cats sprinkled throughout, but why a line such as netlbs <- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = NET_LBS)? For this problem, just something like subset(wo_raw, wo_raw$mach==machine, net) would suffice
I get that the code works, but try to work out where you are using what kind of objects. if (is.data.frame(netlbs)) {total=sum(netlbs)} may work, but summing a data.frame while you actually just need a column leads to confusion.
When using variables to store the names of other variables such as you are doing, be very aware of what you are actually refering to. For that reason, it's generally advisable to steer clear of these constructs, it's almost always easier to store your results in a list or something similar
Come to that: the variable machine is not a data.frame, it's a character. That character is the name of another variable, which is a data.frame. So (commented out) I see some instances of machine <- NULL and machine$year, those are wrong. As is rbind(machine, ...), as machine is not a data.frame
That being said, I think you got close with the assign-statement.
Does assign(machine,rbind(get(machine),tempDf)) work?

How to use create-<breeds>-with between two breed turtle agents?

I've been stuck by this issue for a long time. I have two networks in my model, so I want to create different types of links with different breed turtle agentsets.
Let's name the 1st turtle agentset T1 and the 2nd T2, so what I did is the following:
breed [T1s T1]
undirected-link-breed [TL1s TL1]
breed [T2s T2]
undirected-link-breed [TL2s TL2]
;;Got error report
ask T1s [create-TL1s-with other n-of 10 T1s]
The last line gave an error reporting that "You cannot use breeded and unbreeded links in the same world". I'm quite confused about what this means.
And then, I changed the last line to:
ask T1s [create-links-with other n-of 10 T1s]
It worked this time, but if that's the case, how can I define two different types of links, i.e., TL1 and TL2, with different turtle agentsets T1s and T2s?
Can anybody help me out? I really appreciate it!
Thanks
That error means that you've created some links that have no breed (probably with create-link-with) before creating links with a breed, or vice-versa. If you want to use link breeds, you can never use create-link-with, create-link-to, or create-link-from. You must always use create-<breed>-with, create-<breed>-to, and create-<breed>-from.
So, search your code for instances of create-link-with, create-link-to, or create-link-from and either delete them or change them to create-<breed>-with, create-<breed>-to, or create-<breed>-from. If you're still getting the error, call clear-all or clear-links to make sure you've removed all unbreeded links.

window function R code

fine people of stackoverflow. I have become trapped on a rather simple part of my program and was wondering if you guys could help me.
library(nonlinearTseries)
tt<-c(0,500,1000)
mm<-rep(0,2)
for (j in 1:2){mm[j]=estimateEmbeddingDim(window(rnorm(1000), start=tt[j],end=tt[j+1]), number.points=(tt[j+1]-tt[j]),do.plot=FALSE)}
Warning message:
In window.default(rnorm(1000), start = tt[j], end = tt[j + 1]) :
'start' value not changed
If I plug in the values directly (tt[1], tt[2], tt[3]), it works but I also get a warning
estimateEmbeddingDim(window(rnorm(1000), start=tt[1],end=tt[2]), number.points=(tt[2]-tt[1]),do.plot=FALSE)
[1] 9
Warning message:
In window.default(rnorm(1000), start = tt[1], end = tt[2]) :
'start' value not changed
Thanks, Matt.
The problem seems to be with the
window(rnorm(1000), start=tt[j],end=tt[j+1])
lines. First of all, window is only meant to be used with a time series object (class=="ts"). In this case, rnorm(1000) simply returns a numeric vector, there are no dates associated with this object. So i'm not sure what you think this function does. Did you only want to extract the values that were between 0-500 and 500-1000? If so that seems a bit because with a standard normal variable, the max of 1000 samples isn't likely to be much over 4 let alone 500.
So be sure to use a proper "ts" object with dates and everything to get this to work.

`data.table` error: "reorder received irregular lengthed list" in setkey

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
sapply(my.dt, length)
I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.
If so, there is this bug fix in v1.8.1 :
o rbind() of DT with an irregular list() now recycles the list items
correctly, #2003. Test added.
Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.
EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)
Error in setkeyv(x, cols, verbose = verbose) :
Column 2 is length 4 which differs from length of column 1 (6). Invalid
data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
not already reported and fixed, please report to datatable-help.
NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

Resources