Creating a dataFrame inside a for loop R - r

I am very new to R and I am trying to create a dataframe inside a forloop.
When I print the data frame it seems that it only retains records from the most recent iterations. I believe that the values are overwritten with the new records every time the loop iterates. Is there a way to maintain the previous records and also update the dataframe with new records? Thanks
ttest.function = function(x,cohort,sex,age){
dataF=data.frame(Intensity=numeric(), Cohort=character(),Age=numeric(),sex=character(),Status=numeric(),stringsAsFactors=TRUE)
for(val in c(1:length(age))){
string=strsplit(x[val],":")
intensity=string[[1]][1]
Presence=string[[1]][2]
status=""
new_age=age[val]
new_cohort=cohort[val]
new_sex=sex[val]
if(Presence=="true"){
Ismissing=0
}
else{
Ismissing=1
}
dataF=(cbind(intensity,new_cohort,new_age,new_sex,status))
}
print(dataF)
}

Related

Inserting listview items individually into database as new entries

Tl:dr
I have a listview with items. I want each individual item inserted into my sqlite database as a new entry. Right now, I am only able to insert all items into the database as a single entry.
I am able to populate the list from my database correctly. If I manually input the items in the SqliteStudio. The added items will show up as an individual item.
Code settings up the list
private ObservableList listchosedescription;
listchosedescription = FXCollections.observableArrayList();
this.descriptionschosen.setItems(listchosedescription);
Code for populating the list
while (result.next()) {
listchosedescription.add(result.getString("description"));
}
descriptionschosen.setItems(listchosedescription);
Faulty code for adding listview items to the database
Connection conn = dbConnection.getConnection();
PreparedStatement statement2 = conn.prepareStatement(sqlDesInsert);
statement2.setString(1, String.valueOf(descriptionschosen.getItems()));
statement2.setInt(2, Integer.parseInt(labelidnew.getText()));
statement2.execute();
From looking online. I think that I need a for-loop counting the individual items in the list.
for(int i = listchosedescription.size(); i != 0; i--){
Then I need to add each individual entry to a batch and then execute the batch.
I also understand how to get a single item from the listview. So I feel a little stuck, hence I thought I would post for guidance.
for (int i = listchosedescription.size(); i != 0; i--) {
statement2.setString(1, String.valueOf(listchosedescription.subList(i - 1, i)));
statement2.setInt(2, Integer.parseInt(labelidnew.getText()));
statement2.addBatch();
}
statement2.executeBatch();
In this for-loop, I have three statements:
I create an integer (i) which counts the size() of my observableList.
I run the loop as long as the size() is not equal to 0 (should probably be as long as it is larger than zero).
I decrease my integer (i) by 1 each time the loop is run.
Inside the loop, I add my two statements as I normally would. But the values from the observableList are accessed by using its subList. I acess the location using my integer (i).
i-1 will make sure I reach the correct fromIndex.
i will make sure I reach the correct toIndex.
Lastly, I add to the batch inside the loop and execute the batch after the loop.

Unable to update data in dataframe

i tried updating data in dataframe but its unable to get updating
//Initialize data and dataframe here
user_data=read.csv("train_5.csv")
baskets.df=data.frame(Sequence=character(),
Challenge=character(),
countno=integer(),
stringsAsFactors=FALSE)
/Updating data in dataframe here
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i]==user_data$challenge_sequence[j]&&user_data$challenge[i]==user_data$challenge[j])
{
writedata(user_data$challenge_sequence[i],user_data$challenge[i])
}
}
}
writedata=function( seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence=seqnn,Challenge=challng,countno=1)
baskets.df=rbind(baskets.df,newRow)
}
//view data here
View(baskets.df)
I've modified your code to what I believe will work. You haven't provided sample data, so I can't verify that it works the way you want. I'm basing my attempt here on a couple of common novice mistakes that I'll do my best to explain.
Your writedata function was written to be a little loose with it's scope. When you create a new function, what happens in the function technically happens in its own environment. That is, it tries to look for things defined within the function, and then any new objects it creates are created only within that environment. R also has this neat (and sometimes tricky) feature where, if it can't find an object in an environment, it will try to look up to the parent environment.
The impact this has on your writedata function is that when R looks for baskets.df in the function and can't find it, R then turns to the Global Environment, finds baskets.df there, and then uses it in rbind. However, the result of rbind gets saved to a baskets.df in the function environment, and does not update the object of the same name in the global environment.
To address this, I added an argument to writedata that is simply named data. We can then use this argument to pass a data frame to the function's environment and do everything locally. By not making any assignment at the end, we implicitly tell the function to return it's result.
Then, in your loop, instead of simply calling writedata, we assign it's result back to baskets.df to replace the previous result.
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i] == user_data$challenge_sequence[j] &&
user_data$challenge[i] == user_data$challenge[j])
{
baskets.df <- writedata(baskets.df,
user_data$challenge_sequence[i],
user_data$challenge[i])
}
}
}
writedata=function(data, seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence = seqnn,
Challenge = challng,
countno = 1)
rbind(data, newRow)
}
I'm not sure what you're programming background is, but your loops will be very slow in R because it's an interpreted language. To get around this, many functions are vectorized (which simply means that you give them more than one data point, and they do the looping inside compiled code where the loops are fast).
With that in mind, here's what I believe will be a much faster implementation of your code
user_data=read.csv("train_5.csv")
# challenge_indices will be a matrix with TRUE at every place "challenge" and "challenge_sequence" is the same
challenge_indices <- outer(user_data$challenge_sequence, user_data$challenge_sequence, "==") &
outer(user_data$challenge, user_data$challenge, "==")
# since you don't want duplicates, get rid of them
challenge_indices[upper.tri(challenge_indices, diag = TRUE)] <- FALSE
# now let's get the indices of interest
index_list <- which(challenge_indices,arr.ind = TRUE)
# now we make the resulting data set all at once
# this is much faster, because it does not require copying the data frame many times - which would be required if you created a new row every time.
baskets.df <- with(user_data, data.frame(
Sequence = challenge_sequence[index_list[,"row"]],
challenge = challenge[index_list[,"row"]]
)

Getting Error "could not find function "assign<-" inside of colnames()

I'm using assign() to assign some new data frames from some other data frame. I then want to name some of the columns in the new data frame. When I use assign() to create the new data frames it works fine. But when I use the assign() inside of colnames() is gives the error 'Error "could not find function "assign<-".'
Here's my snippet of code(abbreviated of course):
for(i in 1:value) {
assign(Name[i], Old.Data.Frame[Old.Data.Frame$1 == Index[i]]) #I'm going to call this line of code 'New Data Frame' for brevity
for(j in 1:ncol(New Data Frame)) {
colnames(New Data Frame)[j] = as.character(Old.Data.Frame[3,j])
I do all this assign() stuff because the names of the Old Data Frame constantly change and I can create any concrete variables in my code, only the dimentions of the frame stay the same.
The only error in this code is that R cannot "find function assign<- in colnames(...". I'm flustered because assign() had just worked in the line before, any help is appreciated, thanks!
You have a list of variable names in Name, which you assign a value (your code block).
for(i in 1:value) { assign(Name[i], Old.Data.Frame[Old.Data.Frame$1 == Index[i]]) }
Could you then try (note I'm separating this code block for debugging purposes):
for(i in 1:value) { colnames(get(Names[i])) <- as.character(Old.Data.Frame[3,] }
get will retrieve the data (data.frame) assigned to the variable name Names[i] (character)

Multiple inputs into a R function

So I currently have this code below in a file, pullsec.R
pullsec <- function(session=NULL){
if(is.null(session)) session<-1
stopifnot(is.numeric(session))
paste("Session",1:10)[session]
}
In an .Rnw file, I call on this pullsec.R and choose session number 3 by:
source("pullsec.R")
setsec <- pullsec(3)
which would pull all of the rows where the column Session has data values of "Session 3"
I would like to add another block to pullsec.R that would allow me to pull data for a second column, Sessions where the data in that column is Sessions 1-2, Sessions 3-4, Session 5-6, etc. But I'm not sure how to modify the pullsec block to accept multiple inputs.
I had tried many solutions but no bite. My most naive solution is:
pullsec2 <- function(sessions1=NULL,sessions2=NULL){
if(is.null(sessions1)) sessions1<-1
stopifnot(is.numeric(session1))
paste("Sessions",1:10,"-",1:10)[sessions]
}
either of these would do:
pullsec2 <- function(session1=1,session2=2){
stopifnot(is.numeric(session1))
stopifnot(is.numeric(session2))
paste0("Sessions ",session1,'-',session2)
}
pullsec2(3,4)
pullsec2 <- function(sessions=1){
stopifnot(is.numeric(sessions))
paste("Sessions",paste0(sessions,collapse="-"))
}
pullsec2(3:4)

How to check specific column exist in datarow?

I came across situation,where columnname for datatable is dynamic.While fetching a data I want to check existence of column.
DataTable table = ds.Table["Sample1"]
if(table.Row.Count > 0)
{
foreach(DataRow dr in table.Rows )
{
if(dr.Table.Column.Contain("DateInfo"))
{
// store value in variable
// first approach
}
if(table.Column.Contain("DateInfo"))
{
// store value in variable
// second approach
}
}
}
Which one is best approach?
Will this be enough:
1st Approach: Which will simply check in an entire DataTable.
datatable.Columns.Contains("column")
2nd Approach: which will check for each row collection in DataTable
dr.Table.Columns.Contains("column")
3rd Approach: Which fetch each columns in DataColumnCollection object and then check if it contains the specific field or not.
DataColumnCollection columns = datatable.Columns;
if (columns.Contains(columnName))
So these all approaches are better in their own way. you can use whatever you find it better.
This is best one
dr.Table.Column.Contain("DateInfo")
foreach loop get single row at a time sometimes if any conditional is possible in this method

Resources