I have a data frame with zero columns and zero rows, and I want to have the for loop fill in numbers from 1 to 39. The numbers should be repeating themselves twice until 39, so for instance, the result I am looking for will be in one column, where each number repeats twice
Assume st is the data frame I have set already. This is what I have so far:
for(i in 1:39) {
append(st,i)
for(i in 1:39) {
append(st,i)
}
}
Expected outcome will be in a column structure:
1
1
2
2
3
3
.
.
.
.
39
39
You don't need to use for loop. Instead use rep()
# How many times you want each number to repeat sequentially
times_repeat <- 2
# Assign the repeated values as a data frame
test_data <- as.data.frame(rep(1:39, each = times_repeat))
# Change the column name if you want to
names(test_data) <- "Dont_encourage_the_use_of_blanks_in_column_names"
Related
I have the following matrix:
x=matrix(c(1,2,2,1,10,10,20,21,30,31,40,
1,3,2,3,10,11,20,20,32,31,40,
0,1,0,1,0,1,0,1,1,0,0),11,3)
I would like to find for each unique value of the first column in x, the maximum value (across all records having that value of the first column in x) of the third column in x.
I have created the following code:
v1 <- sequence(rle(x[,1])$lengths)
A=split(seq_along(v1), cumsum(v1==1))
A_diff=rep(0,length(split(seq_along(v1), cumsum(v1==1))))
for( i in 1:length(split(seq_along(v1), cumsum(v1==1))) )
{
A_diff[i]=max(x[split(seq_along(v1), cumsum(v1==1))[[i]],3])
}
However, the provided code works only when same elements are consecutive in the first column (because I use rle) and I use a for loop.
So, how can I do it to work generally without the for loop as well, that is using a function?
If I understand correctly
> tapply(x[,3],x[,1],max)
1 2 10 20 21 30 31 40
1 1 1 0 1 1 0 0
For grouping more than 1 variable I would do aggregate, note that matrices are cumbersome for this purpose, I would suggest you transform it to a data frame, nonetheless
> aggregate(x[,3],list(x[,1],x[,2]),max)
Problem
Let's consider two data frames :
One containing only 1's and 0's and second one with data :
set.seed(20)
df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))
#zero_one data frame
sample.0.1..5..T. sample.0.1..5..T..1 sample.0.1..5..T..2
1 0 1 0
2 1 0 0
3 1 1 1
4 0 0 0
5 1 0 1
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))
#with data
append.rnorm.4...10. append.runif.4....5. append.rexp.4...20.
1 0.08609139 0.2374272 0.3341095
2 -0.63778176 0.2297862 0.7537732
3 0.22642990 0.9447793 1.3011998
4 -0.05418293 0.8448115 1.2097271
5 10.00000000 -5.0000000 20.0000000
Now what I want to do is to change values in second data frame for which first data frame takes values 0 by mean calculated for values for which first data frame takes value one.
Example
In first column I want to replace 0.08609139 and -0.05418293 (values for which first column in first data frame takes values 0) by mean(-0.63778176, 0.22642990,10.00000000) (values for which first column in first data frame takes values 1).
I want to do it using mutate_all() function from dplyr package.
My work so far
df1<-df1 %>% mutate_all(
function(x) ifelse(df[x]==0, mean(x[df==1],na.rm=T,x)))
I know that the condition df[x] is meaningless, but I have no idea what should i put there. Could you please help me with that ?
You could follow #deschen's suggestion and multiply the two data frames together.
Here is another approach to consider using mapply. For each column, identify the positions (indices) in df where value is zero.
Then, substitute the corresponding df1 column of those positions with the mean of other values in the column. y[-idx] should be all values in the df1 column that exclude those positions.
Note that my set.seed is different - when I used yours of 20 I got different values, and a column with all zeroes. Please let me know if you are able to reproduce.
set.seed(12)
df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))
my_fun <- function(x, y) {
idx <- which(x == 0)
y[idx] <- mean(y[-idx])
return(y)
}
mapply(my_fun, df, df1)
Let's say I have a function called remove_fun which reduces the number of rows of a dataframe based on some conditions (this function is too verbose to include in this question). This function takes as its input a dataframe with 2 columns. For example, an input df called block_2_df could look likes this:
block_2_df
Treatment seq
1 29
1 23
3 60
1 6
2 41
1 5
2 44
For this example, let's say the function remove_fun removes 1 row at a time based on the highest value of seq in block_2_df$seq. Applying remove_fun once would result in a new dataframe that looks like this:
remove_fun(block_2_df)
Treatment seq
1 29
1 23
1 6
2 41
1 5
2 44
I.e., the row containing seq==60 in block_2_df was removed via remove_fun
I can create a while loop which repeats this operation on block_2_df via remove_fun based on the number of rows remaining in block_2_df as:
while (dim(block_2_df)[1]>1) {
block_2_df <- remove_fun(block_2_df)
print(remove_fun(block_2_df))
}
This while loop reduces block_2_df until it has 1 row left (the lowest value of block_2_df$seq), and prints out the 'updated' versions of block_2_df until it is reduced to one row.
However, I'd like to save each 'updated' version of block_2_df (i.e. block_2_df with 7, then 6, then 5,....,then 1 row) produced from the while loop. How can I accomplish this? I know for for loops, this could be done by creating an empty list at storing each 'updated' block_2_df in the ith element in the empty list. But I'm not sure how to do something similar in a while loop. It would be great to have a list of dfs as output from this while loop.
Just create and maintain an index counter yourself. It's a bit more trouble than a for() loop, that does it on its own but it's not so difficult.
saved <- list()
i <- 1
while (dim(block_2_df)[1]>1) {
block_2_df <- remove_fun(block_2_df)
saved[[i]] <- block_2_df
i <- i + 1
print(block_2_df)
}
Also, you were calling remove_funtwice in your loop, that was probably not what you wanted to do. I've corrected that, if I'm wrong please say so.
I have a dataframe like below.
11,15,12,25
11,12
15,25
134,45,56
46
45,56
15,12
66,45,56,24,14,11,25,12,134
I want to identify the frequency of pairs/triplets or higher that occurs in the data. Say for example, in above data the occurrence of pairs looks like below
item No of occurrence
11,12 3
11,25 2
15,12 2
15,25 2
.
.
45,56 3
134,45,56 2 ....and so on
I am trying to write a R code for the above and I am finding difficulty to approach this.
Given a 1 column data.frame with commas separating the variables, the following should produce your desired result:
# split column into a list
myList <- strsplit(df$V1, split=",")
# get all pairwise combinations
myCombos <- t(combn(unique(unlist(myList)), 2))
# count the instances where the pair is present
myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
sum(sapply(myList, function(j) {
sum(!is.na(match(c(myCombos[i,]), j)))})==2)})
# construct final matrix
allDone <- cbind(matrix(as.integer(myCombos), nrow(myCombos)), myCounts)
This returns a matrix where the first two columns are the items in comparison and the third column of the count that these items are in the row of the data.frame.
data
df <- read.table(text="11,15,12,25
11,12
15,25
134,45,56
46
45,56
15,12
66,45,56,24,14,11,25,12,134", as.is=TRUE)
So I have a data frame consisting of values of 0 and 1. I would like to make a loop that randomly samples 38 of those observations and replaces them with NA. I am successful in doing one iteration, where the original vector observations are replaced with the following one line code:
foo$V2[sample(seq(foo$V2), 38)] <- NA
However, I would like to do this 20 times and have each iteration compiled as separate columns in a single object. Ultimately, I would have a 20 column data frame with each having 191 observations, each with 38 randomly substituted NA's. At the end of the loop, I would like the data frame to be written out as a text file. Thank you for any help in the right direction.
Data Set:
https://drive.google.com/file/d/0BxfytpfgCdAcdEQ2LWFuVWVqMVU/view?usp=sharing
Maybe something like this:
# Fake data
set.seed(1998)
foo = data.frame(V2=sample(0:1, 191, replace=TRUE))
# Create 20 samples where we replace 38 randomly chosen values with NA
n = 20
samples = as.data.frame(replicate(n, {
x = foo$V2
x[sample(seq_along(x), 38)] = NA
x
}))
Then you can write it in whatever format you wish. For example:
write.table(samples, "samples.txt")
write.csv(samples, "samples.csv", row.names=FALSE)