I have a matrix with 10 columns and 1000 rows of data. I have a vector with 1000 rows of data. I want to subtract that vector from every column separately, using a loop. What is the best way to do this in R?
Related
I would like to create a matrix in R, say 1000 by 10, and each column is a random sample without replacement of 1000 numbers between 1 and 1000. I would like this random sample 10 times, one for each column in the matrix.
Thank you in advance.
I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df
I have a number of dataframes, each with different numbers of rows. I want to break them all into smaller dataframes that have no more than 50 rows each, for example.
So, if I had a dataframe with 107 rows, I want to output the following:
A dataframe containing rows 1-50
A dataframe containing rows 51-100
A dataframe containing rows 101-107
I have been reading many examples using the split() function but I have not been able to find any usage of split() or any other solution to this that does not pre-define the number of dataframes to split into, or scrambles the order of the data, or introduce other problems.
This seems like such a simple task that I am surprised that I have not been able to find a solution.
Try:
split(df,(seq_len(nrow(df))-1) %/% 50)
What have in common the first 50 rows? If you make an integer division (%/%) of the index of row (less one) by 50, they all give 0 as result. As you can guess, rows 51-100 give 1 and so on. The (seq_len(nrow(df))-1) %/% 50 basically indicate the group you want to split into.
Say I have a data.frame of arbitrary dimensions (n by p). I want to extract a vector of length n from that data.frame, one element in the vector per row in the data.frame. However, the column in which each element lies may vary by row. Is there a way to do this without loops?
For example, if I have the following (3x3) data frame, called say DATA
X Y Z
1 17 43
3 4 2
6 9 0
I want to extract one scalar value from DATA per row. I have a vector, call it column.list, c(1,3,1) (arbitrarily selected in this case) which gives the column index for the elements I want, where the kth element of column.list is the column index for row k in DATA. How do I do this without loops? I want to avoid loops because I am using this repeatedly in a simulation study that will take a lot of running time even without loops, and the row number might be 100,000 or so. Much appreciated!
You can do this by indexing your data.frame with a matrix. The first column indicates row, the second indicates column. So if you do
column.list <- c(1,3,1)
DATA[cbind(1:nrow(DATA), column.list)]
You will get
[1] 1 2 6
as desired. If you mix across columns of different classes, all the variable will be coerced to the most accommodating data type.
I have a dataframe with 1000 columns. I am trying to loop over 10 columns at a time and use the seqdef() function from the TraMineR package to do sequence alignment across the data in those columns. Hence, I want to apply this function to columns 1-10 in the first go-around, and columns 11-20 in the second go-around, all the way up to 1000.
This is the code I am using.
library(TraMineR)
by(df[, 1:10], seqdef(df))
However, this only loops over the first 10 and then stops. How do I loop it across chunks of 10 columns?
You could do something like this using mapply and setting the sequence of columns to loop across.
colpicks <- seq(10,1000,by=10)
mapply(function(start,stop) seqdef(df[,start:stop]), colpicks-9, colpicks)