I am trying to make a column called ID that contains 5000 rows to act as an identification column for observations on 20 individuals. I want there to be 200 observations for each of the first 10 individuals, and 300 observations for the next ten individuals (because I don't want the same number of observations for each individual). So I made two separate columns:
ID <- data.frame(ID=rep(c(12,122,242,329,595,130,145,245,654,878), each = 200))
ID2 <- data.frame(ID=rep(c(863,425,24,92,75,3,200,300,40,500), each = 300))
Why am I unable to stack one on top of the other (making a single column with all individuals) using rbind?
ID <- rbind(c(ID,ID2))
you were almost there, just don't use c() inside the rbind
ID <- rbind(ID,ID2)
Related
I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df
Just wondering if there is a way to expand rows which have multiple observations, into rows of unique observations using R? I have data in an excel spreadsheet with the variable headings: Lease, Line, Bay, Date, Predators, Food.Index, DD, MM, YY.
On some dates, there have been multiple predators (from 1 to 4) recorded in the same row. Other days just have 0. On a day where there has been 4 predators recorded, I would like to somehow transform the data to show four unique observations (instead of one row with 4 recorded under "Predators").
I have 1669 rows of data and multiple rows need to be expanded
Example of Data set
Many thanks for your help in advance.
enter image description here
Assuming you have your data in a data.frame, df, one possible solution would be
df.expanded <- df[rep(row.names(df), df$Predators), ]
EDIT: If you also want to keep the rows with 0 predators, you can use pmax to always return at least one:
df.expanded <- df[rep(row.names(df), pmax(df$Predators, 1)),]
Here the pmax(df$Predators, 1) will return the elementwise maximum of df$Predators and 1 so that it returns a new vector where each element is at least 1 but takes the value of df$Predators if that number is greater than 1.
My Problem:
I have a dataframe consisting of 86016000 rows of observations:
there are 512000 observations for each hour
there are 24 hours data for seven days
So 24*7*512000 = 86016000
there are 40 columns (variables)
There is no column of date or datetimestamp
Only row numbers are good enough to identify how many obs. for each day, and there are no errors in recording of this data.
Given such a large dataset, what I want to do is create subsets of 12288000 (i.e. 24 * 512000) rows, so that we have 7 each day's subset.
What I tried:
d <- split(PltB_Fold3_1_Data, rep(1:12288000, each=7))
But unfortunately after almost half an hour, I termicated the process as there was no result.
Is there any better solution then the one above?
You're probably looking for seq rather than rep. With seq, you can generate a sequence of numbers from 0 to 86016000 incremented by 12288000.
To save resources, you can then use this sequence to generate temporary data frames and do whatever you want with each.
sequence <- seq(from = 0, to = 86016000, by = 12288000)
for(i in 1:(length(sequence)-1)){
temp <- df[sequence[i]+1:sequence[i+1], ]
# do something here with your temporary data frame
}
Sample data
mysample <- data.frame(ID = 1:100, kWh = rnorm(100))
I'm trying to automate the process of returning the rows in a data frame that contain the 5 highest values in a certain column. In the sample data, the 5 highest values in the "kWh" column can be found using the code:
(tail(sort(mysample$kWh), 5))
which in my case returns:
[1] 1.477391 1.765312 1.778396 2.686136 2.710494
I would like to create a table that contains rows that contain these numbers in column 2.
I am attempting to use this code:
mysample[mysample$kWh == (tail(sort(mysample$kWh), 5)),]
This returns:
ID kWh
87 87 1.765312
I would like it to return the r rows that contain the figures above in the "kWh" column. I'm sure I've missed something basic but I can't figure it out.
We can use rank
mysample$Rank <- rank(-mysample$kWh)
head(mysample[order(mysample$Rank),],5)
if we don't need to create column, directly use order (as #Jaap mentioned in three alternative methods)
#order descending and get the first 5 rows
head(mysample[order(-mysample$kWh),],5)
#order ascending and get the last 5 rows
tail(mysample[order(mysample$kWh),],5)
#or just use sequence as index to get the rows.
mysample[order(-mysample$kWh),][1:5]
I have a data frame with 21 variables and 1200 observations. The first column is the ID name for each species and column 21 is the total count of all the times each species was seen across multiple sites.
example columns: ID, RM1, RM2, RM10, Total
each row is an ID name and counts per river mile and total count
All I want is a list of the top 20 (or 100 for that matter) most abundant species and their total count. How do I do this?
This is driving me crazy and I don't want to do it in excel - there must be a way in R.
Sort you data frame, lets call it df, by Total, and take top 100
head(df[order(df$Total,decreasing = TRUE), ], 100)