Splitting one big dataframe into multiple CSV.files [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Firstly, I have a big data.frame which has 104 rows and 12 columns, I would like to split it up to 13 rows of 8 rows each with the 12 columns.
I am trying to make a code robust enough to not care how many rows there are but simple make a new data.frame every 8 rows.
Also, is it possible after this point to make a code which loops through the 13 data.frames for some calculations?

Here is a way using data.table.split
library(data.table)
#sample data
set.seed(123)
AA <- data.frame( data = rnorm(104) )
#set number of rows to split on
chunksize = 8
#split on create rowid's
l <- split( setDT(AA)[, rowID := (.I-1) %/% chunksize][], by = "rowID")
#names of the list will become the names of the data.frames
names(l) <- paste0( "df", names(l) )
#write the elements of the list to the global environment, using their names
list2env( l, envir = globalenv() )

Related

how to average groups of elements in a list in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
In R, I have list of 10 elements,such as:
n <- c(5,5,5,5,5,6,6,6,6,6)
I want to create another list which will have 2 elements where each element will be the average of 5-elements in the previous list. So, other list should look like as follows:
5,6
Is there a quick way to do that in R? What i want is a list which gives me the average of first X element, the average of second X element and so on.. In the example, initial list has 10 elements and I want to group them by 5. So I am trying to calculate a list where first element is the mean of first 5 elements, and the second element is the mean of last 5 elements.
Cast your vector of values
x = rnorm(50 * 10)
into a matrix and take the column means
colMeans(matrix(x, 50))
More generally, create a vector that describes how the data is to be partitioned
f = rep(seq_len(10), each=50)
split() your data, lapply() the function to each group, and combine (unlist() in this case) the results
unlist(lapply(split(x, f), mean))
Do this in two (e.g., sapply(split(x, f), mean)) or one (e.g., tapply(), by(), ave()) steps.
As you've mentioned, your original list has 500 names. Here's an approach:
x <- runif(500)
list_x <- split(x, ceiling(seq_along(x)/50))
# find the mean in each group of 50
colMeans(as.data.frame(list_x))
for 10 elements:
y <- c(rep(5, 5), rep(6, 5))
list_y <- split(y, ceiling(seq_along(y)/5))
colMeans(as.data.frame(list_y))

Splitting vector in dataframe into 2 vectors [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to split a vector in a dataframe(last.first) into 2 separate vectors (firstname, lastname) and then put the 2 vectors back into the dataframe. What should I do.
You can split names with strsplit, use whatever is separating the first and second names instead of " " (space in my example).
This will give you a list. Which can be made into dataframe via ldply or unlist to matrix
person.names <- c("Adam Smith", "Max Webber")
temp.list <- strsplit(person.names, " ")
names.df <- ldply(temp.list, function (x) data.frame(first = x[1], second = x[2]))
first second
1 Adam Smith
2 Max Webber
or
matrix(unlist(temp.list), ncol = 2, byrow = TRUE)

R: sort rows, query them and add results as colum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars
First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)

Recode 2 variables to one in one line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Say I have a DF like:
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
only it has a long no. of rows. I'd like to create a column depending on simultaneous values of a & b, e.g
df
a b c
0 0 10
0 1 11
1 0 12
1 1 13
I take this can be done with inner joins, using sqldf or maybe dplyr; is there a quicker way, with or without libraries?
Thanks in advance, p
You could do:
library(dplyr)
df %>% mutate(newcol = paste0(a, b))
Depending on how you want the new column to be labelled.
If you have a vector of desired values, let's call it lookup:
lookup <- 10:100
df %>% mutate(newcol = lookup[as.factor(paste0(a, b))])
I think what you mean is that you have some other data frame (say called dictionary) with a c column, and you look up the (a, b) in the dictionary and grab the c from there??
df=data.frame(a=c(0,0,1,1),b=c(0,1,0,1))
dictionary <- df
dictionary$c <- 10:13
dictionary <- dictionary[sample(4), ] # shuffle it just to prove it works
In that case you can do
merge(df, dictionary, merge=c('a', 'b'), all.x=T)
And that will grab the matching c column from dictionary and plonk it into df. The all.x will put a NA there if there is no matching (a, b) in dictionary.
If speed becomes an issue, you might try data.table
library(data.table)
setDT(df) # convert to data.table
setDT(dictionary) # convert to data.table
# set key
setkey(df,a,b)
setkey(dictionary,a,b)
# merge
dictionary[df] # will be `df` with the `c` column added, `NA` if no match
Super cheaty and only applicable to this example but:
df$c <- 10 + df$b + df$a*2?
otherwise, look at ?merge

Differences between two data frames in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have two data frames, each with 9 columns, and DF2 is a subset of DF1. I'm trying to create a third data frame that contains only the contents of DF1 that are NOT present in DF2.
What is the most efficient way of doing this? I can write a while loop, but I was wondering if there is another way (besides sqldf as for some reason I cannot upload it into my R Studio) that I can do this?
The following can work (directly from Identify records in data frame A not contained in data frame B)
fun.12 <- function(x.1,x.2,...){
x.1p <- do.call("paste", x.1)
x.2p <- do.call("paste", x.2)
x.1[! x.1p %in% x.2p, ]
}
DF1 <- data.frame(a=c(1,2,3,4,5), b=c(1,2,3,4,5))
DF2 <- data.frame(a=c(1,1,2,3,4), b=c(1,1,99,3,4))
fun.12(DF1, DF2)
# a b
# 2 2 2
# 5 5 5

Resources