I have a database as a data frame and I would like to order all columns, but keeping relations between elements.
For example, if I do the following:
> DF
1 11 2 432 4
2 11 3 432 4
3 13 4 241 5
4 42 5 2 3
5 51 5 332 2
6 51 5 332 1
7 51 5 332 1
> DF=DF[order(A,B,C,D),]
> DF
1 11 2 432 4
2 11 3 432 4
3 13 4 241 5
4 42 5 2 3
6 51 5 332 1
7 51 5 332 1
5 51 5 332 2
Ok, this is what I wanted (pay atention to the last two rows), but I would like to have a generic solution, independent of the number of columns. I have tried the following, but it does not work.
> DF=DF[order(colnames(DF)),]
> DF
1 11 2 432 4
2 11 3 432 4
3 13 4 241 5
4 42 5 2 3
I would be grateful if someone could help me with this little issue. Regards.

We can use with order for ordering on all the columns of a dataset
DF[, DF),]
If we use tidyverse, there is arrange_at that will take column names
DF %>%
#or as #Sotos commented
#arrange(!!! rlang::syms(names(.)))
# A B C D
#1 11 2 432 4
#2 11 3 432 4
#3 13 4 241 5
#4 42 5 2 3
#5 51 5 332 1
#6 51 5 332 1
#7 51 5 332 2


Match values based on multiple conditions from dataframes of different sizes in R

I have two dataframes of different sizes. Example:
t1 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) )
t2 <- data.frame("ind"=c(1,2,4,5,6,7,8),"test_c"=c(3,5,10,10,2,3,1), "time"=c(32,55,21,34,55,22,19))
I would like to match the cases based on two criteria:
t1$id==t2$ind and t1$condition==t2$test_c and create an additional column in t1 based on the outcome of the variable t2$time under these two conditions.
Expected outcome:
t3 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) , "time"=c (32,32,NA,55,55,55,21,34,NA,NA,55,22,19))
I suspect I should use merge or match functions but I am not sure which would be the right approach.
Base R
> out <- merge(t1, t2, by.x=c("id","condition"), by.y=c("ind","test_c"), all.x=TRUE)
> out
id condition time
1 1 1 NA
2 1 3 32
3 1 3 32
4 2 5 55
5 2 5 55
6 2 5 55
7 4 10 21
8 5 5 NA
9 5 5 NA
10 5 10 34
11 6 2 55
12 7 3 22
13 8 1 19
left_join(t1, t2, by = c("id" = "ind", "condition" = "test_c"))
Differences with your t3
There are some differences between them. For the sake of display, I'll show them side-by-side, arranged so that we have an easier comparison.
cbind(out[with(out,order(id,condition)),], t3[with(t3,order(id,condition)),])
# id condition time id condition time
# 1 1 1 NA 1 1 NA
# 2 1 3 32 1 3 32
# 3 1 3 32 1 3 32
# 4 2 5 55 2 5 55
# 5 2 5 55 2 5 NA
# 6 2 5 55 2 5 NA
# 7 4 10 21 4 10 21
# 8 5 5 NA 5 5 NA
# 9 5 5 NA 5 5 NA
# 10 5 10 34 5 10 34
# 11 6 2 55 6 2 55
# 12 7 3 22 7 3 22
# 13 8 1 19 8 1 19
The only differences are with id=2,condition=5, where all of them in the merge are assigned the same time=55, and your t3 fills only the first of them. I don't think this is a "first only" logic, as there are other repeat id,condition that do not elicit the same response. I suspect this is just a mistake with the sample data, or perhaps there is post-merge processing you haven't told us yet :-)
In case you want to use match you can use in addition interaction (or paste) to use multiple columns.
t1$time <- t2[match(interaction(t1), interaction(t2[-3])), 3]
# id condition time
#1 1 3 32
#2 1 3 32
#3 1 1 NA
#4 2 5 55
#5 2 5 55
#6 2 5 55
#7 4 10 21
#8 5 10 34
#9 5 5 NA
#10 5 5 NA
#11 6 2 55
#12 7 3 22
#13 8 1 19

calculate count of number observation for all variables at once in R

numbers1 <- c(4,23,4,23,5,43,54,56,657,67,67,435,
numbers2 <- c(4,23,4,23,5,44,54,56,657,67,67,435,
to peform counting i do so manually
but i can have 100 variables from mydat$x1 to mydat$100.
I don't want manually enter 100 times.
How to do that all counting would for all variables?$x1-mydat$x100))
is not working.
We can make a list of all variables in the environment that have a pattern like numbers. Then we can loop through all of the elements of the list:
number_lst <- mget(ls(pattern = 'numbers\\d'), envir = .GlobalEnv) #thanks NelsonGon
lapply(number_lst, function(x)
x Freq
1 4 2
2 5 1
3 23 2
4 34 2
5 43 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1
x Freq
1 4 2
2 5 1
3 23 2
4 34 2
5 44 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1
As I read your question, you want to count the number of times each unique element in a set occurs using minimal re-typing over many sets.
To do this, you'll first need to put the sets into a single object, e.g. into a list:
list_of_sets <- list(numbers1 = c(4,23,4,23,5,43,54,56,657,67,67,435,
numbers2 = c(4,23,4,23,5,44,54,56,657,67,67,435,
Then you loop over each list element, e.g. using a for loop:
list_of_counts <- list()
for(i in seq_along(list_of_sets)){
list_of_counts[[i]] <-[[i]]))
list_of_counts then contains the results:
Var1 Freq
1 4 2
2 5 1
3 23 2
4 34 2
5 43 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1
Var1 Freq
1 4 2
2 5 1
3 23 2
4 34 2
5 44 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1

How to keep initial row order

I have run this SQL sentence through the package: sqldf
I have got the output I wanted, but I would like to keep the initial row order. Unfortunately, the output has a different order.
For example:
> DF
1 11 2 432 4
2 11 3 432 4
3 13 4 241 5
4 42 5 2 3
5 51 5 332 2
6 51 5 332 1
7 51 5 332 1
> sqldf("SELECT A,B,C,D, COUNT (*) AS NUM
1 11 2 432 4 1
2 11 3 432 4 1
3 13 4 241 5 1
4 42 5 2 3 1
5 51 5 332 1 2
6 51 5 332 2 1
As you can see the row order changes, (row number 5 and 6). It would be great if someone could help me with this issue.
If we need to use this with sqldf, use ORDER.BY with names pasted together
nm <- toString(names(DF))
DF1 <- cbind(rn = seq_len(nrow(DF)), DF)
nm1 <- toString(names(DF1))
fn$sqldf("SELECT $nm, COUNT (*) AS NUM
GROUP BY $nm ORDER BY $nm1")
#1 11 2 432 4 1
#2 11 3 432 4 1
#3 13 4 241 5 1
#4 42 5 2 3 1
#5 51 5 332 2 1
#6 51 5 332 1 2

Replace rows with 0s in dataframe with preceding row values diverse than 0

Here an example of my dataframe:
df = read.table(text = 'a b
120 5
120 5
120 5
119 0
118 0
88 3
88 3
87 0
10 3
10 3
10 3
7 4
6 0
5 0
4 0', header = TRUE)
I need to replace the 0s within col b with each preceding number diverse than 0.
Here my desired output:
a b
120 5
120 5
120 5
119 5
118 5
88 3
88 3
87 3
10 3
10 3
10 3
7 4
6 4
5 4
4 4
Until now I tried:
df$b[df$b == 0] = (df$b == 0) - 1
But it does not work.
na.locf from zoo can help with this:
#converting zeros to NA so that na.locf can get them
df$b[df$b == 0] <- NA
#using na.locf to replace NA with previous value
df$b <- na.locf(df$b)
> df
a b
1 120 5
2 120 5
3 120 5
4 119 5
5 118 5
6 88 3
7 88 3
8 87 3
9 10 3
10 10 3
11 10 3
12 7 4
13 6 4
14 5 4
15 4 4
Performing this task in a simple condition seems pretty hard, but you could also use a small for loop instead of loading a package.
for (i in which(df$b==0)) {
df$b[i] = df$b[i-1]
> df
a b
1 120 5
2 120 5
3 120 5
4 119 5
5 118 5
6 88 3
7 88 3
8 87 3
9 10 3
10 10 3
11 10 3
12 7 4
13 6 4
14 5 4
15 4 4
I assume that this could be slow for large data.frames
Here is a base R method using rle.
# get the run length encoding of variable
temp <- rle(df$b)
# fill in 0s with previous value
temp$values[temp$values == 0] <- temp$values[which(temp$values == 0) -1]
# replace variable
df$b <- inverse.rle(temp)
This returns
a b
1 120 5
2 120 5
3 120 5
4 119 5
5 118 5
6 88 3
7 88 3
8 87 3
9 10 3
10 10 3
11 10 3
12 7 4
13 6 4
14 5 4
15 4 4
Note that the replacement line will throw an error if the first element of the vector is 0. You can fix this by creating a vector that excludes it.
For example
replacers <- which(temp$values == 0)
replacers <- replacers[replacers > 1]

How to sum two tables of different dimensions in R?

I want to sum two tables in R, but they have different valid categories, which produces two different dimensions. How can I add them up?
1 2 3 4 6 7 8 9 10
652 1 300 777 9 615 167 26 67
1 2 3 4 5 6 7 8 9 10
285 5 282 367 1 12 289 129 33 1118
Error in table(cx$V2A) + table(cx$V2B) : non-conformable arrays
What can I do to solve this?
I guess VA and VB are vectors. To effectively sum the tables, all you need to do is this:
> VA <- sample(1:10,20,replace=TRUE)
> VB <- sample(1:10,20,replace=TRUE)
> table(VA)
1 2 3 4 5 6 7 9 10
1 3 3 2 3 2 2 2 2
> table(VB)
1 2 4 5 6 7 8 9 10
1 2 2 2 4 3 1 2 3
> table(c(VA,VB))
1 2 3 4 5 6 7 8 9 10
2 5 3 4 5 6 5 1 4 5
