This question already has answers here:
Select multiple ranges of columns using column names in data.table
(2 answers)
Closed 4 years ago.
I can select multiple ranges of columns in a data.table using a numeric vector like c(1:5,27:30). Is there any way to do the same with column names? For example, in some form similar to col1:col5,col27:col30?
You can with dplyr:
df <- data.frame(a=1, b=2, c=3, d=4, e=5, f=6, g=7)
dplyr::select(df, a:c, f:g)
a b c f g
1 2 3 6 7
I am not sure if my answer is efficient, but I think that could give you a workaround at least in case you need to work with data.table.
My proposal is to use data.table in conjunction with cbind. Thus you could have:
df <- data.frame(a=1, b=2, c=3, d=4, e=5, f=6, g=7)
multColSelectedByName<- cbind(df[,a:c],df[,f:g])
#a b c f g
#1: 1 2 3 6 7
One point that one should be careful is that if there is only one column in one of the selections, for example df[,f] then the name of this column would be something like V2 and not f. In such a case one could use:
multColSelectedByName<- cbind(df[,a:c],f=df[,f])
Related
This question already has answers here:
Filtering single-column data frames
(1 answer)
How to subset matrix to one column, maintain matrix data type, maintain row/column names?
(1 answer)
Closed 4 months ago.
I have a large matrix from which I want to select rows to further process them.
Ultimatly, I want to convert the matrix to a dataframe that only contains my rows of interest.
The easiest thing would be to simple run something like data.frame(my_matrix)[c(3,5),]. The problem with this is, that I am working with a very big sparse matrix, so converting all of it to a dataframe just to select a few rows is ineffective.
This option does what I want, but somehow only returns the result that I intend if I indicate at least 2 indices.
m <- matrix(1:25,nrow = 5)
rownames(m) <- 1:5
colnames(m) <- c("A","B","C","D","E")
data.frame(m[c(3,5),])
If i only want to select 1 row, and if I use the code above, the result is not a "wide" dataframe, but instead a long one, which looks like this:
data.frame(m[c(2),])
m.c.2....
A 2
B 7
C 12
D 17
E 22
Is there a simple way to get a dataframe with just one row out of the matrix without converting the whole matrix first? It feels like I am overlooking something very obvious here...
Any help is much appreciated!
You need to use drop=FALSE in the matrix subset, otherwise it will turn the matrix into a vector, as you saw.
m <- matrix(1:25,nrow = 5)
rownames(m) <- 1:5
colnames(m) <- c("A","B","C","D","E")
data.frame(m[c(2),, drop=FALSE])
#> A B C D E
#> 2 2 7 12 17 22
Created on 2022-10-12 by the reprex package (v2.0.1)
This question already has answers here:
Standardize data columns in R
(16 answers)
Closed 2 years ago.
I have a data frame with n columns like the one below with all the columns being numeric (ex. below only has 3, but the actual one has an unknown number).
col_1 col_2 col_3
1 3 7
3 8 9
5 5 2
8 10 1
11 9 2
I'm trying to transform the data on every column based on this equation: (x-min(col)/(max(col)-min(col)) so that every element is scaled based on the values in the column.
Is there a way to do this without using a for loop to iterate through every column? Would sapply or tapply work here?
We can use scale on the dataset
scale(df1)
Or if we want to use a custom function, create the function, loop over the columns with lapply, apply the function and assign it back to the dataframe
f1 <- function(x) (x-min(col)/(max(col)-min(col))
df1[] <- lapply(df1, f1)
Or this can be done with mutate_all
library(dplyr)
df1 %>%
mutate_all(f1)
In complement to #akrun answer, you can also do that using data.table
library(data.table)
setDT(df)
df[,lapply(.SD, function(x) return((x-min(col)/(max(col)-min(col)))]
If you want to use a subset of columns, you can use .SDcols argument, e.g.
library(data.table)
df[,lapply(.SD, function(x) return((x-min(col)/(max(col)-min(col))),
.SDcols = c('a','b')]
This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
My goal is to create a wordcloud in R, but I'm working with nested JSON data (which also happens to be incredibly messy).
There's a nice explanation here for how to create a wordcloud of phrases rather than singular words. I also know melt() from reshape2 can create new rows out of entire columns. Is there a way in R to perform a melt-like function over nested substrings?
Example:
N Group String
1 A c("a", "b", c")
2 A character(0)
3 B a
4 B c("b", d")
5 B d
...should become:
N Group String
1 A a
2 A b
3 A c
4 A character(0)
5 B a
6 B b
7 B d
8 B d
...where each subsequent substring is returned to the next row. In my actual data, the pattern c("x, y") is consistent but the substrings are too varied to know a priori.
If there's no great way to do this, too bad... just thought I'd ask the experts!
You can use separate_rows from the tidyr package:
library(tidyverse)
data %>%
separate_rows(listcites, sep = ",") %>% # split on commas
dmap_at("listcites", ~ gsub("^c\\(\"|\")$|\"", "", .x)) # clean up the quotations and parens
This question already has answers here:
Extract the maximum value within each group in a dataframe [duplicate]
(3 answers)
Closed 7 years ago.
I am searching for an efficient and fast way to do the following:
I have a data frame with, say, 2 variables, A and B, where the values for A can occur several times:
mat<-data.frame('VarA'=rep(seq(1,10),2),'VarB'=rnorm(20))
VarA VarB
1 0.95848233
2 -0.07477916
3 2.08189370
4 0.46523827
5 0.53500190
6 0.52605101
7 -0.69587974
8 -0.21772252
9 0.29429577
10 3.30514605
1 0.84938361
2 1.13650996
3 1.25143046
Now I want to get a vector giving me for every unique value of VarA
unique(mat$VarA)
the maximum of VarB conditional on VarA.
In the example here that would be
1 0.95848233
2 1.13650996
3 2.08189370
etc...
My data-frame is very big so I want to avoid the use of loops.
Try this:
library(dplyr)
mat %>% group_by(VarA) %>%
summarise(max=max(VarB))
Try to use data.table package.
library(data.table)
mat <- data.table(mat)
result <- mat[,max(VarB),VarA]
print(result)
Try this:
library(plyr)
ddply(mat, .(VarA), summarise, VarB=min(VarB))
This question already has answers here:
Add column with order counts
(2 answers)
Count number of rows within each group
(17 answers)
Closed 7 years ago.
What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.
for example for this simple dataframe: (but I will work with dataframes with more columns)
mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))
I've found this solution:
cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))
and here the result:
A myorder
A 1
A 2
A 3
B 1
B 2
A 4
A 5
Isn't there any single command to do it?. Or using an specialized package?
I want it to later use tidyr's spread() function.
My question is not the same than
Is there an aggregate FUN option to count occurrences?
because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.
OK, my problem is a little bit more complex
mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))
I only know to solve the first example I wrote above.
But what happens when I want it also by a second grouping variable?
something like occurrencies(letter) by group.
group letter "occurencies within group"
x A 1
x A 2
x A 3
x B 1
y B 1
y A 1
y A 2
I've found the way with
ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.
Using data.table
library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]
The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.
mydata
group letter myorder
1: x A 1
2: x A 2
3: x A 3
4: x B 1
5: y B 1
6: y A 1
7: y A 2
or a dplyr solution which is pretty much the same
mydata %>%
group_by(group, letter) %>%
mutate(myorder = 1:n())