I want to calculate the mean of the resulting values returned by abs(((column A - column B)/column A)*100)
So for example on mtcars data i try:
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
...
mean((abs(mtcars['cyl']-mtcars['mpg'])/mtcars['mpg'])*100)
Which gives me error:
Warning message: In mean.default((abs(mtcars["cyl"] -
mtcars["mpg"])/mtcars["mpg"]) * : argument is not numeric or
logical: returning NA
How can i fix this?
You need to use $ operator to extract the values as vectors or use double brackets, i.e.
mean((abs(mtcars[['cyl']]-mtcars[['mpg']])/mtcars[['mpg']])*100)
#[1] 64.13455
#or
mean((abs(mtcars$cyl-mtcars$mpg)/mtcars$mpg)*100)
#[1] 64.13455
You can see the difference in structure,
str(mtcars['cyl'])
'data.frame': 32 obs. of 1 variable:
$ cyl: num 6 6 4 6 8 6 8 4 4 6 ...
str(mtcars[['cyl']])
num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
str(mtcars$cyl)
num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
A tidyverse alternative:
library(dplyr)
mtcars %>%
summarise(Mean = mean(abs(cyl - mpg)/mpg) * 100)
Mean
1 64.13455
Related
I have a data frame, df_1_2017, with 38 columns. I have another data frame, df_2_2018, with 43 columns. I want the same number of columns/header names so I can easily cbind the two data frames.
I have tried the below with out any luck
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(col_names_2017)
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Canopy_cover_mean` doesn't exist.
How can I incorporate a select where if the colnames from df_1_2017 are present in df_2_2018 then to select all the columns the two data frames share.
You can use
common_colsnms <- intersect(colnames(df_1_2017) , colnames(df_2_2018))
# apply
selected_cols_df_2_2018 <- df_2_2018 %>%
select(common_colsnms)
Please see https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html for future reference. Let me know if this works.
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(all_of(col_names_2017))
If you really mean cbind, then it has nothing to do with the numbers or names of columns (well, duplicate names are discouraged but possible). In this case, you should be looking at the number of rows in each, and if they align row-wise; normally either they are perfectly a match (same number of rows, each row means the same thing) or they have shared ID fields that require a join/merge operation.
However, in case you mean rbind instead, where you feel you need the columns to match, by-name, then you have a couple of options.
base R
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt2)[6:8] <- paste(names(mt2)[3:5], "_3")
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt3)[6:8] <- paste(names(mt3)[6:8], "_3")
mt2
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mt3
# mpg cyl disp hp drat wt_3 qsec_3 vs_3 am gear carb
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
common <- intersect(names(mt2), names(mt3))
mt2[,common]
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
mt3[,common]
# mpg cyl am gear carb
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
rbind(mt2[,common], mt3[,common])
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, limiting names
library(dplyr)
rbind(select(mt2, any_of(names(mt3))), select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
Or using the more-flexible bind_rows:
select(mt2, any_of(names(mt3))) %>%
bind_rows(select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, accept all columns
If you are less concerned about extra columns, then you can use bind_rows and its innate ability to align columns by name and create columns in one frame where it is not found.
bind_rows(mt2, mt3)
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb disp hp drat wt_3 qsec_3 vs_3
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA NA NA NA NA NA
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA NA NA NA NA NA
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 NA NA NA NA NA NA
# Hornet 4 Drive 21.4 6 NA NA NA NA NA NA 0 3 1 258 110 3.08 3.215 19.44 1
# Hornet Sportabout 18.7 8 NA NA NA NA NA NA 0 3 2 360 175 3.15 3.440 17.02 0
# Valiant 18.1 6 NA NA NA NA NA NA 0 3 1 225 105 2.76 3.460 20.22 1
Is there a way that the row names can be substituted based on predefined vector in R, something like:
rownames(GV) <- c(beta1='Age', beta10='Female Gender')
Or maybe case_when() will be easier for you:
library(dplyr)
df <- data.frame(a = c(1, 2, 3))
rownames(df)
#> [1] "1" "2" "3"
rownames(df) <- case_when(rownames(df) == "1" ~ "one",
rownames(df) == "2" ~ "two",
TRUE ~ rownames(df))
rownames(df)
#> [1] "one" "two" "3"
You specify new value for each contidion and the value for all rest cases (where is TRUE ~ rownames(df) line) - for the rest cases I'm leaving the previous row names above.
We could do the following:
rownames(mtcars)[which(rownames(mtcars) == "Datsun 710")] <- "My Rowname"
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If we want to rename more rownames we can use %in%, but as #gss mentions in the comments, this comes with a caveat: not matter the order of the names in the character vector succeeding %in% the names will be replaced in the order they appear in rownames(). Compare the following two calls:
rownames(mtcars)[which(rownames(mtcars) %in% c("Datsun 710", "Mazda RX4 Wag"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Which has the same result as:
rownames(mtcars)[which(rownames(mtcars) %in% c("Mazda RX4 Wag", "Datsun 710"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to rename all the rows, and you have an array of the desired new names in order:
example <- head(mtcars, 3)
mynewnames <- c("First", "Second", "Third")
rownames(example) <- mynewnames
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> First 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Second 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Third 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename all the rows, and you have a named array (not necessarily in the correct order):
example <- head(mtcars, 3)
mynewnames <- c("Datsun 710" = "Datsun", "Mazda RX4" = "Mazda", "Mazda RX4 Wag" = "Also Mazda")
rownames(example) <- mynewnames[rownames(example)]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Also Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename only some rows, and have a named array (an ordered array makes no sense in this context):
example <- head(mtcars, 3)
mynewnames <- c("Mazda RX4" = "This Mazda", "Mazda RX4 Wag" = "That Mazda")
rownames(example)[rownames(example) %in% names(mynewnames)] <-
mynewnames[rownames(example)[rownames(example) %in% names(mynewnames)]]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> This Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> That Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
This is a bit unwieldy; if you are only replacing one or two row names then #TimTeaFan's first suggestion is probably easier.
Most safe way and as OP prefers with a predefined named vector is taking the current rownames, replace those who are defined and set the rownames again. this does not fail on an incomplete vector, if it cannot be replaced it stays as it was before.
The advantage of this solution is to prevent the error below if your rename vector is incomplete.
Error in `.rowNamesDF<-`(x, value = value) :
missing values in 'row.names' are not allowed
solution
library(stringr) # used for str_replace_all()
df <- data.frame(
x = rep(1:5),
y = rep(11:15),
row.names = LETTERS[1:5]
)
df
# x y
# A 1 11
# B 2 12
# C 3 13
# D 4 14
# E 5 15
change <- c("A" = "a", "C" = "c")
row.names(df) <- str_replace_all(row.names(df), change)
df
# x y
# a 1 11
# B 2 12
# c 3 13
# D 4 14
# E 5 15
I asked a question about accessing variable in global environment with the same name as one of the column name in dplyr functions. One solution I received was using get. However, new matter arises. Without specify environment explicitly, the result is different (see the following), Could someone explain why?
# dplyr 0.6.0
mpg <- 21
mtcars %>% filter(mpg == get("mpg"))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> ...
# dplyr 0.5.0
mtcars %>% filter(mpg == get("mpg"))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
I am working on this predifend dataset in R called mtcars. The head of this dataset looks like:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
on the left side there is the name of each car. how come it doesn't have data type such as num or factor ? how can I apply that on a simillar datset?
the structure is as :
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
where name of the cars doesn't appear.
From what you share, the name of car was read in and assign as rowname of the dataset. As str() only print out the summary of columns in dataset so it will not print any thing related with the row names.
So if you want to have the car name as a column just add a column:
mtcars_trial <- mtcars
mtcars_trial$carname <- rownames(mtcars)
Regard to comment:
# this will assign the column as rowname for dataset
rownames(mtcars_trial) <- mtcars_trial$carname
# this will remove the carname column
mtcars_trial$carname <- NULL
The reason behind this is that here, the names of cars are stored as rownames and NOT as a column in the data.frame.
Based on Kunal Puri,
## values
mpg <- c(21.0, 21.0,22.8)
cyl <- c(6,6,4)
disp <- c(160,160,108)
hp <- c(110, 110, 93)
drat <- c(3.90,3.90,3.85)
wt <- c(2.620, 2.875, 2.320)
qsec <- c(16.46, 17.02,18061)
vs <- c(0,0,1)
am <- c(1,1,1)
gear <- c(4,4,4)
crab <- c(4,4,1)
## data.frame
df <- data.frame(mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, crab, row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"))
## give a column name, take rownames values
df$cars <- rownames(df)
## row names removed
rownames(df) <- NULL
## rearranged df
df <- data.frame(df[12], df[1], df[2], df[3], df[4], df[5], df[6],df[7], df[8], df[9], df[10], df[11])
print(df)
output:
cars mpg cyl disp hp drat wt qsec vs am gear crab
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18061.00 1 1 4 1
Does it help?I guess it is the same thing as previous solution.
I would like to have the last column of the data frame moved to the start (as first column). How can I do it in R?
My data.frame has about a thousand columns to changing the order wont to. I just want to pick one column and "move it to the start".
Dplyr's select() approach
Moving the last column to the start:
new_df <- df %>%
select(last_column_name, everything())
This is also valid for any column and any quantity:
new_df <- df %>%
select(col_5, col_8, everything())
Example using mtcars data frame:
head(mtcars, n = 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Last column is 'carb'
new_df <- mtcars %>% select(carb, everything())
head(new_df, n = 2)
# carb mpg cyl disp hp drat wt qsec vs am gear
# Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
dplyr 1.0.0 now includes the relocate() function to reorder columns. The default behaviour is to move the named column(s) to the first position.
library(dplyr) # from version 1.0.0
mtcars %>%
relocate(carb) %>%
head()
carb mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 22.8 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 18.7 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 18.1 6 225 105 2.76 3.460 20.22 1 0 3
But other locations can be specifed with the .before or .after arguments:
mtcars %>%
relocate(gear, carb, .before = cyl) %>%
head()
mpg gear carb cyl disp hp drat wt qsec vs am
Mazda RX4 21.0 4 4 6 160 110 3.90 2.620 16.46 0 1
Mazda RX4 Wag 21.0 4 4 6 160 110 3.90 2.875 17.02 0 1
Datsun 710 22.8 4 1 4 108 93 3.85 2.320 18.61 1 1
Hornet 4 Drive 21.4 3 1 6 258 110 3.08 3.215 19.44 1 0
Hornet Sportabout 18.7 3 2 8 360 175 3.15 3.440 17.02 0 0
Valiant 18.1 3 1 6 225 105 2.76 3.460 20.22 1 0
You can change the order of columns by adressing them in the new order by choosing them explicitly with data[,c(ORDER YOU WANT THEM TO BE IN)]
If you just want the last column to be first use: data[,c(ncol(data),1:(ncol(data)-1))]
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> head(cars[,c(2,1)])
dist speed
1 2 4
2 10 4
3 4 7
4 22 7
5 16 8
6 10 9
dataframe<-dataframe[,c(1000, 1:999)]
this will move your last column i.e. 1000th column to the first column.
I don't know if it's worth adding this as an answer or if a comment would be fine, but I wrote a function called moveme that lets you do what you want to do with the language you describe. You can find the function at this answer: https://stackoverflow.com/a/18540144/1270695
It works on the names of your data.frame and produces a character vector that you can use to reorder your columns:
mydf <- data.frame(matrix(1:12, ncol = 4))
mydf
moveme(names(mydf), "X4 first")
# [1] "X4" "X1" "X2" "X3"
moveme(names(mydf), "X4 first; X1 last")
# [1] "X4" "X2" "X3" "X1"
mydf[moveme(names(mydf), "X4 first")]
# X4 X1 X2 X3
# 1 10 1 4 7
# 2 11 2 5 8
# 3 12 3 6 9
If you're shuffling things around like this, I suggest converting your data.frame to a data.table and using setcolorder (with my moveme function, if you wish) to make the change by reference.
In your question, you also mentioned "I just want to pick one column and move it to the start". If it's an arbitrary column, and not specifically the last one, you could also look at using setdiff.
Imagine you're working with the "mtcars" dataset and want to move the "am" column to the start.
x <- "am"
mtcars[c(x, setdiff(names(mtcars), x))]
If you want to move any named column to the first position, simply use:
df[,c(which(colnames(df)=="desired_colname"),which(colnames(df)!="desired_colname"))]
A native R approach that works with any number of rows or columns to move the last column of a dataframe to the first column position:
df <- df[,c(ncol(df),1:ncol(df)-1)]
It can be used to move any column to the first column by replacing:
df <- df[,c(your_column_number_here,1:ncol(df)-1)]
If you don't know the column number, but know the column label name, do the following replacing "your_column_name_here":
columnNumber <- which(colnames(df)=="your_column_name_here")
df <- df[,c(columnNumber,1:ncol(df)-1)]
There is also the data.table option with setcolorder():
library(data.table)
mtcars_copy <- copy(mtcars)
setDT(mtcars_copy)
# Move column "gear" in the first position
setcolorder(mtcars_copy, neworder = "gear")
head(mtcars_copy)
# gear mpg cyl disp hp drat wt qsec vs am carb
# 1: 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# 2: 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
# 3: 4 22.8 4 108 93 3.85 2.320 18.61 1 1 1
# 4: 3 21.4 6 258 110 3.08 3.215 19.44 1 0 1
# 5: 3 18.7 8 360 175 3.15 3.440 17.02 0 0 2
# 6: 3 18.1 6 225 105 2.76 3.460 20.22 1 0 1
If multiple columns, then mention the order in a vector:
setcolorder(mtcars_copy, neworder = c("vs", "carb"))
head(mtcars_copy)
# vs carb gear mpg cyl disp hp drat wt qsec am
# 1: 0 4 4 21.0 6 160 110 3.90 2.620 16.46 1
# 2: 0 4 4 21.0 6 160 110 3.90 2.875 17.02 1
# 3: 1 1 4 22.8 4 108 93 3.85 2.320 18.61 1
# 4: 1 1 3 21.4 6 258 110 3.08 3.215 19.44 0
# 5: 0 2 3 18.7 8 360 175 3.15 3.440 17.02 0
# 6: 1 1 3 18.1 6 225 105 2.76 3.460 20.22 0
Move any column from any position for the first position in your data
n <- which(colnames(df)=="column_need_move")
column_need_move <- df$column_need_to_move
df <- cbind(column_need_move, df[,-n])
If you want to create a new column and have it be the first column, use the .before=1 argument:
my_data <- my_data %>% mutate(newcol = a*b, .before=1)