I want to create a variable region based on a series of similar variables zipid1 to zipid26. My current code is like this:
dat$region <- with(dat, ifelse(zipid1 == 1, 1,
ifelse(zipid2 == 1, 2,
ifelse(zipid3 == 1, 3,
ifelse(zipid4 == 1, 4,
5)))))
How can I write a loop to avoid typing from zipid1 to zipid26? Thanks!
We subset the 'zipid' columns, create a logical matrix by comparing with 1 (== 1), get the column index of the TRUE value with max.col (assuming there is only a single 1 per each row and assign it to create 'region'
dat$region <- max.col(dat[paste0("zipid", 1:26)] == 1, "first")
Using a small reproducible example
max.col(dat[paste0("zipid", 1:5)] == 1, "first")
data
dat <- data.frame(id = 1:5, zipid1 = c(1, 3, 2, 4, 5),
zipid2 = c(2, 1, 3, 5, 4), zipid3 = c(3, 2, 1, 5, 4),
zipid4 = c(4, 3, 6, 2, 1), zipid5 = c(5, 3, 8, 1, 4))
Related
Let's cosnider very easy dataframe containing four groups:
cat <- c(1, 0, 0, 1, 2, 1, 2, 3, 2, 1, 3)
var <- c(10, 5, 3, 2, 5, 1, 2, 10, 50, 2, 30)
df <- data.frame(cat, var)
What I would like to do is that using dplyr plot distribution of values between those four categories
I have the feeling that it can be eaisly done with group_by, but I'm not sure how it can be done. Do you know how I can do it?
I have a column in my dataframe containing ascending numbers which are interrupted by Zeros.
I would like to find all rows which come before a Zero and create a new datatable containing only these rows.
My Column: 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0
What I need: 4, 6
Any help would be much appreciated! Thanks!
A dplyr solution:
library(dplyr)
df %>%
filter(lead(x) == 0, x != 0)
#> x
#> 1 4
#> 2 6
Created on 2021-07-08 by the reprex package (v2.0.0)
data
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
Welcome to SO!
You can try with base R. The idea is to fetch the rownames of the rows before the 0 and subset() the df by them:
# your data
df <- data.frame(col = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
# an index that get all the rownames before the 0
index <- as.numeric(rownames(df)[df$col == 0]) -1
# here you subset your original df by index: there is also a != 0 to remove the 0 before 0
df_ <- subset(df, rownames(df) %in% index & col !=0)
df_
col
4 4
12 6
Using base R:
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0),
y = LETTERS[1:13])
df[diff(df$x)<0,]
x y
4 4 D
12 6 L
Using Run Lengths in base R. To get the index of x, add the run lengths until 0 value occurs.
x <- c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0)
y <- rle(x)
x[cumsum(y$lengths)][which(y$values == 0) - 1]
# [1] 4 6
Though this problem has been 'solved' many times, it turns out there's always another problem.
Without the print function it runs with no errors, but with it I get the following:
Error in .subset2(x, i) : recursive indexing failed at level 2
Which I'm taking to mean it doesn't like graphs being created in two layers of iteration? Changing the method to 'qplot(whatever:whatever)' has the exact same problem.
It's designed to print a graph for every pairing of the variables I'm looking at. There's too many for them to fit in a singular picture, such as for the pairs function, and I need to be able to see the actual variable names in the axes.
load("Transport_Survey.RData")
variables <- select(Transport, "InfOfReceievingWeather", "InfOfReceievingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
varnames <- list("InfOfReceivingWeather", "InfOfReceivingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
counterx = 1
countery = 1
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]],
xlab=varnames[counterx], ylab=varnames[countery]))+
geom_point())
countery = countery+1
counterx = counterx+1
}
}
#variables2 <- select(Transport, one_of(InfOfReceivingWeather, InfOfReceivingTraffic, InfOfSeeingTraffic, InfWeather.Ice, InfWeather.Rain, InfWeather.Wind, InfWeather.Storm, InfWeather.Snow, InfWeather.Cold, InfWeather.Warm, InfWeather.DarkMorn, InfWeather.DarkEve, HomeParking, WorkParking, Disability, Age, CommuteFlexibility, Gender, PassionReduceCongest))
Here is a mini-data frame for reference, sampled from the columns I'm using:
structure(list(InfOfReceievingWeather = c(1, 1, 1, 1, 4), InfOfReceievingTraffic = c(1,
1, 1, 1, 4), InfOfSeeingTraffic = c(1, 1, 1, 1, 4), InfWeather.Ice = c(3,
1, 3, 5, 5), InfWeather.Rain = c(1, 1, 2, 2, 4), InfWeather.Wind = c(1,
1, 2, 2, 4), InfWeather.Storm = c(1, 1, 1, 2, 5), InfWeather.Snow = c(1,
1, 2, 5, 5), InfWeather.Cold = c(1, 1, 1, 2, 5), InfWeather.Warm = c(1,
1, 1, 1, 3), InfWeather.DarkMorn = c(1, 1, 1, 1, 1), InfWeather.DarkEve = c(1,
1, 1, 1, 1), HomeParking = c(1, 1, 3, 1, 1), WorkParking = c(1,
4, 4, 5, 4), Disability = c(1, 1, 1, 1, 1), Age = c(19, 45, 35,
40, 58), CommuteFlexibility = c(2, 1, 5, 1, 2), Gender = c(2,
2, 2, 2, 1), PassionReduceCongest = c(0, 0, 2, 0, 2)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
You get an error in the assignment of your a and b. Basically, when defining a and b in variables, they become the vector of values contained in columns of variables. Thus, in your aes mapping, when you are calling variables[[a]], basically, you are writing (for the first iteration of a in variables):
variables[[c(1, 1, 1, 1, 4)]] instead of variables[["InfOfReceievingWeather"]]. So, it can't work.
To get over this issue, you have to either choose between:
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=a, y=b)) ...
or
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) ...
Despite the first one seems to be simpler, I will rather prefere the second option because it will allow you to recycle a and b as column indicator to extract colnames of variables for xlab and ylab.
At the end, writing something like this should work:
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) +
xlab(colnames(variables)[a])+
ylab(colnames(variables)[b])+
geom_point())
}
}
Does it answer your question ?
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)
I'm working on the following df:
Num1 <- c(1, 2, 1, 3, 4, 4, 6, 2)
Num2 <- c(3, 3, 2, 1, 1, 2,4, 4)
Num3 <- c(2, 2, 3, 4, 3, 5, 5, 7)
Num4 <- c(1, 3, 3, 1, 2,3, 3, 6)
Num5 <- c(2, 1, 1, 1, 5, 3, 2, 1)
df <- data.frame(Num1, Num2, Num3, Num4, Num5)
I need to create a new matrix having the first column as df[1] - df[2], the second as df[2] - df[3] and so on.
How about this?
mapply('-', df[-length(df)], df[-1])
Or (as mentioned by #Pierre Lafortune)
df[-length(df)] - df[-1]