Writing a function for a data frame - r

I am a very new R user and struggling with writing a function.
I would like to write a function for the sample data frame below where the text “Number of students” is printed for each remark category with the corresponding number of students in each category.
student.id midterm final remark
student.1 83 81 Excellent
student.2 52 42 Work Harder
student.3 62 8 Work Harder
student.4 50 44 Work Harder
student.5 86 80 Excellent
student.6 90 1 Not Bad
student.7 73 70 Work Harder
student.8 87 84 Excellent
student.9 55 23 Work Harder
student.10 62 87 Not Bad
student.11 72 78 Work Harder
student.12 70 91 Not Bad
student.13 57 33 Work Harder
student.14 61 43 Work Harder
student.15 60 11 Work Harder
student.16 59 13 Work Harder
student.17 53 26 Work Harder

Here is one way to do this. The function takes the data frame as an argument:
remark_frequency <- function(df){
countTable <- as.data.frame(table(df$remark))
for(i in 1:nrow(countTable)){
printString <- paste("Number of students in",
countTable[i,1],
"category:",
countTable[i,2])
print(printString)
}
}

Related

rowsums accross specific row in a matrix

final.marks
# raj sanga rohan rahul
#physics 45 43 44 49
#chemistry 47 45 48 47
#total 92 88 92 96
This is the matrix I have. Now I want to find the total for each subject separately across respective subject rows and add them as a new column to the above matrix as the 5th column . However my code i.e class.marks.chemistry<- rowSums(final.marks[2,]) keeps producing an error saying
Error saying
rowSums(final.marks[2, ]) :
'x' must be an array of at least two dimensions
Can you please help me solve it. I am very new to R or any form of scripting or programming background.
Do you mean this?
# Sample data
df <- read.table(text =
" raj sanga rohan rahul
physics 45 43 44 49
chemistry 47 45 48 47
total 92 88 92 96", header = T)
# Add column total with row sum
df$total <- rowSums(df);
df;
# raj sanga rohan rahul total
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368
The above also works if df is a matrix instead of a data.frame.
If you look at ?rowSums you can see that the x argument needs to be
an array of two or more dimensions, containing numeric,
complex, integer or logical values, or a numeric data frame.
So in your case we must pass the entire data.frame (or matrix) as an argument, rather than a specific column (like you did).
Another option would be to use addmargins on a matrix
addmargins(as.matrix(df), 2)
# raj sanga rohan rahul Sum
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368

use dplyr mutate() in programming

I am trying to assign a column name to a variable using mutate.
df <-data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(name){
df%>%mutate(name = ifelse(x <50, "small", "big"))
}
When I run
new(name = "newVar")
it doesn't work. I know mutate_() could help but I'm struggling in using it together with ifelse.
Any help would be appreciated.
Using dplyr 0.7.1 and its advances in NSE, you have to UQ the argument to mutate and then use := when assigning. There is lots of info on programming with dplyr and NSE here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
I've changed the name of the function argument to myvar to avoid confusion. You could also use case_when from dplyr instead of ifelse if you have more categories to recode.
df <- data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(myvar){
df %>% mutate(UQ(myvar) := ifelse(x < 50, "small", "big"))
}
new(myvar = "newVar")
This returns
x y newVar
1 37 1.82669 small
2 63 -0.04333 big
3 46 0.20748 small
4 93 0.94169 big
5 83 -0.15678 big
6 14 -1.43567 small
7 61 0.35173 big
8 26 -0.71826 small
9 21 1.09237 small
10 90 1.99185 big
11 60 -1.01408 big
12 70 0.87534 big
13 55 0.85325 big
14 38 1.70972 small
15 6 0.74836 small
16 23 -0.08528 small
17 27 2.02613 small
18 76 -0.45648 big
19 97 1.20124 big
20 99 -0.34930 big
21 74 1.77341 big
22 72 -0.32862 big
23 64 -0.07994 big
24 53 -0.40116 big
25 16 -0.70226 small
26 8 0.78965 small
27 34 0.01871 small
28 24 1.95154 small
29 82 -0.70616 big
30 77 -0.40387 big
31 43 -0.88383 small
32 88 -0.21862 big
33 45 0.53409 small
34 29 -2.29234 small
35 54 1.00730 big
36 22 -0.62636 small
37 100 0.75193 big
38 52 -0.41389 big
39 36 0.19817 small
40 89 -0.49224 big
41 81 -1.51998 big
42 18 0.57047 small
43 78 -0.44445 big
44 49 -0.08845 small
45 20 0.14014 small
46 32 0.48094 small
47 1 -0.12224 small
48 66 0.48769 big
49 11 -0.49005 small
50 87 -0.25517 big
Following the dlyr programming vignette, define your function as follows:
new <- function(name)
{
nn <- enquo(name) %>% quo_name()
df %>% mutate( !!nn := ifelse(x <50, "small", "big"))
}
enquo takes its expression argument and quotes it, followed by quo_name converting it into a string. Since nn is now quoted, we need to tell mutate not to quote it a second time. That's what !! is for. Finally, := is a helper operator to make it valid R code. Note that with this definition, you can simply pass newVar instead of "newVar" to your function, maintaining dplyr style.
> new( newVar ) %>% head
x y newVar
1 94 -1.07642088 big
2 85 0.68746266 big
3 80 0.02630903 big
4 74 0.18323506 big
5 86 0.85086915 big
6 38 0.41882858 small
Base R solution
df <-data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(name){
df[,name]='s'
df[,name][df$x>50]='b'
return(df)
}
I am using dplyr 0.5 so i just combine base R with mutate
new <- function(Name){
df=mutate(df,ifelse(x <50, "small", "big"))
names(df)[3]=Name
return(df)
}
new("newVar")

assign objects to dynamic lists in r

I have a nested loops which produce outputs that I want to store in list objects with dynamic names. A toy example of this would look as follows:
set.seed(8020)
names<-sample(LETTERS,5,replace = F)
for(n in names)
{
#Create the list
assign(paste0("examples_",n),list())
#Poulate the list
get(paste0("examples_",n))[[1]]<-sample(100,10)
get(paste0("examples_",n))[[2]]<-sample(100,10)
get(paste0("examples_",n))[[3]]<-sample(100,10)
}
Unfortunately I keep getting the error:
Error in get(paste0("examples_", n))[[1]] <- sample(100, 10) :
target of assignment expands to non-language object
I have tried all kind of assign, eval, get type of functions to parse the object, but haven't had any luck
Expanding on my comment with a worked example:
examples <- vector(mode="list", length=length(names) )
names(examples) <- names # please change that to mynames
# or almost anything other than `names`
examples <- lapply( examples, function(L) {L[[1]] <- sample(100,10)
L[[2]] <- sample(100,10)
L[[3]] <- sample(100,10); L} )
# Top of the output:
> examples
$P
$P[[1]]
[1] 34 49 6 55 19 28 72 42 14 92
$P[[2]]
[1] 97 71 63 59 66 50 27 45 76 58
$P[[3]]
[1] 94 39 77 44 73 15 51 78 97 53
$F
$F[[1]]
[1] 12 21 89 26 16 93 4 13 62 45
$F[[2]]
[1] 83 21 68 74 32 86 52 49 16 13
$F[[3]]
[1] 14 45 40 46 64 85 88 28 53 42
This mode of programming does become more natural over time. It gets you out of writing clunky for-loops all the time. Develop your algorithms for a single list-node at a time and then use sapply or lapply to iterate the processing.

Display values on heatmap in R

I am working on a heatmap using heatmap.2 and would like to know if there is anyway to display the values on all heatmap positions. For example for the area representing "1" and rating I would like to display value "43", for "2" and privileges the value 51 and so on.
My sample data is as follows:
rating complaints privileges learning raises critical advance
1 43 51 30 39 61 92 45
2 63 64 51 54 63 73 47
3 71 70 68 69 76 86 48
4 61 63 45 47 54 84 35
Is this what you mean? By providing the data object as the cellnote argument, the values are printed in the heatmap.
heatmap.2(data, # cell labeling
cellnote=data,
notecex=1.0,
notecol="cyan",
na.color=par("bg"))
The answer is just for "For Cell labeling is there anyway not to display values that are 0".
cellnote=ifelse(data==0, NA, data) will work as you want.
In python when using seaborn.heatmap by simply using annot=True, all the values are displayed in the heatmap plot. See the example below:

Transpose with multiple variables and more than one metrics in R

I'm previously a SAS user - since I don't have SAS anymore I need to learn to use R for work.
The dataset has the following column:
market date sitename impression clicks
I want to transpose it into:
market date sitename-impression sitename-clicks
I think in SAS I used to do:
Proc Transpose
by market date;
id sitename;
var impression clicks;
run;
I do have a book on R and googled a lot, but couldn't find the solution that works...
Would really appreciate if anyone can help.
Thanks in advance!!!
Let me start by saying welcome to stackoverflow. Glad to have anew user. When you ask a question it's helpful and encouraged for you to provide the code you're using and a reproducible data set that looks like the original. This is called a minimal reproducible example. To get a data set into here you can use several options, here are two: use dput() around the object name and cut and paste what is displayed in the console or just post the dataframe directly. For the code provide all the code necessary to replicate your problem. I hope you find this helpful for future questions you'll ask.
I may not fully understand but I think you want to transform, not transpose, the data.
dat <- data.frame(market=rnorm(10), date=rnorm(10), #let's create a data set
sitename=rnorm(10), impression=rnorm(10), clicks=rnorm(10))
dat #look at it (I pasted it below)
# > dat
# market date sitename impression clicks
# 1 -0.9593797 -0.08411994 1.6079129 -0.5204772 -0.31633966
# 2 -0.5088689 1.78799500 -0.2469315 1.3476964 -0.04344779
# 3 -0.1527465 0.81673996 1.7824969 -1.5531260 -1.28304384
# 4 -0.7026194 0.52072913 -0.1174356 0.5722210 -1.20474443
# 5 -0.4537490 -0.69139062 1.1124277 -0.2452974 -0.33025320
# 6 0.7466588 0.36318337 -0.4623319 -0.9036768 -0.65754302
# 7 0.8007612 2.59588554 0.1820732 0.4318629 -0.36308748
# 8 1.0781715 -1.01512734 0.2297475 0.9219439 -1.15687902
# 9 0.3731450 -0.19004572 0.5190749 -1.4020371 -0.97370295
# 10 0.7724259 1.76528303 0.5781786 -0.5490849 -0.83819036
#now to create the new columns (I think this is what you want)
#the easiest way is to use transform. ?tranform for more
dat.new <- transform(dat, sitename.clicks=sitename-clicks,
impression.clicks=impression-clicks)
dat.new #here's the new data set. Notice it has the new and old columns.
#To get rid of the old columns you can use indexing and specify the columns you want.
dat.new[, c(1:2, 6:7)]
#We could have also done:
dat.new[, c(1,2,6,7)]
#or said the columns not wanted with negative indexing:
dat.new[, -c(3:5)]
EDIT In looking at Brian's comments and the variables I would think that a long to wide transformation is what the poster desires. I would likely approach it using Wickham's reshape2 package as well, as this method is easier for me to work with and I imagine it would be easier for an R beginner as well. However, here is a base way to do the long to wide format using the same data set Brian provided:
wide <- reshape(DF, v.names=c("impression", "clicks"), idvar=c("market", "date"),
timevar="sitename", direction="wide")
reshape(wide)
The reshape function is very flexible but takes some getting used to to use appropriately. I'm leaving my previous response up as well to keep the history of this post though I now believe this is not the posters intent. It serves as a reminder that a reproducible example is very helpful in providing clarity to your query.
Example data, as Tyler said, is important. I interpreted your question differently because I thought your data was different. I didn't take the - as a literal subtraction of numerics, but a combination of variables.
DF <- expand.grid(market = LETTERS[1:5],
date = Sys.Date()+(0:5),
sitename = letters[1:2])
n <- nrow(DF)
DF$impression <- sample(100, n, replace=TRUE)
DF$clicks <- sample(100, n, replace=TRUE)
I find the reshape2 package useful for these sort of transpositions/transformations/rearrangements.
library("reshape2")
dcast(melt(DF, id.vars=c("market","date","sitename")),
market+date~sitename+variable)
gives
market date a_impression a_clicks b_impression b_clicks
1 A 2012-02-28 74 97 11 71
2 A 2012-02-29 34 30 88 35
3 A 2012-03-01 40 85 40 49
4 A 2012-03-02 46 12 99 20
5 A 2012-03-03 6 95 85 56
6 A 2012-03-04 61 61 42 64
7 B 2012-02-28 4 53 74 9
8 B 2012-02-29 43 27 92 59
9 B 2012-03-01 34 26 86 43
10 B 2012-03-02 81 47 84 35
11 B 2012-03-03 3 5 91 48
12 B 2012-03-04 19 26 99 21
13 C 2012-02-28 22 31 100 53
14 C 2012-02-29 40 83 95 27
15 C 2012-03-01 78 89 81 29
16 C 2012-03-02 57 55 79 87
17 C 2012-03-03 37 61 3 97
18 C 2012-03-04 83 61 41 77
19 D 2012-02-28 81 18 47 3
20 D 2012-02-29 90 100 17 83
21 D 2012-03-01 12 40 35 93
22 D 2012-03-02 85 14 63 67
23 D 2012-03-03 63 53 29 58
24 D 2012-03-04 40 79 56 70
25 E 2012-02-28 97 62 68 31
26 E 2012-02-29 24 84 17 63
27 E 2012-03-01 94 93 32 2
28 E 2012-03-02 6 26 86 26
29 E 2012-03-03 100 34 37 80
30 E 2012-03-04 89 87 72 11
The column names have a _ between them rather than a -, but you can change that if you want. I wouldn't recommend it, though, because then you will have problems later referencing the column since the - will be taken as subtraction (you would need to quote the name).

Resources