R Function - Count Non Positives - r

I'm a beginner for R. Please help me with the coding of function as below. Thanks!
Create a function named CountNonpositives that takes a numeric dataframe as its only input parameter. This function should return a dataframe with one row for each column of the input dataframe. This output dataframe should have two columns, one giving the name of each column of the input dataframe and the other giving the number of observations of that variable which are not positive.
Note: missing values, if any, must be included in the nonpositive count.

The sapply is doing the trick for you. I trust, you can encapsulate it into a function of your specifics.
d <- data.frame(
x = c(sample(-10:10, 10, replace = TRUE),NA),
y = c(sample(-10:10, 10, replace = TRUE),NA),
z = c(sample(-10:10, 10, replace = TRUE),NA)
)
sapply(d, function(x) sum(x<0 & !is.na(x)) )
Preview -
> d
x y z
1 5 10 2
2 9 -2 -2
3 -9 10 -2
4 -1 0 0
5 2 -9 7
6 -5 7 -3
7 1 -7 10
8 -10 5 -8
9 8 6 -9
10 -8 10 -4
11 NA NA NA
> sapply(d, function(x) sum(x<0 & !is.na(x)) )
x y z
5 3 6

Related

Ifelse statement inside for loop?

I have a data frame for which I need to change all negative values to positive then the changed values multiply by 100 i.e. multiply all negative values by -100. I MUST use for loop and if or ifelse.
My data frame; x = factor(c("a","b","c","d","e"), y = seq(-4, 4, by = 2), z = c(3,4,-5,6,-8)
x y z
1 a -4 3
2 b -2 4
3 c 0 -5
4 d 2 6
5 e 4 -8
So far I have succeeded in changing two of the negative values but for some reason the other to didn't change.
Here is the code:
for(i in 2:length(df)){
value <- df[[i]][i]
if(value < 0){
df[[i]][i] = value* -100
}
}
The result
x y z
1 a -4 3
2 b 200 4
3 c 0 500
4 d 2 6
5 e 4 -8
as you can see the the two negative values at [2,2] and [3,3] have been multiplied by -100 but the other two have not. Can anyone help me understand why this happened?
Thanks!
Take a look at the simplicity of this application of two vectorized functions abs and *to the last two columns:
dfrm <- read.table(text="x y z
1 a -4 3
2 b -2 4
3 c 0 -5
4 d 2 6
5 e 4 -8",head=TRUE, colClasses=c( "character", "factor", "numeric", "numeric") )
# I have no idea what your statement "z is c()" means
dfrm[-1] <- abs(dfrm[-1])*100
> dfrm
x y z
1 a 400 300
2 b 200 400
3 c 0 500
4 d 200 600
5 e 400 800
To answer some of your specific questions:
The if statement is not being ignore in:
if(i < 0) {
df <- df[ ,i]*-100
{
}
The value of i in the loop index will never by <0 so the expression i,0 will always be TRUE. (That conditional really makes no sense.)
If you want to make an assignment only in those rows were the value was negative, you could assign with a logical index

R - Find the sum for the lagging record and add another column to current value iteratively

So my dataframe is structured like so:
x
s
NA
0
13
0
-3
0
2
0
-4
0
for each row in s, I would like to take the lag(s), add it to column x, then set it to the value of s.
my output data would therefore look like:
x
s
NA
0
13
13
-3
10
2
12
-4
8
I tried the following function, but after fiddling I was only able to get all NA's or all 0's:
mydata$s = lag(mydata$s)+mydata$x
Note - if it helps, I can remove the first row.
You can use cumsum() to perform the job, and also replace NA with 0 during the calculation (without changing your original dataset).
library(tidyverse)
df %>% mutate(s = cumsum(ifelse(is.na(x), 0, x)))
x s
1 NA 0
2 13 13
3 -3 10
4 2 12
5 -4 8
It works for me.
Set up:
mydata <- data.frame(x = c(NA, 13, -3, 2, -4), s = c(0, 13, 10, 12, 8) )
mydata$s <- lag(mydata$s)+mydata$x
Gives:
mydata
x s
1 NA NA
2 13 13
3 -3 10
4 2 12
5 -4 8
The difference is my first s is NA. That should be expected as the first x is NA.
Base R solution:
mydata$s <- c(mydata$x[1], cumsum(mydata$x[-1]))
Data:
mydata <- data.frame(x = c(NA, 13, -3, 2, -4))

Using Diff() in R for multiple columns

I would like to calculate the first order difference for many columns in a data frame without naming them explicitly. It works well with one column with this code:
set.seed(1)
Data <- data.frame(
X = sample(1:10),
Y = sample(1:10),
Z = sample(1:10))
Newdata <- as.data.frame(diff(Data$X, lag = 1))
How to I calculate the same for a lot of columns, e.g.[2:200], in a data frame?
I think this does what you want:
as.data.frame(lapply(Data, diff, lag=1))
## X Y Z
## 1 1 -1 -8
## 2 1 4 4
## 3 2 4 -5
## 4 -5 -5 8
## 5 6 2 -1
## 6 1 1 -1
## 7 -3 -4 -2
## 8 4 -3 -2
## 9 -9 8 1
Since data frames are internally lists, we can lapply over the columns. You can use Data[1:2] instead of Data to just do the first two columns, or any valid column indexing.

For Loop Function in R

I have been struggling to figure out why I am not returning the correct values to my data frame from my function. I want to loop through a vector of my data frame and create a new column by a calculation within the vector's elements. Here's what I have:
# x will be the data frame's vector
y <- function(x){
new <- c()
for (i in x){
new <- c(new, x[i] - x[i+1])
}
return (new)
}
So here I want to create a new vector that returns the next element subtracted from current element. Now, when I apply it to my data frame
df$new <- lapply(df$I, y)
I get all NAs. I know I'm missing something completely obvious...
Also, how would I execute the function that resets itself if df$ID changes so I am not subtracting elements from two different df$IDs? For example, my data frame will have
ID I Order new
1001 5 1 1
1001 6 2 -2
1001 4 3 -2
1001 2 4 NA
1005 2 1 6
1005 8 2 0
1005 8 3 -2
1005 6 4 NA
Thanks!
Avoid the loop and use diff. Everything is vectorized here so it's easy.
df$new <- c(diff(df$I), NA)
But I don't understand your example result. Why are some 0 values changed to NA and some are not? And shouldn't 8-2 be 6 and not -6? I think that needs to be clarified.
If the 0 values need to be changed to NA, just do the following after the above code.
df$new[df$new == 0] <- NA
A one-liner of the complete process, that returns the new data frame, can be
within(df, { new <- c(diff(I), NA); new[new == 0] <- NA })
Update : With respect to your comments below, my updated answer follows.
> M <- do.call(rbind, Map(function(x) { x$z <- c(diff(x$I), NA); x },
split(dat, dat$ID)))
> rownames(M) <- NULL
> M
ID I Order z
1 1001 5 1 1
2 1001 6 2 -2
3 1001 4 3 -2
4 1001 2 4 NA
5 1005 2 1 6
6 1005 8 2 0
7 1005 8 3 -2
8 1005 6 4 NA
The dplyr library makes it very easy to do things separately for each level of a grouping variable, in your case ID. We can use diff as #Richard Scriven recommends, and use dplyr::mutate to add a new column.
> library(dplyr)
> df %>% group_by(ID) %>% mutate(new2 = c(diff(I), NA))
Source: local data frame [8 x 5]
Groups: ID
ID I Order new new2
1 1001 5 1 1 1
2 1001 6 2 -2 -2
3 1001 4 3 -2 -2
4 1001 2 4 NA NA
5 1005 2 1 6 6
6 1005 8 2 0 0
7 1005 8 3 -2 -2
8 1005 6 4 NA NA
Rather than a loop, you would be better off using a vector version of the math. The exact indices will depend on what you want to do with the last value... (Note this line is not placed into your for loop, but just gives the result.)
df$new = c(df$I[-1],NA) - df$I
Here you will be subtracting the original df$I from a shifted version that omits the first value [-1] and appends a NA at the end.
EDIT per comments: If you don't want to subtract across df$ID, you can blank out that subset of cells after subtraction:
df$new[df$ID != c(df$ID[-1],NA)] = NA

Subtracting small data.frame from large data.frame by grouped a variable

I have got a very large data set
mdf <- data.frame (sn = 1:40, var = rep(1:10, 4), block = rep(1:4, each = 10),
yld = c(1:40))
I have small data set
blockdf <- data.frame(block = 1:4, yld = c(10, 20, 30, 40)) # block means
All variables in both dataset except yld are factors.
I want to subtract block means (blockdf$yld) form each mdf$yld dataset, such that the block effects should correspond to block in mdf dataframe.
for example: value 10 will be substracted from all var within
first block yld in mdf
20 - second block yld in mdf
and so on
Please note that I might have sometime unbalance number of var within the reps. So I want to write it in such way that it can handle unbalance situation
This should do the trick
block_match <- match(mdf$block, blockdf$block)
transform(mdf, yld = yld - blockdf[block_match, 'yld'])
This should work
newdf <- merge(x=mdf, y=blockdf, by="block", suffixes = c("",".blockmean"))
newdf$newvr <- newdf$yld-newdf$yld.blockmean
print(newdf, row.names=FALSE)
block sn var yld yld.blockmean newvr
1 1 1 1 10 -9
1 2 2 2 10 -8
1 3 3 3 10 -7
1 4 4 4 10 -6
1 5 5 5 10 -5
1 6 6 6 10 -4
1 7 7 7 10 -3
1 8 8 8 10 -2
1 9 9 9 10 -1
1 10 10 10 10 0
2 11 1 11 20 -9
2 12 2 12 20 -8
...........................

Resources