Using rowSum and subset to clean data [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am having trouble subsetting a large data frame. I have 5,000 observations and 60+ columns. I want to subset based on ~ 30 columns -- essentially to "drop" any observations where the sum of the values in these 30 columns of interest == 0. A small sample is below: I would want to get rid of UID #1 and #3.
UID 236.1(b) 261.5(c) 261.5(d)
1 0 0 0
2 2 3 0
3 0 0 0
4 0 0 0
I have tried the following code:
sub <- subset(df, rowSums(df[, 29:60]>0))
which generated the following error term:
Error in subset.data.frame(merge_charge, rowSums(merge_charge[, 29:60] > : 'subset' must be logical
and:
test <- subset(rowSums(df[,29:60]>0))
Which generated the following error:
Error in subset.default(rowSums(merge_charge[, 29:60] > 0)) :
argument "subset" is missing, with no default
Any suggestions or pointers would be most appreciated.

First, take a look at subset() function.
You can use it like this:
subset(data, condition)
So, you miss the data argument here.
Second, you put ( in rowSums wrongly. It must be rowSums(df[,1:2]) > 0
Therefore, It'll be:
test <- subset(your_data, rowSums(your_data[,29:60])>0 )

Related

Why if_else function does not work in other data set [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 months ago.
Improve this question
I am running an if_else function to create a new outcome vectors from 4 columns of data.
The command is as follows:
payment_amt <- if_else( interest_rate>0,
(balance-(balance*amortisation_factor)/(1+(interest_rate/12))^tenor)*((interest_rate/12)/(1-((1+(interest_rate/12))^(-1*tenor)))),
0 )
This command work well in 1 of my data
But does not work in other data
I tried my best to google but could not understand why the command did not work for the second set of data.
Very much appreciate if anyone can help!
Here I attach here my code & the data_work and data_not_work sets for your reference
# Data Work _ test
tenor = data_work[,"ECL_TENOR"]
interest_rate = data_work[,"INTEREST_RATE"]
amortisation_factor = data_work[,"AMORTISATION_FACTOR"]
balance = data_work[,"ECL_BALANCE"]
payment_amt <- if_else( interest_rate>0,
(balance-(balance*amortisation_factor)/(1+(interest_rate/12))^tenor)*((interest_rate/12)/(1-((1+(interest_rate/12))^(-1*tenor)))),
0 )
payment_amt
#####################################################
# Data Not work _ Test
tenor = data_not_work[,"ECL_TENOR"]
interest_rate = data_not_work[,"INTEREST_RATE"]
amortisation_factor = data_not_work[,"AMORTISATION_FACTOR"]
balance = data_not_work[,"ECL_BALANCE"]
payment_amt <- if_else( interest_rate>0,
(balance-(balance*amortisation_factor)/(1+(interest_rate/12))^tenor)*((interest_rate/12)/(1-((1+(interest_rate/12))^(-1*tenor)))),
0 )
Here is data
After posting this question, I found out that during the merging process, the data_not_work set has been hiddenly converted to tible, that why if_else does not work. When I convert it back to data frame, then if_else work.

Dividing all integers of a column in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am trying to divide all integers in a column with another integer. I have a database with a column that has integers that go above 1*10^20. Because of this my plots are way to big. I need to normalize the data to have a better understanding what is going on. For example, the data that I have:
[x][Day] [Amount]
[1] 1 1 23440100
[2] 2 2 41231020
[3] 3 3 32012010
I am using a data.frame for my own data, so here you have the data frame for the data above
x <- c(1,2,3)
day <- c(1,2,3)
Amount <- c(23440100, 41231020, 32012010)
my.data <- data.frame(x, day, Amount)
I tried using another answer, provided here, but that doesn't seem to work.
The code that I tried:
test <- my.data[, 3]/1000
Hope someone can help me out! Cheers, Chester
I guess you are looking for this?
my.data$Amount <- my.data$Amount/1000
such that
> my.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Use mutate from dplyr
Since you're using a data.frame, you can use this simple code:
library(dplyr)
mutated.data <- my.data %>%
mutate(Amount = as.integer(Amount / 1000))
> mutated.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Hope this helps.

model.matrix doesn't work for dataframe with multiple rows [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I have a very simple data frame that I wish to turn into a matrix.
df <- data.frame(age=c(50, 60), sex=factor(c('M', 'F')))
However, when I try to run model.matrix it fails:
model.matrix(1 ~ age + sex, df)
Error in model.frame.default(object, data, xlev = xlev) :
variable lengths differ (found for 'age')
However, if I run a row at a time it's fine.
model.matrix(1 ~ age + sex, df[1, ])
(Intercept) age sexM
1 1 50 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"
I've got what I want working with an lapply over the rows and do.call('rbind', ...) to join it back together, but I must be missing something right?
The 1 is your problem because it is length 1. You can use model.matrix(~ age + sex, df) if you don't want to specify a response.

Error in dataframe in R when trying to recode a variable [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I am workign with NLS data and try to recode the gender variable, where I called it female from the beginning and now I try to recode the following
1 Male
2 Female
0 No Information
My code:
nlsy$female[ nlsy$female == 1 ] <- 0
nlsy$female[ nlsy$female == 2 ] <- 1
However, I get the following error from R:
Error in `$<-.data.frame`(`*tmp*`, "female", value = numeric(0)) : replacement has 0 rows, data has 7120
Any suggestions?
what I would check:
data.frame nlsy is not empty, by empty i mean with 0 rows / records.
do you have a column named 'female' in the data.frame nlsy
class of the column, is it integer or character or others
after all the checks
nlsy$female[which(nlsy$female == 2)] <- 1

How to use hist() function with count data as input [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 6 years ago.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Improve this question
What is the criteria for using the hist() function in R?
I have two columns of data, which looks something like this:
1 8764
2 604
3 150
4 50
5 21
6 7
7 2
8 5
10 3
11 2
12 1
14 1
16 1
17 2
18 3
20 1
23 1
24 1
25 1
28 1
29 1
When I put that into a data frame in R, and try to plot that using the hist() function, it gives me an error "x: must be numerical". How do I go about solving this?
I'm trying to get the first column on the x-axis and the second column on the y-axis.
Pardon if the question sounds stupid, its my first time using R.
you can do like this if you absolutely want to use the hist function:
hist(rep(df[[1]], df[[2]]))
df being your data.frame (well, if I understand as Roland said in the comments that the first column might be your values and the second column your frequency counts)
Edit it appears that your data.frame only has one column. In this case this will work:
hist(rep(seq_along(df[[1]]), df[[1]]))
In case of presence of NA do this before:
df <- na.omit(df)
If your data is data.frame called df then just do hist(df). It plots by default, of you can plot it with plot(hist(df)).

Resources