Subset data in R to look at specefic country [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am currently using the EVS data. It is a panel data from 1981-2021. This has 223099 obvs and 635 variables. One of the variables of interest is the country. I am trying to subset the data frame to only look at one country but at the same time look at the other variables. I am not sure what to do.
I input the code:
data <-subset(data,COW_NUM == "339")
Where COW_NUM is the country number. 339 is the country of interest
I keep getting an error message. I don't know what I am doing wrong.

Can you provide a link to the data?
my_data: this is your data
my_data$COW_NUM: this is the column that you want to use to filter
In R if you do my_data$COW_NUM == "339" ("339" is is string ?), R will return a vector of TRUE/FALSE value (223099 TRUE/FALSE if I understand correctly). R will check if each value of my_data$COW_NUM == 339 if yes it return TRUE if not FALSE.
Then you can use this new vector inside [ to subset my_data:
my_data[my_data$COW_NUM == "339",] will keep every rows of my_data where my_data$COW_NUM == "339" give a TRUE and discard the one with FALSE.
Last step should be:
my_data_339 <- my_data[my_data$COW_NUM == "339",]
Hope it help but it is hard to do it without the data!

Related

Errors in Executing While loop [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am trying to read a random data from a dataset in while loop but I'm getting errors, can anyone here help me?
How to calculate the percentage of points in the sample that are greater than 100?
I tried following method
dataset = 1:100
i=0
while(dataset[i] > condition) #compare every value in dataset
{
percent_age= dataset[i] + percent_age
i=i+1
if(i=100)
{break}
}
But it gives me only errors.
The while statement is evaluated before anything in the body, so the first time it is evaluated i is equal to 0 and so dataset[i] is dataset[0] which is an empty object (vector of length 0), you also have not defined condition in the code that you give us. So while is looking of a single logical value, but you are giving it the result of comparing a zero-length vector to an undefined value. That is going to give at least one error.
You can fix that by starting i at 1 and defining condition before the while.
In your if statement you have i=100, that is setting i to 100, to compare (and return a logical) it should be i == 100.
Because R can be used interactively, it tries to evaluate code as early as possible, therefore it is best to put the opening curly bracket { on the same line as the keywords like if and while.
A couple of nit-picky things that probably will not resolve errors, but could help for better programming in the future:
Use more whitespace within lines: i = i + 1 can be easier to read than i=i+1 and mistakes like i=100 vs i == 100 are easier to catch when whitespace is used appropriately.
I find the arrow assignment in R i <- 1 reads easier and lessens chances of confusing different uses of =, so I would recommend using it for all assignments.

Identifying a specific pattern in several adjacent rows of a single column - R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm back with my survey data.
This time, I need to remove a specific set of rows from data when they occur. In our survey, an automated telephone survey, the survey tool will attempt three times during that call to prompt the respondent to enter a response. After three timeouts of the question the survey tool hangs up. This mostly happens when the call goes to someone's voicemail.
I would like to identify that pattern when it happens so I can remove it from calculating call time.
The pattern I am looking for looks like this in the Interactions column:
It doesn't HAVE to be Intro. It can be any part of the survey where it prompting the respondent for a response THREE times but no response is provided so the call fails. But, it does have to be sandwiched in between "Answer" (the phone picks up) and "Timeout. Call failed." (a failure).
I did try to apply what I learned from yesterday's solution (about run length encoding) to my other indexing question but I couldn't make it work in the slightest. So, here I am.
Here's an example dataset:
This is 15 respondents and every interaction between the survey tool and the respondent (or their phone, essentially).
Here's the code for the dataframe: This goes to a Google Drive text editor with the code
If I understand the question correctly, the function below removes all rows between a row with "Answer" and a failure value (there are 3 such values in the question).
The name of the column to look for defaults to "Interactions", and the first answer and failure values also have defaults assigned.
Note that all match instructions are case sensitive.
removeRows <- function(X, col = "Interaction",
ans = "Answer",
fail = c("Timeout. Call failed.", "Partial", "Enqueueing call"))
{
a <- grep(ans, X[[col]])
f <- which(X[[col]] %in% fail)
a <- a[findInterval(f, a)]
for(i in seq_along(a)){
X[[col]][a[i]:f[i]] <- NA_character_
}
Y <- X[complete.cases(X), , drop = FALSE]
Y
}
removeRows(survey_data)

Column type changes to "unknown" when its values are all the same [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
When I create a dataframe, columns with the same values in every row automatically set to type "unknown" and it is not possible to change it.
Here is an example for your better understanding:
data <- data.frame(c(1,1,1,1), c(1:4), c(4:1))
colnames(data) <- c("Not Working", "Ok", "Ok")
The first column of such data frame ("Not Working") is filled with the same values (all 1).
As you run the code, you'll notice that its type is "unknown", while "numeric" is automatically set for the others.
If you try to change it though, nothing works. For example:
data$`Not Working` <- as.numeric(data$`Not Working`)
data$`Not Working` <- as.numeric(as.character(data$`Not Working`))
You'll see that the column type is still the same using both string of code. Neither change anything using brackets instead of the dollar sign.
This happens every time a column gets all its values equal. I also tried to turn the data frame into a matrix first and then into a data frame again, or to change columns into factors first (even if is meaningless for my specific kind of data) and then into numeric, but nothing works.
And although this is not a problem for a classic R script, it turns to be crucial when I try to knit the file, returning the following error:
"Error [...]: replacement has length zero"
After several test, I found out that the error is specific for the column type that should be numeric. I have R markdown and Latex properly installed, so it should be nothing about that.
Does anyone know why this happens and if there is a way to fix it? It looks like a bug or something, but I've already tried to update the program at the latest version but nothing changes.
Firstly you should not have two columns with the same name. I would recommend you use tibble to create data frames.
library(tibble)
data <- tibble("Not Working" = c(1,1,1,1), "Ok" = c(1:4), "Oki" = c(4:1))
sapply(data, class) #check the data types
If you want to change a data type of a column to something specific you can easily specify that.
data <- tibble("Not Working" = as.character(c(1,1,1,1)), "Ok" = c(1:4), "Oki" = c(4:1))

I want to remove duplicated entries in a data.frame based on one column [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
uniqueIDs <- data.frame(unique(MASTERFILE$Number), MASTERFILE[,3])
So I have this big table called "MASTERFILE". In this table, I have a column called "Number" which has a number for each row. Some of the rows will have the same number, so for instance:
1
2
3
3
4
5
So what I would like to do, is to remove the duplicate "3" entry. However, I also want column number 3 to be included in my new "uniqueIDs" data frame (hence the MASTERFILE[,3] part).
Unfortunately, when I try to run this, it will say that the rows from the unique function are different from the rows in column 3, which is obvious, however the question now is, how can I make sure those same rows that where removed in the unique function, also get removed in the 3rd column?
I am sure this questions has already been asked and answered, but something like this should work:
MASTERFILE[!duplicated(MASTERFILE$Number), c("Number", "Col3")]

R - error in for loop [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I want to do a for loop with lists.
I tried :
for(tri in tripletsFinaux){
CoefsCrit = apprentissage(data,tri$concurrents,tri$client,tri$depot)
/* I put actions here but you don't need to see it for my problem */
}
In my loop, I launch a function which need the different values of the list tri.
Because tripletsFinaux is a dataframe. And I need the 3 values of each tri to perform my function apprentissage() in the for loop.
tripletsFinaux looks like :
head(tripletsFinaux,2)
depot client concurrents nbLignes
1 blablabla blobloblo tatata 131
2 bliblibli blublbublu tututu 231
My error is :
$ operator is invalid for atomic vectors
What can I do ?
I don't know if the error is in the apprentissage() function or in the for loop
It seems like you want to loop over the rows, so:
for(i in 1:nrow(tripletsFinaux)){
CoefsCrit = apprentissage(data, tripletsFinaux$concurrents[i], tripletsFinaux$client[i], triplentsFinaux$depot[i])
# ...
}
The above is what you want. The way you have it, for (tri in tripletsFinaux) will loop over the columns, one column at a time. I was hoping that would be clear if you ran for (tri in tripletsFinaux) {print(head(tri))}.

Resources