Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to determine which variables in my V1 column have values in the V5 column that are in the range of 95-105 and also have values in the V6 column that are in the 7-13 range. I am using the which function and attempting to store the names of the variables in V1 under the variable x but I keep getting the output integer(0) or character(0) and I'm not sure what that means. An image of my code is attached below.
integer(0) means there are no elements of your data frame that satisfy the conditions. (You could try
with(df, any(95 <= V5 & V5 <= 105 &
13 <= V6 & V6 <= 17))
(edited on the basis of #H1's comment, to match your description rather than your code); rearranging slightly to approximate the A < B < C syntax that R's parser can't handle ...)
You should probably check str(df) and/or summary(df) (or sapply(df, class)) to make sure that your data frame has really been read in as intended (or use dplyr::read_csv(), which prints information about the classes inferred from the data set. In particular, any typos in your data that make an entry not be a valid number (extra decimal point, missing value such as "?" not recognized as missing, etc.) will make R interpret the entire column as a character (since you've set stringsAsFactors=FALSE) rather than a numeric variable.
If you want to force columns 2-14 to numeric, you can use df[-1] <- lapply(df[-1], as.numeric) however, it would be better practice to find and fix any problems upstream ...
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two character vectors and I just want to compare them and just keep those, which contain the same character pattern, here country.
a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv", "nutr_sup_AUS.csv")
b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_AGO_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
#wished result:
result_a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv")
result_b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
I thought about subsetting first and compare the strings then:
a_ISO<-str_sub(a, start=10, end = -5) #subset just ISO name
b_ISO<-str_sub(b, start =12, end = -9 ) #subset just ISO name
dif1<-setdiff(a, b) # get difference (order is important)
dif2<-setdiff(b,a) # get difference
dif<-c(dif1,dif2) # selection which to remove
But I don't know from here how to compare a and b with dif. So basically How to compare a character vector by regex with another character vector.
I think you should extract the characters with a more general approach with regex, not with position. I think it is also easier to just subset the elements you want to keep with intersect() rather than determining the ones to drop with settdiff():
Extract the three-character code with a regex:
index_a<-stringr::string_extract(a, "[A-Z]{3}")
index_b<-stringr::string_extract(b, "[A-Z]{3}")
Then subset the vectors with intersect() and base indexing:
intersect_ab<-intersect(index_a, index_b)
result_a<-a[index_a %in% intersect_ab]
result_b<-b[index_b %in% intersect_ab]
That said, your solution does work with an additional final step:
result_a<-a[!dif1 %in% a_ISO]
result_b<-b[!dif2 %in% b_ISO]
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have an R list where all of the values are in the first position (i.e. list[1]), while I want all the values to be spread evenly throughout the list (list[1] contains one value, list[2] contains the next, etc.). I have been trying unsuccessfully for a while to split the values one position into separate values (each value is a string of characters separated by spaces) but nothing has worked.
Below is an illustration of the sort of situation I am in.
Say "test" is the name of a list in R. Test is an object of length 1, and if you enter test[1] in the console, the output is thousands of values formatted like so:
[1] "value1" "value2" "value3" ... etc.
Now I want to somehow split the contents of list[1] so that each separated character string is in a separate position, so test[1] is "value1", test[2] is "value2", etc. I have looked around for and attempted many purported solutions to this sort of issue (recent example here: List to integer or double in R) but nothing has worked for me so far.
Here's a simple way:
l1 <- list(l1 = round(rnorm(100, 0, 5), 0))
v <- unlist(l1)
l2 <- as.list(v)
length of l1 is 1 and length of l2 is 100. Is this what you are after?
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3)
nRowsDf <- nrow(df)
for(i in 1:nRowsDf){
df[i,1] =ifelse(is.na(df[i,1]), lag(df[i,1])+3*lag(df[i,1]), df[i,1])
}
The above code does not give me an error but does not do the job either.
In addition, is there a better way to do this instead of writing a loop?
Update and Data:
Here is an example of data. I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3). The NA values are in subsequent rows.
df <- mtcars
df[c(2,3,4,5),1] <-NA
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lead(IND,1L, F),1] * 3
The last line of the above code does the job row by row (I should run it 4 times to fill the 4 missing rows). How can I do it once for all rows?
reproducible data which YOU should provide:
df <- mtcars
df[c(1,5,8),1] <-NA
code:
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lag(IND,1L, F),1] * 3
since you use lag I use lag. You are saying "previous". So maybe you want to use lead.
What happens if the first value in lead case or last value in lag case is missing. (this remains a mystery)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a dataframe with a column named Stage. The dataframe is generated from a regularly updated excel file.
This column should only have a certain few values in it, such as 'Planning', or 'Analysis', but people occasionally put custom values in and it is impractical to stop.
I want the dataframe sorted by this column, with a custom sort order that makes sense chronologically (e.g for us, planning comes before analysis). I would be able to implement this using factors (e.g. Reorder rows using custom order ), but if I use a predefined list of factors, I lose any unexpected values that people enter into that column. I am happy for the unexpected values not to be sorted properly but I don't want to lose them entirely.
EDIT: Answer by floo0 is amazing, but I neglected to mention that I was planning on barplotting the results, something like
barplot(table(MESH_assurance_involved()[MESH_assurance_involved_sort_order(), 'Stage']), main="Stage became involved")
(parentheses because these are shiny reactive objects, shouldn't make a difference).
The results are unsorted, although testing in the console reveals the underlying data is sorted.
table is also breaking the sorting but using ggplot and no table I get the identical result.
To display a barplot maintaining the source order seems to require something like Ordering bars in barplot() but all solutions I have found require factors, and mixing them with the solution here is not working for me somehow.
Toy data-set:
dat <- data.frame(Stage = c('random1', 'Planning', 'Analysis', 'random2'), id=1:4,
stringsAsFactors = FALSE)
So dat looks as follows:
> dat
Stage id
1 random1 1
2 Planning 2
3 Analysis 3
4 random2 4
Now you can do something like this:
known_levels <- c('Planning', 'Analysis')
my_order <- order(factor(dat$Stage, levels = known_levels, ordered=TRUE))
dat[my_order, ]
Which gives you
Stage id
2 Planning 2
3 Analysis 3
1 random1 1
4 random2 4
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have CSV data as follows:
code, label, value
ABC, len, 10
ABC, count, 20
ABC, data, 102
ABC, data, 212
ABC, data, 443
...
XYZ, len, 11
XYZ, count, 25
XYZ, data, 782
...
The number of data entries is different for each code. (This doesn't matter for my question; I'm just point it out.)
I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?
Is there a better way of doing this than this kind of thing:
x = read.csv('dataFile.csv, header=T)
...
median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value))
boxplot(median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value)))
split and list2env allows you to separate your data.frame x for each code generating one data.frame for each level in code:
list2env(split(x, x$code), envir=.GlobalEnv)
or just
my.list <- split(x, x$code)
if you prefer to work with lists.
I'm not sure I totally understand the final objective of your question, do you just want some pointers of what you could do it? because there are a lot of possible solutions.
When you ask: I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?
The answer would be no, you don't strictly have to. You could use R functions which does this task for you, for example:
x = read.csv('dataFile.csv', header=T)
#is it numeric?
class(x$value)
# if it is already numeric you shouldn't have to convert it,
# if it strictly numeric I don't know any reason why it
# should be read as strings but it happens.
aggregate(x,by=list(x$code),FUN="median")
boxplot(value~code,data=x)
# and you can do ?boxplot to look into its options.