Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have an R list where all of the values are in the first position (i.e. list[1]), while I want all the values to be spread evenly throughout the list (list[1] contains one value, list[2] contains the next, etc.). I have been trying unsuccessfully for a while to split the values one position into separate values (each value is a string of characters separated by spaces) but nothing has worked.
Below is an illustration of the sort of situation I am in.
Say "test" is the name of a list in R. Test is an object of length 1, and if you enter test[1] in the console, the output is thousands of values formatted like so:
[1] "value1" "value2" "value3" ... etc.
Now I want to somehow split the contents of list[1] so that each separated character string is in a separate position, so test[1] is "value1", test[2] is "value2", etc. I have looked around for and attempted many purported solutions to this sort of issue (recent example here: List to integer or double in R) but nothing has worked for me so far.
Here's a simple way:
l1 <- list(l1 = round(rnorm(100, 0, 5), 0))
v <- unlist(l1)
l2 <- as.list(v)
length of l1 is 1 and length of l2 is 100. Is this what you are after?
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two character vectors and I just want to compare them and just keep those, which contain the same character pattern, here country.
a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv", "nutr_sup_AUS.csv")
b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_AGO_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
#wished result:
result_a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv")
result_b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
I thought about subsetting first and compare the strings then:
a_ISO<-str_sub(a, start=10, end = -5) #subset just ISO name
b_ISO<-str_sub(b, start =12, end = -9 ) #subset just ISO name
dif1<-setdiff(a, b) # get difference (order is important)
dif2<-setdiff(b,a) # get difference
dif<-c(dif1,dif2) # selection which to remove
But I don't know from here how to compare a and b with dif. So basically How to compare a character vector by regex with another character vector.
I think you should extract the characters with a more general approach with regex, not with position. I think it is also easier to just subset the elements you want to keep with intersect() rather than determining the ones to drop with settdiff():
Extract the three-character code with a regex:
index_a<-stringr::string_extract(a, "[A-Z]{3}")
index_b<-stringr::string_extract(b, "[A-Z]{3}")
Then subset the vectors with intersect() and base indexing:
intersect_ab<-intersect(index_a, index_b)
result_a<-a[index_a %in% intersect_ab]
result_b<-b[index_b %in% intersect_ab]
That said, your solution does work with an additional final step:
result_a<-a[!dif1 %in% a_ISO]
result_b<-b[!dif2 %in% b_ISO]
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
For example, I have a data.frame with 40 rows and 20 columns and want to create 100 variables assigned to the name of the first row (a string):
row_name_1 <- df[1, ]
Is there a way to write a loop to do this for all 100 rows that would save the trouble of typing 40 lines of code?
I have tried using this code:
Phoneme_Features.list <- setNames(split(Phoneme_Features,
seq(nrow(Phoneme_Features))), rownames(Phoneme_Features))
The specific application for this would be to be able to search another data frame based on the values from the first data frame.
I have 2 data frames: Phoneme_Features and Phonetic_Dictionary (with 130,000 rows). Phoneme features is data frame where each row corresponds to around 20 phonetic features (e.g. F = consonant = 1, vowel = 0, labial = 1, dental = 1, etc). The Phonetic_Dictionary contains 130,000 words with the corresponding phonetic transcription (e.g. phonetics F AH0 N EH1 T IH0 K S)
I want to use the new variables to replace the values of another data frame (stored as factors) so that I can search items in the second data frame by the features in the first data frame (Phoneme Features).
I would like to be able to search Phonetic_Dictionary and return every entry in which the first column contains a value of 1 for consonant. In other words, to be able to search the dictionary for all entries with an initial consonant, or final high vowel, or any other feature from the first data frame Phoneme_Features.
You can use assign() and paste0() to create variables names programatically.
An example using the iris dataset:
for(i in 1:nrow(iris)){
assign(paste0('row_name_',i),iris[i,])
}
paste0() attaches the row number, i, to the string row_name_ and then assign() then assigns the newly created variable name to the enviroment with a value of iris[i,]
Thanks for everybody's help. I was able to get what I wanted by using:
for(i in 1:nrow(Phoneme_Features)){
assign(paste0(Phoneme_Features[i, ]), Phoneme_Features[i, ])}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Apologies that my question was unclear. It was my interpretation of the following assignment question:
Create a list (named mylist) that consists of one character vector (“J”,“A”,“G”,”B”,”H”,”E”,”C”,”F”,”D”,”I”), one numeric vector (10 random values from rnorm), and a matrix of size 10 x 10 (containing integer 1 to 100). After that you will provide a way of sorting the rows of all components (character, numeric and matrix) of mylist based on the order of the sorted character list. Finally, do a matrix times vector multiplication of the sorted second component and the third (integer) component (you will need to extract and convert these components to suitable modes).
Based on the code above, write a function that reads one character vector (of size n), one numeric vector (of size n) and one matrix (of size n x n). Then sorts the rows of all components based on the character vector, performs matrix times vector multiplication, combines the output of the multiplication with the input into a data frame that should be the output of the function.
We need to extract the character vector separately, order it and then use lapply toorder the elements
i1 <- order(lst$vec1) #assuming that the character `vector` is named `vec1`
lst1 <- lapply(lst, function(x) if(is.vector(x)) x[i1] else x[i1,])
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i have a string like this x <- "avd_1xx_2xx_3xx"
i need to extract the number from x(string) and put them in new variables
num1 <- 1xx
num1 <- 2xx
num1 <- 3xx
however, i can't predict the number of digits for each number
for instance, this x would be "avd_1_2_3" or "avd_11_21_33" or likes
could you give me some solutions?
Thanks
We can use str_extract from stringr. To extract multiple matches we use str_extract_all, which returns a list of length 1 (as we have a single element in 'x'). To extract the list element, we can use [[ i.e. [[1]].
library(stringr)
str_extract_all(x, "\\d+[a-z]*")[[1]]
#[1] "1xx" "2xx" "3xx"
A similar option using base R would be regmatches/gregexpr
regmatches(x, gregexpr("\\d+[a-z]*", x))[[1]]
#[1] "1xx" "2xx" "3xx"
The pattern we match is one or more numbers (\\d+) followed by zero or more lower case letters ([a-z]*).
It is better to keep it as a vector rather than having multiple objects in the global environment.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars
First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)