I am trying to format a list such that I would have one word per value(I imported it from a very poor quality csv, and can't do much about improving the csv). I currently am trying to make it so that every element only has one value, however, the code I am currently using is not doing this, although I am not getting error messages.
Here is the code I am currently using:
Terms <- [] #9020 elements with lengths 1, 2, and 3
for (x in 1:length(Terms)){
if (Terms[[x]] %>% is.list()){
term <-Terms[[x]]
length(term) <- 1
Terms[[x]]<-term
}#should return list of same size, but only with elements of length 1
Any help figuring out what I could use to make it so that I can delete any second variables would be appreciated.
An option would be to create a logical condition with lengths and then use that for subsetting the list
lst2 <- lst1[lengths(lst1) == 1]
If the intention is to get only the first element
lst2 <- lapply(lst1, `[`, 1)
NOTE: Assuming the list elements are vectorss
Related
I'm currently working on a Homework, where I'm asked to subset a list of reviews to a new list containing only reviews with 5 or less words.
Using short_revs <- walk(mydoc, ~length(mydoc[[i]]) <= 5)) returns me the same initial List.
Can anyone help?
I think walk is not the right tool for this: it operates solely in side-effect, always returning the input unchanged. Some simple alternatives, choose one:
short_revs <- mydoc[ lengths(mydoc) <= 5 ]
short_revs <- Filter(function(z) length(z) <= 5, mydoc]
The title might be misleading but I have the scenario here:
half_paper <- lapply(data_set[,-1], function(x) x[x==0]<-0.5)
This line is supposed to substitute 0 for 0.5 in all of the columns except the first one.
Then I want to take half_paper and put it into here where it would rank all of the columns except the first one in order.:
prestige_paper <-apply(half_paper[,-1],2,rank)
But I get an error and I think that I need to somehow make half_paper into a data set like data_set.
Thanks for all of your help
Your main issue 'This line is supposed to substitute 0 for 0.5 in all of the columns except the first one' can be remedied by placing another line in your anonymous function. The gets operator <- returns the value of whatever is on the right hand side, so your lapply was returning a value of 0.5 for each column. To remedy this, another line can be added to the function that returns the modified vector.
It's also worth noting that lapply returns a list. apply was substituted in for lapply in this case for consistency, but plyr::ddply may suit this specific need better.
half_mtcars <- apply(mtcars[, -1], 2, function(x) {x[x == 0] <- .5;return(x)})
prestige_mtcars_tail <- apply(half_mtcars, 2, rank)
prestige_mtcars <- cbind(mtcars[,1, drop = F], prestige_mtcars_tail)
Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]
I have a list with 20 elements each contains a vector of 2 numbers. I have also generated a sequence of numbers (20). Now I would like to construct 1 long vector that would first list the elements of intervals[[1]] and the first element of newvals[1], later intervals[[2]], newvals[2] etc etc
Help will be much appreciated. I think plyr package might be helpful although I am not sure how to structure it. help will be much appreciated!
s1 <- seq(0, 1, by = 0.05)
intervals <- Map(c, s1[-length(s1)], s1[-1])
intervals[[length(intervals)]][2] <- intervals[[length(intervals)]][2]+0.1
newvals <- seq(1,length(intervals),1)
#### HERE I WOULD LIKE TO HAVE A VECTOR IN THE FOLLOWING PATTERN
####UP TO THE LAST ELEMENT OF THE LIST:
stringreclass <- c(intervals[[1]],newvals[1]), .... , intervals[[20]],newvals[20])
In R I have a function (coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply.
I have a list of IP's called ip.addresses:
> head(ip.addresses)
[1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12"
Note: Those or any other IP's can be used to reproduce this problem.
So I apply the function to that object with sapply:
ips.info <- sapply(ip.addresses, ip2coordinates)
and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a dataframe. The problem is that not all IP addresses are in the databases thus some list elements only have 1 field and I get this error:
> ips.df <- as.data.frame(ips.info)
Error in data.frame(`128.177.90.10` = list(ip.address = "128.177.90.10", :
arguments imply differing number of rows: 1, 0
My question is -- "How do I remove the elements with missing/incomplete data or otherwise convert this list into a data frame with 11 columns and 1 row per IP address?"
I have tried several things.
First, I tried to write a loop that removes elements with less than a length of 11
for (i in 1:length(ips.info)){
if (length(ips.info[i]) < 11){
ips.info[i] <- NULL}}
This leaves some records with no data and makes others say "NULL", but even those with "NULL" are not detected by is.null
Next, I tried the same thing with double square brackets and get
Error in ips.info[[i]] : subscript out of bounds
I also tried complete.cases() to see if it could potentially be useful
Error in complete.cases(ips.info) : not all arguments have the same length
Finally, I tried a variation of my for loop which was conditioned on length(ips.info[[i]] == 11 and wrote complete records to another object, but somehow it results in an exact copy of ips.info
Here's one way you can accomplish this using the built-in Filter function
#input data
library(RDSTK)
ip.addresses<-c("128.177.90.10","71.179.13.143","66.31.55.111","98.204.243.188",
"67.231.207.8","67.61.248.15")
ips.info <- sapply(ip.addresses, ip2coordinates)
#data.frame creation
lengthIs <- function(n) function(x) length(x)==n
do.call(rbind, Filter(lengthIs(11), ips.info))
or if you prefer not to use a helper function
do.call(rbind, Filter(function(x) length(x)==11, ips.info))
Alternative solution based on base package.
# find non-complete elements
ids.to.remove <- sapply(ips.info, function(i) length(i) < 11)
# remove found elements
ips.info <- ips.info[!ids.to.remove]
# create data.frame
df <- do.call(rbind, ips.info)