Having trouble filling a vector in a for-loop - r

I am having trouble filling a specified vector with search-values from the spotifyr package and I can't really understand where it is going wrong.
top2017id <- numeric(200)
for(i in top2017vec){
search <- search_spotify(i, type = "track", limit = 1)
top2017id[i] <- search$id
}
Error in top2017id[i] <- search$id : replacement has length zero
In addition: Warning message:
Unknown or uninitialised column: `id`.
top2017vec is a vector containing 200 track names, for example: "Mi Gente", and what I want the for-loop to do, is search for the first track name in the vector using the search_spotify function, save it to the un-defined "search" and then save search$id to the first place in the already defined vector top2017id, and then repeat the process but with the second track name instead.
The function that I use inside the for-loop, "search_spotify" is a function from the spotifyr package, that returns a list with 27 variables. I have tested outside the for-loop, and indexing with search$id works perfectly fine in returning just a string with the tracks id.
Other than the error I recive, it do add some values to the top2017id vector. The first 200 values are 0, but after it adds 27 values which alternates between a track-name from the top2017vec, and the specific tracks id. Like this:
> top2017id
"0"
"0"
"0"
"0"
...
Believer
"0pqnGHJpmpxLKifKRmU6WP"
Felices los 4
"1RouRzlg8OKFeqc6LvdxmB"
What is it that I have managed to screw up?
Edit:
I kept on trying after the answer from #Dylan_Gomes and I made some progress, however I am stuck with another simillar error now.
for(i in 1:length(top2017vec)){
search <- search_spotify(top2017vec[i], type = "track", limit = 1)
top2017id[i] <- search$id
}
It now works for the first 26 id's, but after the first 26 ids it gives me 0's for the rest of the vector, and then ends. The error message I receive is:
Error in top2017id[i] <- search$id : replacement has length zero
In addition: Warning message:
Unknown or uninitialised column: `id`.

The way you have your for loop might be the problem. For example:
vect<-numeric(200)
for(i in vect){
search<-rnorm(1,0,1)
vect[i]<-search
}
vect
Doesn't work, it returns a vector of 200 zeros still. Yet, if we change the for loop structure to:
for(i in 1:length(vect)){
search<-rnorm(1,0,1)
vect[i]<-search
}
vect
[1] 0.87096868 0.78146593 0.72339698 0.45954073 1.29507907 0.28822357 -0.97277289 -0.22033080
[9] -0.41323427 -1.79971088 -0.20233652 -1.30564552 0.46676890 -0.64209630 0.95616195 0.67121680
[17] -0.18220987 -0.45524523 -0.91059605 -1.65350181 -0.33524219 2.60902403 0.58630848 -1.22887993
It then works as expected. There might be a different problem going on with spotifyr but I can't check it because it doesn't work with the current version of R.

Related

Making list of factors in a function but return warning error

Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]

R: errors in cor() and corrplot()

Another stumbling block. I have a large set of data (called "brightly") with about ~180k rows and 165 columns. I am trying to create a correlation matrix of these columns in R.
Several problems have arisen, none of which I can resolve with the suggestions proposed on this site and others.
First, how I created the data set: I saved it as a CSV file from Excel. My understanding is that CSV should remove any formatting, such that anything that is a number should be read as a number by R. I loaded it with
brightly = read.csv("brightly.csv", header=TRUE)
But I kept getting "'x' must be numeric" error messages every time I ran cor(brightly), so I replaced all the NAs with 0s. (This may be altering my data, but I think it will be all right--anything that's "NA" is effectively 0, either for the continuous or dummy variables.)
Now I am no longer getting the error message about text. But any time I run cor()--either on all of the variables simultaneously or combinations of the variables--I get "Warning message:
In cor(brightly$PPV, brightly, use = "complete") :
the standard deviation is zero"
I am also having some of the correlations of that one variable with others show up as "NA." I have ensured that no cell in the data is "NA," so I do not know why I am getting "NA" values for the correlations.
I also tried both of the following to make REALLY sure I wasn't including any NA values:
cor(brightly$PPV, brightly, use = "pairwise.complete.obs")
and
cor(brightly$PPV,brightly,use="complete")
But I still get warnings about the SD being zero, and I still get the NAs.
Any insights as to why this might be happening?
Finally, when I try to do corrplot to show the results of the correlations, I do the following:
brightly2 <- cor(brightly)
Warning message:
In cor(brightly) : the standard deviation is zero
corrplot(brightly2, method = "number")
Error in if (min(corr) < -1 - .Machine$double.eps || max(corr) > 1 + .Machine$double.eps) { :
missing value where TRUE/FALSE needed
And instead of making my nice color-coded correlation matrix, I get this. I have yet to find an explanation of what that means.
Any help would be HUGELY appreciated! Thanks very much!!
Please check if you replaced your NAs with 0 or '0' as one is character and other is int. Or you can even try using as.numeric(column_name) function to convert your char 0s with int 0. Also this error occurs if your dataset has factors, because those are not int values corrplot throws this error.
It would be helpful of you put sample of your data in the question using
str(head(your_dataset))
That would be helpful for you to check the datatypes of columns.
Let me know if I am wrong.
Cheerio.

Creating data subset with a vector - why does data have to be sorted?

I am hoping someone can help with the following problem i am having while creating subsets of my data:
I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out the subset of data that start with an 'S'. I find it works fine if my data are alphabetically sorted so that all the sites starting with 'S' are first and those starting with Y are last. If the lakes are mixed up it does not work. I could sort my data first, but if possible i would like to solve the problem directly and keep the steps simple.
Here is my code:
seki_vector = LakeK_all[grep("^[S].*", LakeK_all$Lake, value=TRUE)]
seki_vector
LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))
LakeK
Here is the output i am getting:
> seki_vector = LakeK_all[grep("^S", LakeK_all$Lake, value=TRUE)]
Error in `[.data.frame`(LakeK_all, grep("^S", LakeK_all$Lake, value = TRUE)) :
undefined columns selected
> seki_vector
[1] "S005-Out" "S003-Out" "S004-Out" "S001-Out" "S040-Out" "S043-Out" "S044-Out" "S048-Out" "S049-Out" "S041-Out" "S047-Out" "S042-Out" "S046-Out" "S039-Out"
LakeK = subset(LakeK_all, subset=(LakeK_all$Lake==seki_vector))
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(LakeK_all$Lake, seki_vector) :
longer object length is not a multiple of shorter object length
> LakeK
[1] Y Year WYear Lake Panel Lat Long Cen LowerDL UpperDL InclProb PanelProb AdjInclProb
<0 rows> (or 0-length row.names)
It seems the vector is working, but not the subset step. Again, if i sort the data then it works just fine.
Reading through previous questions it sounds like it is better to use [] instead of 'subset'. I tried this and it did not fix the issue.
I think I spot a couple problems. In grep you don't want to set value to be TRUE. Setting value to be true returns the matched word instead of the index of the row. Also you are missing a comma (hence the undefinied columns error).
Try This:
LakeK_all[grep("^S", LakeK_all$Lake), ]

How to subset a list based on the length of its elements in R

In R I have a function (coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply.
I have a list of IP's called ip.addresses:
> head(ip.addresses)
[1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12"
Note: Those or any other IP's can be used to reproduce this problem.
So I apply the function to that object with sapply:
ips.info <- sapply(ip.addresses, ip2coordinates)
and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a dataframe. The problem is that not all IP addresses are in the databases thus some list elements only have 1 field and I get this error:
> ips.df <- as.data.frame(ips.info)
Error in data.frame(`128.177.90.10` = list(ip.address = "128.177.90.10", :
arguments imply differing number of rows: 1, 0
My question is -- "How do I remove the elements with missing/incomplete data or otherwise convert this list into a data frame with 11 columns and 1 row per IP address?"
I have tried several things.
First, I tried to write a loop that removes elements with less than a length of 11
for (i in 1:length(ips.info)){
if (length(ips.info[i]) < 11){
ips.info[i] <- NULL}}
This leaves some records with no data and makes others say "NULL", but even those with "NULL" are not detected by is.null
Next, I tried the same thing with double square brackets and get
Error in ips.info[[i]] : subscript out of bounds
I also tried complete.cases() to see if it could potentially be useful
Error in complete.cases(ips.info) : not all arguments have the same length
Finally, I tried a variation of my for loop which was conditioned on length(ips.info[[i]] == 11 and wrote complete records to another object, but somehow it results in an exact copy of ips.info
Here's one way you can accomplish this using the built-in Filter function
#input data
library(RDSTK)
ip.addresses<-c("128.177.90.10","71.179.13.143","66.31.55.111","98.204.243.188",
"67.231.207.8","67.61.248.15")
ips.info <- sapply(ip.addresses, ip2coordinates)
#data.frame creation
lengthIs <- function(n) function(x) length(x)==n
do.call(rbind, Filter(lengthIs(11), ips.info))
or if you prefer not to use a helper function
do.call(rbind, Filter(function(x) length(x)==11, ips.info))
Alternative solution based on base package.
# find non-complete elements
ids.to.remove <- sapply(ips.info, function(i) length(i) < 11)
# remove found elements
ips.info <- ips.info[!ids.to.remove]
# create data.frame
df <- do.call(rbind, ips.info)

R returns list instead of filling in dataframe column

I am trying to use apply() to fill in an additional column in a dataframe and by calling a function I created with each row of the data frame.
The dataframe is called Hit.Data has 2 columns Zip.Code and Hits. Here are a few rows
Zip.Code , Hits
97222 , 20
10100 , 35
87700 , 23
The apply code is the following:
Hit.Data$Zone = apply(Hit.Data, 1, function(x) lookupZone("89000", x["Zip.Code"]))
The lookupZone() function is the following:
lookupZone <- function(sourceZip, destZip){
sourceKey = substr(sourceZip, 1, 3)
destKey = substr(destZips, 1, 3)
return(zipToZipZoneMap[[sourceKey]][[destKey]])
}
All the lookupZone() function does is take the 2 strings, truncates to the required characters and looks up the values. What happens when I run this code though is that R assigns a list to Hit.Data$Zone instead of filling in data row by row.
> typeof(Hit.Data$Zone)
[1] "list
What baffles me is that when I use apply and just tell it to put a number in it works correctly:
> Hit.Data$Zone = apply(Hit.Data, 1, function(x) 2)
> typeof(Hit.Data$Zone)
[1] "double"
I know R has a lot of strange behavior around dropping dimensions of matrices and doing odd things with lists but this looks like it should be pretty straightforward. What am I missing? I feel like there is something fundamental about R I am fighting, and so far it is winning.
Your problem is that you are occasionally looking up non-existing entries in your hashmap, which causes hash to silently return NULL. Consider:
> hash("890", hash("972"=3, "101"=3, "877"=3))[["890"]][["101"]]
[1] 3
> hash("890", hash("972"=3, "101"=3, "877"=3))[["890"]][["100"]]
NULL
If apply encounters any NULL values, then it can't coerce the result to a vector, so it will return a list. Same will happen with sapply.
You have to ensure that all possible combinations of the first three zip code digits in your data are present in your hash, or you need logic in your code to return NA instead of NULL for missing entries.
As others have said, it's hard to diagnose without knowing what ZiptoZipZoneMap(...) is doing, but you could try this:
Hit.Data$Zone <- sapply(Hit.Data$Zip.Code, function(x) lookupZone("89000", x))

Resources