I am trying to recode the following variable:
str(dades$Edat)
num [1:30000] 24 26 34 37 57 37 29 23 28 35 ...
Into this:
agrupar.edat<-function(x){
for (i in 1:length(x)){
if (x[i]>=21 & x[i]<30) {x[i]<-'1'} else
if (x[i]>=30 & x[i]<40) {x[i]<-'2'} else
if (x[i]>=40 & x[i]<50) {x[i]<-'3'} else
if (x[i]>=50 & x[i]<60) {x[i]<-'4'} else
if (x[i]>=60 & x[i]<70) {x[i]<-'5'} else
if (x[i]>=70 & x[i]<80) {x[i]<-'6'}
}
So I can put the results here:
edx<-agrupar.edat(dades$Edat)
But something is not working and edx keeps returning me "NULL"
Problem 1.
Your function has no return argument.
As a result, it reads that way:
agrupar.edat<-function(x){
# do stuff
# good bye
}
… so logically enough, nothing (NULL) comes out of it.
Try simply adding return(1) at the end, just before the closing bracket, and magic will happen.
Note, however, that your problem does not require a function. It requires…
Problem 2.
… using cut, as #akrun's comment instructs you to do.
Related
while working on the exercise 2.2 of "programming in Lua 4" I do have to create a function to built all permutations of the numbers 1-8. I decided to use Heaps algorithm und made the following script. I´m testing with numbers 1-3.
In the function I store the permutations as tables {1,2,3} {2,1,3} and so on into local "a" and add them to global "perm". But something runs wrong and at the end of the recursions I get the same permutation on all slots. I can´t figure it out. Please help.
function generateperm (k,a)
if k == 1 then
perm[#perm + 1] = a -- adds recent permutation to table
io.write(table.unpack(a)) -- debug print. it shows last added one
io.write("\n") -- so I can see the algorithm works fine
else
for i=1,k do
generateperm(k-1,a)
if k % 2 == 0 then -- builts a permutation
a[i],a[k] = a[k],a[i]
else
a[1],a[k] = a[k],a[1]
end
end
end
end
--
perm = {}
generateperm(3,{1,2,3}) -- start
--
for k,v in ipairs (perm) do -- prints all stored permutations
for k,v in ipairs(perm[k]) do -- but it´s 6 times {1,2,3}
io.write(v)
end
io.write("\n")
end
debug print:
123
213
312
132
231
321
123
123
123
123
123
123
I am running below code, its working but not showing me output
for (name in tita$name){
if (tita$sex == 'female' && tita$embarked == 'S' && tita$age > 33.00)
{
print (name)
}
}
It's just showing me ****** in R studio, though when I check dataset, it has data which have female having age greater than 33 and embarked from S, but this statement is not showing me result. But when I change the value from 33 to 28 the same code shows me the result. Why is that.
I am using the following dataset:
https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
I think you're mixing loops and vectorization where you shouldn't. As I mentioned in the comments your conditions are vectorized, but it looks like you're trying to evaluate each element in a loop.
You should do either:
# loop through elements
for (i in seq_along(tita$name)){
if (tita$sex[i] == 'female' & tita$embarked[i] == 'S' & tita$age[i] > 33.00){
print(tita$name[i])
}
}
OR use vectorization (this will be faster and is recommended):
conditions <- tita$sex == 'female' & tita$embarked == 'S' & tita$age > 33.00
names <- tita$name[conditions]
Here conditions is a TRUE and FALSE logical vector -- TRUE where all the conditions are met. We can use the to subset in R. For more information on what I mean by vectorization please see this link.
Below is my code:
my.dataset1<- data.frame(site=c(11,12,13,14),
season=c(21,22,23,24),
PH=c(1,2,3,4))
for i in names(my.dataset1){
for (j in nrow(my.dataset1)) {
print(my.dataset1$i[j])
}
}
What i want is that it can print the results:
11
12
13
14
21
22
23
24
1
2
3
4
what I actually get is
null
It does not work. I want to get the results just by for loop!
Loop syntax must be modified. names in the first line of loop should be also replaced with ncol(). This will work for you.
my.dataset1<- data.frame(site=c(11,12,13,14),
season=c(21,22,23,24),
PH=c(1,2,3,4))
for (i in 1:ncol(my.dataset1)){
for (j in 1:nrow(my.dataset1)) {
print(my.dataset1[j,i])
}
}
I have several time-based datasets which are of very different scale, e. g.
[set 1]
2010-01-01 10
2010-02-01 12
2010-03-01 13
2010-04-01 19
…
[set 2]
2010-01-01 920
2010-02-01 997
2010-03-01 1010
2010-04-01 1043
…
I'd like to plot the relative growth of both since 2010-01-01. To put both curves on the same graph I have to normalize them. So I basically need to pick the first Y value and use it as a weight:
plot "./set1" using 1:($2/10), "./set2" using 1:($2/920)
But I want to do it automatically instead of hard-coding 10 and 920 as dividers. I don't even need the max value of the second column, I just want to pick the first value or, better, a value for a given date.
So my question: is there a way to parametrize the value of a given column which corresponds a given value of the given X column (X is a time axis)? Something like
plot "./set1" using 1:($2/$2($1="2010-01-01")), "./set2" using 1:($2/$2($1="2010-01-01"))
where $2($1="2010-01-01") is the feature I'm looking for.
Picking the first value is quite easy. Simply remember its value and divide all data values by it:
ref = 0
plot "./set1" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref),\
"./set2" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref)
Using the value at a given date is more involved:
Using an external tool (awk)
ref1 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
ref2 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
plot "./set1" using 1:($2/ref1), "./set1" using 1:($2/ref2)
Using gnuplot
You can use gnuplot's stats command to pick the desired value, but you must pay attention to do all time settings only after that:
a) String comparison
stats "./set1" using (strcol(1) eq "2010-01-01" ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...
b) Compare the actual time value (works like this only since version 5.0):
reftime = strptime("%Y-%m-%d", "2010-01-01")
stats "./set1" using (timecolumn(1, "%Y-%m-%d") == reftime ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...
I have the following array:
Year Month Day Hour
1 1 1 1 0
2 1 1 1 3
...
etc
I wrote a function which I then tried to vectorize by using apply in order to run calculations row-by-row basis, but it doesn't work due to the booleans:
day_in_season<-function(tarr){
#first month in season
if((tarr$month==12) || (tarr$month==3) ||(tarr$month==6) || (tarr$month==9)){
d=tarr$day
#second month in season
}else if ((tarr$month==1) || (tarr$month==4)){
d=31+tarr$day
}else if((tarr$month==7) || (tarr$month==10)){
d=30+tarr$day
#third month in season
}else if((tarr$month==2)){
d=62+tarr$day
}else{
d=61+tarr$day
}
h=tarr$hour/24
d=d+h
return(d)
}
I tried
apply(tdjf,1,day_in_season)
but it raised this exception:
Error in tarr$month : $ operator is invalid for atomic vectors
(I already knew about this potential pitfall, but that's why I wanted to use apply in the first place!)
The only way I can currently get it to work is if I do this:
days<-c()
for (x in 1:nrow(tdjf)){
d<-day_in_season(tdjf[x,])
days=append(days,d)
}
If there were only a few values, I'd throw up my hands and just use the for loop, efficiency be damned, but I have over 15,000 rows and that's just one dataset. I know that there has to be a way to make it work.
To vectorize your code, use ifelse() and| instead of ||:
ifelse(
(tarr$month==12) | (tarr$month==3) |(tarr$month==6) | (tarr$month==9),
tarr$day,
ifelse((tarr$month==1) | (tarr$month==4),
31+tarr$day,
ifelse((tarr$month==7) | (tarr$month==10),
30+tarr$day,
ifelse(tarr$month==2,
62+tarr$day,
61+tarr$day)
)
)
)+tarr$hour/24
You might be surprised at how quickly a well constructed for loop can run. If designed well, it has about the same efficiency of an apply statement.
The properfor loop in your case is
tdjf$days <- vector ("numeric", nrow (tdjf))
for (x in seq_along (tdjf$days)){
tdjf$days [x] <- day_in_season(tdjf[x,])
}
If you really want to go the apply route, I would recommend rewriting your function to take three arguments -- month, day, and hour -- and pass those three columns into mapply