Bugs filed versus the phase of the moon.
One of the more classic jokes about not yet understood bugs is that the phase of the moon is somehow involved in causing it.
Being bored, I decided to spend some time to see whether "date of bug filed" could somehow be correlated with "phase of the moon" for a given source package.
Fast forward an hour of perl experimenting, and here we are:
#!/usr/bin/perl -w use strict; use warnings; use constant PI => 3.1415926535; use feature "say"; use SOAP::Lite; use Astro::Coord::ECI::Moon; my $soap = SOAP::Lite->uri('Debbugs/SOAP')->proxy('http://bugs.debian.org/cgi-bin/soap.cgi'); if (!defined($ARGV[0])) { die "E: must have a source package!\n"; } my @bugs = $soap->get_bugs(src=>$ARGV[0])->result(); my $bugsdata = $soap->get_status(@bugs)->result(); my $moon = Astro::Coord::ECI::Moon->new(); my %count = ( 'new' => 0, 'first' => 0, 'full' => 0, 'last' => 0); foreach my $bug (keys %$bugsdata) { my $time = $$bugsdata{$bug}->{date}; my $phase = $moon->phase($time); if ($phase <= 45 * PI / 180 || $phase > 315 * PI / 180) { $count{'new'} = $count{'new'} + 1; } elsif ($phase <= 135 * PI / 180 && $phase > 45 * PI / 180) { $count{'first'} = $count{'first'} + 1; } elsif ($phase <= 225 * PI / 180 && $phase > 135 * PI / 180) { $count{'full'} = $count{'full'} + 1; } elsif ($phase <= 315 * PI / 180 && $phase > 225 * PI / 180) { $count{'last'} = $count{'last'} + 1; } } say "Number of bug submissions during new moon : " . $count{new}; say "Number of bug submissions during first quarter: " . $count{first}; say "Number of bug submissions during full moon : " . $count{full}; say "Number of bug submissions during last quarter : " . $count{last};
This uses the (not packaged in Debian) Astro::Coord::ECI::Moon perl module.
Use like so:
wouter@carillon:~code/perl$ ./debbugsmoon nbd Number of bug submissions during new moon : 2 Number of bug submissions during first quarter: 10 Number of bug submissions during full moon : 1 Number of bug submissions during last quarter : 3
Apparently there's a reasonable correlation between 'the moon is in the first quarter' and 'people file bugs on nbd'.
Note: no, the above is probably not very scientific. That's not the point.
This line doesn't seem correct:
shouldn't it be $count{'full'} + 1 ?
Assuming your count is correct, it's even highly statistically significant (p<0.001). GNU R says:
data: c(2, 10, 1, 1) X-squared = 16.2857, df = 3, p-value = 0.0009908
Warning message: In chisq.test(c(2, 10, 1, 1)) : Chi-squared approximation may be incorrect
It's not an accurate count, actually. There was a small bug in my original implementation; I fixed the implementation, but not the count in this blog entry. Fixed that now.
With those new numbers, it's still pretty statistically significant, I'd say:
X-squared = 12.5, df = 3, p-value = 0.005853
Yeah. Unfortunately what probably kills it is the assumption of independence. If any of these bugs are related (e.g. someone tried the program for the first time, found three bugs, and filed all of them at once) the model is invalid
/* Steinar */