Sunday, June 15, 2008

Shell Scripting, Factorial Primes And Huge Number Computations On Linux And Unix

Hey There,

I know it's the weekend when I don't get any hate mail regarding posts like yesterday's on producing prime numbers with the bash shell ;)

One truth about prime numbers is that, except for 5, no number ending in 5 can be a prime. Logic dictates that it will, of course, be divisible by itself and by the number 5, thereby removing it from the prime pool. Another thing you may have noticed is the output from the script itself which showed that it found 35 to be a "prime number." It also missed quite a few after 19 (23, 29 and 31, to be specific).

There's a reason I let it go; and not just so I'd have something easy to write about on a "Lazy Sunday" ;) I actually tested this script both on an OpenBSD box and Cygwin for Windows (Unfortunately, I have no Unix computers at my disposal on the weekends, but Linux is just as capable an OS). Neither one could get past 19, when listing out primes; both related to the same two basic problems. The two situations that resulted, when trying to determine higher primes, both had to do with my system limitations, so you might have been able to get higher numbers than I did before the script went South on you :)

The first issue is the theorem we used to determine the primes. Since Wilson's Theorem depends on factorial computation in order to determine a number's prime status, each succeeding number requires an additional operation, which make the numbers required to test divisibility grow at an alarming rate.

For instance, the factorials of 2, 3 or 4 are trivial:

2 = 2 * 1 --> Factorial equals 2
3 = 3 * 2 * 1 --> Factorial equals 6
4 = 4 * 3 * 2 * 1 --> Factorial equals 24


and you can see how the numbers can begin to get large very quickly.

The second issue is the capacity of the Linux or Unix operating system to handle extremely large numbers. Although both of my systems are running 32 bit OS's, the 64 bit OS will have to eventually fail as well. On a 32 bit system, the magic number 20 is where the wrap-around occurs and the numbers become so large that the system can misinterpret them as negative something-or-other. Here are my two example outputs (truncated):

For Cygwin on 32 bit XP, you can see this in the script when you invoke bash with the -x flag, as it hits the breaking point at the number 22 (soooo close to 23 ;)

...
+ count=22
+ '[' 22 -le 22 ']'
++ prime_number 22
++ local prime=22
++ p_minus_1=21
+++ factorial 21
+++ local factorial_count=21
+++ '[' 21 -eq 0 ']'
+++ (( factor=20 ))
+++ (( 20 >= 1 ))
+++ factorial_count=420
+++ (( --factor ))
+++ (( 19 >= 1 ))
+++ factorial_count=7980
+++ (( --factor ))
...
+++ (( --factor ))
+++ (( 2 >= 1 ))
+++ factorial_count=-4249290049419214848
...


at this point, the test for divisibility becomes unreliable, at best.

On OpenBSD, 32 bit, the results were exactly the same:

...
+++ (( --factor ))
+++ (( 2 >= 1 ))
+++ factorial_count=-4249290049419214848
...


Naturally, then, every number tested for "prime status" by our prime number shell script, beginning with 22 couldn't be tested correctly, resulting in many more surprising results like 35 ;)

In any event, hopefully, the script helped make at least one method to determine primes more accessible or understandable. As we noted, also, in yesterday's post, no one has been able to define the largest prime number yet. Yesterday's script gives you a pretty clear idea of the reason. When faced with an infinite number of integers, the requirements for calculation, alone, make this an impossible task. But, basically, the reasons all boil down to the same thing (although, if you look around the web, there are a few unprovable - in the absolute sense - theories that claim the number can be known):

The nature of integers is infinite. If you think of the highest number you can and add 1, you're one step closer to being frustrated for the rest of your life ;)

Also, for clarity's sake, since the mathematics in my head don't always match what I type, I thought I'd save you some paraphrasing and just dump two definitions here on the page; for composite numbers and relative primes (with my side-notes and apologies):

Relative Primes: The integers a and b are relatively prime if they have no common factor other than 1 or, equivalently, if their greatest common divisor is 1 (Note that this was correctly defined yesterday, but not thoroughly enough, since I didn't mention the fact that, although the numbers can't be primes and must have the greatest common factor of 1, I neglected to mention that neither pair of numbers could have any other common factors. A fairly large distinction, but lost in the mix when I was typing. For instance, 6 and 35 are relative primes, while 6 and 36 aren't, since they both share a common factor of being divisible by 3. ...My mind can be a terrible thing ;)

Composite Numbers: These are any positive integers which have a positive divisor other than one or themselves (While these can be derived by multiplying any two prime numbers, which I pointed out, the less complicated definition is that composite numbers are any number that isn't a prime. The multiplication of any 2 primes to create a composite is more of a cool little frill than a necessity. It's a lot easier to wrap your head around when you look at it that way ;)

But, as promised, with regards to composites , we won't put a script out, so much as a description of a method, since the same mathematical boundaries would be hit performing this operation:

1. Run yesterday's script, or pick up a prime number table that will start with 2; the first real prime (check out yesterday's post on producing prime numbers for the reasons 1 and 0 aren't considered primes).

2. Remove every number in the prime table from the larger set of all numbers.

Voila: You have all of the composite numbers, although you will most probably never ever find them all ;)

Here's to another month or so of non-math-related posts ;)

Cheers,

, Mike