DTACK GROUNDED, The Journal of Simple 68000 Systems
Issue # 2 August 1981 Copyright Digital Acoustics, Inc.
There seems to be a considerable amount of misunderstanding about the precision of various floating point software and hardware packages. In particular, a significant number of Apple II users who read our first newsletter have mentioned that no faster floating point computations are needed, since there are two manufacturers who already offer boards with the AMD 9511A math processor. Now, the 9511A is a very fine device if you can get by with 6 decimal digit precision. When performing graphic calculations, that is more than sufficient precision. For business uses, which generally require 12 to 16 decimal digit precision (depending on the size of the business and complexity of the calculations) 6 decimal digits is completely inadequate and the 9511A is unusable. On the other hand, the precision of the floating point as calculated by the 68000 is determined by the software. That is why we will make available 61 bit (13 decimal digit precision) and 80 bit (18 digits) floating point packages along with our Microsoft compatible 40 bit, 9 decimal digit precision package which is already completed.
Another matter which needs to be BRIEFLY discussed is the question: Which is the best microprocessor? We interpret "best" as meaning the greatest computational throughput. Using this criterion the 68000 wins hands down. To mention only two unique advantages of the 68000:
However, we feel that further discussion of this matter should be left to other publications. This is, after all the Journal of Simple 68000 Systems. People might suspect we were prejudiced, or something!
HOW TO SPEED UP INTERPRETIVE BASIC: By using one of our boards, of course! (If you are easily bored by excessive detail, skip past the asterisks on the next page). Seriously, the first thing to do is to replace the vectors to the four basic floating point routines (add, subtract, multiply, divide) with vectors to routines which send the appropriate command to the 68000 followed by the contents of the two F.P. accumulators. The 68000 then performs the operation and returns an error status byte, then the result, provided the error status byte is zero. Exclusive of time for the data transfer, the mathematical computation is about 13 times faster than in the 6502.
When the transcendental computations have all been converted to 68000 code (as the logarithm routine has already) the improvement in speed actually approaches 13X since the proportion of time exchanging data is much less. The constants used in the transcendental computations will then be maintained in 68000 memory space so that no data transfer between processors is involved.
Next, we identify the "find variable" routine which determines the location of a numeric variable in memory. Then we modify this routine so that the numeric variable is maintained in the 68000 memory space rather than in the 6502. Also, the numeric variable table will be maintained in the 68000 memory space.
Now we locate the "evaluate expression" routine. This is a recursive routine which determines mathematical precedence and converts ASCII constants to F.P. variables when necessary. It also calls the "find variable" routine when it encounters a numeric variable name. This routine is supposed to return the address of the variable to a utility pointer in zero page. It does, but since we have modified the "find variable" routine, the utility pointer in question is in the 68000.
A routine is generally called at this point to move the variable onto the stack. Since the 6502 stack isn't very big, it also checks for stack overflow and sends an "expression too complex" error message if necessary. We modify this routine to send a command to the 68000 to use the utility pointer in the zero page OF THE 68000 to place the numeric variable on a stack in the 68000 memory space. It is unlikely that we will overflow the stack area! Still, a zero error status byte should be returned to the 6502 to let it know the operation was a success.
Now, when the 6502 decides to move the number on the stack to the second F.P. accumulator and multiply F.P. accumulator #1, it instead simply sends a command to the 68000, which leaves the result in FPACC#1, which of course is in the 68000. After an expression is evaluated, the result is generally stored in a variable or printed. In either case, the 68000 can do this better than the 6502.
Above is a rough sketch of what needs to be done to speed up interpretive BASIC. The process we have outlined moves the floating point math to the 68000 first, followed by the transcendentals. The variables themselves are moved into the 68000 memory space. Then the interpretive process itself is transferred to the 68000 a little bit at a time. The eventual goal is to have a 68000 BASIC interpreter with the host processor performing I/O operations exclusively. By I/O we mean keyboard, CRT, printer, disk, tape (ugh!) etc.
This will take some time, but we will eventually wind up with a machine that runs interpretive BASIC programs about twelve times as fast as the host system does now (assuming the program is not I/O limited). Although the full conversion will take at least a year, relocating the floating point computations has already been accomplished and results in a two times speedup for average programs, more for scientific number crunching (and less for text editing).
We think we have figured out a way to run about 10 (ten) times faster very soon, like by October of this year. This is what we will discuss next.
LET'S DISCUSS COMPILED BASIC: We will limit our discussion to compiled basic which support floating point operations (the only generally useful kind). DTL in England makes such a compiler for the Pet/CBM series, and Hayden in this country publishes one for the Apple II. The Hayden ad on page 315 of the Aug. 81 BYTE magazine states that "the resulting binary program runs 3 to 10 times faster than normally interpreted code". We asked a friend who has an Apple and this compiler to benchmark the speed improvement of this program:
10 FOR I=100TO100000STEP100:A=LOG(I):NEXTI 20 PRINT"DONE":END
and guess what? The speed improvement was 8%. Instead of a "3 to 10" times speed improvement, the program did not even run one tenth faster! Wha hoppen??
What happened is that the compilation process greatly speeds up all of the non-F.P. operations, but can't do a thing about the F.P. execution time. Since the program listed above spends most of the time performing the LOG calculation in the interpretive mode, the compilation didn't help much. Which means the Hayden advertisement is, well, a little bit optimistic about the low end performance improvement (this shouldn't surprise anyone in our society who is over 8 years old).
THE NUMBER CRUNCHING ABILITY OF THE 68000 IS A PERFECT MATCH FOR A 6502 COMPILED BASIC! The compiled BASIC runs about 10 times faster than interpretive BASIC exclusive of floating point. Our 68000 floating point package is about 13 times faster than the 6502. How do we tie them together?
The basic compiler has to have a table with pointers to all the functions including the F.P. math. The difference is that it compiles machine code with some form of direct call, probably a subroutine call ($20XXXX) or jump ($4CXXXX) where XXXX is the address of the function. If we replace the four pointers to the math in the jump table of the compiler with addresses of short routines which shove the numbers up to the 68000, we have a very fast compiled basic.
We plan to find those pointers, locate the jump table in the compiler, and change the pointers so the math can be done by our board. Once this has been accomplished, we will publish the necessary instructions to accomplish this in the following issue of DTACK GROUNDED. We plan to begin by compiling the following program:
We will then save the compiled program onto disk and examine the disk image. This should give us a very good idea about how the add routines in Applesoft ROM is accessed by the compiler. Then we will identify the hooks to the other three math functions, then go look for the jump table. Simple, no?
Maybe not so simple, if the software publisher has write protected his compiler. It's not that we can't break the write protection (everybody can do that these days), it is just that DTACK GROUNDED is a business. As a business, we don't want to get into a legal hassle with any software house. In fact, we want to be very friendly with software houses because without software we won't sell many boards.
HOW ABOUT A 68000 COMPILED BASIC? It should be obvious that a compiled BASIC that runs in the 68000 would be much better than one which runs in the 6502. Regrettably, that which is obvious ain't necessarily so. First, there already exist 6502 BASIC compilers, but there are no 68000 BASIC compilers, as far as we know. A program which exists is greatly to be preferred over one which does not exist.
However, there is a more subtle reason why a 68000 compiler would not be, by itself, such a hot idea. Although that portion of the compiled code which replaces the 6502 interpreter would run about 100 to 150 times faster, there would be no increase in the speed of the floating point math as compared to the 6502-68000 combination. A speed mismatch would exist between the compiler and the F.P. operations, just as it does now in the Hayden Apple II 6502 compiler.
We would go so far as to say it makes no sense (for a 6502 host with a DTACK GROUNDED board) to write a compiled 68000 BASIC unless the F.P. bottleneck can be licked. If we want a good match, we need to perform a floating point multiply about 100 times faster than in the 6502. While we are daydreaming, it would be nice to have greater precision than the nine decimal digits of the standard 6502 Microsoft BASIC.
TADAA!! INTRODUCING THE INTEL 8087!! By a remarkable coincidence, Intel (for you 6502 fans, Intel invented the 8080. Before that, they invented the microprocessor) has introduced the 8087 Floating Point Math Processor. This chip leaps tall buildings with a single bound and is more powerful than a steam locomotive. Besides that, it performs a 15 decimal digit precision floating point multiply in 19 microseconds. That is 125 times faster than the 6502 9 digit multiply! THE 8087 IS AN ABSOLUTELY PERFECT MATCH FOR A 68000 COMPILED BASIC! Are we going to manufacture an 8087 board to hook into the DTACK GROUNDED expansion bus? Why would we do a silly thing like that?
Although the 8087 is not yet commercially available (because there are some bugs on the current mask iteration) we have a sample chip set which runs, provided the room is air conditioned and a fan blows air past the 8087. Also, not all of the commands work correctly on the sample chip. But Intel gave us that sample set a full year ago! Since then, there have been at least three more mask iterations, so some improvement has to have been made. The next mask is due the end of this year. In any event, we don't have a 68000 BASIC compiler yet. If we are lucky, both may arrive more or less simultaneously.
Please do not get the idea that DTACK GROUNDED is going to write a compiled BASIC. Our forte is hardware - very fast hardware - and we will depend on the professional software houses to develop high level languages for the 68000.
ABOUT THOSE SLOW $8000 - $30000 68000 SYSTEMS: Yes, $30000 and more (the Ruben 68000 business computer runs $40,000 to $120,000). Yes, SLOW compared to our DTACK GROUNDED boards once the 8087 is available. Like about 8 times slower, if the comparison is made using compiled BASIC. All the reasons we gave on the previous page (the second and third paragraphs from the bottom) as to why a 68000 compiled BASIC was not such a hot idea using the 68000 alone also apply to those $8000 - $30000 68000 systems you have been reading about!
Does all this cost a lot of money? Sure! A fully loaded 92K 68000 board will run about $1300. The 8087 board will be about $1000 initially ($400 to us for the board, $600 to Intel for the chip set. $1600 if you buy the combination from us) and maybe $600 for a compiled 68000 BASIC with hooks to the 8087 board. Say, $3000 all told.
For $3000 plus your existing system you get a computer with 100 times the computational throughput of the standard interpretive BASIC in your Pet/CBM or Apple II. You get 8 times the throughput of those $8000 to $30000 68000 systems. And you get to keep ALL of your existing software because you are not faced with buying a new computer (for $8000 to $30000) in order to take advantage of the 68000.
It is a little difficult to comprehend (for us, anyway) the full meaning of a 100X speed increase. For example, did you know that the Concorde supersonic transport does NOT fly 100 times faster than an athletic 13 year old girl can run? The Concord cruises at 1350 mph. An athletic 13 year old girl can easily run faster than 13.5 mph. 100X is two full orders of magnitude and takes you into a whole new realm of computational performance.
This is the point where any public relations or advertising type would jump up on the table and shout MAINFRAME PERFORMANCE! Well, anyone who knows anything about modern mainframes would know that it ain't so, not even close. But it is ironic that DTACK GROUNDED, which started because we wanted a simple board (and to hell with building PDP11/70 killers like everybody else working with the 68000), is going to wind up making PDP11/70 computational power available on your Pet or Apple! The PDP 11/70 is a very good minicomputer, NOT a mainframe, by the way.
In fact, with the 68000/8087 combination your computer will spend most of its time waiting on your floppy disk. For this reason, we plan to eventually make an interface board for hard disks (we said the 6502 was a good I/O processor, not a perfect one). The interface will couple to the Shugart Associates Standard Interface (SASI), if you must know.
But you will not have to wait for the 128K expansion board because that will be available before October is out.
Incidentally, for those who might think that the 8087 board is an afterthought, Intel GAVE us an 8087 chip set a full year ago because we planned to make a board to mount on the Pet. It was the Intel representative's suggestion that it would be a good idea to make math processor boards for microcomputer systems in general, not just the Pet.
When we began to look at the specifies of coupling the 8087 to a 6502 system, it became obvious that, while it could be done and would run arithmetic very, very fast, the 6502 was going to be a HORRIBLE bottleneck for all the other stuff a high level language had to do. For a good match with the 8087, a much better microprocessor was needed. Well, the very best microprocessor available is the 68000, and even IT isn't fast enough to run efficiently with the 8087 using BASIC unless a compiled BASIC is used. Like we said two pages ago, the 8067 leaps tall buildings with a single bound and is more powerful than a steam locomotive.
While we all wait for the 8087 to become available, the 68000 can be used to do your arithmetic slowly. By slowly, we mean ONLY 13-times faster than your 6502 based system.
Those of you who already know about the 8097 probably know that it will be marketed as a package with its associated 40 pin clock generator chip, called the 8086. We will in fact use the 8087 with this clock generator.
WHAT ABOUT OTHER HOSTS? Why aren't we offering a version of this board for TRS80 and S100 users? We will. It's just that we are most familiar with the 6502, so we chose the Pet/CBM and Apple II as the initial host systems. We are therefore able to play a significant role in helping to mate the host with the 68000 code.
After we have useful programs to demonstrate to others, like a compiled BASIC that really does run 10 times faster, that will be the time to corner some TRS80 sharpie and say: "Look. Here is some really useful 68000 stuff. Here, see it run. Here is a board with an interface to the TRS80. Now YOU find the hooks into your TRS80 system, you clever devil you!" Why should he find the hooks? So that he can sell them, of course.
Which hosts will we make an interface for? Those which: A) have a large enough user base to be attractive B) Have users who are interested and capable of doing the necessary "hosting" and C) where it also makes sense. We have had no less than three serious suggestions that we hook our board up to the Commodore VIC, which has a 22 column display. Frankly, that doesn't make sense to us right now.
Obviously, all of the I/O specific parts of any 68000 software will have to be modified for each program since all of the various candidate hosts have unique I/O configurations. This raises the question, what will be the differences on the 68000 side of the Interface between different host systems?
THERE WILL BE ABSOLUTELY NO DIFFERENCE BETWEEN THE SYSTEMS ON THE 68000 SIDE OF THE INTERFACE!! Aside from the amount of RAM available, all of the 68000 systems made by DTACK GROUNDED will be absolutely identical for the various host systems. Not similar. Not very nearly the same. Not "the same for all practical purposes". Identical.
We are, of course, talking about identical from a software environment standpoint, which is what the professional programmers - the ones we want to be friendly with - are interested in. It may be necessary to build a kidney bean shaped board to most efficiently interface with a particular host, but that is irrelevant as long as the monitor is the same, the handshake is the same, and the available RAM always starts at the same address and is contiguous upward into the address space.
(If you do not write software we suggest you skip past the next line of asterisks.) This means that a software house which writes some high level 68000 code for a DTACK GROUNDED board on one host can transport that code to a different host without changing any portion of the 68000 code. In fact, reassembly will not even be necessary because EVERYTHING related to the 68000 software environment will be identical. Once the 8087 arrives, it will interface to the 68000, not the host, so the 8087 code will also require no modification.
WRITE H.P. 9826 CODE ON A DTACK GROUNDED BOARD! When writing complex 68000 code, we suggest that a software house place what FORTH programmers call "stubs" on all I/O functions. The floating point links should be arranged so the computations can be performed either by the 68000 or a math processor such as the 8087. The resulting code can be "hosted" not only on the various hosts supported by DTACK GROUNDED interfaces, but also on the 68000 systems which have already been announced, such as the $10000 H.P. 9826, the $20000 to $35000 Q1 Corp. system and the $40000 to $120000 Ruben Corp. system. Later the code can be rehosted on the inevitable Apple Corp 68000 computer, the Radio Shack version, etc.
Why use one of our boards? First, to write 68000 software you have to have an operational 68000 processor. If you already own a CBM system or an Apple II system, our board with 92K RAM installed will be about $8700 less than the next cheapest useful alternative. Second, if you get into the habit early on of isolating the host specific portion of the code (usually the I/O) from the rest of your program, it will be easier later to host multiple non-DTACK GROUNDED systems such as the H.P. and the Q1.
Speaking of 68000 systems, it is rumored that one recently announced unit uses BANK SELECT memory (barf!). Now, as we mentioned on page 1 of this newsletter, one of the major advantages of the 68000 is its large linearly addressable memory space. Using bank select memory discards this advantage. Apparently the company had some bank select memory boards left over from a 6502 project.
Now is the time to cover some specifics regarding the DTACK GROUNDED 68000 boards. Since the software is more important than the hardware, we will discuss the interface and handshake first.
HOST TO 68000 INTERFACE DETAILS: When the power switch is turned on, the command of the system generally rests with the user via the keyboard, or, on some systems, commands stored on a disk are automatically executed. In either case, the host obviously has to be in charge. In fact, until some 68000 code is transferred from disk to 68000 RAM, the 68000 is useless.
Therefore, the host is initially in command of the 68000 board. It can reset the 68000 at any time under host software control. It can also send two bits to the 68000 status input port. One of the two bits lets the 68000 distinguish between data bytes and command bytes. The other bit is undefined. Aside from the software reset and these two bits, the data handshake is bilaterally symmetrical. When the sending processor writes a byte into its' data output port, a flip-flop is set high. This bit can be read on D7 of the receiving processor's status port and on D6 of the sending processor's status port.
When the receiving processor detects a high on D7 of its' status port, it can read the data (or command, in the case of the 68000) from its input port. Reading the input port automatically resets the flip-flop low. The sending processor can detect the low bit on D6 of its' own status port, so it knows that the receiving processor is ready for the next byte.
When one of the processors is drastically slower than the other, as is the case with a 1MHz 6502 and an 8MHz 68000, the slower processor need utilize full handshake on only the first byte of a block transfer. Subsequent bytes can be sent (or received) with full confidence that the 68000 has accepted the data (or sent a new byte). Since a 1MHz 6502 can send or receive a byte every 14 microseconds, the block transfer rate is over 70K bytes/second. If you have a typical floppy disk drive, loading a 35K block from disk to the host will require about 12 seconds, and another 1/2 second to send the data to the 68000.
Since the disk drive generally has the fastest transfer rate of any peripheral tied to the host, it should be obvious that the transfer of data back and forth between the processors will not be a bottleneck. Incidentally, for short bursts such as transferring the F.P. accumulators where straight line coding can replace the loops the burst transfer rate is even faster.
HOST TO 68000 INTERFACE SUMMARY: The host can write data, read data, read status and write status (two command bits and the 68000 reset line). The data is in 8 bit bytes. When reading status, the host looks only at D7 (data received?) or D6 (data accepted?).
The 68000 can read data, write data, and read status. The data is in 8 bit bytes. The status includes D7 (data received?), D6 (data accepted), D1 (undefined) and D0 (high= command, low= data). D1 and D0 are the two status bits written by the host into its' status port.
The host processor uses a total of two input ports and two output ports. If the R/W line is decoded, only two bytes of memory space are required (for a 6502, 6800 or 6869 host). The 68000 uses two input ports and one output port.
THERE IS MUCH TO BE SAID FOR SYSTEMS WHICH RUN FAST AND RELIABLY. The interface technique outlined above was chosen as on which could be duplicated on any host, so that our promise to the systems programmer of a universally uniform 68000 environment could be kept. This interface technique is very reliable, since no "logic race" problems are possible. It is also "user friendly" since the average programmer can understand all aspects of the handshake, and will therefore have no problem implementing his own transfers. Future issues of DTACK GROUNDED will include sample source code for data transfers. This includes 6502 source code and 68000 source code, since both processors are involved in any data transfer.
WHERE IS MY DMA AND WHERE ARE MY PRIORITIZED MULTIPLE INTERRUPTS? Sir, you have become lost. This newsletter is for SIMPLE 68000 systems. Not necessarily small. Definitely not slow. But simple.
For every red-hot supergenius programmer who is contemptuous of any computer with less than seventeen levels of prioritized interrupts, there are probably about 20 competent programmers who would just as soon use simple methods of data transfer, particularly if the simple method does not become a system bottleneck. The aforementioned RHSP is probably scornful of these simplistic types, and calls them peasants. We are friendly disposed toward them, and we would prefer to use the term "customer" to describe them.
However, the RHSP is going to have a lot of complex systems to choose from, since Motorola is promoting the 68000 exclusively as the "engine" for incredibly complex computer systems (have we said that before?). Every OTHER 68000 system we have seen seems to be following Motorola's lead. Dtack seems to spend a lot of time high in those systems, which means that the 68000 is slowed down.
EVERYBODY IS OUT OF STEP BUT DTACK GROUNDED! Since we seem to be a minority of one, why do so many people we have talked to seem excited about our product? We concede that there are some people who absolutely DEMAND DMA and an interrupt. but they are in the minority. These people will, of course, buy their 68000 product elsewhere.
Anyone who follows the electronic trade journals has read that the manufacturers of 16 bit microprocessors (Intel, Zilog, Motorola) are puzzled about the seeming lack of market acceptance of their products. One industry spokesman has gone so far as to state "the 16 bit microprocessor market is dead in the water". We believe that the way the 16 bit processors have been promoted (as vehicles for very complex systems) is largely to blame.
Motorola, for example, brought in a group of minicomputer people to market the 68000 rather than let the old 6800 (remember the 6800, the one with only two zeros?) crowd handle it. The old crowd was given the 6809 to market, and guess what? The 6809 is being designed into a whole bunch of systems, despite the fact that it is the SLOWEST microprocessor of its generation (the 8088 is contemporary and is MUCH faster). In the meantime, if you call on Motorola about the 68000, they try to sell you a $28000 minicomputer called the EXORCISOR. If you want to see the rapidly receding back of a Motorola 68000 product specialist, refuse to buy his $28000 minicomputer!
SOME HARDWARE DETAILS: Both the Pet/CBM and the Apple II version have provisions for up to 92K bytes of static RAM on board (but we will sell the board with as little as 4K on board). The RAMs we are using are the Toshiba TMM2016P, which are 2K X8 devices with very low power consumption because unselected devices automatically power down. The speed is 200 nsec which is a good match for an 8MHz 68000. 8MHz is our standard board speed. Since dtack is (very firmly) grounded, the CPU runs at full speed at all times.
A 10MHz 68000 can be used if the crystal is changed and 150 nsec versions of the RAM are used. If Motorola, or some other 68000 supplier, comes out with a 12MHz 68000, our board will run at this speed with the appropriate crystal and 100nsec RAMs. The Toshiba RAM is already available in 200, 150 and 100 nsec versions. The 150 nsec version is now almost double the cost of the 200nsec part, and the 100nsec cost is more than double.
These boards are offered for sale as industrial development tools, in the same sense as the Rockwell AIM-65 and the Motorola 68000 Design Module. We plan to develop shielded enclosures so that the boards can eventually be sold as Class A devices (for home use), but realistically FCC certification is at least six months away.
The Pet/CBM version is a 6.5 by 15 inch PC board which mounts INSIDE the Pet. Since most Pets have a metal case, the radiation problem may be minimized for that instrument. There is also a +5V power supply board which mounts inside the Pet and uses the 16VAC CT secondary on the power transformer. We believe that there is room for a 128K expansion board inside as well (and enough reserve power capacity in the transformer) although this has yet to be confirmed. Anybody who wants to mount 1/2 megabyte of expansion memory will have to be planning to move outside the instrument, which is possible. A separate power supply will then be needed, of course.
The good news about the Pet/CBM version is that 2K additional static RAM, mapped into $8800 - $88F8 of the 6502 address space, is included. The bad news is that it is necessary, on the CBM 8032, to cut one trace and mount a 74LS32 "dead bug" style to decode that RAM and the 68000 interface.
Although it is it principle possible to work with the 68000 purely in software using the "wedge", this would slow down the basic interpreter and offset a portion of the speed improvement which can be obtained by doing the math in the 68000. For this reason, the best way to transfer the math to the 68000 involves modifying one of the BASIC ROMS. The existing code has to be transferred into RAM from one of the 4K byte ROMS, a few bytes modified, and then the code "burned" into a 4K byte 2532 EPROM. The original ROM is then removed and replaced by the 2532. You will have to have access to a 2532 EPROM blaster, of course. This may be simpler if you belong to a club than otherwise.
Although we could do this easily ourselves and provide the EPROM ready to go, it is possible to interpret this as selling the (presumably copyrighted) code in the ROM. Therefore, until and unless we have permission (in writing) to do this you will have to do it yourself. Since you already own a copy of the code in the ROM, we believe this modification of the code falls under the "fair use" provisions of the copyright law.
This also means that you must have a Pet/CBM which uses 24 pin ROMs. So far we have identified the locations to be modified in both the 8032 (Basic 4.0) and in the 4016/4032 (Basic 3.0). If you do not have one of these models, don't order a board yet!
The 6502 utilities needed will be provided in a 2532, assembled to plug into the $A000 socket. Source code of the binary code in the EPROM will be included.
The Apple II version is also a 6.5 by 15 inch PC board which mounts OUTSIDE the Apple, plus a small parallel interface card which plugs into one of the I/O slots inside the Apple. There is a separate +5V power supply which is in a case and has its own power cord. Only that portion of the circuit on the small interface board is powered by the Apple, the remainder is powered by the separate power supply. Enough power is provided for a fully populated 92K 68000 board and one 128K expansion board. A shielded case and interface cable will have to be developed before the device can be certified as a Class A device.
We have not yet looked for the "hooks" into the Apple BASIC. We expect them to be the same as in the Pet/CBM since both machines use 9 digit Microsoft BASIC.
TERMS OF SALE: We want a check in the amount of 10% of the order to secure your order. When the order is complete, it will be shipped UPS COD for the balance of the order, shipping costs and, for California residents, 6% sales tax. We believe this is a fair arrangement since the buyer has to put up very little money in front and our shipping costs are covered by the 10% in case the COD is refused on delivery. We expect in the future to also offer VISA/Mastercharge orders via a special DTACK GROUNDED telephone. Either of these two methods will provide the rapid receivables turnover that we will need if the sales of these boards takes off!
We will NOT accept purchase orders for these boards at this time. Many large businesses are very slow paying their bills to small businesses like us. With the prime rate over 20%, an unpaid bill amounts to a no interest loan. The big Aerospace firms are particularly notorious for taking no interest loans from small businesses. Mr. big company purchasing agent, we don't care WHAT inviolable rules your company has, if you don't send us a 10% down payment, you don't have an order placed with us!
WARRANTY TERMS: We offer a standard limited warranty to U.S. customers. Limited means that we will only fix the board if it breaks. We will not reimburse you for the 2.7 billion dollar skyscraper that fell down because the board got busted, nor will we pay for any emotional strain on that account. Regarding shipping, you pay for shipping a busted board to us in Santa Ana, CA. and we will ship warranty repairs back to you prepaid. Just in case that isn't clear, you pay for shipping the board to us, we pay for shipping the board to you. We will not accept collect shipments.
If you send us a board for repair that doesn't need repairing, we do not feel that we should have to pay for the return shipment, and we won't. On the other hand, we will not bill you for the work we did proving the board was O.K. This seems like a fair exchange.
The duration of the warranty is one year or the first instant you touch the board(s) with a soldering iron, whichever comes first. You would be surprised how many amateur technicians think that a 150 watt American Beauty soldering iron plus acid core solder is essential to any board modification.
The warranty is transferrable. However. we emphasize that what we warrant is what we sell. If a 4K board that we sold comes back as a 92K board for warranty repair, we will return the board collect with a polite note stating that the person or company which sold you the board has assumed the warranty on that board, since it ain't what we sold!
PRICING POLICY: We have decided to adopt the radical for the computer industry policy of assigning one price to a particular model and configuration which is the same for all customers. We will maintain this policy through round one and into the start of round 2. We wish we could maintain that policy forever, but (sigh), life in the computer industry just ain't that simple.
SOFTWARE STATUS REPORT: Unchanged; we have been busy with the hardware!
HARDWARE STATUS REPORT: Production prototypes of the Pet/CBM version, with space for 92K RAM, have now been ordered (today's date is Aug. 26). When this board has been debugged, the artwork will be modified to an Apple II version.
Production Pet/CBM boards should be available before the next newsletter is out. The price includes source code for the necessary 6502 utilities, description of the "hooks" to the floating point routines, source code for the 68000 monitor and floating point routines, and (for now) a BASIC program listing for a "hand assemblers helper" for the 68000 which runs on the Pet/CBM. As this is being written, the program is incomplete and has bugs. But it was successfully used to produce the floating point package. The board? Oh, THAT!
Pet RAM 68K RAM without 68000 with 68000 2K 4K $450 $725 2K 28K $690 $965 2K 60K $1000 $1275 2K 92K $1300 $1575
If you order one of these boards for your Pet/CBM system, be sure to let us know your exact equipment configuration. The $450 price for the 4K version without the 68000 should be pretty stable over the next year. The price WITH the 68000 will drop as the Motorola price drops. The prices with large memory will remain stable until Toshiba can make more RAMs than they can sell (right now they can't do that), then that price will drop as the RAM prices fall. By the way, the Motorola price for 1 ea. 8MHz 68000 is now $144, down from $195 as of the last newsletter.
ACKNOWLEDGEMENTS: Apple and Apple II are trademarks of the Apple Computer Co. Pet and CBM are trademarks of Commodore Business Machines. PDP 11/70 is a trademark of the Digital Equipment Corp. EXORCISOR is a trademark of Motorola. DTACK GROUNDED is a trademark of Digital Acoustics, Inc.
If you would like to subscribe to this newsletter, send $15 for six issues to:
1415 E. McFadden, St. F
SANTA ANA CA 92705
Payment should be made to DTACK GROUNDED. The subscription will start with the first issue unless otherwise specified.
If you received this newsletter as a free sample, or via photocopy machine discount, this is the end. For subscribers, the next four RED pages are a tutorial on division using the 68000.
WELCOME TO REDLANDS!
ERROR CITY! While we were writing our Microsoft compatible floating point package, we ran across a substantial discrepancy (nearly a full least significant mantissa bit) between what the results of our divide routine, which used a combination of the 16 bit hardware divide and the 16 bit hardware multiply, and the Microsoft divide routine, which uses a conventional "shift and maybe subtract" routine (we aren't making fun of Microsoft; that's the only way the 6502 can do the job).
Because our divide routine represented our first try at using the hardware divide to perform extended division, we weren't sure which routine was the more accurate. Since, as mentioned in the white pages of newsletter #1, we intended to write an 80 bit floating point package anyway, which would involve a 64 bit mantissa and therefore a 64 bit into 64 bit divide routine, we decided to write such a routine using the conventional "shift and maybe subtract" routine, 6502 style.
6502 TO 68000 TRANSLATION: We hauled out one of our old 6502 divide routines and settled down for the arduous task of translating the code from 6502 to 68000. SURPRISE! The translation was VERY easy, and the code was greatly simplified (the necessary 64 bit subtract was two instructions, not 24!). We actually had the code completed and running in about 2 hours, most of which was for the terribly crude "hand assembler's helper" we now use.
64 BIT DIVIDE: Reference the source code on page 13. The two 64 bit integers are assumed to be in the FPACC#1 and FPACC#2 of the Microsoft routines. Therefore, the integer overflows all the way from the sign to the two guard bytes of each FPACC, or just barely 64 bits, eight bytes.
The first two instructions are long word moves (32 bits each) to fetch the code from RAM into D1 and D0. D1 is the high order long word. The next two instructions fetch the second number to D3 and D2. Then D7 is set for #64 loops, and the routine is entered at "LDIV2" since the state of the carry is both unknown and irrelevant at the start of the first loop.
DIVISION IS NON-COMMUTATIVE. When we explained the multiply routine last month, it didn't matter which number multiplied the other. But A/B does NOT equal B/A. To be compatible with the Microsoft floating point package, FPACC#2 is the numerator and FPACC#1 the denominator. Therefore, we divide D3, D2 by D1, D0.
ABOUT THE POST-DIVIDE NORMALIZATION PROBLEM: When we discussed the multiply routine in the last issue, you will recall that, for properly normalized F.P. integers, the integer result could range from 0.9999... to 0.250. If the result is less than 0.5, is was necessary to normalize the mantissa by shifting it one bit left, then reducing the exponent by one.
The divide has the opposite problem: while the result can never be less than 0.5, it can be greater than 1. In fact, the largest integer result is 1.999999... If the integer result is one or greater, the result must be shifted right one bit position and the exponent increased by 1.
The above explanation is technically correct and in fact describes how our algorithm using the hardware divide works (in part). The "shift and maybe subtract" algorithm, by the way it is implemented, changes this procedure so that if the result is correctly normalized in F.P. fashion with the most significant bit equal to 1, we increase the exponent by 1. If the result is not normalized, we leave the exponent alone but shift the result one bit left. So much for consistency, the darn algorithm works!
HOW THE LOOP WORKS: Again, refer to the source code. We enter the first time at LDIV2, and clear D6. Then we compare the high order 32 bits of the numerator (D3) to the high order bits of the denominator (D1). If D3 is greater than D1, the carry bit will be set, and we branch to SUB! where we subtract D1, D0 from D3, D2 and set the least bit of D6 to 1. If D3 is not greater than D1 and also unequal to D1, we branch to NOSUB, where we of course do not perform a subtraction and do not set the least bit of D6 to zero.
If D3 is neither greater than D1 nor unequal, then the two registers are equal (this will happen once every 32,678 times for random numbers!) and we have to compare the low order 32 bits of each number to determine whether the numerator is equal to or greater than the denominator. If so, we subtract the denominator from the numerator and set the least bit of D6 to 1 and proceed with the code at NOSUB.
At NOSUB, we first shift the least bit of D1 into the carry and extend bits, then we shift this bit into the least bit of D4 while shifting the most significant bit of D4 into the carry and extend bits. Then we shift the new extend bit into D5.
When the division is completed, the result will be in D5, D4 with D5 being the most significant 64 bits. Note that it takes only two instructions, 4 bytes, to shift a 64 bit number.
Now we shift the 64 bit numerator one bit left by first performing an LSL on D2, then an ROXL on D3. IT IS VERY IMPORTANT TO NOTE THAT THIS MAY RESULT IN THE CARRY AND EXTEND BITS BEING SET! If the most significant bit of the numerator was a 1 prior to the shift, then the carry and extend bits will be set.
How can this happen? If the numerator is, for example, $80000000 and the denominator is $A0000000 at the start of the loop, no subtraction will take place. But the most significant bit of the numerator is a 1 and this will then be shifted into the carry and extend bits. Now we decrement the lowest 16 bits of D7 and loop to LDIV1 if the result is not $FFFF.
If the carry bit is set at LDIV1, we know IMMEDIATELY that the numerator is larger than the denominator, so we immediately branch to SUB! This is why we skipped over this conditional branch when we entered the first time.
WHAT HAPPENS AFTER WE DO 64 LOOPS? Once the DBF instruction results in a value of $FFFF in D7, instead of looping we continue the code at $44BA. At this point we have a value in D5, D4 which is either in correctly normalized F.P. form or one which will be correctly normalized if we shift the 64 bit number 1 bit left.
So we move D5 to D7 in order to set or clear the minus flag. If the most significant bit is one, the minus flag will be set and we will take the BMI LDIVX branch, which exits to the DSPREG (display the 68000 registers and flags) routine. Otherwise, we shift the 64 bit result one bit left and place $AAAA in D6 to indicate this happened, and exit to the DSPREG routine.
THAT LAST BUNCH OF CODE WAS SLOPPY! The way the final test for a shift was performed, the least significant bit will always be zero if the shift is performed, and that is not right! The result IS accurate to 63 (not 64) bits, and that is more than adequate to check the accuracy of our 32 bit divide routine and Microsoft's divide routine.
This will be fixed, of course, when this is developed into the divide for our forthcoming 80 bit floating point package.
Incidentally, we apologize for the compressed form of the 68000 source code on the preceding page. We generally prefer a blank line between each 68000 instruction for readability. However, the bozo who writes the white pages overran his allotment this month, so we're restricted to four pages.
The 64 bit divide routine is at this moment just a utility. On the last line it jumps to another utility called DSPREG. DSPREG calls a very simple 68000 routines which sends the data and address registers to the host, followed by the status register. That's 66 bytes in all. The host then has a utility which formats this data and presents it on the CRT.
If we divide $8000000000000000 by $A000000000000000 and then jump to the DSPREG utility, the CRT display will look like this on a CBM 8032:
D0= $00 00 00 00 D4= $CC CC CC CC A0= $00 00 44 80 A4= $FF FF 7F EF D1= $A0 00 00 00 D5= $CC CC CC CC A1= $E7 FF 4F EF A5= $FF FF 79 FF D2= $00 00 00 00 D6= $FD FF AA AA A2= $7D FF 7F 7E A6= $FF EF FF FF D3= $80 00 00 00 D7= $66 66 66 66 A3= $FF 7F F7 FF A7= $00 00 00 00 T= 1 S= 1 I= 7 X= 0 N= 1 Z= 0 V= 0 C= 1
Note that the divisor in D1, D0 remains unchanged. The numerator appears unchanged, but that is an accident due to the oddball numbers we chose to divide. The result is $CCCCCCCCCCCCCCCC, which is found in D5, D4. The lower 16 bits of D6 is $AAAA, indicating that a shift was necessary to leave the result in normalized form. This can be confirmed by examining D7, which is equal to D5 BEFORE the shift. Note that the most significant bit is a zero.
Also note that A0 contains $00004480, which just happens to be the starting address of the 64 bit divide routine. The DSPREG is a very useful utility to find out what is happening in the 68000!
INTRODUCTION TO THE 68000 HARDWARE DIVIDE: DIVU is a 16 bit unsigned hardware divide. The source is a 16 bit operand which is divided into the destination. The destination is a 32 bit operand and MUST be a data register. The source can be any data addressing mode (we are going to HAVE to discuss addressing modes sometime), but for simplicity we will stick to another data register as the source. The numerator is the destination register, the denominator is the source. The upper sixteen bits of the source register are "don't cares".
AFTER the divide, the source is unchanged and the result is in the lower 16 bits of the destination register. The remainder is left in the higher 16 bits of the destination register (the result and remainder are in reverse order in the sense that the 6502 address is usually in reverse order).
THE NUMERATOR MUST BE LESS THAN THE DENOMINATOR! If the upper 16 bits of the destination operand are equal to or larger than the 16 bits in the source operand, the destination register remains unchanged, no division takes place, and the instruction terminates with the V flag set to 1. When this happens, the instruction execution time is much faster than usual.
EXAMPLES: (BEFORE) SOURCE DESTINATION (AFTER) DESTINATION V FLAG $8888 $AAAA0000 $AAAA0000 1 $A000 $80000000 $8000CCCC 0 $A000 $0000FFFF $5FFF0001 0 $8123 $8456789A $8456789A 1 $889A $81234567 $1799F203 0
This discussion will continue in the next issue.
1 OPT P=68000,BRS,FRS 004480 2 ORG $004480 3 4 * MOVE DENOMINATOR TO D1, D0 5 004480 2238 7028 6 LDIV MOVE.L S1.W,D1 004484 2038 702C 7 MOVE.L 2+M1.W,D0 8 9 * MOVE NUMERATOR TO D3, D2 10 004488 2638 7030 11 MOVE.L S2.W,D3 00448C 2438 7034 12 MOVE.L 2+M2.W,D2 004490 3E3C 003F 13 MOVE.W #$003F,D7 004494 60 02 14 BRA LDIV2 15 16 * THIS IS THE MAIN LOOP, EXECUTE 64 TIMES 17 004496 65 0E 18 LDIV1 BCS SUB 004498 4246 19 LDIV2 CLR.W D6 20 21 * COMPARE THE HIGH ORDER 32 BITS 22 00449A B283 23 CMP.L D3,D1 00449C 65 08 24 BCS SUB 00449E 66 0C 25 BNE NOSUB 26 27 * THE HIGH ORDER 32 BITS ARE EQUAL; 28 * COMPARE THE LOW ORDER 32 BITS 29 0044A0 B082 30 CMP.L D2,D0 0044A2 65 02 31 BCS SUB 0044A4 66 06 32 BNE NOSUB 33 34 * NUMERATOR IS GREATER THAN OR EQUAL TO THE 35 * DENOMINATOR; SUB DENOMINATOR FROM NUMERATOR 36 0044A6 9480 37 SUB SUB.L D0,D2 0044A8 9781 38 SUBX.L D1,D3 39 40 * SET D6, BIT 0 - 1 41 0044AA 5246 42 ADDQ.W #1,D6 43 44 * SHIFT A 1 OR A 0 INTO THE CY FROM D6, BIT0 45 0044AC E24E 46 NOSUB LSR.W #1,D6 47 48 * SHIFT THE CY AND RESULT (64 BITS) 1 BIT LEFT 49 0044AE E394 50 ROXL.L #1,D4 0044B0 E395 51 ROXL.L #1,D5 52 53 * SHIFT THE NUMERATOR 1 BIT LEFT 54 0044B2 E38A 55 LSL.L #1,D2 0044B4 E393 56 ROXL.L #1,D3 0044B6 51CF FFDE 57 DBF D7,LDIV1 58 *( LOOP FOR 64 BITS ) 59 60 * TEST WHETHER THE MSB IS 1. IF NOT, SHIFT THE 61 * RESULT ONE BIT LEFT AND LEAVE $AAAA IN D6 62 0044BA 2E05 63 MOVE.L D5,D7 0044BC 6B 08 64 BMI LDIVX 65 0044BE E38C 66 LSL.L #1,D4 0044C0 E395 67 ROXL.L #1,D5 0044C2 3C3C AAAA 68 MOVE.W #$AAAA,D6 69 0044C6 4EF8 0274 70 LDIVX JMP DSPREG 71 # 00000274 72 DSPREG EQU $000274 # 00007028 73 S1 EQU $007028 # 0000702A 74 M1 EQU $00702A # 00007030 75 S2 EQU $007030 # 00007032 76 M2 EQU $007032