Machine Problem 4: Fractals
| Assigned |
Thursday, 16 March 2006 |
| Due Date |
Wednesday, 5 April 2006 |
| Purpose |
Learn 32-bit protected mode programming including
high resolution graphics, floating point and MMX operations, and
DPMI mouse interupts. |
| Points |
70 |
In this MP, you will write a program that generates the Mandelbrot set
fractal. The user will be able to click on a point in the display area and
get a more detailed view of that portion of the fractal. Once the new view
has been computed the program will use MMX instructions to fade from the old
image to the new. The color scheme used in painting the fractal will be
computed at runtime by linearly interpolating between color values specified
in a global table.
Reading:
Differences Between Real Mode and Protected Mode (Lab Notes, Chapter 17),
Introduction to PModeLib (Lab Notes, Chapter 18),
PModeLib Reference (Lab Notes, Appendix A),
MMX:
MMX Technology Overview (Introduction and section on alpha blending),
8087 Floating Point:
Art of Assembly (Section 14.4 recommended)
Mandelbrot set:
MathWorld and
a more intuitive explanation
Screen Dump
Here are a couple of possible outputs for the program. Your program
should support a wide variety of size and color configurations as suggested
by these:
Program Specification
This program has several major components that work together to produce
the beautiful images above. Trying to explain them all at once in great
detail would invite unnecessary confusion, so instead each component will be
presented on its own first and the relationships between the components
described later.
This bottom-up approach to explaining the MP is a bit different than the way
previous MPs have been specified. As a result, variables and subroutines will
be explained as they come up rather than all clumped together.
Before we begin however, it should be pointed out that there is a significant
amount of information needed to successfully complete this MP. This write-up
cannot cover every detail, so things like the semantics of MMX and floating point
instructions will be left to you and your group members to figure out. While,
it is a lot of information, you have several team members to help in this process
and there are a great number of resources availiable to you including lectures,
the reading list at the top of this document, your professor and TAs, and above
all else, Google.
Protected Mode
The first hurdle will be to acclimate yourself to protected mode. There are a
few things to keep in mind:
- Protected mode is a 32-bit mode. All the registers you're used to have a
a 32 bit version with the same name but prefixed by an 'E' (AX -> EAX) etc.
While these registers are 32-bit now, they can still be accessed as 16 or
8-bit registers as was the case in real mode (e.g., AL, AH, AX, and EAX are
all accessible registers).
- The exceptions to this rule are the segment registers (CS, DS, ES) which
keep their old names and gain a few new friends (FS and GS). They behave
rather differently in protected mode but outside the mouse handling routines
you should not need them.
- Addresses are also 32-bit, so you'll need to use the 32-bit versions of
registers to access memory locations.
- The good news is that addressing modes are more flexible in protected
mode. For example, there are no restrictions on which of the 32-bit registers
may be used for indexing. Furthermore, you can do handy things like multiple
a register by four inside of braces to do something like mov eax, [myvar + ecx + 4 *
esi] that would make a RISC designer cry.
- The call instruction still works and is good for subroutines, but for
functions you'll be better served by the proc, endproc, and invoke macros.
Together these macros work to enforce the C-style calling conventions
implemented manually in previous MPs. As a quick summary, the syntax is
something like this:
invoke _AddNums, dword 2, ebx ; returns 2 + ebx in eax
...
; AddNums(firstNumber, secondNumber) returns the sum of two numbers
proc _AddNums
.firstNumber arg 4 ; the first argument is 4 bytes
.secondNumber arg 4 ; the second argument is also 4 bytes
.sum local 4 ; define a 4 byte local variable on the stack
mov eax, [ebp + .firstNumber], ; eax gets the first argument
mov [ebp + .sum], eax ; store the first argument in the local variable sum
mov eax, [ebp + .secondNumber], ; eax gets the second argument
add eax, [ebp + .sum] ; eax = first argument + second argument
ret ; still need a return - endproc does not do this for you
endproc
_AddNums_arglen equ 8 ; define this value to be the total size of all the arguments
- As you can see, local variables can be declared and used almost exactly like
arguments. Furthermore, to access either of these, the pointer ebp is needed.
Most of what the proc macro does is push ebp and copy esp into ebp so don't change
ebp if you need to access stack data.
- A local variable in the code above is used not just to illustrate the use of the
'local' macro, but also to avoid modifying registers. One thing 'proc' does not
do for you is preserve any registers except for ebp.
- The protected mode debugger, cv32, that is availiable may initially seem limited
but it will be of tremendous benefit, especially for debugging your floating point
and MMX routines.
Fractal Generation
Next we'll focus on the computation of the the Mandelbrot set. The
Mandelbrot set is just that, a set. It contains certain numbers in the
complex plane and not others.
The rule for a point at complex coordinate C being in the set is that the
iterative equation Z = Z^2 + C does not diverge when the initial value of
Z is taken to be zero. Note that Z is a complex value so computing Z^2
is more involved than just squaring a real-valued number. A straight binary
representation of the set looks something like this:
As can be seen from this image, the set is bounded spatially; there is some
radius, r, such that no coordinate, C, whose magnitude is greater than r
(|C| > r) is contained by the set. This radius happens to be two.
So, in its simplest form, all a program to generate the Mandelbrot set needs
to do is to go through all the pixels in the current view, figure out what
complex coordinates they correspond to, and see whether or not iterating on
the equation above for that pixel's value of C ever gives a Z with magnitude
greater than 2. If so, draw the pixel white, otherwise draw it black and
you've got yourself the image above.
The obvious flaw with this plan is that all the black points were by
definition the ones that never stopped iterating. So if you want your
program to run in finite time (always a good goal), you eventually
have to give up on iterating and assume that since it's taken X number of
cycles and we haven't seen it diverge that it will never diverge. This
means we can't really compute the true Mandelbrot set.
An approximation can look pretty good though if we choose X to be high
enough that not many points from our current view require more than X
iterations to diverge. Generally speaking, the more you zoom in on the
set, the more iterations are required. So the first MP4 function we
specifically describe is one that adjusts the maximum number of iterations
(which we've been referring to as X), depending on the size of the current
view.
_SetLimit just implements the equation
_iterationLimit = 15 * (_timesZoomed + 1) * (_maxX - _minX + 1)
There is nothing magical about this equation, it's simply a heuristic that
gives okay results. As you can see, it combines information about
the number of times the user has zoomed in with the width of the current
view. As you go deeper and deeper, _timesZoomed increases while
(_maxX - _minX + 1) goes to 1. Overall, it just increases the number of
iterations linearly as you go deeper. The variables used to express the
equation are all actual variables from the given mp4.asm file. They
are:
- _minX, _maxX, _minY, _maxY: Quadwords
(64-bit locations containing double precision floats in this case) that
contain the coordinates of the bounds of the current view.
- _timesZoomed: Byte initialized to
zero and incremented everytime the user zooms in.
- _iterationLimit: Word containing
the maximum number of times to apply the Z = Z^2 + C equation to a point
before giving up and calling the point a member of the Mandelbrot set.
Note that since _minX and _maxX are floating point values, at least part
of the computation for this routine will need to be done using floating
point instructions. A few points of advice:
- Remember that the floating point registers (st0-st7) are
organized as a stack. That means that if you write the expressions you want to
compute in Reverse Polish Notation (RPN), then you can directly translate
from expression to a sequence of floating point instructions that operate
on the top elements of the stack (look at the "f_operation_p" instructions
in particular if you want to do things this way).
- The floating point instruction reference in the manual is great for
figuring out what instructions are availiable and generally what they do
but there are so many variants with subtly different semantics that are
only vaguely described that you *will* need to use the debugger to
understand exactly what each instruction does to the stack. Press F1 in
cv32 to get a list of the availiable panes (floating point pane is the
one you want, not numeric / SSE).
- The floating point registers overlap with the MMX registers so the
meaning of any one of these registers is dependent entirely on context.
What that means for you is that you must be careful when switching back
and forth between floating point and MMX. Something you put into an MMX
register can trigger a floating point exception that terminates your
program when you start doing floating point calculations again. The
way to avoid this is to place the emms instruction at the end of all
of your MMX routines. This empties the MMX state so that subsequent floating
point instructions won't read the bad data. It may also help to insert
finit instructions in your floating point routines before using the
floating point unit, but this is far less critical than emms.
Now that we have an iteration limit, we could write an algorithm for
generating the binary fractal shown above, but it wouldn't be all that
interesting.
Instead, some clever person noted that if we paint each of the areas
between successive
lemniscates of the Mandelbrot set a different color the image becomes
much prettier. The nth lemniscate simply refers to the curve that
contains the set of points in the graph that require more than n
iterations of the (in this case) Mandelbrot set equation to diverge
(defined as exceeding radius 2). Because we are already keeping track of
the number of iterations at every point (to check if it goes above
_iterationLimit), all we have to do is choose a color based on this
number.
Colors, Images, and Video Memory
The next question then is how we represent, store, and display
colors.
We'll be using 24-bit
color in this MP which means that the color of every pixel on
the screen requires 24 bits of space to specify. These 24 bits are
broken up into three 8-bit numbers, representing the intensities of
red, green, and blue light used to color that pixel. A value of 0
corresponds to no light, while 255 is full intensity. Hence {R,G,B} =
{0,0,0} gives the color black, {0,0,255} gives pure blue, and
{255,255,255} gives the color white.
It's worth taking a moment to contrast 24-bit mode with 32-bit mode.
We are not using 32-bit mode graphics in this MP. Nor will we ever,
because our new machines do not support 32-bit images the
way PModeLib uses them. That said, they also do not list 24-bit
graphics as an option in the 'Display Properties' control panel. Things
will work okay if you try to draw a 24-bit image using PModeLib with
Windows set to 32-bit mode but not if Windows is in 16-bit mode so
please check that your computer is not set to 16-bit mode in the
Control Panel's Display Properties. I think most of the machines are
configured correctly but there are a couple that seem to like to reset
themselves.
Another hardware/OS quirk worth mentioning is that Windows expects
24-bit color data to be stored in BGR format, that is, blue at
the lower addressed location, even though people always say "RGB".
Moving beyond individual pixels, images are stored as arrays of these
24-bit values. The first of these values corresponds to the color of
the pixel in the top left corner of the image. Images are almost always
(and for this MP, are always) stored in row major order as with other
2D arrays we've used in this class. So the second value in the array
defines the color of the pixel just to the right of the first pixel.
All in all, an image is just an array like any other. The most notable
difference is that it is a very big array. Because of this, it's preferable
to allocate space for them at runtime using the
_AllocMem function provided by PModeLib. _AllocMem and all
the other PModeLib functions require that you call _LibInit first
and _LibExit before exiting your program.
There's a somewhat elaborate initialization ritual needed to set up a display
window for your images. The lab manual has
details (that assume the 32-bit mode we can't use) but to summarize, you
must call PModeLib's _InitGraphics, _FindGraphicsMode, and
_SetGraphicsMode in that order with appropriate arguments. To clean
up after yourself when the program is done there is a shutdown sequence:
_UnsetGraphicsMode and then _ExitGraphics.
Once you've made space for your image and written some data into it,
displaying it to your window is rather trivial. All that is needed
is a call to _CopyToScreen.
Colormaps
We've talked about (or at least made reference to) all the technical
details of putting a given pattern of colors onto the screen now but
one issue remains and that is how we decide what color to use for a
given number of iterations.
More broadly, we'd like to define a general scheme that makes it easy
for us to completely change what colors are used without having to
change assembly instructions all the time. To this end, the design
we will use is a cyclic colormap generated at runtime by linearly
interpolating between a list of colors.
Breaking that statement down, the design will
center around a colormap, which is nothing more than an array of colors.
In the code this array is named _colorMap. It will map the number
of iterations executed on a pixel onto the color that should be used
for that pixel on the screen. So if a pixel took two iterations, you
would read the third row from _colorMap to determine what color to use
(third because the first row corresponds to zero iterations).
The colormap is cyclic because indices wrap around at the end.
If there end up being 100 entries in _colorMap, we need a rule to decide
what to do when more than 100 iterations are used. A simple but
flexible solution is to wrap around so that in the above example case
we would be reading the row corresponding to _colormap[i % 100].
_colorMap is populated with data by calling _FillColorMap, one
of the functions you are responsible for writing. _FillColorMap has a
number of implicit and explicit inputs:
- NUM_COLORS (global define): The number of distinct color
entries in the _colors array described below. Must be < 256.
- COLOR_STEPS (global define): The number of levels to
interpolate between neighboring colors in the _colors array. Must be
< 256.
- _colors (address passed as argument): An array of colors
given in BGR format. There are actually NUM_COLORS + 1 entries in the
array but the last is a repeat of the first for convenience in coding
_FillColorMap.
- _colorMap (address passed as argument): The array that
holds the output colormap as mentioned before. It has NUM_COLORS *
COLOR_STEPS rows.
The job of _FillColorMap is to take the
colors defined in the _colors array and blend successive neighbors
together with COLOR_STEPS different weightings. For example, if
COLOR_STEPS was 4 and the two colors were red and blue, the first four
entries output to _colorMap would be 100% red, 75% red + 25% blue, 50%
red + 50% blue, and 25% red + 75% blue.
Numerically that would be:
-Color 1- Weight 1 -Color 2- Weight 2 -Output Color-
{B,G,R} = { 0 , 0 ,255} * 100% + {255, 0 , 0 } * 0% = { 0 , 0 ,255}
{ 0 , 0 ,255} * 75% + {255, 0 , 0 } * 25% = { 64, 0 ,192}
{ 0 , 0 ,255} * 50% + {255, 0 , 0 } * 50% = {128, 0 ,128}
{ 0 , 0 ,255} * 25% + {255, 0 , 0 } * 75% = {192, 0 , 64}
A couple of visual examples may help. In the images below, the colors
in the upper row of each image represent the colors defined in the
_colors array. The colors in the lower row are colors output to the
_colorMap array with the exception of the part marked "No Entries";
because the last color this section corresponds to the extra row of _colors (which we
defined as having NUM_COLORS + 1 rows) and there are no corresponding
repeated entries in _colorMap.

NUM_COLORS = 3, COLOR_STEPS = 2, _colors = {red, green, blue, red}
Looking at this first example, there are three (NUM_COLORS) colors in the
_colors array but we repeat the first one giving a total of four
(NUM_COLORS + 1) entries in _colors. _FillColorMap should loop from 0 to
NUM_COLORS-1, thereby looping through all the entries of _colors except
for the repeated one at the end.
For each of these color entries, it performs mixing similar to the
example above. Since COLOR_STEPS is 2 for
this first image, this first iteration of the loop we will write 2
output colors to _colorMap. The first will correspond to 100% of the
current color (red) and 0% of the next (green), giving red. The second
entry will be 50% of the current color (still red) and 50% of the next
(still green) giving that nasty olive color you see. Then, in the
second iteration it will apply the same ratios (100% and 0%, 50% and 50%)
but to the second color in _colors (green) and the third color (blue).
For the third and final iteration it combines the third color, blue,
with the fourth color so that the color map cycles back to red where
it started.
The second and third examples, shown below, increase the number of
COLOR_STEPS for a more smooth appearance, and also vary the input colors
stored in the _colors array.

NUM_COLORS = 6, COLOR_STEPS = 4, _colors = {red, yellow, green, cyan, blue, magenta, red}

NUM_COLORS = 2, COLOR_STEPS = 200, _colors = {black, white, black}
_FillColorMap is probably most easily implemented with three nested loops.
The outermost one goes from 0 to NUM_COLORS and is indexed by 'i' let's
say. The middle one goes from 0 to COLOR_STEPS with index 'j'. The
innermost loop goes from 0 to 2 (for the three separate components of
each color) with index 'k'. The innermost loop implements the following
expression which stores the linearly interpolated colors into the
colormap:
_colorMap[(i * COLOR_STEPS + j) * 3][k] = (colors[i*3][k] * (COLOR_STEPS - j) + colors[(i+1)*3][k] * j) / COLOR_STEPS
It's a bit ugly all written out together like that but if you notice, the
left hand side of the equation just writes successive bytes in _colorMap.
Meanwhile, the right side of the equation is multiplying the first color by 1 -
(j / COLOR_STEPS), the second color by (j / COLOR_STEPS), and then taking the
sum of those two products. Since those two fractions always sum to one we know
that this expression generates evenly weighted colors (numerically, visually
is a completely different story). Furthermore the expression is linear in
terms of variable j so the transition from color to color is linear (again,
numerically).
This routine is to be implemented in integer arithmetic so be careful to avoid
overflow and underflow. Consider carefully the effects of the
order in which you execute arithmetic operations.
With this colormap, we've now specified how to
draw all of the points that prove not to be in the Mandelbrot set but we
haven't said anything about points that are in the set. These we will
simply assign a color. This color is specified by three defines:
LIMIT_BLUE, LIMIT_GREEN, and LIMIT_RED.
Fractal Generation, revisited
Pulling all of this together, we're finally able to generate a fractal in
color to draw to the screen with a call to _CopyToScreen. In the code this
task is split between two functions.
First there's _IteratePixel, which takes
a pointer to a complex coordinate and returns the number of iterations it
takes for that point to diverge. _Generate
is the other function, and it loops through all the pixels on the screen,
calling _IteratePixel on the corresponding complex coordinate and then
using _colorMap and the LIMIT_colors to write the appropriate color data
based on what _IteratePixel returns. _Generate is called by the main loop
implemented in _DoFrame (which is given to you). When this call
is made, the address of the buffer at which to store the fractal image
is passed as an argument.
Looking at some C code that summarizes what these functions do is
probably illustrative even if you are not intimately familiar with
C.
#define SCREEN_WIDTH 640
#define SCREEN_HEIGHT 480
struct Complex {
double Real, Imag;
}
struct Complex Z, C; // Declare two of these complex number structures
...
void _Generate(unsigned char*** buffer) {
// pass in the address to write the image at
unsigned short iterations;
_SetLimit(); // Call this once to set _iterationLimit
for (int j = 0; j < SCREEN_HEIGHT; j++) {
for (int i = 0; i < SCREEN_WIDTH; i++) {
C.real = _minX + (_maxX - _minX) * i / SCREEN_WIDTH;
C.imag = _minY + (_maxY - _minY) * j / SCREEN_HEIGHT;
// convert screen coordinates into complex-plane coordinates
iterations = _IteratePixel(&C);
// pass a pointer to C (the coordinates to test)
if (iterations == _iterationLimit) {
// paint the current pixel (buffer[j][i]) with the color for points in the set
buffer[j][i][0] = LIMIT_BLUE;
buffer[j][i][1] = LIMIT_GREEN;
buffer[j][i][2] = LIMIT_RED;
}
else {
// paint the current pixel with a color from _colorMap
unsigned short wrappedColor = iterations % (NUM_COLORS * COLOR_STEPS);
// use wraparound to map the number of iterations into a color index
for (int k = 0; k < 3; k++) {
buffer[j][i][k] = _colorMap[wrappedColor][k];
}
}
}
}
}
unsigned long _IteratePixel(const struct Complex* pos) {
// pass in the address of C in our Z = Z^2 + C equation
unsigned short iterations = 0;
Z.Real = 0;
Z.Imag = 0;
while (iterations < _iterationLimit) {
double ZRealNext, ZImagNext;
ZRealNext = Z.Real * Z.Real - Z.Imag * Z.Imag;
ZImagNext = 2 * Z.Real * Z.Imag;
// ZNext = Z^2
Z.Real = ZRealNext + pos->Real;
Z.Imag = ZImagNext + pos->Imag;
// Z = Z^2 + C
if (Z.Real * Z.Real + Z.Imag * Z.Imag >= 4) { // if |Z| >= 2
return iterations;
}
iterations = iterations + 1;
}
return iterations; // iterations == _iterationLimit if we got here
}
Most of the constructs and variables you see above see can be directly
implemented in your code or are already present:
- SCREEN_WIDTH and SCREEN_HEIGHT: Defines we've
given you that control the number of pixels in the generated images.
- Complex: A structure that simplifies access to related
variables. Implemented in the given mp4.asm with the STRUC/ENDSTRUC
macro. These macros create several defines for you including
Complex.Real, Complex.Imag, and Complex_size. Check the lab manual
for details of what they are and how to use them.
- Z, C: Two global variables of type Complex as defined
above.
- _Generate: Puts a fractal into the buffer specified by its
only argument. First it calls _SetLimit so that the _iterationLimit
will be appropriate for the current view. Then it loops through all
the pixels on the screen and converts them to complex coordinates (the
equation is given fairly clearly in the code above). The address of
where the current coordinate is stored is passed as the argument to
_IteratePixel which counts how many iterations can be done before that
point diverges. Using this information, it writes a pixel to the
buffer either from _colorMap or from the LIMIT_COLOR defines.
- _SetLimit: Described above.
- _minX, etc: Described above.
- _IteratePixel: Accepts a pointer to a 'Complex' structure
as described a couple lines above. It repeatedly applies the equation
Z = Z^2 + C using zero as the initial Z and the passed data as C. As
it's computing it keeps a count of how many times the equation has
been executed without the magnitude of Z going above two. If it does
exceed two, the function returns the current iteration count.
Otherwise it continues iterating until the count is equal to the
value in _iterationLimit at which point it gives up (still returning
the current count which should be equal to _iterationLimit as we've
just said).
- _iterationLimit: Described above.
- LIMIT_BLUE, etc: Described above.
- _colorMap: Described above.
User Interaction
Unlike previous assignments, the mouse is the sole input device for this
machine problem. The action taken upon clicking and then releasing
(nothing happens until the release) each button is as follows:
- Left: View zooms in by a factor of two
- Right: View zooms out by a factor of two (but not beyond the initial
view)
- Middle: Exits the program.
Because there is a good chance that the user will click the
mouse during the lengthy fractal computation, we need to use an interrupt.
In particular, we're going to setup a
callback function that will be called whenever either mouse
button is clicked. The job of this callback function will be to read
the button states (pressed or not) and also the position of the mouse
and store these in global variables that the rest of our program can
read and respond to.
Working with protected mode, we don't have the same access to interrupts
that we did in real mode. Instead, the program must switch to real mode,
make the interrupt call it wants, and then switch back to protected mode.
Doing this by hand would be a tedious exercise, but fortunately there is
a PModeLib function that does this for you.
DPMI_Int takes a real mode interrupt number given in BX
and register values in
DPMI_Regs. It then switches to real mode, sets the registers
to the corresponding values in DPMI_Regs, and calls the interrupt. The
resulting register state is written back to DPMI_Regs, where it will be
visible from protected mode, and the routine then finishes by returning
to protected mode. This function is at the heart of both the
_InstallMouse and _RemoveMouse routines that install and
remove your mouse handling routine, _MouseCallback.
The interrupt that installs the mouse callback is number 33h, function
0Ch. All the necessary details are given in the corresponding row in
the table
here.
Here is a summary of what each of the three mouse procedures must do:
_InstallMouse
- _LockArea for the locations in memory which will be read upon
executing the callback handler, namely the _mouseState variable and the
code for the handler itself.
- Use _Get_RMCB to get the address of a real mode callback
handler based on your protected mode handler (_MouseCallback). Store
this address for future use into the _MouseSeg and
_MouseOff registers.
- Write the necessary register values for the int 33h call into the
appropriate places in DPMI_Regs.
- Call DPMI_Int with bx = 33h to install your handler.
_RemoveMouse
- Set the values in the DPMI_Regs so that a call to int 33h will remove
your previously installed handler.
- Call DPMI_Int with bx = 33h.
- Use _Free_RMCB to remove the real mode version of your
callback function.
_MouseCallback
- The one argument that is passed to _MouseCallback is a pointer to
the DPMI_Regs structure you will need.
- Test if _mouseState is nonzero. If so, the previous mouse click
has not been processed so just exit.
- Check the state of DPMI_Regs to determine what buttons are pressed.
Note that although the 'button state' is returned in BX, you actually
want to look at AX, the 'event mask'. The bits in BX represent
current mouse button state rather than state transitions. So BX has 1 in
one of its low 3 bits if the corresponding button is pressed (if you're
following the
table, there's a typo: bit 2 corresponds to the middle button).
We're recording button releases though, and so the information in BX is
not useful. AX contains the information we need; AX & LEFT_BUTTON for
example, is nonzero when the left button has just been released.
- Copy the relevant mouse state data into _mouseX,
_mouseY, and _mouseState.
Implementing these three functions is all you need to do in order for
the interface to work; the code that reads _mouseState and decides
what to do is in the given function _DoFrame. In this function,
if _mouseState is LEFT_BUTTON or RIGHT_BUTTON, it calls _Zoom
(also given to you) with the appropriate arguments to zoom either in or
out. If _mouseState is MIDDLE_BUTTON, it causes the program to exit.
The catch is, the way interrupts are handled in protected mode,
if you use the library version of _InstallMouse, it installs the library
version of _MouseCallback, not yours. As a result, you will not be
able to test your versions of these functions one at a time. The
library version of _RemoveMouse *should* work with both your functions
and the library versions but it's possible that it will be incompatible
too.
Alpha Blending
The last part of this assignment makes one fractal fade to the next
using alpha blending. As was the case with the user interaction portion
of this MP, some of the grunt-work code has been given to you and you
are only required to implement the key interesting routines.
In particular, _Blend is the only function you need to write for
fading to work. It takes pointers to three images, one labeled the
top source, one the bottom source, and one the destination. _Blend's
task is to combine the data in the two sources and write the resulting
image into the destination buffer. The combination function depends
on the fourth argument, alpha.
Alpha can be interpreted in different ways, but in our program we will
consider it the percentage of the output image that comes from the top
source. The rest of the output comes from the bottom source. Although
we referred to it as a percentage, it is actually an integer from 0 to
255. The reason for this is that in 32-bit images, the
fourth byte of color data is this alpha value (the first three are still
blue, green, and red). So even though it's easiest to think of as a
percentage, the alpha value needs to be divided by 255 for it to behave
as such. Hence, an alpha byte of zero corresponds to 0/255 = 0% and a
byte of 255 corresponds to 255/255 = 100%.
The combination function itself is given by:
dst = (srcTop * alpha + srcBot * (255 - alpha)) / 255
This equation is applied per byte. That is to say, first you would
apply it to the blue value of the upper left pixels in the two
source images to get the blue value for the upper left pixel of
the destination image. After that you would proceed to the green and red
bytes, then continuing on to the remaining pixels. Such highly parallel
integer arithmetic is what MMX was designed for.
Instead of laboriously applying the formula to every byte of the
image we can leverage MMX instructions and instead only apply it once
per pixel. In doing so, the fade routine will run likely run about two
or three times more quickly. This will be important if we want to animate
fading since each step of the fade is done by calling _Blend; if it took
a long time to perform the _Blend, the animation would be slow and
choppy.
The algorithm will look something like this:
(code to load constants like 0, alpha, 255-alpha into MMX registers)
loop through pixel rows (Y) from zero to SCREEN_HEIGHT
loop through pixels (X) from zero to SCREEN_WIDTH
load current pixel from top source into an MMX register, say mm0
unpack mm0 from 4 bytes to 4 words
multiply the words in mm0 by alpha together at once
shift the result to the right by 8
load current pixel from bottom source into an MMX register, say mm1
unpack mm1 from 4 bytes to 4 words
multiply the words in mm1 by (255-alpha) together at once
shift the result to the right by 8
add the two results
pack the 4 words in the sum back into 4 bytes
store the 4 bytes into the current pixel to the destination image
end loop
end loop
There are a couple things about the pseudocode above that should seem a
little off. First, we appear to be reading and writing 4 bytes of data
at a time even though pixels are only 3 bytes apiece. This is
intentional. The rest of the code has been set up such that it works
safely. Namely, the buffers are created at neighboring locations in
memory with an extra byte at the end so that reads and writes to the
extra fourth byte in the last pixel won't cause a memory exception. For
the rest of the pixels, reading the data and performing computation on it
costs us nothing, and even though writing the extra byte into the output
might seem dangerous, that byte is always overwritten when the formula is
applied to the next pixel (assuming you loop through the image from the
top left as in the pseudocode). Moreover, this runs faster than trying
to worry about only reading, manipulating, and writing back 3 bytes at
a time.
The other apparent mistake is use of a shift right by 8 to implement
the divide by 255. It's true that this is technically incorrect, but
for our purposes speed is more important than accuracy (and the amount
of inaccuracy would probably not be noticeable anyways). By avoiding a
divide by 255, _Blend will run more quickly.
Program Organization
Three functions are given to you: _Main, _DoFrame, and _Zoom. It's up
to you to replace the library versions of _Generate and the other procedures
though. This is done by deleting the statements that call the libmp4
subroutine and by adding your own code. Each subroutine that you write should
match the output of the library subroutine exactly.
Friendly Advice
- The libmp4.lib file contains executable library subroutines
for each of the routines that you need to implement. The library
subroutines enable you to run the program and understand how it works before
you implement it. You can test your program with any combination of
your own code and library subroutines. You will receive credit only for
the subroutines that you implement yourself.
- Monitor the course WebBoard for clarifications and help.
- START EARLY!
Files for MP4 are on the V: drive in the directory
V:\ece390\mp4. In this directory are the program framework
mp4.asm and a running version of the program mp4lab.exe.
Lab versions of subroutines are in libmp4.lib, which contains all
subroutines of LIB291 plus versions of the MP4 subroutines (libMain for Main, etc).
You will use mp4xit instead of dosxit. You should start by copying these files to your home
directory with the following command: xcopy /s V:\ece390\mp4
W:\mp4 You may download the files from the server as
mp4.zip
Demonstration, Documentation, and Grading
Demonstrate your program to an ECE 390 staff member.
As in previous MPs, keep an MP development log and write a cover memo.
The memo should address the following questions:
- How much time did you spend on each subroutine, from planning and
design through final debugging?
- What kinds of defects (bugs) did you find during the development of
the program? When did you discover these defects (during code review or
during testing)? How did you find them?
- What did you learn about design, coding, testing, and debugging in
this MP?
- What went well in your work on this MP? What did not?
- What you would do differently for the final project?
After the demonstration, submit your program and cover memo. Your
program will be graded according to the clarity of your design and the
quality of your documentation.
Gradesheet:
- _Generate: 12
- _IteratePixel: 6
- _SetLimit: 6
- _InstallMouse: 3
- _MouseCallback: 6
- _RemoveMouse: 3
- _FillColorMap: 12
- _Blend: 12
- Technique: 3
- Documentation: 3
- Cover Memo: 4
mp4.asm (program framework)
; ECE 390 Spring 2006 MP4
; -- Fractals --
;
; Completed By:
; Your Name
;
; Mark Dykstra
; University of Illinois Urbana Champaign
; Dept. of Electrical & Computer Engineering
;
; Ver 1.0
%include "lib291.inc"
%include "libmp4.inc"
BITS 32
GLOBAL _main
;===== Section 1: Define Constants ============================================
; Number of pixels the screen is wide and tall
SCREEN_WIDTH EQU 640
SCREEN_HEIGHT EQU 480
; Number of colors given in _colors
NUM_COLORS EQU 6
; Number of levels to interpolate between entries in _colors
COLOR_STEPS EQU 16
; Amount to increase the alpha each frame of the blending animation
FADE_SPEED EQU 12
; The color to use when the iteration limit is exceeded
LIMIT_BLUE EQU 0
LIMIT_GREEN EQU 0
LIMIT_RED EQU 0
LEFT_BUTTON EQU 2
RIGHT_BUTTON EQU 8
MIDDLE_BUTTON EQU 32
STRUC Complex
.Real resq 1
.Imag resq 1
ENDSTRUC
;===== Section 2: Declare External Routines ===================================
;===== Section 3: .bss ========================================================
SECTION .bss
_kbINT resb 1
_kbIRQ resb 1
_kbPort resw 1
_mouseSeg resw 1 ; real mode segment for MouseCallback
_mouseOff resw 1 ; real mode offset for MouseCallback
_frontBuffer resd 1
_backBuffer resd 1
_blendedBuffer resd 1
_ScreenOffset resd 1
_graphicsMode resw 1
_mouseX resw 1
_mouseY resw 1
_iterationLimit resw 1
_colorMap resb 3 * NUM_COLORS * COLOR_STEPS
fildPass resd 1 ; can be used to hold integers you want to convert into floats with fild
diverges resd 1
Z resb Complex_size
C resb Complex_size
;===== Section 4: .data =======================================================
SECTION .data
_minX dq -2.5
_minY dq -1.5
_maxX dq 1.5
_maxY dq 1.5
_mouseState db 0
_timesZoomed db 0
%if 1
; BBB, GGG, RRR
_colors db 0, 0, 240
db 0, 240, 240
db 0, 240, 0
db 240, 240, 0
db 240, 0, 0
db 240, 0, 240
db 0, 0, 240
; repeat first color at the bottom for convenience later
%endif
;===== Section 5: Main Procedure ==============================================
SECTION .text
_main:
call _LibInit
test eax, eax
jnz near .mainExit
; Allocate memory to store three buffers
; (plus a byte at the end so we can write 32 bits to the last pixel)
invoke _AllocMem, dword SCREEN_WIDTH * SCREEN_HEIGHT * 9 + 1
cmp eax, -1
je near .mainExit
mov [_frontBuffer], eax
add eax, SCREEN_WIDTH * SCREEN_HEIGHT * 3
mov [_backBuffer], eax
add eax, SCREEN_WIDTH * SCREEN_HEIGHT * 3
mov [_blendedBuffer], eax
; Graphics Init
invoke _InitGraphics, dword _kbINT, dword _kbIRQ, dword _kbPort
test eax, eax
jnz near .graphicserror
; Find Graphics Mode: SCREEN_WIDTH x SCREEN_HEIGHT x 24
invoke _FindGraphicsMode, word SCREEN_WIDTH, word SCREEN_HEIGHT, word 24, dword 1
cmp ax, -1
je near .graphicserror
mov [_graphicsMode], ax
invoke _SetGraphicsMode, word [_graphicsMode]
test eax, eax
jnz near .graphicserror
call _InstallMouse
; Use the colors defined at _colors to make a color map
invoke _FillColorMap, dword _colors, dword _colorMap
.executionLoop
; generate fractal in _frontBuffer then fade to it from _back
; oddly, this works much more quickly than switching the pointers
invoke _DoFrame, dword[_frontBuffer], dword[_backBuffer]
test eax, eax
jnz .done
; generate fractal in _backBuffer then fade to it from _front
invoke _DoFrame, dword[_backBuffer], dword[_frontBuffer]
test eax, eax
jnz .done
jmp .executionLoop
.done
.mouseerror
call _RemoveMouse
call _UnsetGraphicsMode
.graphicserror
call _ExitGraphics
.mainExit
call _LibExit
call MP4LibExit
ret
; DoFrame(frontBuffer, backBuffer)
; PURPOSE: Generates a fractal in frontBuffer and fades it into view
; ARGS: frontBuffer: A pointer to the 24-bit buffer to hold the fractal
; backBuffer: A pointer to the 24-bit buffer holding the currently
; displayed image
; INPUTS: _mouseState: Information about what buttons have been pressed
; OUTPUTS: _blendedBuffer: The blended image is written here
; _ScreenOffset: This buffer implicitly gets updated when we _CopyToScreen
; CALLS: _Generate
; _Blend
; _CopyToScreen
; _Zoom
proc _DoFrame
.frontBuffer arg 4
.backBuffer arg 4
mov esi, [ebp + .frontBuffer]
mov edi, [ebp + .backBuffer]
pusha
invoke _Generate, esi
popa
mov ecx, FADE_SPEED
.fadeLoop
pusha
invoke _Blend, esi, edi, dword[_blendedBuffer], ecx
invoke _CopyToScreen, dword[_blendedBuffer], dword SCREEN_WIDTH * 3, dword 0, dword 0, dword SCREEN_WIDTH, dword SCREEN_HEIGHT, dword 0, dword 0
popa
add ecx, FADE_SPEED
cmp ecx, 256
jb .fadeLoop
invoke _CopyToScreen, dword esi, dword SCREEN_WIDTH * 3, dword 0, dword 0, dword SCREEN_WIDTH, dword SCREEN_HEIGHT, dword 0, dword 0
; finish by drawing the new fractal
.spinLoop
cmp byte [_mouseState], 0
je .spinLoop
; wait for mouse input
test byte [_mouseState], MIDDLE_BUTTON
jnz near .quit
test byte [_mouseState], LEFT_BUTTON
jz .notLeft
invoke _Zoom, 1
.notLeft
test byte [_mouseState], RIGHT_BUTTON
jz .notRight
invoke _Zoom, 0
.notRight
mov byte [_mouseState], 0
test eax, eax
je .spinLoop
; if the Zoom didn't do anything (because we couldn't zoom out any further)
mov eax, 0
ret
.quit
mov eax, 1
ret
endproc
_DoFrame_arglen equ 8
; Zoom(zoomIn)
; PURPOSE: Zooms the view in by a factor of two, centering on the last position
; that was clicked by the user
; ARGS: zoomIn: Nonzero if we should zoom in, zero if we should zoom out
; INPUTS: _minX, _maxX, minY, _maxY: Give the initial zoom state
; RETURNS: (eax): Nonzero if zooming was performed
; OUTPUTS: _minX, _maxX, minY, _maxY: Modified to give a view half or twice
; the size of the previous and centered on the last mouse click
; _timesZoomed: Global that keeps track of zoom depth in order to determine
; an appropriate iteration limit in _SetLimit
; CALLS: Nothing
; USES: Floating Point (64-bit)
proc _Zoom
.zoomIn arg 1
cmp byte [ebp + .zoomIn], 0
jne .safe
cmp byte [_timesZoomed], 0
jne .safe
mov eax, 0
ret
; disallow zooming out beyond initial view (would result in _iterationLimit of 0)
.safe
finit
fld qword [_minX]
mov dword [fildPass], SCREEN_WIDTH
fild dword [fildPass]
movzx esi, word [_mouseX]
mov dword [fildPass], esi
fild dword [fildPass]
fld qword [_maxX]
fld qword [_minX]
fsubp st1
fmulp st1
fdivrp st1
faddp st1
; push minX + (maxX - minX) * mouseX / SCREEN_WIDTH (new center)
fld qword [_maxX]
fld qword [_minX]
fsubp st1
; push window width
cmp byte [ebp + .zoomIn], 0
je .doneScaleX
fld1
fadd st0
fadd st0
fdivp st1
.doneScaleX
fld st0
; if we didn't jump, push width twice (giving total of current width * 2)
; otherwise pop window width, push quarterwidth two times (total 1/2 width)
fadd st2
fstp qword [_maxX]
fsubp st1
fstp qword [_minX]
; write center + quarterwidth to _maxX, center - quarterwidth to _minX
fld qword [_minY]
mov dword [fildPass], SCREEN_HEIGHT
fild dword [fildPass]
movzx esi, word [_mouseY]
mov dword [fildPass], esi
fild dword [fildPass]
fld qword [_maxY]
fld qword [_minY]
fsubp st1
fmulp st1
fdivrp st1
faddp st1
; push minY + (maxY - minY) * mouseY / SCREEN_HEIGHT (new center)
fld qword [_maxY]
fld qword [_minY]
fsubp st1
; push window height
cmp byte [ebp + .zoomIn], 0
je .doneScaleY
fld1
fadd st0
fadd st0
fdivp st1
.doneScaleY
fld st0
; if we didn't jump, push height twice (giving total of current height * 2)
; otherwise pop window height, push quarterheight two times (total 1/2 height)
fadd st2
fstp qword [_maxY]
fsubp st1
fstp qword [_minY]
; write center + quarterheight to _maxY, center - quarterheight to _minY
cmp byte [ebp + .zoomIn], 0
je .decrement
inc byte [_timesZoomed]
jmp .done
.decrement
dec byte [_timesZoomed]
; update _timesZoomed depending on which way we went
.done
mov eax, 1
ret
endproc
_Zoom_arglen equ 1
;--------------------------------------------------------------
;-- Replace Library Calls with your Code! --
;--You should only need to replace the 'call'/'invoke' lines --
;-- Do not forget to add Function Headers --
;--------------------------------------------------------------
;--------------------------------------------------------------
;-- _Blend() --
;--------------------------------------------------------------
proc _Blend
.srcTop arg 4
.srcBot arg 4
.dst arg 4
.alpha arg 4
invoke _libBlend, dword [ebp + .srcTop], dword [ebp + .srcBot], dword [ebp + .dst], dword [ebp + .alpha]
ret
endproc
_Blend_arglen equ 16
;--------------------------------------------------------------
;-- _Generate() --
;--------------------------------------------------------------
proc _Generate
.dst arg 4
invoke _libGenerate, dword [ebp + .dst]
ret
endproc
_Generate_arglen equ 4
;--------------------------------------------------------------
;-- _SetLimit() --
;--------------------------------------------------------------
_SetLimit
call _libSetLimit
ret
;--------------------------------------------------------------
;-- _IteratePixel --
;--------------------------------------------------------------
proc _IteratePixel
.pos arg 4
invoke _libIteratePixel, dword [ebp + .pos]
ret
endproc
_IteratePixel_arglen equ 4
;--------------------------------------------------------------
;-- _FillColorMap() --
;--------------------------------------------------------------
proc _FillColorMap
.colors arg 4
.cmap arg 4
invoke _libFillColorMap, dword [ebp + .colors], dword [ebp + .cmap]
ret
endproc
_FillColorMap_arglen equ 8
;--------------------------------------------------------------
;-- _InstallMouse() --
;--------------------------------------------------------------
_InstallMouse
call _libInstallMouse
ret
;--------------------------------------------------------------
;-- _RemoveMouse() --
;--------------------------------------------------------------
_RemoveMouse
call _libRemoveMouse
ret
;--------------------------------------------------------------
;-- _MouseCallback() --
;--------------------------------------------------------------
proc _MouseCallback
.DPMIRegsPtr arg 4
invoke _libMouseCallback, dword [ebp + .DPMIRegsPtr]
ret
endproc
_MouseCallback_end
_MouseCallback_arglen EQU 4
|