Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VGA mode in libc #2184

Open
toncho11 opened this issue Jan 14, 2025 · 42 comments
Open

VGA mode in libc #2184

toncho11 opened this issue Jan 14, 2025 · 42 comments
Labels
enhancement New feature

Comments

@toncho11
Copy link
Contributor

toncho11 commented Jan 14, 2025

@ghaerr

Is it possible to add the inline asm code from:
https://github.com/ghaerr/elks/wiki/Coding-games-for-ELKS#2-using-direct-access-to-the-vga-memory that initializes the VGA mode to libc?

It could be a header "vgalib.h" with 1 function and several defines:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define VGA_256_COLOR_MODE  0x13      /* use to set 256-color mode. */
#define TEXT_MODE           0x03      /* use to set 80x25 text mode. */

typedef unsigned char  byte;

byte __far *VGA=(byte __far *)0xA0000000L;

void set_mode(byte mode);

This could be used by the new 8086 tool-chain, elksdoom developed by @FrenkelS and our basic interpreter for example. It will give common ground for future VGA development. The VGA will be set differently if needed - for PC98 for example, but will not require changing the program that uses the header.

@toncho11 toncho11 added the bug Defect in the product label Jan 14, 2025
@toncho11 toncho11 changed the title VGA mode switch VGA mode in libc Jan 14, 2025
@ghaerr ghaerr added enhancement New feature and removed bug Defect in the product labels Jan 15, 2025
@ghaerr
Copy link
Owner

ghaerr commented Jan 15, 2025

Is it possible to add the inline asm code from:

It is possible I suppose, but I don't think it necessarily a good idea, for the following reasons:

  • The code doesn't really do much, and is only compatible with VGA (not EGA or CGA or PC-98). Very many ELKS systems don't have a VGA adaptor, and this code doesn't manage between the display types. The actual drawing capability is limited to drawing single points, no line, circle, rectangle, fill or other basic graphics functions.
  • Doom doesn't actually use this code; it uses a much larger assembly-only source file for implementing extremely fast screen access.
  • Our Basic interpreter uses separate graphics functions for IBM PC and PC-98; vgalib.h won't help with an compatibility for what is expected in a library routine, which would be expected to work for all systems. Also, both our IBM and PC-98 graphics routines, although considerably more complicated than this, are frankly too slow for anything usable in but the most graphics or games.
  • Plotting a pixel is an extremely basic operation, much more is needed. For this we already have Nano-X, which seamlessly handles much more complex graphics. I would suggest that we push for programmers to use Nano-X for building graphical applications for ELKS, rather than starting from scratch rebuilding (slowly) what Nano-X already does. We could add Nano-X capability into libc, for instance, if this sort of thing mattered in libc.

In summary, Vgalib.h can't be used by our new toolchain, as it doesn't support __far pointers, and this code can't be used in our Basic interpreter nor is it usable in Doom.

Further discussion points might be: what real usefulness does adding a super-simple vgalib.h add? What types of actual programs would be written, and why would we want them to use vgalib.h rather than Nano-X? For instance, ported games will always use their already-existing routines written for speed, which is very necessary on legacy non-framebuffer systems. New graphical applications should probably be pushed into the direction of Nano-X, especially with the new work allowing for multiple graphical applications to run concurrently.

@toncho11
Copy link
Contributor Author

The idea of this post was actually to only switch to graphical mode and back. It was not to create an entire graphical library. So it could be called "vgamode.h" instead.

The target of this post is this code:

void set_mode(byte mode)
{
   // SI, DI, BP, ES and probably DS are to be saved
   // cli and sti are used to make a proper BIOS call from ELKS
   __asm__(
  "push %%si;"
  "push %%di;"
  "push %%bp;"
  "push %%es;"
  "cli;"
  "mov %%ah,0;" 
  "mov %%al,%0;" 
  "int $0x10;"
  "sti;"
  "pop %%es;"
  "pop %%bp;"
  "pop %%di;"
  "pop %%si;"
     : /* no outputs */
     : "r" (mode)
     : ); //list of modified registers
}

that I remember providing to Doom.

So what I wanted is to play with VGA with the new 8086-toolchian. But now I need the above assembly code to do it. So how do I do that? I suppose an assembler .as file can be created that I can link against. That we will leave only the problem with (byte __far *) pointer? Or maybe there is a simpler way to access the video memory compatible with the 8086-toolchain?
So let's focus on how to add VGA access to the toolchain. If you think that porting Nano-X to the 8086-toolchain is better, then it is OK for me. But it looks like a big project.

@ghaerr
Copy link
Owner

ghaerr commented Jan 15, 2025

The idea of this post was actually to only switch to graphical mode and back

Ok. Lets start with the C86 toolchain, since that's what you're interested in and each compiler would need a special implementation anyways. C86 has an asm directive, but I've never tested or used it. Nonetheless, switching modes is the simple part; accessing video ram at A000 will likely require a function call. C86 supposedly supports inline functions, but they were discussed as possibly buggy by BYU. The hard part will be making this anywhere near fast enough to be useful for other than experimentation.

I'll look further into this and respond with something you can play with, rather than adding immediately to libc, thank you!

@rafael2k
Copy link
Contributor

If you implement this function, I can test in the image viewer I plan to develop.

@ghaerr
Copy link
Owner

ghaerr commented Jan 17, 2025

Here's the set_mode function written for our C86 toolchain:

void set_mode(int mode)
{
    asm(
        "push   si\n"
        "push   di\n"
        "push   ds\n"
        "push   es\n"
        "mov    ax,[bp+4]\n"    /* AH=0, AL=mode */
        "int    0x10\n"
        "pop    es\n"
        "pop    ds\n"
        "pop    di\n"
        "pop    si\n"
    );
}

I can write a function drawpixel(int offset, int val) to write a pixel at segment A000, but for anyone adventurous, you might try it yourself. The function would be coded like above, but instead of saving all SI/DI/DS/ES, it would just save DS and BX, load DS with A000, then grab the video RAM passed to the function at [bp+4] into BX, load the pixel value at [bp+6] into AL, then write it using "mov [bx],al", then restore BX and DS.

@toncho11
Copy link
Contributor Author

You mean the drawpixel function should be in assembler? I am definitely not an assembler guy. The idea is have only 1 function set_mode in assembler and have all the drawing routines in C with some kind of pointer to the video memory as in:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define VGA_256_COLOR_MODE  0x13      /* use to set 256-color mode. */
#define TEXT_MODE           0x03      /* use to set 80x25 text mode. */

#define SCREEN_WIDTH        320       /* width in pixels of mode 0x13 */
#define SCREEN_HEIGHT       200       /* height in pixels of mode 0x13 */
#define NUM_COLORS          256       /* number of colors in mode 0x13 */

#define sgn(x) ((x<0)?-1:((x>0)?1:0)) /* macro to return the sign of a
                                         number */
typedef unsigned char  byte;
typedef unsigned short word;

byte __far *VGA=(byte __far *)0xA0000000L;        /* this points to video VGA memory. */

/**************************************************************************
 *  set_mode                                                              *
 *     Sets the video mode.                                               *
 **************************************************************************/

void set_mode(byte mode)
{
   // SI, DI, BP, ES and probably DS are to be saved
   // cli and sti are used to make a proper BIOS call from ELKS
   __asm__(
  "push %%si;"
  "push %%di;"
  "push %%bp;"
  "push %%es;"
  "cli;"
  "mov %%ah,0;" 
  "mov %%al,%0;" 
  "int $0x10;"
  "sti;"
  "pop %%es;"
  "pop %%bp;"
  "pop %%di;"
  "pop %%si;"
     : /* no outputs */
     : "r" (mode)
     : ); //list of modified registers
}

/**************************************************************************
 *  plot_pixel                                                            *
 *    Plot a pixel by directly writing to video memory, with no           *
 *    multiplication.                                                     *
 **************************************************************************/

void plot_pixel(int x,int y,byte color)
{
  /*  y*320 = y*256 + y*64 = y*2^8 + y*2^6   */
  VGA[(y<<8)+(y<<6)+x]=color;
}

int main()
{
  set_mode(VGA_256_COLOR_MODE);       /* set the video mode to 256 colors 320 x 200 */
								 
  for (int i=0;i<60;i++)
  	plot_pixel(100+i,100,5);

  for (int i=0;i<60;i++)
        plot_pixel(100,100+i,0xA);

  sleep(3);
  
  set_mode(TEXT_MODE);                /* set the video mode back to text mode. */

  return 0;
}

@ghaerr
Copy link
Owner

ghaerr commented Jan 17, 2025

The idea is have only 1 function set_mode in assembler and have all the drawing routines in C with some kind of pointer to the video memory

As I mentioned above, the toolchain C86 compiler doesn't support __far pointers, so we can't use some kind of pointer to video memory in C. All low-level drawing routines will have to be implemented using a function call written in assembler.

@rafael2k
Copy link
Contributor

The idea is have only 1 function set_mode in assembler and have all the drawing routines in C with some kind of pointer to the video memory

As I mentioned above, the toolchain C86 compiler doesn't support __far pointers, so we can't use some kind of pointer to video memory in C. All low-level drawing routines will have to be implemented using a function call written in assembler.

May be a draw() function with the draw buffer as input would work (calling the assembly written function to write to video memory), instead of plot_pixel()?

@ghaerr
Copy link
Owner

ghaerr commented Jan 17, 2025

Here's your function; you'll have to convert y,x to an offset into video ram then call write_vid(offset, value):

/* write byte val at video RAM at A000:offset */
void writevid(unsigned int offset, unsigned int val)
{
    asm(
        "push   ds\n"
        "push   bx\n"
        "mov    ax,#0xA000\n"
        "mov    ds,ax\n"
        "mov    bx,[bp+4]\n"    /* offset */
        "mov    al,[bp+6]\n"    /* val */
        "mov    [bx],al\n"
        "pop    bx\n"
        "pop    ds\n"
    );
}

I am writing this to show that one can't just assume that a single C library function can be used to implement graphics, it is always more complicated than that. Each of our compilers has to use incompatible non-standard extensions in order to accomplish even basic graphics. This in turn affects the speed and design of the application. Feel free to play with these two functions with our new toolchain and see what you can do with them.

@toncho11
Copy link
Contributor Author

Thank you @ghaerr !

@toncho11
Copy link
Contributor Author

I think I got everything. I will have:

void writevid(unsigned int offset, unsigned int val)
{
    asm(
        "push   ds\n"
        "push   bx\n"
        "mov    ax,#0xA000\n"
        "mov    ds,ax\n"
        "mov    bx,[bp+4]\n"    /* offset */
        "mov    al,[bp+6]\n"    /* val */
        "mov    [bx],al\n"
        "pop    bx\n"
        "pop    ds\n"
    );
}

void plot_pixel(int x,int y,byte color)
{
     /*  y*320 = y*256 + y*64 = y*2^8 + y*2^6   */
    int offset = (y<<8)+(y<<6)+x;
    writevid(offset, color)
}

@toncho11
Copy link
Contributor Author

toncho11 commented Jan 17, 2025

Below is the full source code I will test later.

/*#include <stdio.h>
#include <stdlib.h>*/
#include <unistd.h>

#define VGA_256_COLOR_MODE  0x13      /* use to set 256-color mode. */
#define TEXT_MODE           0x03      /* use to set 80x25 text mode. */

#define SCREEN_WIDTH        320       /* width in pixels of mode 0x13 */
#define SCREEN_HEIGHT       200       /* height in pixels of mode 0x13 */
#define NUM_COLORS          256       /* number of colors in mode 0x13 */

#define sgn(x) ((x<0)?-1:((x>0)?1:0)) /* macro to return the sign of a
                                         number */
typedef unsigned char  byte;

/**************************************************************************
 *  set_mode                                                              *
 *     Sets the video mode.                                               *
 **************************************************************************/

void set_mode(int mode)
{
    asm(
        "push   si\n"
        "push   di\n"
        "push   ds\n"
        "push   es\n"
        "mov    ax,[bp+4]\n"    /* AH=0, AL=mode */
        "int    0x10\n"
        "pop    es\n"
        "pop    ds\n"
        "pop    di\n"
        "pop    si\n"
    );
}

/**************************************************************************
 *  plot_pixel                                                            *
 *    Plot a pixel by directly writing to video memory, with no           *
 *    multiplication.                                                     *
 **************************************************************************/

void writevid(unsigned int offset, unsigned int val)
{
    asm(
        "push   ds\n"
        "push   bx\n"
        "mov    ax,#0xA000\n"
        "mov    ds,ax\n"
        "mov    bx,[bp+4]\n"    /* offset */
        "mov    al,[bp+6]\n"    /* val */
        "mov    [bx],al\n"
        "pop    bx\n"
        "pop    ds\n"
    );
}

void plot_pixel(int x,int y,byte color)
{
     /*  y*320 = y*256 + y*64 = y*2^8 + y*2^6   */
    int offset = (y<<8)+(y<<6)+x;
    writevid(offset, color)
}

int main()
{
  set_mode(VGA_256_COLOR_MODE);       /* set the video mode to 256 colors 320 x 200 */
								 
  for (int i=0;i<60;i++)
  	plot_pixel(100+i,100,5);

  for (int i=0;i<60;i++)
        plot_pixel(100,100+i,0xA);

  sleep(3);
  
  set_mode(TEXT_MODE);                /* set the video mode back to text mode. */

  return 0;
}

@toncho11
Copy link
Contributor Author

Actually this can be another demo for the 8086-toolchain, right? :)

@toncho11
Copy link
Contributor Author

toncho11 commented Jan 17, 2025

Sorry for all this posts. This should be the final version that does not use any headers. Maybe it is ready for a 8086 toolchain demo. Is volatile supported?

#define VGA_256_COLOR_MODE  0x13      /* use to set 256-color mode. */
#define TEXT_MODE           0x03      /* use to set 80x25 text mode. */

#define SCREEN_WIDTH        320       /* width in pixels of mode 0x13 */
#define SCREEN_HEIGHT       200       /* height in pixels of mode 0x13 */
#define NUM_COLORS          256       /* number of colors in mode 0x13 */

#define sgn(x) ((x<0)?-1:((x>0)?1:0)) /* macro to return the sign of a
                                         number */
typedef unsigned char  byte;

/* simple sleep that does not use any headers or asm */
void sleep(int seconds) {
    volatile unsigned long long count = 0;
    unsigned long long target = 1000000ULL * seconds; // Adjust as needed for the CPU speed
    
    while (count < target) {
        count++;
    }
}

/**************************************************************************
 *  set_mode                                                              *
 *     Sets the video mode.                                               *
 **************************************************************************/

void set_mode(int mode)
{
    asm(
        "push   si\n"
        "push   di\n"
        "push   ds\n"
        "push   es\n"
        "mov    ax,[bp+4]\n"    /* AH=0, AL=mode */
        "int    0x10\n"
        "pop    es\n"
        "pop    ds\n"
        "pop    di\n"
        "pop    si\n"
    );
}

/**************************************************************************
 *  plot_pixel                                                            *
 *    Plot a pixel by directly writing to video memory, with no           *
 *    multiplication.                                                     *
 **************************************************************************/

void writevid(unsigned int offset, unsigned int val)
{
    asm(
        "push   ds\n"
        "push   bx\n"
        "mov    ax,#0xA000\n"
        "mov    ds,ax\n"
        "mov    bx,[bp+4]\n"    /* offset */
        "mov    al,[bp+6]\n"    /* val */
        "mov    [bx],al\n"
        "pop    bx\n"
        "pop    ds\n"
    );
}

void plot_pixel(int x,int y,byte color)
{
     /*  y*320 = y*256 + y*64 = y*2^8 + y*2^6   */
    int offset = (y<<8)+(y<<6)+x;
    writevid(offset, color)
}

int main()
{
  set_mode(VGA_256_COLOR_MODE);       /* set the video mode to 256 colors 320 x 200 */
								 
  for (int i=0;i<60;i++)
  	plot_pixel(100+i,100,5);

  for (int i=0;i<60;i++)
        plot_pixel(100,100+i,0xA);

  sleep(3);
  
  set_mode(TEXT_MODE);                /* set the video mode back to text mode. */

  return 0;
}

@ghaerr
Copy link
Owner

ghaerr commented Jan 17, 2025

Maybe it is ready for a 8086 toolchain demo.

Does it even compile? long long is not supported, but I see to remember it accepts it then generates code using long instead.

You're better off using the C library routine sleep than trying to reinvent it with very inaccurate busy looping. You don't need unistd.h, although you'd want to use it should you use this as an example in toolchain. In order to test without copying unistd.h, you can always just declare it in the same way as done in unistd.h:

unsigned int sleep(unsigned int seconds);

Volatile is not supported. In general, C86 won't optimize any loops so the result would be the same without it.

typedef unsigned char byte;

Since writevid is declared using unsigned int val, it's better to pass it that (an int), rather than a char (byte). In most cases, declaring a char parameter to a function ends up generating more and slower code anyways, as the value passed has to be either stripped or sign extended to int.

@toncho11
Copy link
Contributor Author

toncho11 commented Jan 18, 2025

Thanks! It works! Tested in emulator. Here is the image:
elks_toolchain_vga1.zip
I replaced test.c with the code below. Try it with ./test

#define VGA_256_COLOR_MODE  0x13      /* use to set 256-color mode. */
#define TEXT_MODE           0x03      /* use to set 80x25 text mode. */

#define SCREEN_WIDTH        320       /* width in pixels of mode 0x13 */
#define SCREEN_HEIGHT       200       /* height in pixels of mode 0x13 */
#define NUM_COLORS          256       /* number of colors in mode 0x13 */

unsigned int sleep(unsigned int seconds);

/**************************************************************************
 *  set_mode                                                              *
 *     Sets the video mode.                                               *
 **************************************************************************/

void set_mode(int mode)
{
    asm(
        "push   si\n"
        "push   di\n"
        "push   ds\n"
        "push   es\n"
        "mov    ax,[bp+4]\n"    /* AH=0, AL=mode */
        "int    0x10\n"
        "pop    es\n"
        "pop    ds\n"
        "pop    di\n"
        "pop    si\n"
    );
}

/**************************************************************************
 *  plot_pixel                                                            *
 *    Plot a pixel by directly writing to video memory, with no           *
 *    multiplication.                                                     *
 **************************************************************************/

void writevid(unsigned int offset, unsigned int val)
{
    asm(
        "push   ds\n"
        "push   bx\n"
        "mov    ax,#0xA000\n"
        "mov    ds,ax\n"
        "mov    bx,[bp+4]\n"    /* offset */
        "mov    al,[bp+6]\n"    /* val */
        "mov    [bx],al\n"
        "pop    bx\n"
        "pop    ds\n"
    );
}

void plot_pixel(int x,int y,unsigned int color)
{
     /*  y*320 = y*256 + y*64 = y*2^8 + y*2^6   */
    int offset = (y<<8)+(y<<6)+x;
    writevid(offset, color);
}

int main()
{
  set_mode(VGA_256_COLOR_MODE);       /* set the video mode to 256 colors 320 x 200 */
								 
  for (int i=0;i<60;i++)
  	plot_pixel(100+i,100,5);

  for (int i=0;i<60;i++)
        plot_pixel(100,100+i,0xA);

  sleep(3);
  
  set_mode(TEXT_MODE);                /* set the video mode back to text mode. */

  return 0;
}

@toncho11
Copy link
Contributor Author

toncho11 commented Jan 18, 2025

Works well in both copy.sh and 86box emulator. I see two lines with two different colors. Drawing is faster than I thought.

@ghaerr
Copy link
Owner

ghaerr commented Jan 18, 2025

Works well in both copy.sh and 86box emulator.

Nice work! I'm especially happy since I was unable to test either ASM routine I wrote except for compiling :)

I see two lines with two different colors. Drawing is faster than I thought.

It's pretty hard to judge speed when only two lines are drawn. Try clearing the whole screen (320x200=64,000 pixels) to a certain color and see how long that takes. A clear screen or rectangle fill will give a much better idea of how fast/usable the graphics routine(s) are.

@toncho11
Copy link
Contributor Author

toncho11 commented Jan 18, 2025

Thank you again!
I played with it this morning. It is quite doable indeed with emulated 8Mhz CPU. I can edit the file with "edit" and recompile, execute.
In the future we will make it recompile and execute from an editor :)

@rafael2k
Copy link
Contributor

rafael2k commented Jan 20, 2025

I selected a jpeg decoder and will try to integrate with our "vga-lib":
https://github.com/rafael2k/elks-viewer

Btw, to use with ia16-gcc, I can just __far pointer to access the video memory, right? But for portability, I like the idea of using functions to library.
Thinking about how to ifdef the code at this point...

rafael2k added a commit to rafael2k/8086-toolchain that referenced this issue Jan 20, 2025
ghaerr pushed a commit to ghaerr/8086-toolchain that referenced this issue Jan 20, 2025
* added VGA example by @toncho11, as posted here
ghaerr/elks#2184

* added missing vga.c

* add vga example to main target in makefile for elks

* updated vga example to vgatest
@rafael2k
Copy link
Contributor

rafael2k commented Jan 24, 2025

I started the image viewer, and I'm still compiling with OW, but wanna change to gcc-ia16. Quick question - can I use the new memory functions with gcc-ia16? And use far pointers (to the VGA memory)?
https://github.com/rafael2k/elks-viewer

@ghaerr
Copy link
Owner

ghaerr commented Jan 24, 2025

can I use the new memory functions with gcc-ia16?

I assume you mean __fmalloc, the single-arena allocator? Yes you can and it should work. However, it is untested since there's really no need for it in ia16-elf-gcc, since it still uses only a single 64k (which is already available in ia16-elf-gcc's small model) and returns a far pointer (which usually complicates issues for small model programs). If you need a very large segment of memory, you can use fmemalloc directly.

And use far pointers (to the VGA memory)?

Yes, ia16-elf-gcc supports far pointers, so it'll be easier to access VGA using a pointer dereference, rather than a function call. @toncho11's original VGA sample posted above uses far pointers and was built for ia16-elf-gcc.

@rafael2k
Copy link
Contributor

Yay. The beginning of the viewer starts to come to light!

Image

The routine for color conversion is very slow yet...

@ghaerr
Copy link
Owner

ghaerr commented Jan 25, 2025

Wow!!!! Very impressive coding indeed! You're really on a roll here! :)

I have some fast blending routines that don't require a divide by 255, I'll post them for your consideration.

@rafael2k
Copy link
Contributor

rafael2k commented Jan 25, 2025

Wow!!!! Very impressive coding indeed! You're really on a roll here! :)

I have some fast blending routines that don't require a divide by 255, I'll post them for your consideration.

: )
I'll appreciate. I'm also thinking about loading of custom pallets, so we could display for eg. high quality grayscale images, if I'm getting the theory of loading custom pallets right. BMP for example can embedded the pallet, which can be very useful. If we ever have a bare simple vga lib, functions for retrieval and loading the color pallet would be cool, for all gfx modes.

ps: after cleaning up the mess in the code, and porting to gcc-ia16, could such software go to upstream elkscmd?

@ghaerr
Copy link
Owner

ghaerr commented Jan 25, 2025

after cleaning up the mess in the code, and porting to gcc-ia16, could such software go to upstream elkscmd?

Heck yeah!!! Actually I really like what you've done in main.c, very nicely coded. It'd be great to have that in ELKS. Of course, at first the image viewer would only run on VGA, not EGA or CGA. CGA image support probably isn't worth doing, but ultimately an EGA "driver" (see Nano-X for that) might be nice. Unfortunately, that complicates code by quite a bit, so starting with just VGA is fine with me.

Regarding speeding up the drawing, I have several ideas that should greatly help. First get rid of the "/ 255" divides. There's whole books written on how to speed up blending and such, I'm trying to find specific samples that best match your code. In particular, you can try replacing code like:

		   int red = (pixel[0] * 8) / 255;
		   int green = (pixel[1] * 8) / 255;
		   int blue = (pixel[2] * 4) / 255;

with a somewhat accurate

		   int red = ((pixel[0]+1) * 8) >> 8;
		   int green = ((pixel[1]+1) * 8) >> 8;
		   int blue = ((pixel[2]+1) * 4) >> 8;

or less accurate but still usable and quite fast:

		   int red = (pixel[0] * 8) >> 8;
		   int green = (pixel[1] * 8) >> 8;
		   int blue = (pixel[2] * 4) >> 8;

I am trying to find a copy of "Jim Blinn's Corner" graphics books where he gets into integer math blending with no division, but can't find it yet. Also see https://stackoverflow.com/questions/78481241/fast-alpha-blending-cpu-only.

Another big improvement is to calculate all channels at once. This is usually best when running 24 or 32 bit RGBA. Take a look at the framebuffer driver I wrote for ChrysaLisp in function blit_blend():

/* premultiplied alpha blend or color mod blit, no clipping */
static void blit_blend(Drawable *ts, const Rect *srect, Drawable *td, const Rect *drect)
{
    //unassert(srect->w == drect->w);   //FIXME check why src width can != dst width
    /* src and dst height can differ, will use dst height for drawing */
    pixel_t *dst = (pixel_t *)(td->pixels + drect->y * td->pitch + drect->x * td->bytespp);
    pixel_t *src = (pixel_t *)(ts->pixels + srect->y * ts->pitch + srect->x * ts->bytespp);
    int span = drect->w * td->bytespp;
    int dspan = td->pitch - span;
    int sspan = ts->pitch - span;
    int y = drect->h;
    do {
        int x = drect->w;
        do {
            pixel_t sa = *src++;
            if (sa > 0x00ffffff) {
                if (ts->color == 0xffffff) {        /* premul blend from source */
                    if (sa < 0xff000000) {
                        pixel_t drb = *dst;
                        pixel_t dg = drb & 0x00ff00;
                               drb = drb & 0xff00ff;
                        pixel_t da = 0xff - (sa >> 24);
                        drb = ((drb * da >> 8) & 0xff00ff) + (sa & 0xff00ff);
                        dg =   ((dg * da >> 8) & 0x00ff00) + (sa & 0x00ff00);
                        *dst = drb + dg;
                     } else {                       /* source copy */
                        *dst = sa & 0xffffff;
                     }
                } else {                            /* color mod blend (glyphs) */
                    pixel_t sr = sa & 0xff0000;
                    pixel_t sg = sa & 0x00ff00;
                    pixel_t sb = sa & 0x0000ff;
                    sr = (sr * ts->r >> 8) & 0xff0000;
                    sg = (sg * ts->g >> 8) & 0x00ff00;
                    sb =  sb * ts->b >> 8;
                    if (sa < 0xff000000) {
                        pixel_t da = 0xff - (sa >> 24);
                        pixel_t drb = *dst;
                        pixel_t dg = drb & 0x00ff00;
                               drb = drb & 0xff00ff;
                        drb = ((drb * da >> 8) & 0xff00ff) + sr + sb;
                        dg =   ((dg * da >> 8) & 0x00ff00) + sg;
                        *dst = drb + dg;
                    } else {
                        *dst = sr + sg + sb;
                    }
                }
            }
            dst++;
        } while (--x > 0);
        dst = (pixel_t *)((uint8_t *)dst + dspan);
        src = (pixel_t *)((uint8_t *)src + sspan);
    } while (--y > 0);
}

The above is complicated by trying to quickly handle cases where alpha=0 (no drawing) and alpha=255 (no blending). It demonstrates using >> 8 instead of / 255 and also calculating all channels at once for max speed.

Finally, I would recommend you take the switch() statement out of your inner loops in main.c, and possibly write separate routines or loops for RGB vs BGR. Doing any excess add/subtract/multiplies or switches within a tight main blit loop will slow things down quite a bit. plot_pixel can be made an OWC inline function using a #pragma like you have with mode3() etc.

No need to switch just yet to ia16-elf-gcc - I would continue to get the best speed you can using the OWC extensions and then we can worry about how to get it into the ELKS tree. At some point, we probably need to run the OWC compiler to build ELKS, at least for an optional "2nd pass".

Nice work!

@ghaerr
Copy link
Owner

ghaerr commented Jan 25, 2025

Hello @rafael2k,

Here's a great paper on the subject of fast blending without divisions. The formula I was looking for is on the top of Page 9, "Jim Blinn's best blending for 8-bit ARGB":

#define INT_MULT(a,b,t) ((t) = (a)*(b)+0x80, ((((t)>>8)+(t))>>8))

This mixes an 8-bit color A with B and returns an 8-bit T. My comment above around a "somewhat accurate" blend adds 1 rather than 1/2 (0x80 in some cases) so its not quite correct. When I was trying to write the super-fast frambuffer driver for ChrysaLisp on Raspberry Pi, I went so far as to write C test routines to see exactly what various formulas spit out as results. I've attached them here which might help see how all this relates to your cool image viewer code (see test.c):
fast_blend_test.zip

In the paper, the FbByteMul macro applies fast blending for all channels at once. If you're not fully blending, but have alpha=255 as when directly drawing on the screen, things can get simpler, which are sometimes handled in the final blit output routine. You'll likely have to play around a bit in order to find the fastest method for our very slow CPUs and displays. If you keep the final decoded image in RAM (far or near), we can probably grab a low-level blit from ELKS nano-X to output that as quickly as possible to EGA screens for later.

If you want some GCC-compatible fast output routines/macros, the fast output inline macro convblit_8888 in Microwindows is worth looking at; it takes advantage of the GCC ability to optimize out constants when used as macros inside a separate blit function, and also has some cool orientation swaps (left & right portrait modes and upside down). I'm pretty sure our ia16-elf-gcc will be able to handle the constant optimizations, I'm not sure about OWC.

@rafael2k
Copy link
Contributor

Thanks @ghaerr!

I managed at least fix the color conversion to default vga pallete, but it is still slow, as I'm doing an almost exhaustive search for the nearest color in the rgb vga pallette:
https://github.com/rafael2k/elks-viewer/blob/c28890079ec2fc13c09763dc55cb6cd1467b5778/ppmview.c#L361

@toncho11
Copy link
Contributor Author

I suppose some buffering can help. Size 20kb?
You can do rgb2vga for a 20kb of pixels (it does the search for many pixels at the same time) and then write the 20kb buffer to VGA with something like plot_buffer().

@rafael2k
Copy link
Contributor

Yeap, I'll do some buffering, most likely trying line by line first.
I found a nice dithering implementation for 4-bit color, so definitely CGA and EGA will be supported.
https://github.com/rafael2k/elks-viewer/blob/master/dither16.c

@rafael2k
Copy link
Contributor

rafael2k commented Jan 27, 2025

@ghaerr, I'm look at nano-x, I think the palette functions were never added, right? Or I'm missing something.
For faster 24-bit RGB to VGA conversion (I can do with simple bit shifting), and even for 8-bit (4 and 1 too) bmp, it is faster if we load an optimized palette (or the bmp palette, in case of bmp <= 8-bit).

ps: now there is a viewbmp and viewppm, both working (to some extent) and viewjpg, where there is a bug somewhere, and the image does not look right yet.

@ghaerr
Copy link
Owner

ghaerr commented Jan 27, 2025

I think the palette functions were never added, right?

I haven't looked at the ELKS version (too busy at the moment) but the main Microwindows repo has full support for palette drawing.

For faster 24-bit RGB to VGA conversion (I can do with simple bit shifting), and even for 8-bit (4 and 1 too) bmp, it is faster if we load an optimized palette (or the bmp palette, in case of bmp <= 8-bit).

Yes, a predefined optimized palette is a good idea. Microwindows uses that, see src/engine/devpal*.c in the main repo.

And I realize now I got a bit overambitious talking about all the fast blending above, as you're actually doing screen output in palette mode. So most of that doesn't really apply (yet!!!) :) You will want to replace some of your internal / 255 routines with the integer versions using the Alvy Ray Smith paper I mentioned - lots faster.

And drawing into the VGA using palette mode is much quicker, since you're just writing 1 byte per pixel, right? Above regarding separate drivers for EGA and CGA I was thinking about the non-palette modes which require much more bit twiddling and shifting in order to work. But those only work for 16 colors so pretty useless for images, where 256 colors is much superior. Sorry for any confusion this may have caused!

now there is a viewbmp and viewppm, both working (to some extent) and viewjpg

Nice! This will very nice to have in ELKS!

@rafael2k
Copy link
Contributor

rafael2k commented Jan 28, 2025

I created some images with "MS Paint" and saved with 1, 4 and 8-bit to test the BMP reader I'm writing. It works! But now I need to implement the functions to change the palette, as the speed difference between loading a 8-bit BMP without pixel conversion to loading a 24-bit image with all nearest pixel math is brutal at this point.

I realize that the palette operations on CGA, EGA and VGA are a little different, right?
I'm using this as reference:
https://www.chibialiens.com/8086/platform.php?noui=1

If we have these palette set functions somewhere, please lemme know.

@ghaerr
Copy link
Owner

ghaerr commented Jan 28, 2025

If we have these palette set functions somewhere, please lemme know.

I looked around at both ELKS and main Nano-X and it seems that palette modes were never supported on older EGA/VGA hardware running in real mode - all the functions are for protected mode access to more modern and much faster graphics cards, those that can be set into a framebuffer mode and accessed directly. These cards were supported under Linux, but the kernel itself put the video hardware into and out of graphics mode.

The likely reasons these were never supported is that 1) the older original EGA and VGA cards are way too slow for reasonable graphics (other than just displaying an image, but no window system stuff), and 2) in real mode, one can't even access the video memory for many graphics cards as they consist of 256K RAM etc, with segments limited to 64K. That's why the EGA and VGA ended up using "banked/switched" planes of bits - a huge complication, adding even more slowdown to the process.

I realize that the palette operations on CGA, EGA and VGA are a little different, right?

I'm not completely sure, its been so long. It seems that you can use the BIOS INT 10h AH=10h function Get/Set Palette Registers to portably set a uniform palette for what you're doing. What is happening now, do these cards initialize with a standard palette that seems usable, or do you need a specialized palette for your images? (I realize that each image does best with a custom palette but an equidistant palette mapped to 24-bit RGB might look OK?)

@rafael2k
Copy link
Contributor

rafael2k commented Jan 28, 2025

Indeed. But I start to like the int 0x10 facilities. They are pretty handy, and the planar mapping to 0xA000 in VGA make video memory access easy. We'll need to have a equidistant palette mapped to 24-bit RGB for jpg, ppm and bmp >= 16 bpp. But for BMP with bpp <= 8, which contains the palette, it is optimal to just call int 0x10 ah=0x10 function to set the appropriate palette and voila, the image loading is basically a memory copy.
: )

ps: With more time close to the end of the week I'll add the plot_pixel for different modes and palette manipulation functions to graphics.c

@rafael2k
Copy link
Contributor

rafael2k commented Jan 29, 2025

Questions a bit off-topic but related to graphics development - if I can use small memory model it is better/faster than large model, right? If I want to compile with OW for small model, it is just a matter of pointing to the small model compiled libc, right?

I might port the basic assembly code of elks-viewer to compile with our 3 toolchains assembly formats (OW, gcc-ia16 and C86).

Which C86 which memory model are we using? Small means DS=CS or this is the tiny?

@ghaerr
Copy link
Owner

ghaerr commented Jan 29, 2025

Small model, with a single code and single but separate data segment, is faster than large model in most cases. We don't support tiny model (DS=CS) in the ELKS kernel, as programs are always loaded with separate code/data segments - we support shared code segments and fork requires a separate data segment.

I might port the basic assembly code of elks-viewer to compile with our 3 toolchains assembly formats (OW, gcc-ia16 and C86).

Be aware that the compilers use different function calling sequences - in particular OW uses a register based calling sequence, and all use different assembly language. Each compiler uses and saves registers in different ways. That said, it'd be a good exercise to port your viewer to each compiler!

C86 implements small model, and never accesses the DS or ES register explicitly, but issues DS=ES=SS.

@ghaerr
Copy link
Owner

ghaerr commented Jan 29, 2025

If I want to compile with OW for small model, it is just a matter of pointing to the small model compiled libc, right?

Yes. For now, you'll have to replace the MODEL= in libc/watcom.model and rebuild libc.lib. I have been planning to automate production of all four supported models producing libcl.lib, libcs.lib (large and small, etc). I will up the priority on getting that done so you don't have to do it.

@rafael2k
Copy link
Contributor

rafael2k commented Jan 29, 2025

Small model, with a single code and single but separate data segment, is faster than large model in most cases. We don't support tiny model (DS=CS) in the ELKS kernel, as programs are always loaded with separate code/data segments - we support shared code segments and fork requires a separate data segment.

I might port the basic assembly code of elks-viewer to compile with our 3 toolchains assembly formats (OW, gcc-ia16 and C86).

Be aware that the compilers use different function calling sequences - in particular OW uses a register based calling sequence, and all use different assembly language. Each compiler uses and saves registers in different ways. That said, it'd be a good exercise to port your viewer to each compiler!

C86 implements small model, and never accesses the DS or ES register explicitly, but issues DS=ES=SS.

Thanks @ghaerr.
When I try to do the ports I shout. I might hit some trouble in the stack vs register parameter passing to the assembly code. But I'm sure I'll find good examples on how to call assembly code in ELKS for gcc-ia16 on ELKS.

Btw, I implemented palette get and set functions too:
https://github.com/rafael2k/elks-viewer/blob/a25ef7280fbfa3ca783dd24af3f61af14dc219ec/graphics.c#L400

The 4 and 8-bit BMP load really fast and with correct colors now! Next is the custom palette to ease RGB 24-bit to 8-bit conversion. Then dithering (especially for 4-bit and 1-bit modes, mono or not).

@rafael2k
Copy link
Contributor

If I want to compile with OW for small model, it is just a matter of pointing to the small model compiled libc, right?

Yes. For now, you'll have to replace the MODEL= in libc/watcom.model and rebuild libc.lib. I have been planning to automate production of all four supported models producing libcl.lib, libcs.lib (large and small, etc). I will up the priority on getting that done so you don't have to do it.

Thanks. I imaged it was just that. I'll play with the small model a bit, see if all goes well at least with the bmp and ppm viewer. JPEG is another beast but I also think it will fit in small model too. It will be one of the very few jpg viewers (I found none, but I bet it exists) for realmode 8086. I've been looking for options for DOS, but they are all 386+.

@ghaerr
Copy link
Owner

ghaerr commented Jan 30, 2025

JPEG is another beast but I also think it will fit in small model too.

The decoder will probably fit but I'm wondering about image sizes - it's pretty easy to get over 64k image size. You might consider using compact model, which is small code large data. This reduces code size in the executable but still allows for far data. If you end up having to index into a much larger array (than 64k), OWC also supports "huge" model for pointers, separately.

@rafael2k
Copy link
Contributor

rafael2k commented Jan 30, 2025

But I'm studying the picojpeg decoding loop in order not to buffer the decoded image, as any small resolution * 3 will inevitably be bigger than 64k, so at first I need to remove the buffering in decoding, and write directly to video memory, as bmp and ppm reader, which buffers just one line, but in JPEG the buffer will be one macroblock, instead of the whole image, so I'm optimistic. Next task would be to remove the input buffering, which will be also not trivial.

And thanks for the memory model explanation - I'll try to target compact model, as I do need far pointers indeed.
ps: is the huge mode working with the libc (after recompile)? I see our malloc functions take a 16 size argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature
Projects
None yet
Development

No branches or pull requests

3 participants