-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VGA mode in libc #2184
Comments
It is possible I suppose, but I don't think it necessarily a good idea, for the following reasons:
In summary, Vgalib.h can't be used by our new toolchain, as it doesn't support __far pointers, and this code can't be used in our Basic interpreter nor is it usable in Doom. Further discussion points might be: what real usefulness does adding a super-simple vgalib.h add? What types of actual programs would be written, and why would we want them to use vgalib.h rather than Nano-X? For instance, ported games will always use their already-existing routines written for speed, which is very necessary on legacy non-framebuffer systems. New graphical applications should probably be pushed into the direction of Nano-X, especially with the new work allowing for multiple graphical applications to run concurrently. |
The idea of this post was actually to only switch to graphical mode and back. It was not to create an entire graphical library. So it could be called "vgamode.h" instead. The target of this post is this code:
that I remember providing to Doom. So what I wanted is to play with VGA with the new 8086-toolchian. But now I need the above assembly code to do it. So how do I do that? I suppose an assembler .as file can be created that I can link against. That we will leave only the problem with |
Ok. Lets start with the C86 toolchain, since that's what you're interested in and each compiler would need a special implementation anyways. C86 has an asm directive, but I've never tested or used it. Nonetheless, switching modes is the simple part; accessing video ram at A000 will likely require a function call. C86 supposedly supports inline functions, but they were discussed as possibly buggy by BYU. The hard part will be making this anywhere near fast enough to be useful for other than experimentation. I'll look further into this and respond with something you can play with, rather than adding immediately to libc, thank you! |
If you implement this function, I can test in the image viewer I plan to develop. |
Here's the set_mode function written for our C86 toolchain:
I can write a function |
You mean the drawpixel function should be in assembler? I am definitely not an assembler guy. The idea is have only 1 function set_mode in assembler and have all the drawing routines in C with some kind of pointer to the video memory as in:
|
As I mentioned above, the toolchain C86 compiler doesn't support __far pointers, so we can't use some kind of pointer to video memory in C. All low-level drawing routines will have to be implemented using a function call written in assembler. |
May be a draw() function with the draw buffer as input would work (calling the assembly written function to write to video memory), instead of plot_pixel()? |
Here's your function; you'll have to convert y,x to an offset into video ram then call write_vid(offset, value):
I am writing this to show that one can't just assume that a single C library function can be used to implement graphics, it is always more complicated than that. Each of our compilers has to use incompatible non-standard extensions in order to accomplish even basic graphics. This in turn affects the speed and design of the application. Feel free to play with these two functions with our new toolchain and see what you can do with them. |
Thank you @ghaerr ! |
I think I got everything. I will have:
|
Below is the full source code I will test later.
|
Actually this can be another demo for the 8086-toolchain, right? :) |
Sorry for all this posts. This should be the final version that does not use any headers. Maybe it is ready for a 8086 toolchain demo. Is volatile supported?
|
Does it even compile? You're better off using the C library routine
Volatile is not supported. In general, C86 won't optimize any loops so the result would be the same without it.
Since |
Thanks! It works! Tested in emulator. Here is the image:
|
Works well in both copy.sh and 86box emulator. I see two lines with two different colors. Drawing is faster than I thought. |
Nice work! I'm especially happy since I was unable to test either ASM routine I wrote except for compiling :)
It's pretty hard to judge speed when only two lines are drawn. Try clearing the whole screen (320x200=64,000 pixels) to a certain color and see how long that takes. A clear screen or rectangle fill will give a much better idea of how fast/usable the graphics routine(s) are. |
Thank you again! |
I selected a jpeg decoder and will try to integrate with our "vga-lib": Btw, to use with ia16-gcc, I can just __far pointer to access the video memory, right? But for portability, I like the idea of using functions to library. |
* added VGA example by @toncho11, as posted here ghaerr/elks#2184 * added missing vga.c * add vga example to main target in makefile for elks * updated vga example to vgatest
I started the image viewer, and I'm still compiling with OW, but wanna change to gcc-ia16. Quick question - can I use the new memory functions with gcc-ia16? And use far pointers (to the VGA memory)? |
I assume you mean __fmalloc, the single-arena allocator? Yes you can and it should work. However, it is untested since there's really no need for it in ia16-elf-gcc, since it still uses only a single 64k (which is already available in ia16-elf-gcc's small model) and returns a far pointer (which usually complicates issues for small model programs). If you need a very large segment of memory, you can use
Yes, ia16-elf-gcc supports far pointers, so it'll be easier to access VGA using a pointer dereference, rather than a function call. @toncho11's original VGA sample posted above uses far pointers and was built for ia16-elf-gcc. |
Wow!!!! Very impressive coding indeed! You're really on a roll here! :) I have some fast blending routines that don't require a divide by 255, I'll post them for your consideration. |
: ) ps: after cleaning up the mess in the code, and porting to gcc-ia16, could such software go to upstream elkscmd? |
Heck yeah!!! Actually I really like what you've done in main.c, very nicely coded. It'd be great to have that in ELKS. Of course, at first the image viewer would only run on VGA, not EGA or CGA. CGA image support probably isn't worth doing, but ultimately an EGA "driver" (see Nano-X for that) might be nice. Unfortunately, that complicates code by quite a bit, so starting with just VGA is fine with me. Regarding speeding up the drawing, I have several ideas that should greatly help. First get rid of the "/ 255" divides. There's whole books written on how to speed up blending and such, I'm trying to find specific samples that best match your code. In particular, you can try replacing code like:
with a somewhat accurate
or less accurate but still usable and quite fast:
I am trying to find a copy of "Jim Blinn's Corner" graphics books where he gets into integer math blending with no division, but can't find it yet. Also see https://stackoverflow.com/questions/78481241/fast-alpha-blending-cpu-only. Another big improvement is to calculate all channels at once. This is usually best when running 24 or 32 bit RGBA. Take a look at the framebuffer driver I wrote for ChrysaLisp in function blit_blend():
The above is complicated by trying to quickly handle cases where alpha=0 (no drawing) and alpha=255 (no blending). It demonstrates using >> 8 instead of / 255 and also calculating all channels at once for max speed. Finally, I would recommend you take the switch() statement out of your inner loops in main.c, and possibly write separate routines or loops for RGB vs BGR. Doing any excess add/subtract/multiplies or switches within a tight main blit loop will slow things down quite a bit. No need to switch just yet to ia16-elf-gcc - I would continue to get the best speed you can using the OWC extensions and then we can worry about how to get it into the ELKS tree. At some point, we probably need to run the OWC compiler to build ELKS, at least for an optional "2nd pass". Nice work! |
Hello @rafael2k, Here's a great paper on the subject of fast blending without divisions. The formula I was looking for is on the top of Page 9, "Jim Blinn's best blending for 8-bit ARGB":
This mixes an 8-bit color A with B and returns an 8-bit T. My comment above around a "somewhat accurate" blend adds 1 rather than 1/2 (0x80 in some cases) so its not quite correct. When I was trying to write the super-fast frambuffer driver for ChrysaLisp on Raspberry Pi, I went so far as to write C test routines to see exactly what various formulas spit out as results. I've attached them here which might help see how all this relates to your cool image viewer code (see test.c): In the paper, the If you want some GCC-compatible fast output routines/macros, the fast output inline macro convblit_8888 in Microwindows is worth looking at; it takes advantage of the GCC ability to optimize out constants when used as macros inside a separate blit function, and also has some cool orientation swaps (left & right portrait modes and upside down). I'm pretty sure our ia16-elf-gcc will be able to handle the constant optimizations, I'm not sure about OWC. |
Thanks @ghaerr! I managed at least fix the color conversion to default vga pallete, but it is still slow, as I'm doing an almost exhaustive search for the nearest color in the rgb vga pallette: |
I suppose some buffering can help. Size 20kb? |
Yeap, I'll do some buffering, most likely trying line by line first. |
@ghaerr, I'm look at nano-x, I think the palette functions were never added, right? Or I'm missing something. ps: now there is a viewbmp and viewppm, both working (to some extent) and viewjpg, where there is a bug somewhere, and the image does not look right yet. |
I haven't looked at the ELKS version (too busy at the moment) but the main Microwindows repo has full support for palette drawing.
Yes, a predefined optimized palette is a good idea. Microwindows uses that, see src/engine/devpal*.c in the main repo. And I realize now I got a bit overambitious talking about all the fast blending above, as you're actually doing screen output in palette mode. So most of that doesn't really apply (yet!!!) :) You will want to replace some of your internal / 255 routines with the integer versions using the Alvy Ray Smith paper I mentioned - lots faster. And drawing into the VGA using palette mode is much quicker, since you're just writing 1 byte per pixel, right? Above regarding separate drivers for EGA and CGA I was thinking about the non-palette modes which require much more bit twiddling and shifting in order to work. But those only work for 16 colors so pretty useless for images, where 256 colors is much superior. Sorry for any confusion this may have caused!
Nice! This will very nice to have in ELKS! |
I created some images with "MS Paint" and saved with 1, 4 and 8-bit to test the BMP reader I'm writing. It works! But now I need to implement the functions to change the palette, as the speed difference between loading a 8-bit BMP without pixel conversion to loading a 24-bit image with all nearest pixel math is brutal at this point. I realize that the palette operations on CGA, EGA and VGA are a little different, right? If we have these palette set functions somewhere, please lemme know. |
I looked around at both ELKS and main Nano-X and it seems that palette modes were never supported on older EGA/VGA hardware running in real mode - all the functions are for protected mode access to more modern and much faster graphics cards, those that can be set into a framebuffer mode and accessed directly. These cards were supported under Linux, but the kernel itself put the video hardware into and out of graphics mode. The likely reasons these were never supported is that 1) the older original EGA and VGA cards are way too slow for reasonable graphics (other than just displaying an image, but no window system stuff), and 2) in real mode, one can't even access the video memory for many graphics cards as they consist of 256K RAM etc, with segments limited to 64K. That's why the EGA and VGA ended up using "banked/switched" planes of bits - a huge complication, adding even more slowdown to the process.
I'm not completely sure, its been so long. It seems that you can use the BIOS INT 10h AH=10h function Get/Set Palette Registers to portably set a uniform palette for what you're doing. What is happening now, do these cards initialize with a standard palette that seems usable, or do you need a specialized palette for your images? (I realize that each image does best with a custom palette but an equidistant palette mapped to 24-bit RGB might look OK?) |
Indeed. But I start to like the int 0x10 facilities. They are pretty handy, and the planar mapping to 0xA000 in VGA make video memory access easy. We'll need to have a equidistant palette mapped to 24-bit RGB for jpg, ppm and bmp >= 16 bpp. But for BMP with bpp <= 8, which contains the palette, it is optimal to just call int 0x10 ah=0x10 function to set the appropriate palette and voila, the image loading is basically a memory copy. ps: With more time close to the end of the week I'll add the plot_pixel for different modes and palette manipulation functions to graphics.c |
Questions a bit off-topic but related to graphics development - if I can use small memory model it is better/faster than large model, right? If I want to compile with OW for small model, it is just a matter of pointing to the small model compiled libc, right? I might port the basic assembly code of elks-viewer to compile with our 3 toolchains assembly formats (OW, gcc-ia16 and C86). Which C86 which memory model are we using? Small means DS=CS or this is the tiny? |
Small model, with a single code and single but separate data segment, is faster than large model in most cases. We don't support tiny model (DS=CS) in the ELKS kernel, as programs are always loaded with separate code/data segments - we support shared code segments and fork requires a separate data segment.
Be aware that the compilers use different function calling sequences - in particular OW uses a register based calling sequence, and all use different assembly language. Each compiler uses and saves registers in different ways. That said, it'd be a good exercise to port your viewer to each compiler! C86 implements small model, and never accesses the DS or ES register explicitly, but issues DS=ES=SS. |
Yes. For now, you'll have to replace the MODEL= in libc/watcom.model and rebuild libc.lib. I have been planning to automate production of all four supported models producing libcl.lib, libcs.lib (large and small, etc). I will up the priority on getting that done so you don't have to do it. |
Thanks @ghaerr. Btw, I implemented palette get and set functions too: The 4 and 8-bit BMP load really fast and with correct colors now! Next is the custom palette to ease RGB 24-bit to 8-bit conversion. Then dithering (especially for 4-bit and 1-bit modes, mono or not). |
Thanks. I imaged it was just that. I'll play with the small model a bit, see if all goes well at least with the bmp and ppm viewer. JPEG is another beast but I also think it will fit in small model too. It will be one of the very few jpg viewers (I found none, but I bet it exists) for realmode 8086. I've been looking for options for DOS, but they are all 386+. |
The decoder will probably fit but I'm wondering about image sizes - it's pretty easy to get over 64k image size. You might consider using compact model, which is small code large data. This reduces code size in the executable but still allows for far data. If you end up having to index into a much larger array (than 64k), OWC also supports "huge" model for pointers, separately. |
But I'm studying the picojpeg decoding loop in order not to buffer the decoded image, as any small resolution * 3 will inevitably be bigger than 64k, so at first I need to remove the buffering in decoding, and write directly to video memory, as bmp and ppm reader, which buffers just one line, but in JPEG the buffer will be one macroblock, instead of the whole image, so I'm optimistic. Next task would be to remove the input buffering, which will be also not trivial. And thanks for the memory model explanation - I'll try to target compact model, as I do need far pointers indeed. |
@ghaerr
Is it possible to add the inline asm code from:
https://github.com/ghaerr/elks/wiki/Coding-games-for-ELKS#2-using-direct-access-to-the-vga-memory that initializes the VGA mode to libc?
It could be a header "vgalib.h" with 1 function and several defines:
This could be used by the new 8086 tool-chain, elksdoom developed by @FrenkelS and our basic interpreter for example. It will give common ground for future VGA development. The
VGA
will be set differently if needed - for PC98 for example, but will not require changing the program that uses the header.The text was updated successfully, but these errors were encountered: