The software running on the STM32F103CBT6 microcontroller on the Maple-Mini clone is the wonderful Mecrisp Forth Interpreter/Compiler that target ARM Cortex-M3 based CPUs. It also uses FORTH software libraries from JeeLabs which have been modified in some cases to improve performance and add functionality.
You can find all the FORTH code running on this board at https://github.com/lmamakos/fluke8050a-forth
File and library grouping
The FORTH-based application software is organized into a number of files, which reflect a range of functionality from basic hardware support for a platform, drivers for specific hardware devices, libraries for functions (such as font rendering), debugging support (FORTH word disassembler) and finally the actual Fluke 8050A multimeter "application" itself.
The FORTH program as it is loaded is compiled into the STM32F103's flash, starting beyond the actual FORTH interpreter. Different sets of files are loaded in groups, with the ability to reset and erase the Flash back to one or more previous groups as an aid during development to avoid having to reload all of FORTH program (which can be rather voluminous, as it includes font bitmap definitions.
There is a mechanism which looks for a pseudo FORTH word include as a means of having one file include another while loading to simplify the process.
At the end, when all the FORTH code has been loaded, it is possible to extract the entire STM32F103's Flash image, up through the last programmed page, and then reload it into another device to replicate the entire compiled FORTH environment.
Display update loop
Looking at the FORTH application code in the 40_app.fs file, it is broadly structured around a display update loop.
The basic notion is that the code will wait for the Fluke 8050A existing F8 microprocessor to do an A/D conversion and measurement, and then watch as that microprocessor strobes out 5 digits for the 7-segment Fluke display. The FORTH code watches for each of the digit strobe signals (which occur in sequence from the sign/highest-order digit to the least significant digit) and then capture each 4 bit BCD digit value, as well as decimal points and other annunciators.
Once all the digits are captured, then the graphical LCD display is updated with the current measurement.
Each update cycle is performed by the
fluke-multimeter-display word, which gets all the current digits values (by way of the
get-strobes word), computes a bunch of internal state updates in the
compute-update word, and the updates the display in the
display-Update word. There is, of course, lots more detail than that :-)
Font rendering speed-up
Font rendering was speeded up immensely in this implementation by adding a new primitive to the graphical LCD display driver to render a bitmap to the display, using the currently set display foreground and background colors. A font library was created which would render a character by invoking this bitmap rendering primitive.
Previously, rendering fonts was performance by setting one pixel at a time on the LCD display. This was rather inefficient due to a relatively larger overhead to set the X/Y position for the display and to being and end a SPI transaction. By rendering a bitmap corresponding to a character, the X/Y position and rectangular window size are defined, and then all pixels (16 bit RGB values) are streamed out for each row within one command/data invocation.
SPI interface speed-up
Upon examination with a logic analyzer, it became clear that the SPI bus utilization wasn't as efficient as possible.
For each byte that was streamed out, an idle time existed just following. This was due to the full-duplex nature of the SPI words as defined. That is, as a byte of data was transmitted on the SPI interface to the peripheral, another byte of data was being clocked in from the SPI peripheral. At a low level, a byte is written to the SPI controller, and then the controller is polled until it indicates that the received byte has been fully received, so it can be returned.
However, for this application, it is not necessary to received any data from the SPI LCD display. So the SPI driver was modified to allow the byte to be transmitted to be written to the SPI controller, without waiting for the (nonexistent) input data to be clocked in. It begins clocking out the data, but the FORTH code continues, and begin computing the next byte to be transmitted concurrently as the previous byte is being transmitted.
The effect of the change is to check only before transmitting a byte for the TX register in the SPI interface being "ready", rather than waiting for the transfer to complete. In many cases, this allows much reduced idle time on the SPI bus and results in data being rendered on the SPI LCD display faster.
Tweaked Mecrisp Forth
I built a slightly tweaked version of Mecrisp forth for the STM32F103 series of parts that is configured to use the larger Flash capacity available on the STM32F103xB series parts (128K) rather than the minimum size of 64K.
I created a new directory, stm32f103xB which is a copy of stm32f103 with these changes and file name changes:
$ diff -ur stm32f103/Makefile stm32f103xB/Makefile --- stm32f103/Makefile 2014-05-19 10:32:23.000000000 -0400 +++ stm32f103xB/Makefile 2016-08-21 11:38:53.000000000 -0400 @@ -4,15 +4,15 @@ COPS = -Wall -Os -nostdlib -nostartfiles -ffreestanding -save-temps AOPS = --warn --fatal-warnings -all : mecrisp-stellaris-stm32f103.bin +all : mecrisp-stellaris-stm32f103xB.bin -mecrisp-stellaris-stm32f103.o : mecrisp-stellaris-stm32f103.s - $(ARMGNU)-as mecrisp-stellaris-stm32f103.s -o mecrisp-stellaris-stm32f103.o +mecrisp-stellaris-stm32f103xB.o : mecrisp-stellaris-stm32f103xB.s + $(ARMGNU)-as mecrisp-stellaris-stm32f103xB.s -o mecrisp-stellaris-stm32f103xB.o -mecrisp-stellaris-stm32f103.bin : memmap mecrisp-stellaris-stm32f103.o - $(ARMGNU)-ld -o mecrisp-stellaris-stm32f103.elf -T memmap mecrisp-stellaris-stm32f103.o - $(ARMGNU)-objdump -D mecrisp-stellaris-stm32f103.elf > mecrisp-stellaris-stm32f103.list - $(ARMGNU)-objcopy mecrisp-stellaris-stm32f103.elf mecrisp-stellaris-stm32f103.bin -O binary +mecrisp-stellaris-stm32f103xB.bin : memmap mecrisp-stellaris-stm32f103xB.o + $(ARMGNU)-ld -o mecrisp-stellaris-stm32f103xB.elf -T memmap mecrisp-stellaris-stm32f103xB.o + $(ARMGNU)-objdump -D mecrisp-stellaris-stm32f103xB.elf > mecrisp-stellaris-stm32f103xB.list + $(ARMGNU)-objcopy mecrisp-stellaris-stm32f103xB.elf mecrisp-stellaris-stm32f103xB.bin -O binary clean: rm -f *.bin $ diff -ur stm32f103/mecrisp-stellaris-stm32f103.s stm32f103xB/mecrisp-stellaris-stm32f103xB.s --- stm32f103/mecrisp-stellaris-stm32f103.s 2015-04-05 12:46:41.000000000 -0400 +++ stm32f103xB/mecrisp-stellaris-stm32f103xB.s 2016-08-21 11:47:03.000000000 -0400 @@ -46,7 +46,7 @@ .equ Kernschutzadresse, 0x00004000 @ Darunter wird niemals etwas geschrieben ! Mecrisp core never writes flash below this address. .equ FlashDictionaryAnfang, 0x00004000 @ 16 kb für den Kern reserviert... 16 kb Flash reserved for core. -.equ FlashDictionaryEnde, 0x00010000 @ 48 kb Platz für das Flash-Dictionary 48 kb Flash available. Porting: Change this ! +.equ FlashDictionaryEnde, 0x00020000 @ 112 kb Platz für das Flash-Dictionary 112 kb Flash available. Porting: Change this ! .equ Backlinkgrenze, RamAnfang @ Ab dem Ram-Start. @@ -73,7 +73,7 @@ @ Catch the pointers for Flash dictionary .include "../common/catchflashpointers.s" - welcome " for STM32F103 by Matthias Koch" + welcome " for STM32F103xB by Matthias Koch" @ Ready to fly ! .include "../common/boot.s"