You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw an article about the speed of pthread_mutex implementations ( https://justine.lol/mutex/ )
and tried the mentioned test program (high contended scenario) with OpenWatcom.
The test program uses one counter, 30 threads and 100000 iterations per thread.
Yes, that is just a test program, that makes an implementation detail visible,
but with the default thread count, the OpenWatcom compiled programm uses 100% cpu,
until i canceled it after ~5 minutes.
The test was created to compare the mutex implementation of the cosmopolitan libc (64 bit)
and a part of the mutex implementation is based on nsync: https://github.com/google/nsync
(nsync supports many more cpu`s, including x86, alpha, mips, ppc, ...)
When using all Hardware threads (12 on my system), the mutex implementation of cosmopolitan (64bit)
is more than twice as fast as glibc (64bit),
and the mutex in cosmopolitan is more than 100 times faster as the mutex in OpenWatcom (32bit).
OpenWatcom is still about 35 times slower as glibc (32bit)
12 threads (supported by my hardware):
cosmopolitan libc(64bit): 16,2ms
glibc(64bit): 36,3ms / glibc(32bit): 47,0ms
OpenWatcom: 1682ms (1,68s)
When the program uses more threads as available in hardware,
the slowdown factor for the OpenWatcom compiled program increases dramatically to about
~ 7400 compared to cosmopolitan(64bit)
~ 2500 compared to glibc(32bit)
Even when using only 4 threads, OpenWatcom is ~59 times slower than glibc(32bit)
With only one or two threads, OpenWatcom is >100 times slower as glibc(32bit)
I also retested with 25 threads:
cosmopolitan libc(64bit): 31,4ms
glibc(64bit): 77,6ms / glibc(32bit): 89,7ms
OpenWatcom: 694362ms (694s) (A second run took 705 seconds)
########
The worker code for the thread is simple, but really slow, when compiled with OpenWatcom.
pthread_mutex_t g_locker = PTHREAD_MUTEX_INITIALIZER;
int g_chores;
void *worker(void *arg) {
for (int i = 0; i < ITERATIONS; ++i)
{
pthread_mutex_lock(&g_locker);
++g_chores;
pthread_mutex_unlock(&g_locker);
}
return 0;
}
The Test program was adapted to work with OpenWatcom (rusage is not available): pthread_mutex.c.txt
The implementation of pthread_mutex_lock is in ptmutex.c
The Author mentioned at the head of the file is J. Armstrong.
Is he still around?
pthread_mutex_lock uses sem_wait and sem_wait uses the futex syscall.
Is there something that can be changed, to handle some sync cases faster?
Is it possible to use some code or logic from other pthread_mutex_* or other semaphore implementations?
The text was updated successfully, but these errors were encountered:
The pthread mutex code I implemented is a bit of a mess mostly because I did run into some issues with futexes. The issue isn't using futexes, it has something to do with the implementation of semaphores/futexes in OW. I remember it being a problem when I committed the code, but I don't remember the specifics. It should "work," but there is a time-consuming workaround if I'm not mistaken.
I saw an article about the speed of pthread_mutex implementations ( https://justine.lol/mutex/ )
and tried the mentioned test program (high contended scenario) with OpenWatcom.
The test program uses one counter, 30 threads and 100000 iterations per thread.
Yes, that is just a test program, that makes an implementation detail visible,
but with the default thread count, the OpenWatcom compiled programm uses 100% cpu,
until i canceled it after ~5 minutes.
The test was created to compare the mutex implementation of the cosmopolitan libc (64 bit)
and a part of the mutex implementation is based on
nsync
: https://github.com/google/nsync(nsync supports many more cpu`s, including x86, alpha, mips, ppc, ...)
When using all Hardware threads (12 on my system), the mutex implementation of cosmopolitan (64bit)
is more than twice as fast as glibc (64bit),
and the mutex in cosmopolitan is more than 100 times faster as the mutex in OpenWatcom (32bit).
OpenWatcom is still about 35 times slower as glibc (32bit)
When the program uses more threads as available in hardware,
the slowdown factor for the OpenWatcom compiled program increases dramatically to about
~ 7400 compared to cosmopolitan(64bit)
~ 2500 compared to glibc(32bit)
Even when using only 4 threads, OpenWatcom is ~59 times slower than glibc(32bit)
With only one or two threads, OpenWatcom is >100 times slower as glibc(32bit)
I also retested with 25 threads:
########
The worker code for the thread is simple, but really slow, when compiled with OpenWatcom.
The Test program was adapted to work with OpenWatcom (rusage is not available):
pthread_mutex.c.txt
The implementation of
pthread_mutex_lock
is inptmutex.c
The Author mentioned at the head of the file is J. Armstrong.
Is he still around?
pthread_mutex_lock
usessem_wait
andsem_wait
uses thefutex
syscall.Is there something that can be changed, to handle some sync cases faster?
Is it possible to use some code or logic from other
pthread_mutex_*
or other semaphore implementations?The text was updated successfully, but these errors were encountered: