The bigget [ropbl;em on teh treadin system is that in ptread I do no know which is teh equqvalien of these function
WaitForMultipleObjects(2, hWaitHandles, FALSE,INFINITE) - WAIT_OBJECT_0)
ReleaseSemaphore(m_emptySlot,1,NULL);
this loop in window
- Code: Select all
HANDLE hWaitHandles[2];
hWaitHandles[0] = m_workToDo;
hWaitHandles[1] = m_exit;
if((WaitForMultipleObjects(2, hWaitHandles, FALSE,INFINITE) - WAIT_OBJECT_0) == 1) {
return 0;
}
EnterCriticalSection(&m_criticalSection);
*job = m_queue[m_bottomIndex];
m_bottomIndex = (m_bottomIndex + 1) % DG_MAXQUEUE;
ReleaseSemaphore(m_emptySlot,1,NULL);
LeaveCriticalSection(&m_criticalSection);
is very very efficecent, because the tread are no waitin for anyhong, widow remove any thread that is in and it does no take any tiem at all.
In Pthread teh equavelene loop is this:
- Code: Select all
dgInterlockedIncrement(&m_workInProgress);
while ( m_emptySlot == 0 ) {
dgThreadYield();
}
dgInterlockedDecrement( &m_emptySlot );
dgSpinLock(&m_criticalSection);
m_queue[m_topIndex] = job;
m_topIndex = (m_topIndex + 1) % DG_MAXQUEUE;
dgInterlockedIncrement( &m_workToDo );
dgSpinUnlock( &m_criticalSection );
which mean the thread are always active, and when yielding the time to the system when there is not pending work.
the side effect is that any other task is reflected as if it was time spend in newton engine, because the time slice of a Task in about 100 milicunds (at least in window and I beleiev in linux is even bigger)
so if a thread takes it total timeslice then the engien thread have to way until anothe thread is available .
Even if all the trheads that are in the system takes very small time, the engine have to wait only all the pending thread finish or yeild the reamining of its time so that tha thread can complte their job.
I had triied to solv that problem for a long time but I do not kwno how, with PThread.
Any way the Thread system in Newton 3.00 is more effeicnt than in 2.00, but if you cna fix that th will be a Huge improvemnet for GCC systems.
Howven it is not as bad as it seems, as long as there are real actual cores in the system. a thread can take long time in one core, but othe cores can serve the other threads.
Tha is whith in Netwon 2,00 I limit teh max numbet of thread to teh number of cores.
this is because thread in Newton 2.00 teye to exectre equal ampung of work. so if oen thread is deleyed the teh hiosl syetem is also delayed.
In newton 3 I took a different aproach, threads do not take a fixed batch of mocrotask, they serve whatever mocrotask in next in the queue,
so even if all cores ar bussy wih heavy load,s and only one is avaible, that one continue working on the pendium mocros task in teh queues.
Thsi allowes for settin a unlimited tread number evne is ther is only one core in the system.
More importaty is take advatage of hyperetreaded syetm, whic in 2.00 is a liability.
In 2.00 is you set more or equal thread to cores count the system start to lose eficency whyle
in 3.00 more thread thann cores make the engine more efficent, effciency increases asympthotically with the threads count.
an example can explan it better.
say we have 1000 bodies to be serve Force and torque callback. say we set 4 threads.
in Netwon 2.0 the 1000 call back are devide into 4 bashes 250 mocrotask. these bashes are send to the tread manager and each tread takes one bash.
if one thread is delayed, them the entie system is also delayed becose afte each thread complet it bash they wait in a sycronozation barrier for other to finish.
in Newton 3.00 all mricro task are placed in a single bash array, then the list is send to the thread manager, there each thread compete to get one microtask,
when a tread have it micro task, it work on it only is finish, after that thread complete that microtask go back to teh manage to get another one.
If one thread is interrupted because some higher priorit os thread kicked it out in teh moddle of teh work, then the other threads can take its the rest of the microtask finish the job.
It is not completlly perfect because the thead that was interrupted still have to complete its microtask, But it is oen hell of a lot better than the strategy in 2.00
it also do some kind of automatic thread balancing, for example say each micro task is to solve a collion pair, say one collision is a compound versus compound while othe is shepe vs sphere
in 2.00 the thread that take tha array with the compound /compound, will take much longer than the other, whiel teh other have to yeild ther reat of thier time in teh syncronozation barrier.
in 3.00 the threads with the compound compound can take its time, while the othe thread can solve everything else, and when the compound compound finish then not more pendin work to do.
this makes that the other threads has to wait less on the sycronization barrier
It goes even futher, bacause the method in 3.00 allows for out of order micro task execution.
for example say the other thread complete the rest of the task while one is held in a time consuming micro task.
in some cases, it is possible that the other threads do not have to wait in the syncronization barrier forl the others threads complete thier task, thay can go on and start executing other task
This method I use extensivally in Newton 3.00 with the solver and to some extend with collision.
and it yield about 30% increase in peformance, some time even as high at 50%