Page 12 of 16

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 3:32 pm
by Dr.Disaster
badhabit wrote:
Dr.Disaster wrote:
badhabit wrote:Hmm, what I see is that load is distributed between 4 cores, not summing up to more load than 1.68 cores (21% * 8) ... do you see an FPS increase from 2 to 3?
Nope, no increase at all with more then 2 cores. If that would have been the case i would have stated it above.
so, LoG2 is using max. 2 cores (not 4) and maybe, maybe it is just the GPU driver who uses the second core. Which makes the official recommendation of a quadcore system pretty dubious.
.. and again you are on the wrong track. A pitty after the info i just gave you.

When i switch affinity of LoG2 from 1 to all cores the game immediatly jumps onto 3 more cores.
People with rather slow quad cores might see an increase in fps but i don't have that for testing.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 3:49 pm
by badhabit
Dr.Disaster wrote:
badhabit wrote:
Dr.Disaster wrote: Nope, no increase at all with more then 2 cores. If that would have been the case i would have stated it above.
so, LoG2 is using max. 2 cores (not 4) and maybe, maybe it is just the GPU driver who uses the second core. Which makes the official recommendation of a quadcore system pretty dubious.
.. and again you are on the wrong track. A pitty after the info i just gave you.

When i switch affinity of LoG2 from 1 to all cores the game immediatly jumps onto 3 more cores.
People with rather slow quad cores might see an increase in fps but i don't have that for testing.
That is normal behaviour that threads are moved/distributed/smeared between processors by the OS. This indicates NOT usage. Usage would be if the load would be higher e.g. by switching from 2 to 3 cores, when the FPS rise or at least the combined CPU load (for you 21%) rises.

I tested if with my dualcore system the theory if with less clock available, the second core gets better used. No, it does not, as the FPS were directly related to the clock indicating a completely single threaded code.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 4:31 pm
by Dr.Disaster
This is my affinity graph for LoG2 for the game start screen (party in cage).
LoG2 started out on core 0 and it's clearly visible when/where i added cores 2, 4 and 6.
After 4 cores got assigned there was no more change and my CPU was hardly impressed by the load.

Image

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 4:39 pm
by badhabit
Dr.Disaster wrote:This is my affinity graph for LoG2 for the game start screen (party in cage). It's clearly visible when/where i added cores 2, 4 and 6.
After 4 cores got assigned there was no more change and my CPU was hardly impressed by the load.
Thanks. The correct interpretation is that after 2 cores nothing changes load wise (and I guess, FPS wise too), the random jumping of the threads on the 4 physical cores has no meaning usage wise. 1x core: core0 fully loaded, 100% (+noise from something else, browser in background?), 2x cores: distributed on two cores which are ~40% + ~70% loaded (maybe overall +3% additional load, but as in the range of noise hard to say). 3x cores to 8x cores no change beside noise, so the question remains why AH was recommending quadcore systems.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 4:48 pm
by Dr.Disaster
Nothing in background, just LoG2, Steam and Win7.
Of course there are changes from 3 to 4 cores, core 6 did a lot less before while cores 2 and 4 go a tad further down.
From 4 to all cores there is no change

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 4:53 pm
by badhabit
Dr.Disaster wrote:Nothing in background, just LoG2, Steam and Win7.
Of course there is change from 3 to 4 cores; core 6 did a lot less before
Again, it is not about how the work is distributed between cores, it is about if more cores lead to more work done (e.g. more FPS) indicated by a higher load overall. If not more work is done the code makes no use of multicore ressources and Win7 shuffles the running process/threads just around that the cores get not bored by doing nothing.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 4:54 pm
by Dr.Disaster
As i said: you need a lot slower cpu then mine to check that.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 6:06 pm
by badhabit
Dr.Disaster wrote:As i said: you need a lot slower cpu then mine to check that.
I checked it in CPU bound situation, no additional load seeable but with with frequency variation of my CPU -> conclusion single threaded code.

Also, you could make your system CPU bound too by downscaling everything GPU, disabling v-sync & increasing the frameraterlimiter to 400FPS & analyze the influence of cores in a CPU bound situation.

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 7:21 pm
by vlzvl
Actually Grimrock 2 seems to be multi-threaded.
I created a simple console application that checks how many threads a specific process is using.
The rule of thumb is easy; if a process is using more threads than initial system is providing, then this process is multi-threaded.

For example, my Intel Core i3 370M has 2 physical cores but Windows are reporting 4 cores.
Well, i don't really have 4 cores but 4 threads and windows counts them as 4 cores.

In fact if you call that piece of Windows API code:

Code: Select all

   SYSTEM_INFO sysinfo;
   GetSystemInfo( &sysinfo );
   // sysinfo.dwNumberOfProcessors    ->   gives me 4
So, first point is that what Windows are showing on Performance page is not really the total Core usage per core,
but rather the Total Thread usage per Thread. This is due to Hyper-Threading.

As i said, i created a console application which is hard-coded to seek out grimrock2.exe and then traversing all threads the game is using. Based on the fact: if the number of threads used > total threads of system (cpu-wise), then we can be sure the process is multi-threaded.

TESTS

// output 1: (my OpenGL application, named CROSS.exe, using no multi-threading at all) //////////////////////

Code: Select all

CPU/Thread number for Intel(R) Core(TM:  4
Printing threads for Pid: 83768 (CROSS.exe)

Thread ID: 83730
Thread ID: 83354
Thread ID: 83410
Thread ID: 83220

End printing threads for Pid: 83768
Conclusion: Since my system providing 2 physical cores and 4 threads and my CROSS.exe using 4 threads, my application is not multi-threaded, no matter how high the usage is per-thread; this is OS just doing it's job.


// output 2: (grimrock2.exe) ///////////////////////////////////////////////////////////////

Code: Select all

CPU/Thread number for Intel(R) Core(TM:  4
Printing threads for Pid: 89268 (grim

Thread #0 with ID: 90468
Thread #1 with ID: 88984
Thread #2 with ID: 90656
Thread #3 with ID: 86948
Thread #4 with ID: 89388
Thread #5 with ID: 47916

End printing threads for Pid: 89268
Conclusion: Grimrock 2 indeed is a multi-threaded application since it uses more threads than system providing (4), although i suppose the bottleneck is low-GPUs.

I uploaded my application (with source for anyone's allergic to .exe) here

Re: LoG 2 very poor performance...

Posted: Fri Oct 24, 2014 7:32 pm
by badhabit
vlzvl wrote:Actually Grimrock 2 seems to be multi-threaded.
Thanks for also looking into this topic & your enthusiasm writting code for it (thread# can be also checked via the Taskmanager). About threads, I can tell as programmer that there many reasons to spawn threads in program while not being at all handeling performance relevant stuff (sound, network, input, timing...). Grimrock might spawn some threads but this might not all mean that relevant enigne parts run concurrently.

If a program was written with seriosu mutithreading in mind can be checked by profiling. For instance when we bring the LoG2 engine in a CPU bound situation and remove and add CPU cores. Do the performance increase/decrease? If not, no sufficient aspects are written with concurrency in mind. How to do that I tried to explain here.

About Hyper-Threating: in short, hyper-threaded cores are not real physical silicon cores but coming from physical cores which re-allocate part of their internal ressource (e.g. out-of-order-execution units) and present these as a fully capable cores (HT core). Disadvantage is that this ressources are then missing from the original core, leading potentially to a somewhat slower performance for them. But with code which is "ridicolous trivial" multithread-able (like several simulation/computer science tasks) a separation of one powerfull core in two less powerfull cores can be beneficial for overall performance.