Оцените презентацию от 1 до 5 баллов!
Тип файла:
ppt / pptx (powerpoint)
Всего слайдов:
39 слайдов
Для класса:
1,2,3,4,5,6,7,8,9,10,11
Размер файла:
227.50 kB
Просмотров:
65
Скачиваний:
0
Автор:
неизвестен
Слайды и текст к этой презентации:
№1 слайд
Содержание слайда: Common C++ Performance Mistakes in Games
Pete Isensee
Xbox Advanced Technology Group
№2 слайд
Содержание слайда: About the Data
ATG reviews code to find bottlenecks and make perf recommendations
50 titles per year
96% use C++
1 in 3 use “advanced” features like templates or generics
№3 слайд
Содержание слайда: Why This Talk Is Important
The majority of Xbox games are CPU bound
The CPU bottleneck is often a language or C++ library issue
These issues are not usually specific to the platform
№4 слайд
Содержание слайда: Format
Definition of the problem
Examples
Recommendation
For reference
A frame is 17 or 33 ms (60fps / 30fps)
Bottlenecks given in ms per frame
№5 слайд
Содержание слайда: Issue: STL
Game using std::list
Adding ~20,000 objects every frame
Rebuilding the list every frame
Time spent: 6.5 ms/frame!
~156K overhead (2 pointers per node)
Objects spread all over the heap
№6 слайд
Содержание слайда: std::set and map
Many games use set/map as sorted lists
Inserts are slow (log(N))
Memory overhead: 3 ptrs + color
Worst case in game: 3.8 ms/frame
№7 слайд
Содержание слайда: std::vector
Hundreds of push_back()s per frame
VS7.1 expands vector by 50%
Question: How many reallocations for 100 push_back()s?
Answer: 13! (1,2,3,4,5,7,10,14,20,29,43,64,95)
№8 слайд
Содержание слайда: Clearly, the STL is Evil
№9 слайд
Содержание слайда: Use the Right Tool for the Job
The STL is powerful, but it’s not free
Filling any container is expensive
Be aware of container overhead
Be aware of heap fragmentation and cache coherency
Prefer vector, vector::reserve()
№10 слайд
Содержание слайда: The STL is Evil, Sometimes
The STL doesn’t solve every problem
The STL solves some problems poorly
Sometimes good old C-arrays are the perfect container
Mike Abrash puts it well:
“The best optimizer is between your ears”
№11 слайд
Содержание слайда: Issue: NIH Syndrome
Example: Custom binary tree
Sorted list of transparent objects
Badly unbalanced
1 ms/frame to add only 400 items
Example: Custom dynamic array class
Poorer performance than std::vector
Fewer features
№12 слайд
Содержание слайда: Optimizations that Aren’t
void appMemcpy( void* d, const void* s, size_t b )
{
// lots of assembly code here ...
}
appMemcpy( pDest, pSrc, 100 ); // bottleneck
appMemcpy was slower than memcpy for anything under 64K
№13 слайд
Содержание слайда: Invent Only What You Need
std::set/map more efficient than the custom tree by 10X
Tested and proven
Still high overhead
An even better solution
Unsorted vector or array
Sort once
20X improvement
№14 слайд
Содержание слайда: Profile
Run your profiler
Rinse. Repeat.
Prove the improvement.
Don’t rewrite the C runtime or STL just because you can. There are more interesting places to spend your time.
№15 слайд
Содержание слайда: Issue: Tool Knowledge
If you’re a programmer, you use C/C++ every day
C++ is complex
CRT and STL libraries are complex
The complexities matter
Sometimes they really matter
№16 слайд
Содержание слайда: vector::clear
Game reused global vector in frame loop
clear() called every frame to empty the vector
C++ Standard
clear() erases all elements (size() goes to 0)
No mention of what happens to vector capacity
On VS7.1/Dinkumware, frees the memory
Every frame reallocated memory
№17 слайд
Содержание слайда: Zero-Initialization
struct Array { int x[1000]; };
struct Container {
Array arr;
Container() : arr() { }
};
Container x; // bottleneck
Costing 3.5 ms/frame
Removing : arr() speeds this by 20X
№18 слайд
Содержание слайда: Know Thine Holy Standard
Use resize(0) to reduce container size without affecting capacity
T() means zero-initialize PODs. Don’t use T() unless you mean it.
Get a copy of the C++ Standard. Really.
www.techstreet.com; search on 14882
Only $18 for the PDF
№19 слайд
Содержание слайда: Issue: C Runtime
void BuildScore( char* s, int n )
{
if( n > 0 )
sprintf( s, “%d”, n );
else
sprintf( s, “” );
}
n was often zero
sprintf was a hotspot
№20 слайд
Содержание слайда: qsort
Sorting is important in games
qsort is not an ideal sorting function
No type safety
Comparison function call overhead
No opportunity for compiler inlining
There are faster options
№21 слайд
Содержание слайда: Clearly, the CRT is Evil
№22 слайд
Содержание слайда: Understand Your Options
itoa() can replace sprintf( s, “%d”, n )
*s = ‘\0’ can replace sprintf( s, “” )
std::sort can replace qsort
Type safe
Comparison can be inlined
Other sorting options can be even faster: partial_sort, partition
№23 слайд
Содержание слайда: Issue: Function Calls
50,000-100,000 calls/frame is normal
At 60Hz, Xbox has 12.2M cycles/frame
Function call/return averages 20 cycles
A game calling 61,000 functions/frame spends 10% CPU (1.7 ms/frame) in function call overhead
№24 слайд
Содержание слайда: Extreme Function-ality
120,000 functions/frame
140,000 functions/frame
130,000 calls to a single function/frame (ColumnVec<3,float>::operator[])
And the winner:
340,000 calls per frame!
9 ms/frame of call overhead
№25 слайд
Содержание слайда: Beware Elegance
Elegance → levels of indirection → more functions → perf impact
Use algorithmic solutions first
One pass through the world
Better object rejection
Do AI/physics/networking less often than once/frame
№26 слайд
Содержание слайда: Inline Judiciously
Remember: inline is a suggestion
Try “inline any suitable” compiler option
15 to 20 fps
68,000 calls down to 47,000
Try __forceinline or similar keyword
Adding to 5 funcs shaved 1.5 ms/frame
Don’t over-inline
№27 слайд
Содержание слайда: Issue: for loops
// Example 1: Copy indices to push buffer
for( DWORD i = 0; i < dwIndexCnt; ++i )
*pPushBuffer++ = arrIndices[ i ];
// Example 2: Initialize vector array
for( DWORD i = 0; i < dwMax; ++i )
mVectorArr[i] = XGVECTOR4(0,0,0,0);
// Example 3: Process items in world
for( itr i = c.begin(); i < c.end(); ++i )
Process( *i );
№28 слайд
Содержание слайда: Watch Out For For
Never copy/clear a POD with a for loop
std::algorithms are optimized; use them
memcpy( pPushBuffer, arrIndices,
dwIndexCnt * sizeof(DWORD) );
memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );
for_each( c.begin(), c.end(), Process );
№29 слайд
Содержание слайда: Issue: Exception Handling
Most games never throw
Most games never catch
Yet, most games enable EH
EH adds code to do stack unwinding
A little bit of overhead to a lot of code
10% size increase is common
2 ms/frame in worst case
№30 слайд
Содержание слайда: Disable Exception Handling
Don’t throw or catch exceptions
Turn off the C++ EH compiler option
For Dinkumware STL
Define “_HAS_EXCEPTIONS=0”
Write empty _Throw and _Raise_handler; see stdthrow.cpp and raisehan.cpp in crt folder
Add #pragma warning(disable: 4530)
№31 слайд
Содержание слайда: Issue: Strings
Programmers love strings
Love hurts
~7000 calls to stricmp in frame loop
1.5 ms/frame
Binary search of a string table
2 ms/frame
№32 слайд
Содержание слайда: Avoid strings
String comparisons don’t belong in the frame loop
Put strings in an table and compare indices
At least optimize the comparison
Compare pointers only
Prefer strcmp to stricmp
№33 слайд
Содержание слайда: Issue: Memory Allocation
Memory overhead
Xbox granularity/overhead is 16/16 bytes
Overhead alone is often 1+ MB
Too many allocations
Games commonly do thousands of allocations per frame
Cost: 1-5 ms/frame
№34 слайд
Содержание слайда: Hidden Allocations
push_back(), insert() and friends typically allocate memory
String constructors allocate
Init-style calls often allocate
Temporary objects, particularly string constants that convert to string objects
№35 слайд
Содержание слайда: Minimize Per-Frame Allocations
Use memory-friendly data structures, e.g. arrays, vectors
Reserve memory in advance
Use custom allocators
Pool same-size allocations in a single block of memory to avoid overhead
Use the explicit keyword to avoid hidden temporaries
Avoid strings
№36 слайд
Содержание слайда: Other Tidbits
Compiler settings: experiment
dynamic_cast: just say no
Constructors: performance killers
Unused static array space: track this
Loop unrolling: huge wins, sometimes
Suspicious comments: watch out
“Immensely slow matrix multiplication”
№37 слайд
Содержание слайда: Wrap Up
Use the Right Tool for the Job
The STL is Evil, Sometimes
Invent Only What You Need
Profile
Know Thine Holy Standard
Understand Your Options
№38 слайд
Содержание слайда: Call to Action: Evolve!
Pass the rubber chicken
Share your C++ performance mistakes with your team
Mentor junior programmers
So they only make new mistakes
Don’t stop learning
You can never know enough C++
№39 слайд
Содержание слайда: Questions
Fill out your feedback forms
Email: pkisensee@msn.com
This presentation: www.tantalon.com/pete.htm