Презентация Development and Simulation онлайн
На нашем сайте вы можете скачать и просмотреть онлайн доклад-презентацию на тему Development and Simulation абсолютно бесплатно. Урок-презентация на эту тему содержит всего 85 слайдов. Все материалы созданы в программе PowerPoint и имеют формат ppt или же pptx. Материалы и темы для презентаций взяты из открытых источников и загружены их авторами, за качество и достоверность информации в них администрация сайта не отвечает, все права принадлежат их создателям. Если вы нашли то, что искали, отблагодарите авторов - поделитесь ссылкой в социальных сетях, а наш сайт добавьте в закладки.
Презентации » Устройства и комплектующие » Development and Simulation
Оцените!
Оцените презентацию от 1 до 5 баллов!
- Тип файла:ppt / pptx (powerpoint)
- Всего слайдов:85 слайдов
- Для класса:1,2,3,4,5,6,7,8,9,10,11
- Размер файла:362.56 kB
- Просмотров:95
- Скачиваний:0
- Автор:неизвестен
Слайды и текст к этой презентации:
№5 слайд
Содержание слайда: Objectives
Develop a framework and building blocks for the application specific vector processor, make simulation software.
Simplify the development of the hardware architecture which performs operations on the data structures of finite length (vectors)
signal processing - OFDM symbols, code blocks
cryptography - cipher blocks
networking - data packets
№6 слайд
Содержание слайда: Objectives
Framework should provide unified approach for the development of the functional components: Execution Units (EU) and Data Memories (DM) as well as for their interoperation
Specific set and configuration of the functional units are identified by the target application
№7 слайд
Содержание слайда: Objectives
Elaborate development and simulation methodology of the hardware architecture from system specifications (algorithms) and throughput requirements
Follow top-down development and optimization strategy
Area: improvement of the utilization rate of the processing blocks and storage elements (RAMs)
Throughput: datapath is elaborated at the stage of the architecture development
Power
Non-recurring engineering (NRE) resources and risks
№8 слайд
Содержание слайда: Objectives
Employ SystemC
Short simulation-analysis-update cycle which allows for simulation driven development and optimization
Cycle accurate simulation for the vector core to obtain realistic timing and throughput estimates at the earlier stages
High level of abstraction for Vector Core preferences, runtime configuration and status to stay focused on the architectural tasks
Large ecosystem of C/C++ libraries for data manipulation and processing allows for top-down development approach: from high level processing functions down to elementary arithmetic operations
Allows for integration into the existing simulation workflows
Open source
№10 слайд
Содержание слайда: Components in Brief
Vector Core functions
Performing vector computations in accordance to the configuration supplied from the Scalar Core
Generation of the events at the different stages of the execution of the vector operations
Synchronization of the processing threads
Update of the statuses of the vector operations
Data dependent processing
Sequencing of the vector operations
№11 слайд
Содержание слайда: Components in Brief
Vector Core components
Data Memories (DM): temporary storage of the vectors.
A set of Address Generators (AG) associated with each DM allows for flexible addressing/fetching of the elements of the vector.
Execution Units (EU): perform successive processing of the elements of the vectors.
This part of the Vector Core is specific for target application
№12 слайд
Содержание слайда: Components in Brief
Streaming Devices: interfacing Vector Core with the external devices
ADC or DAC
Preliminary or subsequent processing blocks
Interface to the external storage (DMA)
XBAR: Network-On-Chip (NOC) intended for routing of the vector streams between Execution Units (EU), Data Memories (DM) and/or Streaming Devices
DMs, EUs and Streaming Devices have a unified interface for connection to XBAR
№13 слайд
Содержание слайда: Components in Brief
Functions of the Scalar Infrastructure
Respond to the events and statuses from the Vector Core components
Control the execution flow inside the Vector Core
Synchronization of the vector processing threads
Generate and deliver configuration data to the Vector Core components
№14 слайд
Содержание слайда: Components in Brief
Scalar Infrastructure components
Scalar Core processes the events and statuses from Vector Core, and generates configurations for the components of the Vector Core.
It can be implemented as a programmable general purpose CPU subsystem or an FSM depending on the complexity of the control procedures
Event MUX: delivers events which were generated by the Vector Core components to the Scalar Core
№15 слайд
Содержание слайда: Components in Brief
Scalar Infrastructure components (cont’d)
Config De-multiplexer: distributes commands and data supplied from to the Scalar Core to the Vector Core components.
Broadcasting commands are supported for the execution control
Status MUX: delivers the status data, which was sent by the Vector Core components in response to the request commands from the Scalar Core, to the Scalar Core
№16 слайд
Содержание слайда: Outline of the development strategy
Decompose the processing algorithm down to the level of functional blocks and storage elements
Develop the functional blocks having the specified interface with XBAR and Scalar Infrastructure
Assemble the functional blocks and storage elements to the framework
Develop control FSM and/or CPU FW
№21 слайд
Содержание слайда: VALID-READY Channel
SOURCE retains the state of VALID and DATA signals until the transaction is acknowledged by the DESTINATION with its READY output asserted
READY signal may change its state at any time. DESTINATION may implement a multi-cycle processing or use a shared resource which availability is changed from cycle to cycle
№22 слайд
Содержание слайда: VALID-READY Channel
At the DM/EU interface DATA signal which is transferred in 1 clock cycle contains 4 data slots
Contents of 1 slot
Flag which identifies that the data field is valid
Data field of type “complex double” representing 1 element of the vector or 1 sample
Other application-specific data structures are possible e.g. fixed-point types, pixel colours, etc.
№23 слайд
Содержание слайда: VALID-READY Channel
VALID signal is extended to 4 states (2 bits) to support of vectors transfers of finite length
IDLE – Inactive state
HEAD – first data transaction if the number of data transactions is more than 1
BODY – intermediate data transaction if the number of data transactions is more than 2
TAIL – last data transaction or data transaction with a single element
Extended application-specific set of states is possible
№28 слайд
Содержание слайда: VALID-READY Channel
When merging streams, a processing stage should rely on all the upstream SOURCES to generate VALID output and wait for an acknowledge from the DESTINATION stage
Time alignment buffers can be used to supply vector elements from all the SOURCES at a single clock cycle
There are 2 ways of splitting streams:
DESTINATION which stalls is nominated to be a master and VALID signal to other DESTINATION s is AND’ed with the READY signal from the master DESTINATION.
For DMs and EUs this feature is implemented as a function of XBAR.
Supply VALID signal and process READY signal individually for each DESTINATION, latching the acknowledges for the given transaction.
This requires dedicated source ports in the DM/EU interface.
№29 слайд
Содержание слайда: VALID-READY Channel
To constrain combinational paths through the XBAR, both input and output signals of VALID, DATA and READY in DM or EU should be constrained.
Full-bandwidth elastic buffers described in [2] Section 2.1.2 can be used from both input and output sides.
Elastic buffer should be extended to support 4-state VALID signal
These buffers add to the datapath a delay of 2 clock cycles per single DM or EU.
№36 слайд
Содержание слайда: General Considerations
Efficient operation of the Vector Core from the perspective of the utilization rate and throughput:
Vectors are processed back to back, and
Multiple processing chains or threads run concurrently
The configuration of the Vector Core should be placed inside the correponding components before the processing of the new vector starts.
Pick up new configuration upon the completion of the current vector
Start the processing of the new vector without involving Scalar Infrastructure and thus less delay
№38 слайд
Содержание слайда: Synchronization between Cores:
Scalar Core to Vector Core
The processing in the Vector Core is managed with the commands from the Scalar Core
put configuration to a component of the Vector Core
get status of a component of the Vector Core
run configuration with exec_id which is broadcasted into all the components of the Vector Core
№39 слайд
Содержание слайда: Synchronization between Cores:
Scalar Core to Vector Core
Configuration
Each Vector Core component can have a number of slots to store the configuration.
Compulsory fields in a configuration slot
exec_id: the configuration becomes active when the block receives "run" command with the matching exec_id. The block picks up the configuration from the slot and starts its execution.
status_slot: the execution status is updated at the slot which is pointed by status_slot
This field is not applicable for XBAR
events: the block issues specific events during the execution of the configuration.
This field is not applicable for XBAR
№40 слайд
Содержание слайда: Synchronization between Cores:
Scalar Core to Vector Core
Compulsory fields in a configuration slot (cont’d)
config_next: when the execution of the current configuration is complete i.e.
DM/EU received or transmitted vector TAIL
XBAR identified that all the blocks have completed their processing
the block picks up the configuration from the slot which is pointed by config_next and starts its execution.
This forms a processing chain
Invalid on inexistent config_next completes the chain
Looped chains are possible
№41 слайд
Содержание слайда: Synchronization between Cores:
Vector Core to Scalar Core
Status
EU components can have a number of slots to store the results of the vector processing.
Can reflect runtime state of the component.
This state is visible to Scalar Core
Data fields inside the status slot are specific to a particular block
Events
Components of the Vector Core can notify Scalar Core on the progress of the execution of the configuration.
Full set of events is specific to DM/EU.
№42 слайд
Содержание слайда: Synchronization between Cores:
Vector Core to Scalar Core
Events (cont’d)
The subset of events to be issued for the vector being processed is selected in the configuration
XBAR can issue an event when all the connected components are finished the processing of the vector.
This is useful when the components are executing looped chains
In the simulator events are implemented with transfers through the FIFO channels with polling and readout on the side of the Scalar Core.
№43 слайд
Содержание слайда: Execution Model
Command sequencing under the control of the Scalar Core
Scalar Core configures DMs and EUs, which are employed in the processing chains
Multiple processing chains can run concurrently
Scalar Core configures routing through the XBAR
Same exec_id is programmed for all the DMs, EUs and XBAR configuration
Scalar Core issues run command with the specific exec_id
№44 слайд
Содержание слайда: Execution Model
Command sequencing under the control of the Scalar Core
Scalar Core waits for the TAIL event from the blocks which are employed in the processing chain
Scalar Core read and process blocks’ status
Scalar Core issues run command for the new chain which has been previously configured
Repeat
This behavior is tested in vri_test1..3
№45 слайд
Содержание слайда: Execution Model
Command sequencing under the control of the Vector Core
Configuration Chaining
Cycle stationary (data independent) execution can be handled by the Vector Core without the need for the intervention from the Scalar Core
Events from DMs/EUs/XBAR can be issued in the process of the chain execution to notify the Scalar Core.
The status of the blocks, which have completed the processing, can be read out.
Any of the configuration slots of the blocks, which completed the processing, can be reconfigured
The blocks, which completed the processing, can execute another configuration
FFT with 1 EU and 2 DMs for ping-pong is a good application example of this mode
This behavior is reflected in vri_test4
№46 слайд
Содержание слайда: Execution model:
Example of the Configuration Chaining
Succession of the operations in the processing chain:
Processing chain is initiated when the block receives run command with exec_id=ABC
Configuration slot X becomes active. A vector is processed. Block status is updated in the slot P. No events are generated.
Configuration slot Y becomes active. A vector is processed. Block status is updated in the slot Q. Event is generated when vector HEAD marker is received.
Configuration slot Z becomes active. A vector is processed. Block status is updated in the slot R. Event is generated when vector TAIL marker is received.
The processing chain is complete.
№47 слайд
Содержание слайда: Execution Model
Deferred Execution
Deferred Execution allows writing configuration and issuing run command to DMs, EUs and XBAR while they are busy with the processing.
This relaxes performance constraints on Scalar Core
Configuration chaining prevails over the deferred execution
DM or EU operation
If DM or EU is busy with processing a chain at the moment they receive run command, then the valid exec_id is placed into the FIFO buffer to be processed after the chain execution is complete.
This behavior is reflected in vri_test6
№48 слайд
Содержание слайда: Execution Model
Deferred Execution
XBAR
If XBAR receives run command for a configuration which connects DMs/EUs and at least one of them is busy, then it postpones the execution until all of the DMs and EUs for the requested configuration become idle.
This behavior is reflected in vri_test5, 6
Request for changing or running a configuration which is already active is illegal.
№49 слайд
Содержание слайда: Execution Model
Data Exchange and Data Dependent Execution
Components of the Vector Core can exchange small amounts of data on their own without involving Scalar Core or communication through XBAR
This data can be used as a Processing Parameter, or
A configuration pointer and/or execution trigger
Simple data transfers should not involve Scalar Core
Relaxes throughput constrains for Scalar Infrastructure
This exchange can be done via common register files or FIFOs on the top of the EUs
HW semaphores
Currently this feature is outside of the scope of the simulation framework but it can be implemented as a specific feature of the simulated application.
№50 слайд
Содержание слайда: Execution Model
Data Exchange and Data Dependent Execution
Exemplar task: frequency offset estimation in one chain and frequency offset correction in another
Estimation task processes a vector and produces a resulting scalar value
Estimation task upon the completion sends its result to the correction task
Correction task uses the scalar value in its processing
№55 слайд
Содержание слайда: vri_test
Verifies system integration and overall functionality
Vector transfers
Block configuration, status and events transfers
Verifies operation of VALID-READY interface and XBAR functionality
Verifies execution modes and command sequencing
Under the control of the Scalar Core
Under the control of the Vector Core
vrisrc: Streaming block which simulates VALID-READY source
Configurable random VALID or always VALID
Transmits random DATA with checksum at TAIL
vridst: Streaming block which simulates VALID-READY destination
Configurable random READY or always READY
Receives DATA and verifies the checksum
Assertions inside VALID-READY channel
Inspection of the VCD trace
Tests 01..07 run continuously in the random order for 1 ms
№58 слайд
Содержание слайда: vri_test / test02
Verifies operation of the “always ready” destination
Scalar core configures vector core components
src1->dst2 with READY=1, src2->dst1
Scalar core initiates 2 concurrent vector transfers.
Wait for transfer completion by polling the events from XBAR and EUs
№60 слайд
Содержание слайда: vri_test / test04
Verifies automatic stepping through the EU and XBAR configuration slots
Scalar core configures vector core components to execute 3 configuration slots in a succession.
1. src1->dst1, src2->dst2
2. src1->dst2, src2->dst1
3. src1->dst1, dst2. dst1 set as master, dst2 has READY=1
Scalar core initiates vector transfers which correspond to slot 1.
Scalar core waits for the completion of the transfers which correspond to slot 3 by polling the corresponding events
№61 слайд
Содержание слайда: vri_test / test05
Verifies deferred execution in XBAR
Scalar core configures vector core components in slot 1
src1->dst1
Scalar core initiates vector transfers which correspond to slot1.
Without waiting for completion Scalar core configures vector core components in slot 2
src1->dst2
Scalar core initiates vector transfers which correspond to slot 2. Vector core defers the execution of slot 2 until slot 1 is complete and src1 becomes available.
Scalar core waits for the completion of the transfers which correspond to slot 2.
№62 слайд
Содержание слайда: vri_test / test06
Verifies deferred execution in XBAR and EUs
Scalar core configures vector core components in slot 1
src1->dst1
src2->dst2
Scalar core initiates vector transfers which correspond to slot1.
Without waiting for completion Scalar core configures vector core components in slot 2
src1->dst2
src2->dst1
Scalar core initiates vector transfers which correspond to slot 2. Vector core defers the execution of slot 2 until slot 1 is complete and EUs become available.
Scalar core waits for the completion of the transfers which correspond to slot 2.
№63 слайд
Содержание слайда: vri_test / test07
Verifies execution priorities
Scalar core configures vector core components to execute configuration slots 1 and 2 in a succession.
1. src1->dst1
2. src1->dst2
Scalar core initiates vector transfers which correspond to slot 1.
Without waiting for completion Scalar core configures vector core components in slot 3
src1->dst1
Scalar core initiates vector transfers which correspond to slot 3. Vector core defers the execution of slot 3 until slots 1 and 2 are complete and EUs become available.
The resulting succession of transfers:
src1->dst1
src1->dst2
src1->dst1
№64 слайд
Содержание слайда: dm2dm
Verifies operation of the DM RAM blocks
dm_ram_1rw – Single port RAM with non-simultaneous read and write operations from a single address
dm_ram_1r1w – simple dual-port RAM with simultaneous one read and one write operations to different locations
The test checks
Integration into the vector core structure
Operational modes of address generator
AG register and configuration
Test runs continuously with the random vector sizes and VRI parameters for 1 ms
№66 слайд
Содержание слайда: dm2dm
Scalar core configures vector core components to execute configuration slots 1..3 in a succession.
1. src1->dm1 (lower half), dst1
2. src1->dm1 (upper half), dst1; dm1 (lower half)->dm2, dst2
3. dm1 (upper half)->dst1; dm2-> dm1 (lower half),dm2
Scalar core initiates vector transfers which correspond to slot1.
№68 слайд
Содержание слайда: dm_init
Clean
make EXAMPLE=basic/dm_init clean
Build
make EXAMPLE=basic/dm_init all
Generate initialization .mat file
Execute ./examples/basic/dm/init/mat/dm_init.m
dm_init.mat should be created in the same directory
Run
./build/Release/out/simsimd
Inspect the result:
In gtkwave File->Open New Window->trace.vcd
№69 слайд
Содержание слайда: dm_init
4 regions of dm1 block are initialized from the file at before the simulation starts
Initialize with the data accepted by vridst
Scalar core configures vector core components to configuration slots 1..3 in a succession.
dm1-> dst1, Initialized region 1
dm1-> dst1, Initialized region 2
dm1-> dst1, Initialized region 3
dm1-> dst1, Initialized region 4
Scalar core initiates vector transfers which correspond to slot1.
№70 слайд
Содержание слайда: transp
Verifies operation of the Transparent EU blocks
Synchronous input-to-output transfer: via the register
Asynchronous input-to-output transfer: wires
The test checks
Integration into the vector core structure
Operational of the basic EU block
Asynchronous operation inside EU
Test runs continuously with the random vector sizes and VRI parameters for 100 us
№74 слайд
Содержание слайда: Runtime Statistics
For the specific application collect runtime usage data for the resources which were allocated in the preferences
Access to DMs and EUs. Are they actually used?
Which blocks need supporting command sequencing under the control of the Vector Core
Range of the execution indexes
Internals of the DMs and EUs:
Configuration and status slots , bitwidth of the fields
Execution modes
Depth of the FIFO for the execution indexes
EU operational modes
DM AG modes
Events which were issued and processed
№75 слайд
Содержание слайда: Runtime Statistics
Logging of the of the XBAR
Switching matrix
For the particular application not all of the switching routes are used.
Matrixes for data/valid and for ready can be different if data transfers to multiple destinations are used
Configuration slots, bitwidth of the fields
Execution modes
Events which were issued and processed
№76 слайд
Содержание слайда: RTL Code and Testbench Generator
Generate RTL code on the basis of the initial preferences and runtime statistics
Generate vector core and parameters for XBAR, DMs and EUs
Use VRI interfaces or modules
Data injection
Data pickup
VRI protocol assertions
Generate testbench
Verifies system integration of the vector core
№83 слайд
Содержание слайда: Books, Papers and Presentations
[1] Bridging dream and reality: Programmable
baseband processors for software-defined radio,
D.Liu, A.Nilsson, E.Tell, D.Wu, J.Eilert
IEEE Communications Magazine 47 (9), 134-140
[2] Microarchitecture of Network-on-Chip
Routers, A Designer’s Perspective
Dimitrakopoulos G.; Psarras A.; Seitanidis I.
2015, 175p., Springer
№85 слайд
Содержание слайда: Patents (for reference only)
EP2751670B1: Digital signal processor
EP2751671B1: Digital signal processor and baseband communication device
US20060271764A1: Programmable digital signal processor including a clustered SIMD microarchitecture configured to execute complex vector instructions
US20060271765A1: Digital signal processor including a programmable network
US20070198815A1: Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit
US20140244970A1: Digital signal processor and baseband communication device
US20140281373A1: Digital signal processor and baseband communication device
US20140344549A1: Digital signal processor and baseband communication device
US20140351555A1: Digital signal processor and method for addressing a memory in a digital signal processor
US20140359252A1: Digital signal processor
US20140372728A1: Vector execution unit for digital signal processor
US7299342B2: Complex vector executing clustered SIMD micro-architecture DSP with accelerator coupled complex ALU paths each further including short multiplier/accumulator using two's complement
US7415595B2: Data processing without processor core intervention by chain of accelerators selectively coupled by programmable interconnect network and to memory
US8874968B1: Method and system for testing a processor designed by a configurator
US9557996B2: Digital signal processor and method for addressing a memory in a digital signal processor
Скачать все slide презентации Development and Simulation одним архивом:
-
Software Development Life Cycle and Methodologies
-
Software Development Life Cycle and Methodologies (Topic 2)
-
Android development
-
System analysis and decision making Decision Trees
-
ВКР: Разработка ANDROID приложения с использованием MVP архитектуры
-
5B070500 «Mathematical and Computer Modeling» Образовательные программы
-
Программирование. Program understanding
-
Mathematical functions, characters, and strings. Introduction to Java Programming
-
Языки программирования. Модуль random. Генерация случайных чисел
-
MVC в Android. Создание простейшего приложения