#
99a48a76 |
| 21-Apr-2025 |
cz4e <[email protected]> |
timing(LoadQueueUncache): adjust s1 enq and s2 enq valid generate logic (#4603)
|
#
522c7f99 |
| 07-Mar-2025 |
Anzo <[email protected]> |
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one `virtual load queue`, so in the extreme case it may happen that 36 load instructions that span 16Byte fill all 72 `RAR queues`.
---
There is some problem with our previous handling; if the oldest load instruction spanning 16Byte enters the `replayqueue` and at the same time there exists an instruction in the `loadmisalignbuffer` that can't finish executing because the `RAR Queue` is full, then the oldest load instruction is never cannot be issued because the `loadmisalignbuffer` has instructions in it all the time.
---
Therefore, we use a more violent scheme to do this. When the RAR is full, we let the misaligned load generate a rollback, and the next load instruction that the loadmisalignbuffer can receive must be the oldest (if it is misaligned).
show more ...
|
#
638f3d84 |
| 17-Feb-2025 |
Yanqin Li <[email protected]> |
fix(uncache): uncache load fails to replay (#4275)
Fixed the situation where the nc_with_data was not replayed correctly.
|
#
9e12e8ed |
| 08-Feb-2025 |
cz4e <[email protected]> |
style(Bundles): move bundles to Bundles.scala (#4247)
|
#
e836c770 |
| 16-Jan-2025 |
Zhaoyang You <[email protected]> |
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound. Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch Missprediction, machine clears, Core Bound, Memory Bound. Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound.
show more ...
|
#
0b4afd34 |
| 15-Jan-2025 |
cz4e <[email protected]> |
timing(LoadUnit): optimization load unit writeback data generate logic (#4167)
optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3`
timing(LoadUnit): optimization load unit writeback data generate logic (#4167)
optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3` * select data use one-hot instead of shifter
show more ...
|
#
9b12a106 |
| 25-Dec-2024 |
Anzo <[email protected]> |
area(LoadQueue): remove useless regs (#4062)
Vector Load's additional release logic in the `RAR/RAW Queue` looks unneeded, which would result in the `RAR/RAW Queue` storing redundant `regs` for `uop
area(LoadQueue): remove useless regs (#4062)
Vector Load's additional release logic in the `RAR/RAW Queue` looks unneeded, which would result in the `RAR/RAW Queue` storing redundant `regs` for `uopidx`.
show more ...
|
#
b240e1c0 |
| 07-Nov-2024 |
Anzooooo <[email protected]> |
feat(Zicclsm): refactoring misalign and support vector misalign
|
#
e9e6cd09 |
| 27-Nov-2024 |
Yanqin Li <[email protected]> |
perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2
|
#
46236761 |
| 22-Nov-2024 |
Yanqin Li <[email protected]> |
fix(uncache): fix a list of bugs of outstanding
* fix(uncache): delay flush until receiving uncache resp when state is not idle
* style(pbmt): remove some useless code and comments
* fix(uncache):
fix(uncache): fix a list of bugs of outstanding
* fix(uncache): delay flush until receiving uncache resp when state is not idle
* style(pbmt): remove some useless code and comments
* fix(uncache): not alloc and ready when existing same address entry
show more ...
|
#
bb76fc1b |
| 10-Oct-2024 |
Yanqin Li <[email protected]> |
fix(NC): fix a list of bugs of NC WMO access
* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store
* fix(PBMT): nc st req is changed to a state machine execution
* fix(pbmt)
fix(NC): fix a list of bugs of NC WMO access
* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store
* fix(PBMT): nc st req is changed to a state machine execution
* fix(pbmt): fix typo and control error of nc ld
* fix(pbmt): nc data assignment error
* fix(pbmt): nc should be used to wakeup
* fix(pbmt): remove wrong assert
* fix(pbmt): lots of bugs of nc st ld forward
* fix(pbmt): fix address align error
show more ...
|
#
c7353d05 |
| 03-Sep-2024 |
Yanqin Li <[email protected]> |
feat(NCld): support WMO access for NC ld
* feat(LDU): add support for NC in LoadUnit
* feat(LQ,UB): add support for NC in load queue and uncache buffer
* chore(pbmt): add xsperf for nc ld statistic
|
#
46e9ee74 |
| 27-Sep-2024 |
Haoyuan Feng <[email protected]> |
fix(exception): fix exception vaddr generate logic (#3639)
In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retain
fix(exception): fix exception vaddr generate logic (#3639)
In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retained. And for exceptions detected after address translation, the
48-bit vaddr needs to be zero-extended or sign-extended according to
different modes(`GenExceptionVa`), and then write to *tval.
Also fix some connection bugs.
show more ...
|
#
87b463aa |
| 24-Sep-2024 |
Anzo <[email protected]> |
fix(exception): connect new address port for vector access exceptions (#3626)
The vector exception address comes from the VMergebuffer, which needs to
store all 64 bits addresses and connect to the
fix(exception): connect new address port for vector access exceptions (#3626)
The vector exception address comes from the VMergebuffer, which needs to
store all 64 bits addresses and connect to the LSQ exception processing.
show more ...
|
#
a53daa0f |
| 11-Sep-2024 |
Haoyuan Feng <[email protected]> |
fix(exception): Add guest page fault logic of misalign and vlsu (#3537)
In our previous design, we did not consider the handling of gpf of
unaligned & vector load and stores. This commit adds a fix
fix(exception): Add guest page fault logic of misalign and vlsu (#3537)
In our previous design, we did not consider the handling of gpf of
unaligned & vector load and stores. This commit adds a fix to correctly
return the guest paddr when gpf happens in the above instructions.
show more ...
|
#
94998b06 |
| 04-Sep-2024 |
happy-lx <[email protected]> |
fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)
+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the
fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)
+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the
detection of trigger is moved from Memblock to LoadUnit.
- Once the breakpoint exception is detected, enter the exception Buffer
directly to handle the exception (previously, the
load instruction was executed first and then the exception was handled,
which would cause the mmio load to change the
status of the peripheral).
+ If Trigger address matches and the action is to enter debug mode, both
loadUnit and storeUnit will directly write this instruction back without
any execution (by setting this instruction as an exception).
+ Match trigger addresses for vector instructions in LoadUnit.
+ If both a misalign exception and a breakpoint occur, the breakpoint
exception will be processed first.
---------
Co-authored-by: chengguanghui <[email protected]>
show more ...
|
#
08b0bc30 |
| 03-Sep-2024 |
happy-lx <[email protected]> |
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Opt
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Optimize VLSU feedback and redirect
+ Optimise ldCancel and writeback signal generation
+ Optimise TLB Query Vaddr/hlv/hlvx/valid etc
+ Delay MMIO Store writeback for 1 Cycle
+ Fix tlbNoQuery and pmp logic
+ Remove clock gating for s3_fast_rep
+ Remove wbq conflict check to LoadPipe/MainPipe
+ Remove Mux in dcache resp data
+ Optimise data generation logic of LoadUnit
+ Duplicate Register in LoadUnit for data writeback
+ Duplicate Register in loadPipe for missQueue enq
+ Add skid buffer in VLSU
+ Select data from metaArray at S1
+ Simplify the enqueuing logic of missQueue
+ Separately generate the ready logic of miss Queue
+ Relax the conditions valid for bankdataArray reads
+ Add Reg between Dcache Mainpipe with sms prefetcher
+ Optimise store exceptionBuffer pipeline
---------
Co-authored-by: weiding liu <[email protected]>
Co-authored-by: Charlie Liu <[email protected]>
Co-authored-by: good-circle <[email protected]>
show more ...
|
#
b189aafa |
| 22-Aug-2024 |
zmx <[email protected]> |
zfhmin:add zfhmin extensions
*decode unit adds decoding of zfhmin extension related instructions *Re exemplified the functional units for scalar fpcvt
|
#
41d8d239 |
| 21-Aug-2024 |
happy-lx <[email protected]> |
RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)
This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches
--
RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)
This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches
---------
Co-authored-by: xiaofeibao <[email protected]>
show more ...
|
#
3406b3af |
| 18-Jul-2024 |
weiding liu <[email protected]> |
LoadUnit: refactor writeback data select logic
|
#
16ede6bb |
| 11-Jul-2024 |
weiding liu <[email protected]> |
MemBlock: refactor selectOldest of rollback for better timing
Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request
MemBlock: refactor selectOldest of rollback for better timing
Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request to memblock in LoadQueueRAW.
show more ...
|
#
47986d36 |
| 10-Jul-2024 |
Anzo <[email protected]> |
VLSU: fix bugs related to vector access exceptions (#3169)
fix the bug of vector unit-stride exception address calculation
fix connection between vector exception and 'exceptionBuffer' in 'LoadQu
VLSU: fix bugs related to vector access exceptions (#3169)
fix the bug of vector unit-stride exception address calculation
fix connection between vector exception and 'exceptionBuffer' in 'LoadQueue'
At present, the vector access exception processing still needs to wait
for the modification of the back-end. We will test after the back-end is
completed, and may also adapt the storage access side.
show more ...
|
#
58cb1b0b |
| 06-Jun-2024 |
zhanglinjuan <[email protected]> |
CoupledL2, Uncache, LSQ: support non-data error handling (#3042)
According to CHI specification, a non-data error should be reported when
an error is detected that is not related to data corruption
CoupledL2, Uncache, LSQ: support non-data error handling (#3042)
According to CHI specification, a non-data error should be reported when
an error is detected that is not related to data corruption. Typically
this error is reported for:
* An attempt to access a location that does not exist.
* An illegal access, such as a write to a read only location.
* An attempt to use a transaction type that is not supported.
While the second kind of errors can be resolved by PMA, the first and
the third kind of errors were not supported yet.
This commit implements non-data error handling path. MMIOBridge in
CoupledL2 transfers CHI `RespErr` field downwards into TileLink `denied`
field upwards. Uncache in DCache passes the error to LSQ to generate
access fault exception:
* For MMIO loads, UncacheBuffer writes back `exceptionVec` to LoadUnit
s0 and informs exception address to ExceptionBuffer at the same time.
* For MMIO stores, SQ writes back `exceptionVec` to Backend directly.
BTW, data error is still not supported.
show more ...
|
#
25df626e |
| 04-May-2024 |
good-circle <[email protected]> |
Merge branch 'master' into vlsu-tmp-master
|
#
627be78b |
| 23-Apr-2024 |
good-circle <[email protected]> |
VLSU, lsq: support more than one vector pipeline
|