#
522c7f99 |
| 07-Mar-2025 |
Anzo <[email protected]> |
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one `virtual load queue`, so in the extreme case it may happen that 36 load instructions that span 16Byte fill all 72 `RAR queues`.
---
There is some problem with our previous handling; if the oldest load instruction spanning 16Byte enters the `replayqueue` and at the same time there exists an instruction in the `loadmisalignbuffer` that can't finish executing because the `RAR Queue` is full, then the oldest load instruction is never cannot be issued because the `loadmisalignbuffer` has instructions in it all the time.
---
Therefore, we use a more violent scheme to do this. When the RAR is full, we let the misaligned load generate a rollback, and the next load instruction that the loadmisalignbuffer can receive must be the oldest (if it is misaligned).
show more ...
|
#
3c808de0 |
| 17-Feb-2025 |
Anzo <[email protected]> |
fix(LSU): fix cbo instr exceptions and implementation (#4262)
1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
fix(LSU): fix cbo instr exceptions and implementation (#4262)
1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
5. Adding RAW checks to `cbo zero`.
6. Adding trigger(Debug Mode) checks to `cbo zero`.
7. Fixed several issues with the CBO instruction in NEMU.
----
In order not to create ambiguity with `io.mmioStout`, a new port of
`StoreQueue` is introduced for writeback `cbo zero` after flush sbuffer.
arbitration is performed in `MemBlock`, and currently, `cbo zero` has
higher priority by default.
`cbo zero` should not be writteback at the same time as `mmio`.
---
A check on `CacheLine` has been added to `RAWQueue` to ensure memory
consistency when executing `cbo zero`.
See this issues:https://github.com/OpenXiangShan/XiangShan/issues/4240
for specific issues.
---
The `cbo` instruction requires a trigger check.
---------
Co-authored-by: zhanglinjuan <[email protected]>
show more ...
|
#
9e12e8ed |
| 08-Feb-2025 |
cz4e <[email protected]> |
style(Bundles): move bundles to Bundles.scala (#4247)
|
#
74050fc0 |
| 26-Jan-2025 |
Yanqin Li <[email protected]> |
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`?
* **Old Design**: Always **reject** the new request. * **New Desig**n: Consider **merging** requests.
## Merge Scenarios
‼️If the new one can be merge into the existing one, both need to be `NC`.
1. **New Store Request:** 1. **Existing Store:** Merge (the new store is younger). 2. **Existing Load:** Reject.
2. **New Load Request:** 1. **Existing Load:** Merge (the new load may be younger or older. Both are ok to merge). 2. **Existing Store:** Reject.
# What this PR do?
## 1. Entry Actions
1. **Allocate** a new entry and mark as `valid` 1. When there is no matching address. 2. **Allocate** a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. **Merge** into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is **not** selected to issue or issued. 4. **Reject** the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are **different**.
**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "**same attributes**" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`.
## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)`
> `mid`: master id > > `sid`: slave id
**Old Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req`, records the **`mid`**. - `S` sends a `resp` with the **`mid`**. - `M` receives the `resp` and matches it with the recorded **`mid`**.
**New Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the **`mid`** and updates its record with the received **`sid`**. - `S` sends a `resp` with the its **`sid`**. - `M` receives the `resp` and matches it with the recorded **`sid`**.
**Benefit:** The new design allows `S` to merge requests when new request enters.
## 3. Forwarding Mechanism
**Old Design:** Each address in the `ubuffer` is **unique**, so forwarding is straightforward based on a match.
**New Design:**
* A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one.
## 4. Bug Fixes
1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "**able to send requests**" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`.
<img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" />
# Evaluation
- ✅ timing - ✅ performance
| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO | | -------------- | ------- | ----------- | ------- | ----------- | | IO | 51026 | 1 | 208149 | 1.00 | | NC | 42343 | 1.21 | 169248 | 1.23 | | NC+OT | 20379 | 2.50 | 160101 | 1.30 | | NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 | | cache | 1298 | 39.31 | 4410 | 47.20 |
show more ...
|
#
e836c770 |
| 16-Jan-2025 |
Zhaoyang You <[email protected]> |
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound. Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch Missprediction, machine clears, Core Bound, Memory Bound. Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound.
show more ...
|
#
be8e95bc |
| 25-Dec-2024 |
Anzo <[email protected]> |
fix(MemBlock): fix overflow during lsqptr calculation (#4084)
The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and
fix(MemBlock): fix overflow during lsqptr calculation (#4084)
The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and multiple uop accumulations result in data overflow.
---
Theoretically this would have been a problem in previous versions as well, but for some reason the bug didn't occur in previous versions until `newDispatch`.
show more ...
|
#
0a7d1d5c |
| 22-Nov-2024 |
xiaofeibao <[email protected]> |
feat(backend): NewDispatch
|
#
b240e1c0 |
| 07-Nov-2024 |
Anzooooo <[email protected]> |
feat(Zicclsm): refactoring misalign and support vector misalign
|
#
e9e6cd09 |
| 27-Nov-2024 |
Yanqin Li <[email protected]> |
perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2
|
#
e04c5f64 |
| 19-Nov-2024 |
Yanqin Li <[email protected]> |
feat(outstanding): support nc outstanding and remove mmio st outstanding
|
#
bb76fc1b |
| 10-Oct-2024 |
Yanqin Li <[email protected]> |
fix(NC): fix a list of bugs of NC WMO access
* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store
* fix(PBMT): nc st req is changed to a state machine execution
* fix(pbmt)
fix(NC): fix a list of bugs of NC WMO access
* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store
* fix(PBMT): nc st req is changed to a state machine execution
* fix(pbmt): fix typo and control error of nc ld
* fix(pbmt): nc data assignment error
* fix(pbmt): nc should be used to wakeup
* fix(pbmt): remove wrong assert
* fix(pbmt): lots of bugs of nc st ld forward
* fix(pbmt): fix address align error
show more ...
|
#
c7353d05 |
| 03-Sep-2024 |
Yanqin Li <[email protected]> |
feat(NCld): support WMO access for NC ld
* feat(LDU): add support for NC in LoadUnit
* feat(LQ,UB): add support for NC in load queue and uncache buffer
* chore(pbmt): add xsperf for nc ld statistic
|
#
dc4fac13 |
| 02-Dec-2024 |
CharlieLiu <[email protected]> |
feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)
* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue
feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)
* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue.
* bump rocket-chip & coupledL2.
show more ...
|
#
189d8d00 |
| 29-Oct-2024 |
Anzo <[email protected]> |
refactor(MemBlock): turn on `dontTouch` only when debugging (#3792)
This will result in the delivery of clean generated code and may remove
some of the pseudo-paths.
|
#
cee1d5b2 |
| 15-Oct-2024 |
Yanqin Li <[email protected]> |
fix(lsq): uncache req can be assigned only in idle state (#3732)
**Bug Description:**
When an uncache store (st) is immediately followed by an uncache load
(ld), due to the `AddPipelineReg` in M
fix(lsq): uncache req can be assigned only in idle state (#3732)
**Bug Description:**
When an uncache store (st) is immediately followed by an uncache load
(ld), due to the `AddPipelineReg` in MemBlock when the LSQ transfers
data with the Uncache, even though Uncache is handling the store
request, `MemBlock.uncacheReq.ready` is still true. Under the original
assignment conditions, the ld request(ld req) from LQ will be received
by `MemBlock.uncacheReq` in the `s_store` state. So when
`MemBlock.uncacheReq` is received by Uncache, the LSQ state has already
transitioned from `s_store` to `s_idle`, without switching to `s_load`.
As a result, the load response (ld resp) from Uncache can never be
received by the LSQ. The process is briefly described as follows:
1. SQ: st req
2. Uncache: st req received
3. LQ: ld req in `s_store` state
4. Uncache: st resp
5. SQ: st resp received; Uncache: ld req received
6. LSQ: state to `s_idle`
7. Uncache: ld resp
8. **ERROR**: LSQ can not receive ld resp in `s_idle` state
**Fix**:In LSQ, uncache req can be assigned only in idle state.
<img width="1179" alt="image"
src="https://github.com/user-attachments/assets/1d2d417d-06d6-43bf-a876-5cc53d0ff9ed">
show more ...
|
#
46e9ee74 |
| 27-Sep-2024 |
Haoyuan Feng <[email protected]> |
fix(exception): fix exception vaddr generate logic (#3639)
In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retain
fix(exception): fix exception vaddr generate logic (#3639)
In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retained. And for exceptions detected after address translation, the
48-bit vaddr needs to be zero-extended or sign-extended according to
different modes(`GenExceptionVa`), and then write to *tval.
Also fix some connection bugs.
show more ...
|
#
ad415ae0 |
| 21-Sep-2024 |
Xiaokun-Pei <[email protected]> |
feat(trap): support m/htinst for specific G-stage translation (#3604)
According to RISC-V priv spec, mtinst/htinst could be always written
zero on trap into M/HS-mode, except for Guest-Page-Fault t
feat(trap): support m/htinst for specific G-stage translation (#3604)
According to RISC-V priv spec, mtinst/htinst could be always written
zero on trap into M/HS-mode, except for Guest-Page-Fault traps that meet
both of the following conditions:
- the trap is caused by a G-stage translation which supports VS-stage
translation
- a nonzero value is written to mtval2/htval
"isForVSnonLeafPTE" is used only in exceptional circumstances that gpf
happens in the G-stage translation which supports VS-stage translation,
such as searching the non-leaf pte of VS-stage.
This patch adds support for writing proper value to mtinst/htinst when
specific trap occurs. And bump the nemu.
show more ...
|
#
db6cfb5a |
| 19-Sep-2024 |
Haoyuan Feng <[email protected]> |
fix(exception): check high address bits of lsu (#3596)
In previous implementation, we simply truncated the higher bits of jump
target or load & store address, which made it impossible to raise
exc
fix(exception): check high address bits of lsu (#3596)
In previous implementation, we simply truncated the higher bits of jump
target or load & store address, which made it impossible to raise
exceptions in such cases.
Commit
https://github.com/OpenXiangShan/XiangShan/commit/c1b28b66879239a5b3a44741376f3b002e8ac834
has already fixed high address bits checking of jump target. This commit
fixes lsu part, checking full address in tlb and passing full address
directly to csr.
show more ...
|
#
b4d41c12 |
| 10-Sep-2024 |
xiaofeibao <[email protected]> |
timing(LsqEnqCtrl): fix timing of lqAllocNumber and sqAllocNumber
|
#
94998b06 |
| 04-Sep-2024 |
happy-lx <[email protected]> |
fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)
+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the
fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)
+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the
detection of trigger is moved from Memblock to LoadUnit.
- Once the breakpoint exception is detected, enter the exception Buffer
directly to handle the exception (previously, the
load instruction was executed first and then the exception was handled,
which would cause the mmio load to change the
status of the peripheral).
+ If Trigger address matches and the action is to enter debug mode, both
loadUnit and storeUnit will directly write this instruction back without
any execution (by setting this instruction as an exception).
+ Match trigger addresses for vector instructions in LoadUnit.
+ If both a misalign exception and a breakpoint occur, the breakpoint
exception will be processed first.
---------
Co-authored-by: chengguanghui <[email protected]>
show more ...
|
#
e3ed843c |
| 30-Aug-2024 |
happy-lx <[email protected]> |
Remove `RVA23` prefix and enable CMO by default (#3431)
+ Remove `RVA23` prefix to clean up code
+ set `hasCMO` to true by default
|
#
3fbc86fc |
| 26-Aug-2024 |
Chen Xi <[email protected]> |
RVA23 CMO (Cache Maintenance Operation) (#3426)
Supports Zicbom Extension (Clean/Flush/Invalid)
- https://github.com/OpenXiangShan/CoupledL2/pull/225
This PR also includes other CPL2 changes:
-
RVA23 CMO (Cache Maintenance Operation) (#3426)
Supports Zicbom Extension (Clean/Flush/Invalid)
- https://github.com/OpenXiangShan/CoupledL2/pull/225
This PR also includes other CPL2 changes:
- bug fixes
- timing fixes
- SRAM-Queue | https://github.com/OpenXiangShan/CoupledL2/pull/228
- data SRAM splitted into 4 |
https://github.com/OpenXiangShan/CoupledL2/pull/229
---------
Co-authored-by: lixin <[email protected]>
show more ...
|
#
41d8d239 |
| 21-Aug-2024 |
happy-lx <[email protected]> |
RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)
This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches
--
RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)
This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches
---------
Co-authored-by: xiaofeibao <[email protected]>
show more ...
|
#
5003e6f8 |
| 23-Jul-2024 |
Huijin Li <[email protected]> |
LSQ: optimize static clock gating coverage and fix x_value in vcs (#3176)
optimize LSQ static clock gating coverage, fix x_value in vcs
|
#
16ede6bb |
| 11-Jul-2024 |
weiding liu <[email protected]> |
MemBlock: refactor selectOldest of rollback for better timing
Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request
MemBlock: refactor selectOldest of rollback for better timing
Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request to memblock in LoadQueueRAW.
show more ...
|