#
efee2982 |
| 18-Apr-2025 |
Huijin Li <[email protected]> |
fix(LoadUnit): fix ldld && stld query revoke logic (#4580)
The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0 when source comes from MisalignBuffer, preventing cancellation of rar/
fix(LoadUnit): fix ldld && stld query revoke logic (#4580)
The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0 when source comes from MisalignBuffer, preventing cancellation of rar/raw enqueue requests during misaligned instruction reissuance.
Thus, we must use `io.misalign_ldout.bits.rep_info.need_rep` to determine whether to revoke rar/raw enqueue requests when source is from MisalignBuffer.
show more ...
|
#
35bb7796 |
| 14-Apr-2025 |
Anzo <[email protected]> |
fix(LSU): fix exception for misalign access to `nc` space (#4526)
For misaligned accesses, say if the access after the split goes to `nc` space, then a misaligned exception should also be generated.
fix(LSU): fix exception for misalign access to `nc` space (#4526)
For misaligned accesses, say if the access after the split goes to `nc` space, then a misaligned exception should also be generated.
Co-authored-by: Yanqin Li <[email protected]>
show more ...
|
#
4ec1f462 |
| 09-Apr-2025 |
cz4e <[email protected]> |
timing(StoreMisalignBuffer): fix misalign buffer enq timing (#4493)
* a misalign store will enqueue misalign buffer at s1, and revoke if it needs at s2
|
#
1592abd1 |
| 08-Apr-2025 |
Yan Xu <[email protected]> |
feat: support inst lifetime trace (#4007)
PerfCCT(performance counter commit trace) is a Instruction-level granularity perfCounter like GEM5 How to use this: 1. Make with "WITH_CHISELDB=1" argument
feat: support inst lifetime trace (#4007)
PerfCCT(performance counter commit trace) is a Instruction-level granularity perfCounter like GEM5 How to use this: 1. Make with "WITH_CHISELDB=1" argument 2. Run with "--dump-db --dump-select-db lifetime", then get the database 3. Instruction lifetime visualize run "python3 scripts/perfcct.py "the-db-file-path" -p 1 -v | less" 4. Analysis script now is in XS-GEM5 repo, see https://github.com/OpenXiangShan/GEM5/blob/xs-dev/util/ClockAnalysis.py
How it works: 1. Allocate one unique tag "seqNum" like GEM5 for each instruction at fetch stage 2. Passing the "seqNum" in each pipeline 3. Recording perf data through the DPIC interface
show more ...
|
#
83e17083 |
| 01-Apr-2025 |
Anzo <[email protected]> |
fix(LoadUnit): not enter misalignbuffer on exception (#4477)
|
#
0b8a9d16 |
| 28-Mar-2025 |
Yanqin Li <[email protected]> |
fix(LDU): only selected can be used in address mux (#4466)
|
#
dac94c49 |
| 20-Mar-2025 |
Anzo <[email protected]> |
fix(LoadUnit): uncache should not be generated when page fault (#4442)
As the comment says, even if a `PF` is generated, an address is still generated for `PMP/PMA` checking, which can lead to some
fix(LoadUnit): uncache should not be generated when page fault (#4442)
As the comment says, even if a `PF` is generated, an address is still generated for `PMP/PMA` checking, which can lead to some strange responses. Since the previous(https://github.com/OpenXiangShan/XiangShan/pull/4426) modification removed `s2_exception`, this resulted in the incorrect generation of `s2_uncache`.
This is now represented using clearer semantics: `s2_actually_uncache`: this real physical address is for uncache space. The `s2_uncache` has been retained to distinguish if it's a request from prefetching, which may be handled in a subsequent change to **YQ senior sister**.
I synchronised the changes to StoreUnit in this pr(https://github.com/OpenXiangShan/XiangShan/pull/4441).
show more ...
|
#
bbed9f8d |
| 17-Mar-2025 |
Anzo <[email protected]> |
fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426)
The loadAddrMisaligned exception is generated when misaligned accesses uncache space.
---
A misaligned load sets a loadA
fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426)
The loadAddrMisaligned exception is generated when misaligned accesses uncache space.
---
A misaligned load sets a loadAddrMisaligned exception at the s0 flag to ensure that it only enters the loadmisalignbuffer and has no other side effects. So it will prevent s2_uncache from spawning properly. Previously we used an additional `s2_un_misalign_exception` to flag this. Now, after examining the semantics of s2_uncache, the semantics of s2_uncache can be appropriately represented by directly removing the excepiont related signals
show more ...
|
#
522c7f99 |
| 07-Mar-2025 |
Anzo <[email protected]> |
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one `virtual load queue`, so in the extreme case it may happen that 36 load instructions that span 16Byte fill all 72 `RAR queues`.
---
There is some problem with our previous handling; if the oldest load instruction spanning 16Byte enters the `replayqueue` and at the same time there exists an instruction in the `loadmisalignbuffer` that can't finish executing because the `RAR Queue` is full, then the oldest load instruction is never cannot be issued because the `loadmisalignbuffer` has instructions in it all the time.
---
Therefore, we use a more violent scheme to do this. When the RAR is full, we let the misaligned load generate a rollback, and the next load instruction that the loadmisalignbuffer can receive must be the oldest (if it is misaligned).
show more ...
|
#
90f8d3cf |
| 06-Mar-2025 |
cz4e <[email protected]> |
fix(LoadUnit): exclude prefetch requests (#4367)
* In order to ensure timing, the RAR enqueue conditions need to be compromised, worst source of timing from `pmp` and `missQueue`.
* if `LoadQueueRA
fix(LoadUnit): exclude prefetch requests (#4367)
* In order to ensure timing, the RAR enqueue conditions need to be compromised, worst source of timing from `pmp` and `missQueue`.
* if `LoadQueueRARSize` == `VirtualLoadQueueSize`, just need to exclude prefetching. * if `LoadQueueRARSize` < `VirtualLoadQueueSize`, need to consider the situation of `s2_can_query`
show more ...
|
#
25381b72 |
| 05-Mar-2025 |
Anzo <[email protected]> |
fix(LoadUnit): misalign wakeup should not set s0 valid (#4359)
`s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to `s0_src_valid_vec` is valid when any of the inputs `valid`. The
fix(LoadUnit): misalign wakeup should not set s0 valid (#4359)
`s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to `s0_src_valid_vec` is valid when any of the inputs `valid`. Therefore, `misalign wakeup` needs to globally control `s0_valid`.
show more ...
|
#
7ea48366 |
| 03-Mar-2025 |
Anzo <[email protected]> |
fix(LoadUnit): misalign load wakeup not enter loadunit (#4333)
|
#
0d55e1db |
| 28-Feb-2025 |
cz4e <[email protected]> |
timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297)
* Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to add additional logic for rar enq * When no need fast replay,
timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297)
* Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to add additional logic for rar enq * When no need fast replay, loadunit allocate raw entry
show more ...
|
#
66e9b546 |
| 27-Feb-2025 |
Yanqin Li <[email protected]> |
fix(LDU): nc is also not mis-aligned (#4326)
|
#
99ce5576 |
| 20-Feb-2025 |
cz4e <[email protected]> |
style(Bundles): rewrite bundles with new style (#4274)
|
#
48f7f553 |
| 20-Feb-2025 |
Yanqin Li <[email protected]> |
fix(LDU): only tlb hit can use tlb resp (#4293)
|
#
5a36f63d |
| 20-Feb-2025 |
Anzo <[email protected]> |
fix(LoadUnit): corrupt should be triggered on valid mshr (#4292)
|
#
638f3d84 |
| 17-Feb-2025 |
Yanqin Li <[email protected]> |
fix(uncache): uncache load fails to replay (#4275)
Fixed the situation where the nc_with_data was not replayed correctly.
|
#
ccde5272 |
| 16-Feb-2025 |
cz4e <[email protected]> |
fix(LoadUnit): fix misalign load wrong wakeup (#4263)
when `io.dcache.req.ready` is false, misalign load will be stall, but `wakeup` still work normally and is not canceled in `s3`, which will caus
fix(LoadUnit): fix misalign load wrong wakeup (#4263)
when `io.dcache.req.ready` is false, misalign load will be stall, but `wakeup` still work normally and is not canceled in `s3`, which will cause the backend to get wrong data.
show more ...
|
#
9e12e8ed |
| 08-Feb-2025 |
cz4e <[email protected]> |
style(Bundles): move bundles to Bundles.scala (#4247)
|
#
faeef328 |
| 27-Jan-2025 |
Anzo <[email protected]> |
fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226)
`prefetch.w` sends a write request to `TLB/PMA/PMP`. As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the write
fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226)
`prefetch.w` sends a write request to `TLB/PMA/PMP`. As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the write request.
---
Previously, we only handled the case where `prefetch.r` did not have read permissions, not handled the case where `prefetch.w` did not have write permissions. **So, when `prefetch.w` has an address without write permissions, the request will still be sent to `Dcache`, which generates an error.**
**This pr fixes that, when `PMA/PMP` returns `io.pmp.st`, we generate `dcache.s2_kill`.**
show more ...
|
#
74050fc0 |
| 26-Jan-2025 |
Yanqin Li <[email protected]> |
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`?
* **Old Design**: Always **reject** the new request. * **New Desig**n: Consider **merging** requests.
## Merge Scenarios
‼️If the new one can be merge into the existing one, both need to be `NC`.
1. **New Store Request:** 1. **Existing Store:** Merge (the new store is younger). 2. **Existing Load:** Reject.
2. **New Load Request:** 1. **Existing Load:** Merge (the new load may be younger or older. Both are ok to merge). 2. **Existing Store:** Reject.
# What this PR do?
## 1. Entry Actions
1. **Allocate** a new entry and mark as `valid` 1. When there is no matching address. 2. **Allocate** a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. **Merge** into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is **not** selected to issue or issued. 4. **Reject** the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are **different**.
**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "**same attributes**" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`.
## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)`
> `mid`: master id > > `sid`: slave id
**Old Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req`, records the **`mid`**. - `S` sends a `resp` with the **`mid`**. - `M` receives the `resp` and matches it with the recorded **`mid`**.
**New Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the **`mid`** and updates its record with the received **`sid`**. - `S` sends a `resp` with the its **`sid`**. - `M` receives the `resp` and matches it with the recorded **`sid`**.
**Benefit:** The new design allows `S` to merge requests when new request enters.
## 3. Forwarding Mechanism
**Old Design:** Each address in the `ubuffer` is **unique**, so forwarding is straightforward based on a match.
**New Design:**
* A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one.
## 4. Bug Fixes
1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "**able to send requests**" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`.
<img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" />
# Evaluation
- ✅ timing - ✅ performance
| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO | | -------------- | ------- | ----------- | ------- | ----------- | | IO | 51026 | 1 | 208149 | 1.00 | | NC | 42343 | 1.21 | 169248 | 1.23 | | NC+OT | 20379 | 2.50 | 160101 | 1.30 | | NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 | | cache | 1298 | 39.31 | 4410 | 47.20 |
show more ...
|
#
fa5e530d |
| 21-Jan-2025 |
cz4e <[email protected]> |
timing(VSegmentUnit): duplicate latchVAddr (#4209)
* `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
|
#
0b4afd34 |
| 15-Jan-2025 |
cz4e <[email protected]> |
timing(LoadUnit): optimization load unit writeback data generate logic (#4167)
optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3`
timing(LoadUnit): optimization load unit writeback data generate logic (#4167)
optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3` * select data use one-hot instead of shifter
show more ...
|
#
37f33e11 |
| 13-Jan-2025 |
cz4e <[email protected]> |
timing(LoadUnit): fpWen and pdest reg out (#4144)
when loadunit writeback * **fpWen** uses register directly out * **pdest** uses register directly out
|