LoadUnit.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/mem/pipeline/LoadUnit.scala

Revision	Date	Author	Comments
# efee2982	18-Apr-2025	Huijin Li <[email protected]>	fix(LoadUnit): fix ldld && stld query revoke logic (#4580) The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0 when source comes from MisalignBuffer, preventing cancellation of rar/ fix(LoadUnit): fix ldld && stld query revoke logic (#4580) The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0 when source comes from MisalignBuffer, preventing cancellation of rar/raw enqueue requests during misaligned instruction reissuance. Thus, we must use `io.misalign_ldout.bits.rep_info.need_rep` to determine whether to revoke rar/raw enqueue requests when source is from MisalignBuffer. show more ...
# 35bb7796	14-Apr-2025	Anzo <[email protected]>	fix(LSU): fix exception for misalign access to `nc` space (#4526) For misaligned accesses, say if the access after the split goes to `nc` space, then a misaligned exception should also be generated. fix(LSU): fix exception for misalign access to `nc` space (#4526) For misaligned accesses, say if the access after the split goes to `nc` space, then a misaligned exception should also be generated. Co-authored-by: Yanqin Li <[email protected]> show more ...
# 4ec1f462	09-Apr-2025	cz4e <[email protected]>	timing(StoreMisalignBuffer): fix misalign buffer enq timing (#4493) * a misalign store will enqueue misalign buffer at s1, and revoke if it needs at s2
# 1592abd1	08-Apr-2025	Yan Xu <[email protected]>	feat: support inst lifetime trace (#4007) PerfCCT(performance counter commit trace) is a Instruction-level granularity perfCounter like GEM5 How to use this: 1. Make with "WITH_CHISELDB=1" argument feat: support inst lifetime trace (#4007) PerfCCT(performance counter commit trace) is a Instruction-level granularity perfCounter like GEM5 How to use this: 1. Make with "WITH_CHISELDB=1" argument 2. Run with "--dump-db --dump-select-db lifetime", then get the database 3. Instruction lifetime visualize run "python3 scripts/perfcct.py "the-db-file-path" -p 1 -v \| less" 4. Analysis script now is in XS-GEM5 repo, see https://github.com/OpenXiangShan/GEM5/blob/xs-dev/util/ClockAnalysis.py How it works: 1. Allocate one unique tag "seqNum" like GEM5 for each instruction at fetch stage 2. Passing the "seqNum" in each pipeline 3. Recording perf data through the DPIC interface show more ...
# 83e17083	01-Apr-2025	Anzo <[email protected]>	fix(LoadUnit): not enter misalignbuffer on exception (#4477)
# 0b8a9d16	28-Mar-2025	Yanqin Li <[email protected]>	fix(LDU): only selected can be used in address mux (#4466)
# dac94c49	20-Mar-2025	Anzo <[email protected]>	fix(LoadUnit): uncache should not be generated when page fault (#4442) As the comment says, even if a `PF` is generated, an address is still generated for `PMP/PMA` checking, which can lead to some fix(LoadUnit): uncache should not be generated when page fault (#4442) As the comment says, even if a `PF` is generated, an address is still generated for `PMP/PMA` checking, which can lead to some strange responses. Since the previous(https://github.com/OpenXiangShan/XiangShan/pull/4426) modification removed `s2_exception`, this resulted in the incorrect generation of `s2_uncache`. This is now represented using clearer semantics: `s2_actually_uncache`: this real physical address is for uncache space. The `s2_uncache` has been retained to distinguish if it's a request from prefetching, which may be handled in a subsequent change to YQ senior sister. I synchronised the changes to StoreUnit in this pr(https://github.com/OpenXiangShan/XiangShan/pull/4441). show more ...
# bbed9f8d	17-Mar-2025	Anzo <[email protected]>	fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426) The loadAddrMisaligned exception is generated when misaligned accesses uncache space. --- A misaligned load sets a loadA fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426) The loadAddrMisaligned exception is generated when misaligned accesses uncache space. --- A misaligned load sets a loadAddrMisaligned exception at the s0 flag to ensure that it only enters the loadmisalignbuffer and has no other side effects. So it will prevent s2_uncache from spawning properly. Previously we used an additional `s2_un_misalign_exception` to flag this. Now, after examining the semantics of s2_uncache, the semantics of s2_uncache can be appropriately represented by directly removing the excepiont related signals show more ...
# 522c7f99	07-Mar-2025	Anzo <[email protected]>	fix(LSU): misaligned violation detection stuck (#4369) Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one fix(LSU): misaligned violation detection stuck (#4369) Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one `virtual load queue`, so in the extreme case it may happen that 36 load instructions that span 16Byte fill all 72 `RAR queues`. --- There is some problem with our previous handling; if the oldest load instruction spanning 16Byte enters the `replayqueue` and at the same time there exists an instruction in the `loadmisalignbuffer` that can't finish executing because the `RAR Queue` is full, then the oldest load instruction is never cannot be issued because the `loadmisalignbuffer` has instructions in it all the time. --- Therefore, we use a more violent scheme to do this. When the RAR is full, we let the misaligned load generate a rollback, and the next load instruction that the loadmisalignbuffer can receive must be the oldest (if it is misaligned). show more ...
# 90f8d3cf	06-Mar-2025	cz4e <[email protected]>	fix(LoadUnit): exclude prefetch requests (#4367) * In order to ensure timing, the RAR enqueue conditions need to be compromised, worst source of timing from `pmp` and `missQueue`. * if `LoadQueueRA fix(LoadUnit): exclude prefetch requests (#4367) * In order to ensure timing, the RAR enqueue conditions need to be compromised, worst source of timing from `pmp` and `missQueue`. * if `LoadQueueRARSize` == `VirtualLoadQueueSize`, just need to exclude prefetching. * if `LoadQueueRARSize` < `VirtualLoadQueueSize`, need to consider the situation of `s2_can_query` show more ...
# 25381b72	05-Mar-2025	Anzo <[email protected]>	fix(LoadUnit): misalign wakeup should not set s0 valid (#4359) `s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to `s0_src_valid_vec` is valid when any of the inputs `valid`. The fix(LoadUnit): misalign wakeup should not set s0 valid (#4359) `s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to `s0_src_valid_vec` is valid when any of the inputs `valid`. Therefore, `misalign wakeup` needs to globally control `s0_valid`. show more ...
# 7ea48366	03-Mar-2025	Anzo <[email protected]>	fix(LoadUnit): misalign load wakeup not enter loadunit (#4333)
# 0d55e1db	28-Feb-2025	cz4e <[email protected]>	timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297) * Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to add additional logic for rar enq * When no need fast replay, timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297) * Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to add additional logic for rar enq * When no need fast replay, loadunit allocate raw entry show more ...
# 66e9b546	27-Feb-2025	Yanqin Li <[email protected]>	fix(LDU): nc is also not mis-aligned (#4326)
# 99ce5576	20-Feb-2025	cz4e <[email protected]>	style(Bundles): rewrite bundles with new style (#4274)
# 48f7f553	20-Feb-2025	Yanqin Li <[email protected]>	fix(LDU): only tlb hit can use tlb resp (#4293)
# 5a36f63d	20-Feb-2025	Anzo <[email protected]>	fix(LoadUnit): corrupt should be triggered on valid mshr (#4292)
# 638f3d84	17-Feb-2025	Yanqin Li <[email protected]>	fix(uncache): uncache load fails to replay (#4275) Fixed the situation where the nc_with_data was not replayed correctly.
# ccde5272	16-Feb-2025	cz4e <[email protected]>	fix(LoadUnit): fix misalign load wrong wakeup (#4263) when `io.dcache.req.ready` is false, misalign load will be stall, but `wakeup` still work normally and is not canceled in `s3`, which will caus fix(LoadUnit): fix misalign load wrong wakeup (#4263) when `io.dcache.req.ready` is false, misalign load will be stall, but `wakeup` still work normally and is not canceled in `s3`, which will cause the backend to get wrong data. show more ...
# 9e12e8ed	08-Feb-2025	cz4e <[email protected]>	style(Bundles): move bundles to Bundles.scala (#4247)
# faeef328	27-Jan-2025	Anzo <[email protected]>	fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226) `prefetch.w` sends a write request to `TLB/PMA/PMP`. As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the write fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226) `prefetch.w` sends a write request to `TLB/PMA/PMP`. As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the write request. --- Previously, we only handled the case where `prefetch.r` did not have read permissions, not handled the case where `prefetch.w` did not have write permissions. So, when `prefetch.w` has an address without write permissions, the request will still be sent to `Dcache`, which generates an error. This pr fixes that, when `PMA/PMP` returns `io.pmp.st`, we generate `dcache.s2_kill`. show more ...
# 74050fc0	26-Jan-2025	Yanqin Li <[email protected]>	perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`？ * Old Design: Always reject the new request. * New Design: Consider merging requests. ## Merge Scenarios ‼️If the new one can be merge into the existing one, both need to be `NC`. 1. New Store Request: 1. Existing Store: Merge (the new store is younger). 2. Existing Load: Reject. 2. New Load Request: 1. Existing Load: Merge (the new load may be younger or older. Both are ok to merge). 2. Existing Store: Reject. # What this PR do? ## 1. Entry Actions 1. Allocate a new entry and mark as `valid` 1. When there is no matching address. 2. Allocate a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. Merge into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is not selected to issue or issued. 4. Reject the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are different. NOTE: According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "same attributes" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`. ## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)` > `mid`: master id > > `sid`: slave id Old Design: - `M` sends a `req` with a `mid`. - `S` receives the `req`, records the `mid`. - `S` sends a `resp` with the `mid`. - `M` receives the `resp` and matches it with the recorded `mid`. New Design: - `M` sends a `req` with a `mid`. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the `mid` and updates its record with the received `sid`. - `S` sends a `resp` with the its `sid`. - `M` receives the `resp` and matches it with the recorded `sid`. Benefit: The new design allows `S` to merge requests when new request enters. ## 3. Forwarding Mechanism Old Design: Each address in the `ubuffer` is unique, so forwarding is straightforward based on a match. New Design: * A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one. ## 4. Bug Fixes 1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "able to send requests" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`. <img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" /> # Evaluation - ✅ timing - ✅ performance \| Type \| 4B1000 \| Speedup1-IO \| 1B4096 \| Speedup2-IO \| \| -------------- \| ------- \| ----------- \| ------- \| ----------- \| \| IO \| 51026 \| 1 \| 208149 \| 1.00 \| \| NC \| 42343 \| 1.21 \| 169248 \| 1.23 \| \| NC+OT \| 20379 \| 2.50 \| 160101 \| 1.30 \| \| NC+OT+mergeOpt \| 16308 \| 3.13 \| 126369 \| 1.65 \| \| cache \| 1298 \| 39.31 \| 4410 \| 47.20 \| show more ...
# fa5e530d	21-Jan-2025	cz4e <[email protected]>	timing(VSegmentUnit): duplicate latchVAddr (#4209) * `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
# 0b4afd34	15-Jan-2025	cz4e <[email protected]>	timing(LoadUnit): optimization load unit writeback data generate logic (#4167) optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3` timing(LoadUnit): optimization load unit writeback data generate logic (#4167) optimization load unit writeback data generate logic * merge multi source data at `s2`, select and expand data at `s3` * select data use one-hot instead of shifter show more ...
# 37f33e11	13-Jan-2025	cz4e <[email protected]>	timing(LoadUnit): fpWen and pdest reg out (#4144) when loadunit writeback * fpWen uses register directly out * pdest uses register directly out
12 3 4 5 6 7 8 9 10 >>...17