History log of /XiangShan/src/main/scala/xiangshan/mem/pipeline/LoadUnit.scala (Results 1 – 25 of 414)
Revision Date Author Comments
# efee2982 18-Apr-2025 Huijin Li <[email protected]>

fix(LoadUnit): fix ldld && stld query revoke logic (#4580)

The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0
when source comes from MisalignBuffer, preventing cancellation of
rar/

fix(LoadUnit): fix ldld && stld query revoke logic (#4580)

The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0
when source comes from MisalignBuffer, preventing cancellation of
rar/raw enqueue requests during misaligned instruction reissuance.

Thus, we must use `io.misalign_ldout.bits.rep_info.need_rep` to
determine whether to revoke rar/raw enqueue requests when source is from
MisalignBuffer.

show more ...


# 35bb7796 14-Apr-2025 Anzo <[email protected]>

fix(LSU): fix exception for misalign access to `nc` space (#4526)

For misaligned accesses, say if the access after the split goes to `nc`
space, then a misaligned exception should also be generated.

fix(LSU): fix exception for misalign access to `nc` space (#4526)

For misaligned accesses, say if the access after the split goes to `nc`
space, then a misaligned exception should also be generated.

Co-authored-by: Yanqin Li <[email protected]>

show more ...


# 4ec1f462 09-Apr-2025 cz4e <[email protected]>

timing(StoreMisalignBuffer): fix misalign buffer enq timing (#4493)

* a misalign store will enqueue misalign buffer at s1, and revoke if it
needs at s2


# 1592abd1 08-Apr-2025 Yan Xu <[email protected]>

feat: support inst lifetime trace (#4007)

PerfCCT(performance counter commit trace) is a Instruction-level
granularity perfCounter like GEM5
How to use this:
1. Make with "WITH_CHISELDB=1" argument

feat: support inst lifetime trace (#4007)

PerfCCT(performance counter commit trace) is a Instruction-level
granularity perfCounter like GEM5
How to use this:
1. Make with "WITH_CHISELDB=1" argument
2. Run with "--dump-db --dump-select-db lifetime", then get the database
3. Instruction lifetime visualize run "python3 scripts/perfcct.py
"the-db-file-path" -p 1 -v | less"
4. Analysis script now is in XS-GEM5 repo, see
https://github.com/OpenXiangShan/GEM5/blob/xs-dev/util/ClockAnalysis.py

How it works:
1. Allocate one unique tag "seqNum" like GEM5 for each instruction at
fetch stage
2. Passing the "seqNum" in each pipeline
3. Recording perf data through the DPIC interface

show more ...


# 83e17083 01-Apr-2025 Anzo <[email protected]>

fix(LoadUnit): not enter misalignbuffer on exception (#4477)


# 0b8a9d16 28-Mar-2025 Yanqin Li <[email protected]>

fix(LDU): only selected can be used in address mux (#4466)


# dac94c49 20-Mar-2025 Anzo <[email protected]>

fix(LoadUnit): uncache should not be generated when page fault (#4442)

As the comment says, even if a `PF` is generated, an address is still
generated for `PMP/PMA` checking, which can lead to some

fix(LoadUnit): uncache should not be generated when page fault (#4442)

As the comment says, even if a `PF` is generated, an address is still
generated for `PMP/PMA` checking, which can lead to some strange
responses.
Since the previous(https://github.com/OpenXiangShan/XiangShan/pull/4426)
modification removed `s2_exception`, this resulted in the incorrect
generation of `s2_uncache`.

This is now represented using clearer semantics:
`s2_actually_uncache`: this real physical address is for uncache space.
The `s2_uncache` has been retained to distinguish if it's a request from
prefetching, which may be handled in a subsequent change to **YQ senior
sister**.

I synchronised the changes to StoreUnit in this
pr(https://github.com/OpenXiangShan/XiangShan/pull/4441).

show more ...


# bbed9f8d 17-Mar-2025 Anzo <[email protected]>

fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426)

The loadAddrMisaligned exception is generated when misaligned accesses
uncache space.

---

A misaligned load sets a loadA

fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426)

The loadAddrMisaligned exception is generated when misaligned accesses
uncache space.

---

A misaligned load sets a loadAddrMisaligned exception at the s0 flag to
ensure that it only enters the loadmisalignbuffer and has no other side
effects.
So it will prevent s2_uncache from spawning properly.
Previously we used an additional `s2_un_misalign_exception` to flag
this.
Now, after examining the semantics of s2_uncache, the semantics of
s2_uncache can be appropriately represented by directly removing the
excepiont related signals

show more ...


# 522c7f99 07-Mar-2025 Anzo <[email protected]>

fix(LSU): misaligned violation detection stuck (#4369)

Since a load instruction that cross 16Byte needs to be split and
accessed twice, it needs to enter the `RAR Queue` twice, but occupies
only one

fix(LSU): misaligned violation detection stuck (#4369)

Since a load instruction that cross 16Byte needs to be split and
accessed twice, it needs to enter the `RAR Queue` twice, but occupies
only one `virtual load queue`, so in the extreme case it may happen that
36 load instructions that span 16Byte fill all 72 `RAR queues`.

---

There is some problem with our previous handling; if the oldest load
instruction spanning 16Byte enters the `replayqueue` and at the same
time there exists an instruction in the `loadmisalignbuffer` that can't
finish executing because the `RAR Queue` is full, then the oldest load
instruction is never cannot be issued because the `loadmisalignbuffer`
has instructions in it all the time.

---

Therefore, we use a more violent scheme to do this.
When the RAR is full, we let the misaligned load generate a rollback,
and the next load instruction that the loadmisalignbuffer can receive
must be the oldest (if it is misaligned).

show more ...


# 90f8d3cf 06-Mar-2025 cz4e <[email protected]>

fix(LoadUnit): exclude prefetch requests (#4367)

* In order to ensure timing, the RAR enqueue conditions need to be
compromised, worst source of timing from `pmp` and `missQueue`.

* if `LoadQueueRA

fix(LoadUnit): exclude prefetch requests (#4367)

* In order to ensure timing, the RAR enqueue conditions need to be
compromised, worst source of timing from `pmp` and `missQueue`.

* if `LoadQueueRARSize` == `VirtualLoadQueueSize`, just need to exclude
prefetching.

* if `LoadQueueRARSize` < `VirtualLoadQueueSize`, need to consider the
situation of `s2_can_query`

show more ...


# 25381b72 05-Mar-2025 Anzo <[email protected]>

fix(LoadUnit): misalign wakeup should not set s0 valid (#4359)

`s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to
`s0_src_valid_vec` is valid when any of the inputs `valid`. The

fix(LoadUnit): misalign wakeup should not set s0 valid (#4359)

`s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to
`s0_src_valid_vec` is valid when any of the inputs `valid`. Therefore,
`misalign wakeup` needs to globally control `s0_valid`.

show more ...


# 7ea48366 03-Mar-2025 Anzo <[email protected]>

fix(LoadUnit): misalign load wakeup not enter loadunit (#4333)


# 0d55e1db 28-Feb-2025 cz4e <[email protected]>

timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297)

* Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to
add additional logic for rar enq
* When no need fast replay,

timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297)

* Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to
add additional logic for rar enq
* When no need fast replay, loadunit allocate raw entry

show more ...


# 66e9b546 27-Feb-2025 Yanqin Li <[email protected]>

fix(LDU): nc is also not mis-aligned (#4326)


# 99ce5576 20-Feb-2025 cz4e <[email protected]>

style(Bundles): rewrite bundles with new style (#4274)


# 48f7f553 20-Feb-2025 Yanqin Li <[email protected]>

fix(LDU): only tlb hit can use tlb resp (#4293)


# 5a36f63d 20-Feb-2025 Anzo <[email protected]>

fix(LoadUnit): corrupt should be triggered on valid mshr (#4292)


# 638f3d84 17-Feb-2025 Yanqin Li <[email protected]>

fix(uncache): uncache load fails to replay (#4275)

Fixed the situation where the nc_with_data was not replayed correctly.


# ccde5272 16-Feb-2025 cz4e <[email protected]>

fix(LoadUnit): fix misalign load wrong wakeup (#4263)

when `io.dcache.req.ready` is false, misalign load will be stall, but
`wakeup` still work normally and is not canceled in `s3`, which will
caus

fix(LoadUnit): fix misalign load wrong wakeup (#4263)

when `io.dcache.req.ready` is false, misalign load will be stall, but
`wakeup` still work normally and is not canceled in `s3`, which will
cause the backend to get wrong data.

show more ...


# 9e12e8ed 08-Feb-2025 cz4e <[email protected]>

style(Bundles): move bundles to Bundles.scala (#4247)


# faeef328 27-Jan-2025 Anzo <[email protected]>

fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226)

`prefetch.w` sends a write request to `TLB/PMA/PMP`.
As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the
write

fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226)

`prefetch.w` sends a write request to `TLB/PMA/PMP`.
As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the
write request.

---

Previously, we only handled the case where `prefetch.r` did not have
read permissions, not handled the case where `prefetch.w` did not have
write permissions.
**So, when `prefetch.w` has an address without write permissions, the
request will still be sent to `Dcache`, which generates an error.**

**This pr fixes that, when `PMA/PMP` returns `io.pmp.st`, we generate
`dcache.s2_kill`.**

show more ...


# 74050fc0 26-Jan-2025 Yanqin Li <[email protected]>

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already exists in the `ubuffer`?

* **Old Design**: Always **reject** the new request.
* **New Desig**n: Consider **merging** requests.

## Merge Scenarios

‼️If the new one can be merge into the existing one, both need to be
`NC`.

1. **New Store Request:**
1. **Existing Store:** Merge (the new store is younger).
2. **Existing Load:** Reject.

2. **New Load Request:**
1. **Existing Load:** Merge (the new load may be younger or older. Both
are ok to merge).
2. **Existing Store:** Reject.

# What this PR do?

## 1. Entry Actions

1. **Allocate** a new entry and mark as `valid`
1. When there is no matching address.
2. **Allocate** a new entry and mark as `valid` and `waitSame`:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is either selected to issue or issued.
3. **Merge** into an Existing Entry:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is **not** selected to issue or issued.
4. **Reject** the New Request:
1. When the ubuffer is full.
2. When there is a matching address, but:
* The virtual addresses or attributes are **different**.

**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must
be continuous and naturally aligned, and the `addr` must correspond to
the mask. Therefore, the "**same attributes**" here introduces a new
condition: the merged `mask` must meet the requirements of being
continuous and naturally aligned (function `continueAndAlign`). During
merging, the block offset of addr must be synchronously updated in
`UncacheEntry.update`.

## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache
(S)`

> `mid`: master id
>
> `sid`: slave id

**Old Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req`, records the **`mid`**.
- `S` sends a `resp` with the **`mid`**.
- `M` receives the `resp` and matches it with the recorded **`mid`**.

**New Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req` and responds with `{mid, sid}` .
- `M` matches it with the **`mid`** and updates its record with the
received **`sid`**.
- `S` sends a `resp` with the its **`sid`**.
- `M` receives the `resp` and matches it with the recorded **`sid`**.

**Benefit:** The new design allows `S` to merge requests when new
request enters.

## 3. Forwarding Mechanism

**Old Design:** Each address in the `ubuffer` is **unique**, so
forwarding is straightforward based on a match.

**New Design:**

* A single address may have up to two entries matched in the `ubuffer`.
* If it has two matched enties, it must be true that one entry is marked
`inflight` and the other entry is marked `waitSame`. In this case, the
forwarded data comes from the merged data of two entries, with the
`inflight` entry being the older one.

## 4. Bug Fixes

1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`,
because when `tlbValid` is false, `!tlbMiss` can still be true.
2. `Uncache` state machine transition: The state indicating "**able to
send requests**" (previously `s_refill_req`, now `s_inflight`) should
not be triggered by `reqFire` but rather by `acquireFire`.

<img width="747" alt="image"
src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef"
/>

# Evaluation

- ✅ timing
- ✅ performance

| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO |
| -------------- | ------- | ----------- | ------- | ----------- |
| IO | 51026 | 1 | 208149 | 1.00 |
| NC | 42343 | 1.21 | 169248 | 1.23 |
| NC+OT | 20379 | 2.50 | 160101 | 1.30 |
| NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 |
| cache | 1298 | 39.31 | 4410 | 47.20 |

show more ...


# fa5e530d 21-Jan-2025 cz4e <[email protected]>

timing(VSegmentUnit): duplicate latchVAddr (#4209)

* `latchVAddr` needs to index all dcache data sram from top to bottom,
which causes a large fanout, so duplicate `latchVaddr`


# 0b4afd34 15-Jan-2025 cz4e <[email protected]>

timing(LoadUnit): optimization load unit writeback data generate logic (#4167)

optimization load unit writeback data generate logic
* merge multi source data at `s2`, select and expand data at `s3`

timing(LoadUnit): optimization load unit writeback data generate logic (#4167)

optimization load unit writeback data generate logic
* merge multi source data at `s2`, select and expand data at `s3`
* select data use one-hot instead of shifter

show more ...


# 37f33e11 13-Jan-2025 cz4e <[email protected]>

timing(LoadUnit): fpWen and pdest reg out (#4144)

when loadunit writeback
* **fpWen** uses register directly out
* **pdest** uses register directly out


12345678910>>...17