History log of /XiangShan/src/main/scala/xiangshan/mem/lsqueue/LoadQueueUncache.scala (Results 1 – 10 of 10)
Revision Date Author Comments
# afa1262c 24-Feb-2025 Yanqin Li <[email protected]>

fix(LoadQueueUncache): exhaust the various cases of flush (#4300)

**Bug trigger point:**

The flush occurs during the `s_wait` phase. The entry has already passed
the flush trigger condition of `io.

fix(LoadQueueUncache): exhaust the various cases of flush (#4300)

**Bug trigger point:**

The flush occurs during the `s_wait` phase. The entry has already passed
the flush trigger condition of `io.uncache.resp.fire`, leading to no
flush. As a result, `needFlushReg` remains in the register until the
next new entry's `io.uncache.resp.fire`, at which point the normal entry
is flushed, causing the program to stuck.

**Bug analysis:** The granularity of flush handling is too coarse.

In the original calculation:
```
val flush = (needFlush && uncacheState === s_idle) || (io.uncache.resp.fire && needFlushReg)
```
Flush is only handled in two states: `s_idle` and non-`s_idle`. This
distinction makes the handling of the other three non-`s_idle` states
very coarse. In fact, for the remaining three states, there needs to be
corresponding feedback based on when `needFlush` is generated and when
`NeedFlushReg` is delayed in the register.
1. In the `s_req` state, before the uncache request is sent, the flush
can be performed in time, using `needFlush` to prevent the request from
being sent.
2. If the request has been sent and the state reaches `s_resp`, to avoid
mismatch between the uncache request and response, the flush can be only
performed after receiving the uncache response, i.e., use `needFlush ||
needFlushReg` to flush when `io.uncache.resp.fire`.
3. If a flush occurs during the `s_wait` state, it can also prevent a
write-back and use `needFlush` to flush in time.

**Bug Fix:**

For better code readability, the `uncacheState` state machine update is
used here to update the `wire` `flush`. Where `flush` refers to
executing the flush, `needFlush` refers to the signal that triggers the
flush, and `needFlushReg` refers to the flush signal stored for delayed
processing flush.

show more ...


# 9e12e8ed 08-Feb-2025 cz4e <[email protected]>

style(Bundles): move bundles to Bundles.scala (#4247)


# c590fb32 08-Feb-2025 cz4e <[email protected]>

refactor(MemBlock): move MemBlock.scala from backend to mem (#4221)


# 74050fc0 26-Jan-2025 Yanqin Li <[email protected]>

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already exists in the `ubuffer`?

* **Old Design**: Always **reject** the new request.
* **New Desig**n: Consider **merging** requests.

## Merge Scenarios

‼️If the new one can be merge into the existing one, both need to be
`NC`.

1. **New Store Request:**
1. **Existing Store:** Merge (the new store is younger).
2. **Existing Load:** Reject.

2. **New Load Request:**
1. **Existing Load:** Merge (the new load may be younger or older. Both
are ok to merge).
2. **Existing Store:** Reject.

# What this PR do?

## 1. Entry Actions

1. **Allocate** a new entry and mark as `valid`
1. When there is no matching address.
2. **Allocate** a new entry and mark as `valid` and `waitSame`:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is either selected to issue or issued.
3. **Merge** into an Existing Entry:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is **not** selected to issue or issued.
4. **Reject** the New Request:
1. When the ubuffer is full.
2. When there is a matching address, but:
* The virtual addresses or attributes are **different**.

**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must
be continuous and naturally aligned, and the `addr` must correspond to
the mask. Therefore, the "**same attributes**" here introduces a new
condition: the merged `mask` must meet the requirements of being
continuous and naturally aligned (function `continueAndAlign`). During
merging, the block offset of addr must be synchronously updated in
`UncacheEntry.update`.

## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache
(S)`

> `mid`: master id
>
> `sid`: slave id

**Old Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req`, records the **`mid`**.
- `S` sends a `resp` with the **`mid`**.
- `M` receives the `resp` and matches it with the recorded **`mid`**.

**New Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req` and responds with `{mid, sid}` .
- `M` matches it with the **`mid`** and updates its record with the
received **`sid`**.
- `S` sends a `resp` with the its **`sid`**.
- `M` receives the `resp` and matches it with the recorded **`sid`**.

**Benefit:** The new design allows `S` to merge requests when new
request enters.

## 3. Forwarding Mechanism

**Old Design:** Each address in the `ubuffer` is **unique**, so
forwarding is straightforward based on a match.

**New Design:**

* A single address may have up to two entries matched in the `ubuffer`.
* If it has two matched enties, it must be true that one entry is marked
`inflight` and the other entry is marked `waitSame`. In this case, the
forwarded data comes from the merged data of two entries, with the
`inflight` entry being the older one.

## 4. Bug Fixes

1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`,
because when `tlbValid` is false, `!tlbMiss` can still be true.
2. `Uncache` state machine transition: The state indicating "**able to
send requests**" (previously `s_refill_req`, now `s_inflight`) should
not be triggered by `reqFire` but rather by `acquireFire`.

<img width="747" alt="image"
src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef"
/>

# Evaluation

- ✅ timing
- ✅ performance

| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO |
| -------------- | ------- | ----------- | ------- | ----------- |
| IO | 51026 | 1 | 208149 | 1.00 |
| NC | 42343 | 1.21 | 169248 | 1.23 |
| NC+OT | 20379 | 2.50 | 160101 | 1.30 |
| NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 |
| cache | 1298 | 39.31 | 4410 | 47.20 |

show more ...


# a035c20d 02-Jan-2025 Yanqin Li <[email protected]>

fix(LQUncache): fix a potential deadblock when enqueue (#4096)

**Old design**:
When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated
first.

**Bug scene:**
LQUncacheBuffer is small. T

fix(LQUncache): fix a potential deadblock when enqueue (#4096)

**Old design**:
When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated
first.

**Bug scene:**
LQUncacheBuffer is small. The enqueue `robIdx` of ldu0-1 is [57, 56,
55], the [57, 56] can enqueue, and [55] can not because buffer is full.
57/56 send the `NC` request after enqueuing. 55 is rollbacked. In
principle, 57 and 56 need be flushed. But to ensure the correspondence
between requests and responses of uncache, 57 is flushed when getting
the uncache response. So when the same sequence [57, 56, 55] is coming,
there is still no space to allocate 55, which causes that it is
rollbacked again. Then a deadblock emerged.
This bug is triggered after cutting `LoadUncacheBufferSize` from 20 to
4.

**One way to fix**:
When enqueuing, it is in the order of `robIdx`, i.e. the oldest is
allocated first.

show more ...


# 519244c7 25-Dec-2024 Yanqin Li <[email protected]>

submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)

* L1: deliver the NC and PMA signals of uncacheReq to L2
* L2: [support Svpbmt on CHI
MemAttr](https://github.com/OpenXiangShan/Coupl

submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)

* L1: deliver the NC and PMA signals of uncacheReq to L2
* L2: [support Svpbmt on CHI
MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273)
* LLC: [Non-cache requests are forwarded directly downstream without
entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28)

show more ...


# 54b55f34 24-Dec-2024 Yanqin Li <[email protected]>

fix(LQUncache): consider offset when allocating (#4080)

bug scene:

When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only
allocate one entry (when the `canAllocate` vector is [1, 0

fix(LQUncache): consider offset when allocating (#4080)

bug scene:

When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only
allocate one entry (when the `canAllocate` vector is [1, 0, 0]), the
ldu2's request can not be allocated and then be rollbacked. This is
because the allocation did not take into account the valid offset.

show more ...


# 8b33cd30 13-Dec-2024 klin02 <[email protected]>

feat(XSLog): move all XSLog outside WhenContext for collection

As data in WhenContext is not acessible in another module. To support
XSLog collection, we move all XSLog and related signal outside
Wh

feat(XSLog): move all XSLog outside WhenContext for collection

As data in WhenContext is not acessible in another module. To support
XSLog collection, we move all XSLog and related signal outside
WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to
XSDebug(cond1 && cond2, pable)

show more ...


# e10e20c6 27-Nov-2024 Yanqin Li <[email protected]>

style(pbmt): remove the useless and standardize code

* style(pbmt): remove outstanding constant which is just for self-test

* fix(uncache): added mask comparison for `addrMatch`

* style(mem): code

style(pbmt): remove the useless and standardize code

* style(pbmt): remove outstanding constant which is just for self-test

* fix(uncache): added mask comparison for `addrMatch`

* style(mem): code normalization

* fix(pbmt): handle cases where the load unit is byte, word, etc

* style(uncache): fix an import

* fix(uncahce): address match should use non-offset address when forwading

In this case, to ensure correct forwarding, stores with the same address but overlapping masks cannot be entered at the same time.

* style(RAR): remove redundant design of `nc` reg

show more ...


# e9e6cd09 27-Nov-2024 Yanqin Li <[email protected]>

perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2