History log of /XiangShan/src/main/scala/xiangshan/mem/lsqueue/LSQWrapper.scala (Results 1 – 25 of 154)
Revision Date Author Comments
# 522c7f99 07-Mar-2025 Anzo <[email protected]>

fix(LSU): misaligned violation detection stuck (#4369)

Since a load instruction that cross 16Byte needs to be split and
accessed twice, it needs to enter the `RAR Queue` twice, but occupies
only one

fix(LSU): misaligned violation detection stuck (#4369)

Since a load instruction that cross 16Byte needs to be split and
accessed twice, it needs to enter the `RAR Queue` twice, but occupies
only one `virtual load queue`, so in the extreme case it may happen that
36 load instructions that span 16Byte fill all 72 `RAR queues`.

---

There is some problem with our previous handling; if the oldest load
instruction spanning 16Byte enters the `replayqueue` and at the same
time there exists an instruction in the `loadmisalignbuffer` that can't
finish executing because the `RAR Queue` is full, then the oldest load
instruction is never cannot be issued because the `loadmisalignbuffer`
has instructions in it all the time.

---

Therefore, we use a more violent scheme to do this.
When the RAR is full, we let the misaligned load generate a rollback,
and the next load instruction that the loadmisalignbuffer can receive
must be the oldest (if it is misaligned).

show more ...


# 3c808de0 17-Feb-2025 Anzo <[email protected]>

fix(LSU): fix cbo instr exceptions and implementation (#4262)

1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly

fix(LSU): fix cbo instr exceptions and implementation (#4262)

1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
5. Adding RAW checks to `cbo zero`.
6. Adding trigger(Debug Mode) checks to `cbo zero`.
7. Fixed several issues with the CBO instruction in NEMU.
----

In order not to create ambiguity with `io.mmioStout`, a new port of
`StoreQueue` is introduced for writeback `cbo zero` after flush sbuffer.
arbitration is performed in `MemBlock`, and currently, `cbo zero` has
higher priority by default.
`cbo zero` should not be writteback at the same time as `mmio`.

---
A check on `CacheLine` has been added to `RAWQueue` to ensure memory
consistency when executing `cbo zero`.
See this issues:https://github.com/OpenXiangShan/XiangShan/issues/4240
for specific issues.

---
The `cbo` instruction requires a trigger check.

---------

Co-authored-by: zhanglinjuan <[email protected]>

show more ...


# 9e12e8ed 08-Feb-2025 cz4e <[email protected]>

style(Bundles): move bundles to Bundles.scala (#4247)


# 74050fc0 26-Jan-2025 Yanqin Li <[email protected]>

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already exists in the `ubuffer`?

* **Old Design**: Always **reject** the new request.
* **New Desig**n: Consider **merging** requests.

## Merge Scenarios

‼️If the new one can be merge into the existing one, both need to be
`NC`.

1. **New Store Request:**
1. **Existing Store:** Merge (the new store is younger).
2. **Existing Load:** Reject.

2. **New Load Request:**
1. **Existing Load:** Merge (the new load may be younger or older. Both
are ok to merge).
2. **Existing Store:** Reject.

# What this PR do?

## 1. Entry Actions

1. **Allocate** a new entry and mark as `valid`
1. When there is no matching address.
2. **Allocate** a new entry and mark as `valid` and `waitSame`:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is either selected to issue or issued.
3. **Merge** into an Existing Entry:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is **not** selected to issue or issued.
4. **Reject** the New Request:
1. When the ubuffer is full.
2. When there is a matching address, but:
* The virtual addresses or attributes are **different**.

**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must
be continuous and naturally aligned, and the `addr` must correspond to
the mask. Therefore, the "**same attributes**" here introduces a new
condition: the merged `mask` must meet the requirements of being
continuous and naturally aligned (function `continueAndAlign`). During
merging, the block offset of addr must be synchronously updated in
`UncacheEntry.update`.

## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache
(S)`

> `mid`: master id
>
> `sid`: slave id

**Old Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req`, records the **`mid`**.
- `S` sends a `resp` with the **`mid`**.
- `M` receives the `resp` and matches it with the recorded **`mid`**.

**New Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req` and responds with `{mid, sid}` .
- `M` matches it with the **`mid`** and updates its record with the
received **`sid`**.
- `S` sends a `resp` with the its **`sid`**.
- `M` receives the `resp` and matches it with the recorded **`sid`**.

**Benefit:** The new design allows `S` to merge requests when new
request enters.

## 3. Forwarding Mechanism

**Old Design:** Each address in the `ubuffer` is **unique**, so
forwarding is straightforward based on a match.

**New Design:**

* A single address may have up to two entries matched in the `ubuffer`.
* If it has two matched enties, it must be true that one entry is marked
`inflight` and the other entry is marked `waitSame`. In this case, the
forwarded data comes from the merged data of two entries, with the
`inflight` entry being the older one.

## 4. Bug Fixes

1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`,
because when `tlbValid` is false, `!tlbMiss` can still be true.
2. `Uncache` state machine transition: The state indicating "**able to
send requests**" (previously `s_refill_req`, now `s_inflight`) should
not be triggered by `reqFire` but rather by `acquireFire`.

<img width="747" alt="image"
src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef"
/>

# Evaluation

- ✅ timing
- ✅ performance

| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO |
| -------------- | ------- | ----------- | ------- | ----------- |
| IO | 51026 | 1 | 208149 | 1.00 |
| NC | 42343 | 1.21 | 169248 | 1.23 |
| NC+OT | 20379 | 2.50 | 160101 | 1.30 |
| NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 |
| cache | 1298 | 39.31 | 4410 | 47.20 |

show more ...


# e836c770 16-Jan-2025 Zhaoyang You <[email protected]>

feat(TopDown): add TopDown PMU Events (#4122)

This PR adds hardware synthesizable three-level categorized TopDown
performance counters.
Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo

feat(TopDown): add TopDown PMU Events (#4122)

This PR adds hardware synthesizable three-level categorized TopDown
performance counters.
Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound.
Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch
Missprediction, machine clears, Core Bound, Memory Bound.
Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound.

show more ...


# be8e95bc 25-Dec-2024 Anzo <[email protected]>

fix(MemBlock): fix overflow during lsqptr calculation (#4084)

The addition used previously to calculate the `lsq` pointer results in
overflow, this is because, the bit width of `numLsElem` is 5 and

fix(MemBlock): fix overflow during lsqptr calculation (#4084)

The addition used previously to calculate the `lsq` pointer results in
overflow, this is because, the bit width of `numLsElem` is 5 and
multiple uop accumulations result in data overflow.

---

Theoretically this would have been a problem in previous versions as
well, but for some reason the bug didn't occur in previous versions
until `newDispatch`.

show more ...


# 0a7d1d5c 22-Nov-2024 xiaofeibao <[email protected]>

feat(backend): NewDispatch


# b240e1c0 07-Nov-2024 Anzooooo <[email protected]>

feat(Zicclsm): refactoring misalign and support vector misalign


# e9e6cd09 27-Nov-2024 Yanqin Li <[email protected]>

perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2


# e04c5f64 19-Nov-2024 Yanqin Li <[email protected]>

feat(outstanding): support nc outstanding and remove mmio st outstanding


# bb76fc1b 10-Oct-2024 Yanqin Li <[email protected]>

fix(NC): fix a list of bugs of NC WMO access

* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store

* fix(PBMT): nc st req is changed to a state machine execution

* fix(pbmt)

fix(NC): fix a list of bugs of NC WMO access

* fix(PBMT): skip nc difftest and handle the conflict of nc and normal store

* fix(PBMT): nc st req is changed to a state machine execution

* fix(pbmt): fix typo and control error of nc ld

* fix(pbmt): nc data assignment error

* fix(pbmt): nc should be used to wakeup

* fix(pbmt): remove wrong assert

* fix(pbmt): lots of bugs of nc st ld forward

* fix(pbmt): fix address align error

show more ...


# c7353d05 03-Sep-2024 Yanqin Li <[email protected]>

feat(NCld): support WMO access for NC ld

* feat(LDU): add support for NC in LoadUnit

* feat(LQ,UB): add support for NC in load queue and uncache buffer

* chore(pbmt): add xsperf for nc ld statistic


# dc4fac13 02-Dec-2024 CharlieLiu <[email protected]>

feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)

* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue

feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)

* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue.
* bump rocket-chip & coupledL2.

show more ...


# 189d8d00 29-Oct-2024 Anzo <[email protected]>

refactor(MemBlock): turn on `dontTouch` only when debugging (#3792)

This will result in the delivery of clean generated code and may remove
some of the pseudo-paths.


# cee1d5b2 15-Oct-2024 Yanqin Li <[email protected]>

fix(lsq): uncache req can be assigned only in idle state (#3732)

**Bug Description:**

When an uncache store (st) is immediately followed by an uncache load
(ld), due to the `AddPipelineReg` in M

fix(lsq): uncache req can be assigned only in idle state (#3732)

**Bug Description:**

When an uncache store (st) is immediately followed by an uncache load
(ld), due to the `AddPipelineReg` in MemBlock when the LSQ transfers
data with the Uncache, even though Uncache is handling the store
request, `MemBlock.uncacheReq.ready` is still true. Under the original
assignment conditions, the ld request(ld req) from LQ will be received
by `MemBlock.uncacheReq` in the `s_store` state. So when
`MemBlock.uncacheReq` is received by Uncache, the LSQ state has already
transitioned from `s_store` to `s_idle`, without switching to `s_load`.
As a result, the load response (ld resp) from Uncache can never be
received by the LSQ. The process is briefly described as follows:

1. SQ: st req
2. Uncache: st req received
3. LQ: ld req in `s_store` state
4. Uncache: st resp
5. SQ: st resp received; Uncache: ld req received
6. LSQ: state to `s_idle`
7. Uncache: ld resp
8. **ERROR**: LSQ can not receive ld resp in `s_idle` state

**Fix**:In LSQ, uncache req can be assigned only in idle state.
<img width="1179" alt="image"
src="https://github.com/user-attachments/assets/1d2d417d-06d6-43bf-a876-5cc53d0ff9ed">

show more ...


# 46e9ee74 27-Sep-2024 Haoyuan Feng <[email protected]>

fix(exception): fix exception vaddr generate logic (#3639)

In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retain

fix(exception): fix exception vaddr generate logic (#3639)

In LSU, for exceptions that can be detected before address
translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be
retained. And for exceptions detected after address translation, the
48-bit vaddr needs to be zero-extended or sign-extended according to
different modes(`GenExceptionVa`), and then write to *tval.

Also fix some connection bugs.

show more ...


# ad415ae0 21-Sep-2024 Xiaokun-Pei <[email protected]>

feat(trap): support m/htinst for specific G-stage translation (#3604)

According to RISC-V priv spec, mtinst/htinst could be always written
zero on trap into M/HS-mode, except for Guest-Page-Fault t

feat(trap): support m/htinst for specific G-stage translation (#3604)

According to RISC-V priv spec, mtinst/htinst could be always written
zero on trap into M/HS-mode, except for Guest-Page-Fault traps that meet
both of the following conditions:
- the trap is caused by a G-stage translation which supports VS-stage
translation
- a nonzero value is written to mtval2/htval

"isForVSnonLeafPTE" is used only in exceptional circumstances that gpf
happens in the G-stage translation which supports VS-stage translation,
such as searching the non-leaf pte of VS-stage.

This patch adds support for writing proper value to mtinst/htinst when
specific trap occurs. And bump the nemu.

show more ...


# db6cfb5a 19-Sep-2024 Haoyuan Feng <[email protected]>

fix(exception): check high address bits of lsu (#3596)

In previous implementation, we simply truncated the higher bits of jump
target or load & store address, which made it impossible to raise
exc

fix(exception): check high address bits of lsu (#3596)

In previous implementation, we simply truncated the higher bits of jump
target or load & store address, which made it impossible to raise
exceptions in such cases.

Commit
https://github.com/OpenXiangShan/XiangShan/commit/c1b28b66879239a5b3a44741376f3b002e8ac834
has already fixed high address bits checking of jump target. This commit
fixes lsu part, checking full address in tlb and passing full address
directly to csr.

show more ...


# b4d41c12 10-Sep-2024 xiaofeibao <[email protected]>

timing(LsqEnqCtrl): fix timing of lqAllocNumber and sqAllocNumber


# 94998b06 04-Sep-2024 happy-lx <[email protected]>

fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)

+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the

fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470)

+ @wissygh Refactored Trigger check code of Memblock.
+ Move Trigger address cmp from load S3 to S1. In addition, the
detection of trigger is moved from Memblock to LoadUnit.
- Once the breakpoint exception is detected, enter the exception Buffer
directly to handle the exception (previously, the
load instruction was executed first and then the exception was handled,
which would cause the mmio load to change the
status of the peripheral).
+ If Trigger address matches and the action is to enter debug mode, both
loadUnit and storeUnit will directly write this instruction back without
any execution (by setting this instruction as an exception).
+ Match trigger addresses for vector instructions in LoadUnit.
+ If both a misalign exception and a breakpoint occur, the breakpoint
exception will be processed first.

---------

Co-authored-by: chengguanghui <[email protected]>

show more ...


# e3ed843c 30-Aug-2024 happy-lx <[email protected]>

Remove `RVA23` prefix and enable CMO by default (#3431)

+ Remove `RVA23` prefix to clean up code
+ set `hasCMO` to true by default


# 3fbc86fc 26-Aug-2024 Chen Xi <[email protected]>

RVA23 CMO (Cache Maintenance Operation) (#3426)

Supports Zicbom Extension (Clean/Flush/Invalid)
- https://github.com/OpenXiangShan/CoupledL2/pull/225

This PR also includes other CPL2 changes:
-

RVA23 CMO (Cache Maintenance Operation) (#3426)

Supports Zicbom Extension (Clean/Flush/Invalid)
- https://github.com/OpenXiangShan/CoupledL2/pull/225

This PR also includes other CPL2 changes:
- bug fixes
- timing fixes
- SRAM-Queue | https://github.com/OpenXiangShan/CoupledL2/pull/228
- data SRAM splitted into 4 |
https://github.com/OpenXiangShan/CoupledL2/pull/229

---------

Co-authored-by: lixin <[email protected]>

show more ...


# 41d8d239 21-Aug-2024 happy-lx <[email protected]>

RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)

This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches

--

RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320)

This PR supports handling load store unaligned exceptions by hardware
and provides CSR-controlled switches

---------

Co-authored-by: xiaofeibao <[email protected]>

show more ...


# 5003e6f8 23-Jul-2024 Huijin Li <[email protected]>

LSQ: optimize static clock gating coverage and fix x_value in vcs (#3176)

optimize LSQ static clock gating coverage, fix x_value in vcs


# 16ede6bb 11-Jul-2024 weiding liu <[email protected]>

MemBlock: refactor selectOldest of rollback for better timing

Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request

MemBlock: refactor selectOldest of rollback for better timing

Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request to memblock in LoadQueueRAW.

show more ...


1234567