The adjustments made for XSA-379 as well as those subsequently becoming
XSA-387 still left a race window, when a HVM or PVH guest does a grant
table version change from v2 to v1 in parallel with mapping the status
page(s) via XENMEM_add_to_physmap. Some of the status pages may then be
freed while mappings of them would still be inserted into the guest's
secondary (P2M) page tables.
Any guest can cause xenstored to crash by issuing a XS_RESET_WATCHES
command within a transaction due to an assert() triggering.
In case xenstored was built with NDEBUG #defined nothing bad will
happen, as assert() is doing nothing in this case. Note that the
default is not to define NDEBUG for xenstored builds even in release
builds of Xen.
Any guest issuing a Xenstore command accessing a node using the
(illegal) node path "/local/domain/", will crash xenstored due to a
clobbered error indicator in xenstored when verifying the node path.
Note that the crash is forced via a failing assert() statement in
xenstored. In case xenstored is being built with NDEBUG #defined,
an unprivileged guest trying to access the node path "/local/domain/"
will result in it no longer being serviced by xenstored, other guests
(including dom0) will still be serviced, but xenstored will use up
all cpu time it can get.
The Intel EPT paging code uses an optimization to defer flushing of any cached
EPT state until the p2m lock is dropped, so that multiple modifications done
under the same locked region only issue a single flush.
Freeing of paging structures however is not deferred until the flushing is
done, and can result in freed pages transiently being present in cached state.
Such stale entries can point to memory ranges not owned by the guest, thus
allowing access to unintended memory regions.
In the context switch logic Xen attempts to skip an IBPB in the case of
a vCPU returning to a CPU on which it was the previous vCPU to run.
While safe for Xen's isolation between vCPUs, this prevents the guest
kernel correctly isolating between tasks. Consider:
1) vCPU runs on CPU A, running task 1.
2) vCPU moves to CPU B, idle gets scheduled on A. Xen skips IBPB.
3) On CPU B, guest kernel switches from task 1 to 2, issuing IBPB.
4) vCPU moves back to CPU A. Xen skips IBPB again.
Now, task 2 is running on CPU A with task 1's training still in the BTB.
Shadow mode tracing code uses a set of per-CPU variables to avoid
cumbersome parameter passing. Some of these variables are written to
with guest controlled data, of guest controllable size. That size can
be larger than the variable, and bounding of the writes was missing.
When passing through PCI devices, the detach logic in libxl won't remove
access permissions to any 64bit memory BARs the device might have. As a
result a domain can still have access any 64bit memory BAR when such
device is no longer assigned to the domain.
For PV domains the permission leak allows the domain itself to map the memory
in the page-tables. For HVM it would require a compromised device model or
stubdomain to map the leaked memory into the HVM domain p2m.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
Some Viridian hypercalls can specify a mask of vCPU IDs as an input, in
one of three formats. Xen has boundary checking bugs with all three
formats, which can cause out-of-bounds reads and writes while processing
the inputs.
* CVE-2025-58147. Hypercalls using the HV_VP_SET Sparse format can
cause vpmask_set() to write out of bounds when converting the bitmap
to Xen's format.
* CVE-2025-58148. Hypercalls using any input format can cause
send_ipi() to read d->vcpu[] out-of-bounds, and operate on a wild
vCPU pointer.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
Some Viridian hypercalls can specify a mask of vCPU IDs as an input, in
one of three formats. Xen has boundary checking bugs with all three
formats, which can cause out-of-bounds reads and writes while processing
the inputs.
* CVE-2025-58147. Hypercalls using the HV_VP_SET Sparse format can
cause vpmask_set() to write out of bounds when converting the bitmap
to Xen's format.
* CVE-2025-58148. Hypercalls using any input format can cause
send_ipi() to read d->vcpu[] out-of-bounds, and operate on a wild
vCPU pointer.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
There are two issues related to the mapping of pages belonging to other
domains: For one, an assertion is wrong there, where the case actually
needs handling. A NULL pointer de-reference could result on a release
build. This is CVE-2025-58144.
And then the P2M lock isn't held until a page reference was actually
obtained (or the attempt to do so has failed). Otherwise the page can
not only change type, but even ownership in between, thus allowing
domain boundaries to be violated. This is CVE-2025-58145.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
There are two issues related to the mapping of pages belonging to other
domains: For one, an assertion is wrong there, where the case actually
needs handling. A NULL pointer de-reference could result on a release
build. This is CVE-2025-58144.
And then the P2M lock isn't held until a page reference was actually
obtained (or the attempt to do so has failed). Otherwise the page can
not only change type, but even ownership in between, thus allowing
domain boundaries to be violated. This is CVE-2025-58145.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
There are multiple issues related to the handling and accessing of guest
memory pages in the viridian code:
1. A NULL pointer dereference in the updating of the reference TSC area.
This is CVE-2025-27466.
2. A NULL pointer dereference by assuming the SIM page is mapped when
a synthetic timer message has to be delivered. This is
CVE-2025-58142.
3. A race in the mapping of the reference TSC page, where a guest can
get Xen to free a page while still present in the guest physical to
machine (p2m) page tables. This is CVE-2025-58143.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
There are multiple issues related to the handling and accessing of guest
memory pages in the viridian code:
1. A NULL pointer dereference in the updating of the reference TSC area.
This is CVE-2025-27466.
2. A NULL pointer dereference by assuming the SIM page is mapped when
a synthetic timer message has to be delivered. This is
CVE-2025-58142.
3. A race in the mapping of the reference TSC page, where a guest can
get Xen to free a page while still present in the guest physical to
machine (p2m) page tables. This is CVE-2025-58143.
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
There are multiple issues related to the handling and accessing of guest
memory pages in the viridian code:
1. A NULL pointer dereference in the updating of the reference TSC area.
This is CVE-2025-27466.
2. A NULL pointer dereference by assuming the SIM page is mapped when
a synthetic timer message has to be delivered. This is
CVE-2025-58142.
3. A race in the mapping of the reference TSC page, where a guest can
get Xen to free a page while still present in the guest physical to
machine (p2m) page tables. This is CVE-2025-58143.
When setting up interrupt remapping for legacy PCI(-X) devices,
including PCI(-X) bridges, a lookup of the upstream bridge is required.
This lookup, itself involving acquiring of a lock, is done in a context
where acquiring that lock is unsafe. This can lead to a deadlock.
Certain instructions need intercepting and emulating by Xen. In some
cases Xen emulates the instruction by replaying it, using an executable
stub. Some instructions may raise an exception, which is supposed to be
handled gracefully. Certain replayed instructions have additional logic
to set up and recover the changes to the arithmetic flags.
For replayed instructions where the flags recovery logic is used, the
metadata for exception handling was incorrect, preventing Xen from
handling the the exception gracefully, treating it as fatal instead.
For a brief summary of Xapi terminology, see:
https://xapi-project.github.io/xen-api/overview.html#object-model-overview
Xapi contains functionality to backup and restore metadata about Virtual
Machines and Storage Repositories (SRs).
The metadata itself is stored in a Virtual Disk Image (VDI) inside an
SR. This is used for two purposes; a general backup of metadata
(e.g. to recover from a host failure if the filer is still good), and
Portable SRs (e.g. using an external hard drive to move VMs to another
host).
Metadata is only restored as an explicit administrator action, but
occurs in cases where the host has no information about the SR, and must
locate the metadata VDI in order to retrieve the metadata.
The metadata VDI is located by searching (in UUID alphanumeric order)
each VDI, mounting it, and seeing if there is a suitable metadata file
present. The first matching VDI is deemed to be the metadata VDI, and
is restored from.
In the general case, the content of VDIs are controlled by the VM owner,
and should not be trusted by the host administrator.
A malicious guest can manipulate its disk to appear to be a metadata
backup.
A guest cannot choose the UUIDs of its VDIs, but a guest with one disk
has a 50% chance of sorting ahead of the legitimate metadata backup. A
guest with two disks has a 75% chance, etc.
PVH guests have their ACPI tables constructed by the toolstack. The
construction involves building the tables in local memory, which are
then copied into guest memory. While actually used parts of the local
memory are filled in correctly, excess space that is being allocated is
left with its prior contents.
The hypervisor contains code to accelerate VGA memory accesses for HVM
guests, when the (virtual) VGA is in "standard" mode. Locking involved
there has an unusual discipline, leaving a lock acquired past the
return from the function that acquired it. This behavior results in a
problem when emulating an instruction with two memory accesses, both of
which touch VGA memory (plus some further constraints which aren't
relevant here). When emulating the 2nd access, the lock that is already
being held would be attempted to be re-acquired, resulting in a
deadlock.
This deadlock was already found when the code was first introduced, but
was analysed incorrectly and the fix was incomplete. Analysis in light
of the new finding cannot find a way to make the existing locking
discipline work.
In staging, this logic has all been removed because it was discovered
to be accidentally disabled since Xen 4.7. Therefore, we are fixing the
locking problem by backporting the removal of most of the feature. Note
that even with the feature disabled, the lock would still be acquired
for any accesses to the VGA MMIO region.
In x86's APIC (Advanced Programmable Interrupt Controller) architecture,
error conditions are reported in a status register. Furthermore, the OS
can opt to receive an interrupt when a new error occurs.
It is possible to configure the error interrupt with an illegal vector,
which generates an error when an error interrupt is raised.
This case causes Xen to recurse through vlapic_error(). The recursion
itself is bounded; errors accumulate in the the status register and only
generate an interrupt when a new status bit becomes set.
However, the lock protecting this state in Xen will try to be taken
recursively, and deadlock.