| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
mm/rmap: fix incorrect pte restoration for lazyfree folios
We batch unmap anonymous lazyfree folios by folio_unmap_pte_batch. If the
batch has a mix of writable and non-writable bits, we may end up setting
the entire batch writable. Fix this by respecting writable bit during
batching.
Although on a successful unmap of a lazyfree folio, the soft-dirty bit is
lost, preserve it on pte restoration by respecting the bit during
batching, to make the fix consistent w.r.t both writable bit and
soft-dirty bit.
I was able to write the below reproducer and crash the kernel.
Explanation of reproducer (set 64K mTHP to always):
Fault in a 64K large folio. Split the VMA at mid-point with
MADV_DONTFORK. fork() - parent points to the folio with 8 writable ptes
and 8 non-writable ptes. Merge the VMAs with MADV_DOFORK so that
folio_unmap_pte_batch() can determine all the 16 ptes as a batch. Do
MADV_FREE on the range to mark the folio as lazyfree. Write to the memory
to dirty the pte, eventually rmap will dirty the folio. Then trigger
reclaim, we will hit the pte restoration path, and the kernel will crash
with the trace given below.
The BUG happens at:
BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw);
The code path is asking for anonymous page to be mapped writable into the
pagetable. The BUG_ON() firing implies that such a writable page has been
mapped into the pagetables of more than one process, which breaks
anonymous memory/CoW semantics.
[ 21.134473] kernel BUG at mm/page_table_check.c:118!
[ 21.134497] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 21.135917] Modules linked in:
[ 21.136085] CPU: 1 UID: 0 PID: 1735 Comm: dup-lazyfree Not tainted 7.0.0-rc1-00116-g018018a17770 #1028 PREEMPT
[ 21.136858] Hardware name: linux,dummy-virt (DT)
[ 21.137019] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 21.137308] pc : page_table_check_set+0x28c/0x2a8
[ 21.137607] lr : page_table_check_set+0x134/0x2a8
[ 21.137885] sp : ffff80008a3b3340
[ 21.138124] x29: ffff80008a3b3340 x28: fffffdffc3d14400 x27: ffffd1a55e03d000
[ 21.138623] x26: 0040000000000040 x25: ffffd1a55f7dd000 x24: 0000000000000001
[ 21.139045] x23: 0000000000000001 x22: 0000000000000001 x21: ffffd1a55f217f30
[ 21.139629] x20: 0000000000134521 x19: 0000000000134519 x18: 005c43e000040000
[ 21.140027] x17: 0001400000000000 x16: 0001700000000000 x15: 000000000000ffff
[ 21.140578] x14: 000000000000000c x13: 005c006000000000 x12: 0000000000000020
[ 21.140828] x11: 0000000000000000 x10: 005c000000000000 x9 : ffffd1a55c079ee0
[ 21.141077] x8 : 0000000000000001 x7 : 005c03e000040000 x6 : 000000004000ffff
[ 21.141490] x5 : ffff00017fffce00 x4 : 0000000000000001 x3 : 0000000000000002
[ 21.141741] x2 : 0000000000134510 x1 : 0000000000000000 x0 : ffff0000c08228c0
[ 21.141991] Call trace:
[ 21.142093] page_table_check_set+0x28c/0x2a8 (P)
[ 21.142265] __page_table_check_ptes_set+0x144/0x1e8
[ 21.142441] __set_ptes_anysz.constprop.0+0x160/0x1a8
[ 21.142766] contpte_set_ptes+0xe8/0x140
[ 21.142907] try_to_unmap_one+0x10c4/0x10d0
[ 21.143177] rmap_walk_anon+0x100/0x250
[ 21.143315] try_to_unmap+0xa0/0xc8
[ 21.143441] shrink_folio_list+0x59c/0x18a8
[ 21.143759] shrink_lruvec+0x664/0xbf0
[ 21.144043] shrink_node+0x218/0x878
[ 21.144285] __node_reclaim.constprop.0+0x98/0x338
[ 21.144763] user_proactive_reclaim+0x2a4/0x340
[ 21.145056] reclaim_store+0x3c/0x60
[ 21.145216] dev_attr_store+0x20/0x40
[ 21.145585] sysfs_kf_write+0x84/0xa8
[ 21.145835] kernfs_fop_write_iter+0x130/0x1c8
[ 21.145994] vfs_write+0x2b8/0x368
[ 21.146119] ksys_write+0x70/0x110
[ 21.146240] __arm64_sys_write+0x24/0x38
[ 21.146380] invoke_syscall+0x50/0x120
[ 21.146513] el0_svc_common.constprop.0+0x48/0xf8
[ 21.146679] do_el0_svc+0x28/0x40
[ 21.146798] el0_svc+0x34/0x110
[ 21.146926] el0t
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
nvdimm/bus: Fix potential use after free in asynchronous initialization
Dingisoul with KASAN reports a use after free if device_add() fails in
nd_async_device_register().
Commit b6eae0f61db2 ("libnvdimm: Hold reference on parent while
scheduling async init") correctly added a reference on the parent device
to be held until asynchronous initialization was complete. However, if
device_add() results in an allocation failure the ref count of the
device drops to 0 prior to the parent pointer being accessed. Thus
resulting in use after free.
The bug bot AI correctly identified the fix. Save a reference to the
parent pointer to be used to drop the parent reference regardless of the
outcome of device_add(). |
| In the Linux kernel, the following vulnerability has been resolved:
sunrpc: fix cache_request leak in cache_release
When a reader's file descriptor is closed while in the middle of reading
a cache_request (rp->offset != 0), cache_release() decrements the
request's readers count but never checks whether it should free the
request.
In cache_read(), when readers drops to 0 and CACHE_PENDING is clear, the
cache_request is removed from the queue and freed along with its buffer
and cache_head reference. cache_release() lacks this cleanup.
The only other path that frees requests with readers == 0 is
cache_dequeue(), but it runs only when CACHE_PENDING transitions from
set to clear. If that transition already happened while readers was
still non-zero, cache_dequeue() will have skipped the request, and no
subsequent call will clean it up.
Add the same cleanup logic from cache_read() to cache_release(): after
decrementing readers, check if it reached 0 with CACHE_PENDING clear,
and if so, dequeue and free the cache_request. |
| In the Linux kernel, the following vulnerability has been resolved:
HID: bpf: prevent buffer overflow in hid_hw_request
right now the returned value is considered to be always valid. However,
when playing with HID-BPF, the return value can be arbitrary big,
because it's the return value of dispatch_hid_bpf_raw_requests(), which
calls the struct_ops and we have no guarantees that the value makes
sense. |
| In the Linux kernel, the following vulnerability has been resolved:
nfsd: fix heap overflow in NFSv4.0 LOCK replay cache
The NFSv4.0 replay cache uses a fixed 112-byte inline buffer
(rp_ibuf[NFSD4_REPLAY_ISIZE]) to store encoded operation responses.
This size was calculated based on OPEN responses and does not account
for LOCK denied responses, which include the conflicting lock owner as
a variable-length field up to 1024 bytes (NFS4_OPAQUE_LIMIT).
When a LOCK operation is denied due to a conflict with an existing lock
that has a large owner, nfsd4_encode_operation() copies the full encoded
response into the undersized replay buffer via read_bytes_from_xdr_buf()
with no bounds check. This results in a slab-out-of-bounds write of up
to 944 bytes past the end of the buffer, corrupting adjacent heap memory.
This can be triggered remotely by an unauthenticated attacker with two
cooperating NFSv4.0 clients: one sets a lock with a large owner string,
then the other requests a conflicting lock to provoke the denial.
We could fix this by increasing NFSD4_REPLAY_ISIZE to allow for a full
opaque, but that would increase the size of every stateowner, when most
lockowners are not that large.
Instead, fix this by checking the encoded response length against
NFSD4_REPLAY_ISIZE before copying into the replay buffer. If the
response is too large, set rp_buflen to 0 to skip caching the replay
payload. The status is still cached, and the client already received the
correct response on the original request. |
| In the Linux kernel, the following vulnerability has been resolved:
NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd
The /proc/fs/nfs/exports proc entry is created at module init
and persists for the module's lifetime. exports_proc_open()
captures the caller's current network namespace and stores
its svc_export_cache in seq->private, but takes no reference
on the namespace. If the namespace is subsequently torn down
(e.g. container destruction after the opener does setns() to a
different namespace), nfsd_net_exit() calls nfsd_export_shutdown()
which frees the cache. Subsequent reads on the still-open fd
dereference the freed cache_detail, walking a freed hash table.
Hold a reference on the struct net for the lifetime of the open
file descriptor. This prevents nfsd_net_exit() from running --
and thus prevents nfsd_export_shutdown() from freeing the cache
-- while any exports fd is open. cache_detail already stores
its net pointer (cd->net, set by cache_create_net()), so
exports_release() can retrieve it without additional per-file
storage. |
| In the Linux kernel, the following vulnerability has been resolved:
NFSD: Defer sub-object cleanup in export put callbacks
svc_export_put() calls path_put() and auth_domain_put() immediately
when the last reference drops, before the RCU grace period. RCU
readers in e_show() and c_show() access both ex_path (via
seq_path/d_path) and ex_client->name (via seq_escape) without
holding a reference. If cache_clean removes the entry and drops the
last reference concurrently, the sub-objects are freed while still
in use, producing a NULL pointer dereference in d_path.
Commit 2530766492ec ("nfsd: fix UAF when access ex_uuid or
ex_stats") moved kfree of ex_uuid and ex_stats into the
call_rcu callback, but left path_put() and auth_domain_put() running
before the grace period because both may sleep and call_rcu
callbacks execute in softirq context.
Replace call_rcu/kfree_rcu with queue_rcu_work(), which defers the
callback until after the RCU grace period and executes it in process
context where sleeping is permitted. This allows path_put() and
auth_domain_put() to be moved into the deferred callback alongside
the other resource releases. Apply the same fix to expkey_put(),
which has the identical pattern with ek_path and ek_client.
A dedicated workqueue scopes the shutdown drain to only NFSD
export release work items; flushing the shared
system_unbound_wq would stall on unrelated work from other
subsystems. nfsd_export_shutdown() uses rcu_barrier() followed
by flush_workqueue() to ensure all deferred release callbacks
complete before the export caches are destroyed.
Reviwed-by: Jeff Layton <jlayton@kernel.org> |
| In the Linux kernel, the following vulnerability has been resolved:
media: dvb-net: fix OOB access in ULE extension header tables
The ule_mandatory_ext_handlers[] and ule_optional_ext_handlers[] tables
in handle_one_ule_extension() are declared with 255 elements (valid
indices 0-254), but the index htype is derived from network-controlled
data as (ule_sndu_type & 0x00FF), giving a range of 0-255. When
htype equals 255, an out-of-bounds read occurs on the function pointer
table, and the OOB value may be called as a function pointer.
Add a bounds check on htype against the array size before either table
is accessed. Out-of-range values now cause the SNDU to be discarded. |
| In the Linux kernel, the following vulnerability has been resolved:
net: qrtr: ns: Limit the maximum server registration per node
Current code does no bound checking on the number of servers added per
node. A malicious client can flood NEW_SERVER messages and exhaust memory.
Fix this issue by limiting the maximum number of server registrations to
256 per node. If the NEW_SERVER message is received for an old port, then
don't restrict it as it will get replaced. While at it, also rate limit
the error messages in the failure path of qrtr_ns_worker().
Note that the limit of 256 is chosen based on the current platform
requirements. If requirement changes in the future, this limit can be
increased. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: Wait for RCU readers during policy netns exit
xfrm_policy_fini() frees the policy_bydst hash tables after flushing the
policy work items and deleting all policies, but it does not wait for
concurrent RCU readers to leave their read-side critical sections first.
The policy_bydst tables are published via rcu_assign_pointer() and are
looked up through rcu_dereference_check(), so netns teardown must also
wait for an RCU grace period before freeing the table memory.
Fix this by adding synchronize_rcu() before freeing the policy hash tables. |
| In the Linux kernel, the following vulnerability has been resolved:
netfs: Fix read abandonment during retry
Under certain circumstances, all the remaining subrequests from a read
request will get abandoned during retry. The abandonment process expects
the 'subreq' variable to be set to the place to start abandonment from, but
it doesn't always have a useful value (it will be uninitialised on the
first pass through the loop and it may point to a deleted subrequest on
later passes).
Fix the first jump to "abandon:" to set subreq to the start of the first
subrequest expected to need retry (which, in this abandonment case, turned
out unexpectedly to no longer have NEED_RETRY set).
Also clear the subreq pointer after discarding superfluous retryable
subrequests to cause an oops if we do try to access it. |
| In the Linux kernel, the following vulnerability has been resolved:
dmaengine: idxd: fix possible wrong descriptor completion in llist_abort_desc()
At the end of this function, d is the traversal cursor of flist, but the
code completes found instead. This can lead to issues such as NULL pointer
dereferences, double completion, or descriptor leaks.
Fix this by completing d instead of found in the final
list_for_each_entry_safe() loop. |
| In the Linux kernel, the following vulnerability has been resolved:
netfs: Fix kernel BUG in netfs_limit_iter() for ITER_KVEC iterators
When a process crashes and the kernel writes a core dump to a 9P
filesystem, __kernel_write() creates an ITER_KVEC iterator. This
iterator reaches netfs_limit_iter() via netfs_unbuffered_write(), which
only handles ITER_FOLIOQ, ITER_BVEC and ITER_XARRAY iterator types,
hitting the BUG() for any other type.
Fix this by adding netfs_limit_kvec() following the same pattern as
netfs_limit_bvec(), since both kvec and bvec are simple segment arrays
with pointer and length fields. Dispatch it from netfs_limit_iter() when
the iterator type is ITER_KVEC. |
| In the Linux kernel, the following vulnerability has been resolved:
dmaengine: xilinx: xdma: Fix regmap init error handling
devm_regmap_init_mmio returns an ERR_PTR() upon error, not NULL.
Fix the error check and also fix the error message. Use the error code
from ERR_PTR() instead of the wrong value in ret. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: fix refcount leak in xfrm_migrate_policy_find
syzkaller reported a memory leak in xfrm_policy_alloc:
BUG: memory leak
unreferenced object 0xffff888114d79000 (size 1024):
comm "syz.1.17", pid 931
...
xfrm_policy_alloc+0xb3/0x4b0 net/xfrm/xfrm_policy.c:432
The root cause is a double call to xfrm_pol_hold_rcu() in
xfrm_migrate_policy_find(). The lookup function already returns
a policy with held reference, making the second call redundant.
Remove the redundant xfrm_pol_hold_rcu() call to fix the refcount
imbalance and prevent the memory leak.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller. |
| In the Linux kernel, the following vulnerability has been resolved:
xsk: validate MTU against usable frame size on bind
AF_XDP bind currently accepts zero-copy pool configurations without
verifying that the device MTU fits into the usable frame space provided
by the UMEM chunk.
This becomes a problem since we started to respect tailroom which is
subtracted from chunk_size (among with headroom). 2k chunk size might
not provide enough space for standard 1500 MTU, so let us catch such
settings at bind time. Furthermore, validate whether underlying HW will
be able to satisfy configured MTU wrt XSK's frame size multiplied by
supported Rx buffer chain length (that is exposed via
net_device::xdp_zc_max_segs). |
| In the Linux kernel, the following vulnerability has been resolved:
xsk: tighten UMEM headroom validation to account for tailroom and min frame
The current headroom validation in xdp_umem_reg() could leave us with
insufficient space dedicated to even receive minimum-sized ethernet
frame. Furthermore if multi-buffer would come to play then
skb_shared_info stored at the end of XSK frame would be corrupted.
HW typically works with 128-aligned sizes so let us provide this value
as bare minimum.
Multi-buffer setting is known later in the configuration process so
besides accounting for 128 bytes, let us also take care of tailroom space
upfront. |
| In the Linux kernel, the following vulnerability has been resolved:
ixgbevf: add missing negotiate_features op to Hyper-V ops table
Commit a7075f501bd3 ("ixgbevf: fix mailbox API compatibility by
negotiating supported features") added the .negotiate_features callback
to ixgbe_mac_operations and populated it in ixgbevf_mac_ops, but forgot
to add it to ixgbevf_hv_mac_ops. This leaves the function pointer NULL
on Hyper-V VMs.
During probe, ixgbevf_negotiate_api() calls ixgbevf_set_features(),
which unconditionally dereferences hw->mac.ops.negotiate_features().
On Hyper-V this results in a NULL pointer dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine [...]
Workqueue: events work_for_cpu_fn
RIP: 0010:0x0
[...]
Call Trace:
ixgbevf_negotiate_api+0x66/0x160 [ixgbevf]
ixgbevf_sw_init+0xe4/0x1f0 [ixgbevf]
ixgbevf_probe+0x20f/0x4a0 [ixgbevf]
local_pci_probe+0x50/0xa0
work_for_cpu_fn+0x1a/0x30
[...]
Add ixgbevf_hv_negotiate_features_vf() that returns -EOPNOTSUPP and
wire it into ixgbevf_hv_mac_ops. The caller already handles -EOPNOTSUPP
gracefully. |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: SDCA: Fix errors in IRQ cleanup
IRQs are enabled through sdca_irq_populate() from component probe
using devm_request_threaded_irq(), this however means the IRQs can
persist if the sound card is torn down. Some of the IRQ handlers
store references to the card and the kcontrols which can then
fail. Some detail of the crash was explained in [1].
Generally it is not advised to use devm outside of bus probe, so
the code is updated to not use devm. The IRQ requests are not moved
to bus probe time as it makes passing the snd_soc_component into
the IRQs very awkward and would the require a second step once the
component is available, so it is simpler to just register the IRQs
at this point, even though that necessitates some manual cleanup. |
| In the Linux kernel, the following vulnerability has been resolved:
mshv: Fix infinite fault loop on permission-denied GPA intercepts
Prevent infinite fault loops when guests access memory regions without
proper permissions. Currently, mshv_handle_gpa_intercept() attempts to
remap pages for all faults on movable memory regions, regardless of
whether the access type is permitted. When a guest writes to a read-only
region, the remap succeeds but the region remains read-only, causing
immediate re-fault and spinning the vCPU indefinitely.
Validate intercept access type against region permissions before
attempting remaps. Reject writes to non-writable regions and executes to
non-executable regions early, returning false to let the VMM handle the
intercept appropriately.
This also closes a potential DoS vector where malicious guests could
intentionally trigger these fault loops to consume host resources. |