This fixes a Spectre-v1/L1TF vulnerability in picdev_write().
It replaces index computations based on the (attacked-controlled) port
number with constants through a minor refactoring.
Fixes: commit 85f455f7ddbe ("KVM: Add support for in-kernel PIC emulation")
Signed-off-by: Nick Finco <nifi(a)google.com>
Signed-off-by: Marios Pomonis <pomonis(a)google.com>
Reviewed-by: Andrew Honig <ahonig(a)google.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/kvm/i8259.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 8b38bb4868a6..629a09ca9860 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -460,10 +460,14 @@ static int picdev_write(struct kvm_pic *s,
switch (addr) {
case 0x20:
case 0x21:
+ pic_lock(s);
+ pic_ioport_write(&s->pics[0], addr, data);
+ pic_unlock(s);
+ break;
case 0xa0:
case 0xa1:
pic_lock(s);
- pic_ioport_write(&s->pics[addr >> 7], addr, data);
+ pic_ioport_write(&s->pics[1], addr, data);
pic_unlock(s);
break;
case 0x4d0:
--
2.24.0.525.g8f36a354ae-goog
The rseq.h UAPI documents that the rseq_cs field must be cleared
before reclaiming memory that contains the targeted struct rseq_cs.
We should extend this comment to also dictate that the rseq_cs field
must be cleared before reclaiming memory of the code pointed to by
the rseq_cs start_ip and post_commit_offset fields.
While we can expect that use of dlclose(3) will typically unmap
both struct rseq_cs and its associated code at once, nothing would
theoretically prevent a JIT from reclaiming the code without
reclaiming the struct rseq_cs, which would erroneously allow the
kernel to consider new code which is not a rseq critical section
as a rseq critical section following a code reclaim.
Suggested-by: Florian Weimer <fw(a)deneb.enyo.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Florian Weimer <fw(a)deneb.enyo.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: "Paul E. McKenney" <paulmck(a)linux.ibm.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Neel Natu <neelnatu(a)google.com>
Cc: linux-api(a)vger.kernel.org
---
include/uapi/linux/rseq.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index 9a402fdb60e9..d94afdfc4b7c 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -100,7 +100,9 @@ struct rseq {
* instruction sequence block, as well as when the kernel detects that
* it is preempting or delivering a signal outside of the range
* targeted by the rseq_cs. Also needs to be set to NULL by user-space
- * before reclaiming memory that contains the targeted struct rseq_cs.
+ * before reclaiming memory that contains the targeted struct rseq_cs
+ * or reclaiming memory that contains the code referred to by the
+ * start_ip and post_commit_offset fields of struct rseq_cs.
*
* Read and set by the kernel. Set by user-space with single-copy
* atomicity semantics. This field should only be updated by the
--
2.17.1
[Why]
When change the connection status in a MST topology, mst device
which detect the event will send out CONNECTION_STATUS_NOTIFY messgae.
e.g. src-mst-mst-sst => src-mst (unplug) mst-sst
Currently, under the above case of unplugging device, ports which have
been allocated payloads and are no longer in the topology still occupy
time slots and recorded in proposed_vcpi[] of topology manager.
If we don't clean up the proposed_vcpi[], when code flow goes to try to
update payload table by calling drm_dp_update_payload_part1(), we will
fail at checking port validation due to there are ports with proposed
time slots but no longer in the mst topology. As the result of that, we
will also stop updating the DPCD payload table of down stream port.
[How]
While handling the CONNECTION_STATUS_NOTIFY message, add a detection to
see if the event indicates that a device is unplugged to an output port.
If the detection is true, then iterrate over all proposed_vcpi[] to
see whether a port of the proposed_vcpi[] is still in the topology or
not. If the port is invalid, set its num_slots to 0.
Thereafter, when try to update payload table by calling
drm_dp_update_payload_part1(), we can successfully update the DPCD
payload table of down stream port and clear the proposed_vcpi[] to NULL.
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/drm_dp_mst_topology.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 5306c47dc820..2e236b6275c4 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2318,7 +2318,7 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
{
struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
struct drm_dp_mst_port *port;
- int old_ddps, ret;
+ int old_ddps, old_input, ret, i;
u8 new_pdt;
bool dowork = false, create_connector = false;
@@ -2349,6 +2349,7 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
}
old_ddps = port->ddps;
+ old_input = port->input;
port->input = conn_stat->input_port;
port->mcs = conn_stat->message_capability_status;
port->ldps = conn_stat->legacy_device_plug_status;
@@ -2373,6 +2374,27 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
dowork = false;
}
+ if (!old_input && old_ddps != port->ddps && !port->ddps) {
+ for (i = 0; i < mgr->max_payloads; i++) {
+ struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
+ struct drm_dp_mst_port *port_validated;
+
+ if (vcpi) {
+ port_validated =
+ container_of(vcpi, struct drm_dp_mst_port, vcpi);
+ port_validated =
+ drm_dp_mst_topology_get_port_validated(mgr, port_validated);
+ if (!port_validated) {
+ mutex_lock(&mgr->payload_lock);
+ vcpi->num_slots = 0;
+ mutex_unlock(&mgr->payload_lock);
+ } else {
+ drm_dp_mst_topology_put_port(port_validated);
+ }
+ }
+ }
+ }
+
if (port->connector)
drm_modeset_unlock(&mgr->base.lock);
else if (create_connector)
--
2.17.1
commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.
The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:
normalized_cfs_quota() = [(quota_us << 20) / period_us]
If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.
See below example:
A userspace container manager (kubelet) does three operations:
1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
2) Create a few children cgroups.
3) Set quota to 1,000us and period to 10,000us on a child cgroup.
These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:
new_quota: 1148437ns, 1148us
new_period: 11484375ns, 11484us
And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.
Scaling them by a factor of 2 will fix the problem.
Tested-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Phil Auld <pauld(a)redhat.com>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
kernel/sched/fair.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea2d33aa1f55..773135f534ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3753,20 +3753,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (++count > 3) {
u64 new, old = ktime_to_ns(cfs_b->period);
- new = (old * 147) / 128; /* ~115% */
- new = min(new, max_cfs_quota_period);
-
- cfs_b->period = ns_to_ktime(new);
-
- /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
- cfs_b->quota *= new;
- cfs_b->quota = div64_u64(cfs_b->quota, old);
-
- pr_warn_ratelimited(
- "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
- smp_processor_id(),
- div_u64(new, NSEC_PER_USEC),
- div_u64(cfs_b->quota, NSEC_PER_USEC));
+ /*
+ * Grow period by a factor of 2 to avoid losing precision.
+ * Precision loss in the quota/period ratio can cause __cfs_schedulable
+ * to fail.
+ */
+ new = old * 2;
+ if (new < max_cfs_quota_period) {
+ cfs_b->period = ns_to_ktime(new);
+ cfs_b->quota *= 2;
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+ } else {
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(old, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+ }
/* reset count so we don't come right back in here */
count = 0;
--
2.24.0.393.g34dc348eaf-goog
From: Kaike Wan <kaike.wan(a)intel.com>
When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the
flow PSN of the TID RDMA ACK packet. The resync_psn value indicates
the last flow PSN for which a TID RDMA WRITE DATA packet has been
received by the responder and the requester should resend TID RDMA
WRITE DATA packets, starting from the next flow PSN. However, if
resync_psn points to the last flow PSN for a segment and the next
segment flow PSN starts with a new generation number, use of the
old resync_psn to adjust the flow PSN for the next segment will
lead to miscalculation, resulting in WARN_ON and sge rewinding
errors:
[2419460.492485] WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
[2419460.514565] Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
[2419460.594432] crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
[2419460.605645] CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
[2419460.619424] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
[2419460.631062] Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
[2419460.637423] Call Trace:
[2419460.641044] <IRQ> [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
[2419460.647980] [<ffffffff9dc97648>] __warn+0xd8/0x100
[2419460.654023] [<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
[2419460.661025] [<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
[2419460.669333] [<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
[2419460.677295] [<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
[2419460.685693] [<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
[2419460.693394] [<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
[2419460.702745] [<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
[2419460.710963] [<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
[2419460.718659] [<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
[2419460.726086] [<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
[2419460.732903] [<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
[2419460.739710] [<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
[2419460.746091] [<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
[2419460.752040] [<ffffffff9e36b362>] common_interrupt+0x162/0x162
[2419460.759029] <EOI> [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
[2419460.766758] [<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
[2419460.774637] [<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
[2419460.782080] [<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
[2419460.789606] [<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
[2419460.797298] [<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
[2419460.804292] [<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
[2419460.811025] [<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[2419460.818710] [<ffffffff9dcc1c31>] kthread+0xd1/0xe0
[2419460.824751] [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
[2419460.832013] [<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
[2419460.839611] [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.
Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Kaike Wan <kaike.wan(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
---
drivers/infiniband/hw/hfi1/tid_rdma.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c
index e53f542..8a2e0d9 100644
--- a/drivers/infiniband/hw/hfi1/tid_rdma.c
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
@@ -4633,6 +4633,15 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi1_packet *packet)
*/
fpsn = full_flow_psn(flow, flow->flow_state.spsn);
req->r_ack_psn = psn;
+ /*
+ * If resync_psn points to the last flow PSN for a
+ * segment and the new segment (likely from a new
+ * request) starts with a new generation number, we
+ * need to adjust resync_psn accordingly.
+ */
+ if (flow->flow_state.generation !=
+ (resync_psn >> HFI1_KDETH_BTH_SEQ_SHIFT))
+ resync_psn = mask_psn(fpsn - 1);
flow->resync_npkts +=
delta_psn(mask_psn(resync_psn + 1), fpsn);
/*