[v13,3/3] drivers/net: add diagnostics macros to make code portable

Message ID 1736992511-20462-4-git-send-email-andremue@linux.microsoft.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series add diagnostics macros to make code portable |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/github-robot: build fail github build: failed
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-compile-amd64-testing success Testing PASS

Commit Message

Andre Muezerie Jan. 16, 2025, 1:55 a.m. UTC
It was a common pattern to have "GCC diagnostic ignored" pragmas
sprinkled over the code and only activate these pragmas for certain
compilers (gcc and clang). Clang supports GCC's pragma for
compatibility with existing source code, so #pragma GCC diagnostic
and #pragma clang diagnostic are synonyms for Clang
(https://clang.llvm.org/docs/UsersManual.html).

Now that effort is being made to make the code compatible with MSVC
these expressions would become more complex. It makes sense to hide
this complexity behind macros. This makes maintenance easier as these
macros are defined in a single place. As a plus the code becomes
more readable as well.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 drivers/net/axgbe/axgbe_rxtx.h                |  9 --
 drivers/net/cpfl/cpfl_rxtx_vec_common.h       |  4 -
 drivers/net/dpaa2/dpaa2_rxtx.c                | 15 +---
 drivers/net/fm10k/fm10k_rxtx_vec.c            | 20 ++---
 drivers/net/hns3/hns3_rxtx_vec_neon.h         |  6 +-
 .../net/i40e/i40e_recycle_mbufs_vec_common.c  |  2 -
 drivers/net/i40e/i40e_rxtx_common_avx.h       | 24 +++---
 drivers/net/i40e/i40e_rxtx_vec_altivec.c      | 23 ++---
 drivers/net/i40e/i40e_rxtx_vec_avx2.c         | 40 +++++----
 drivers/net/i40e/i40e_rxtx_vec_avx512.c       | 30 ++++---
 drivers/net/i40e/i40e_rxtx_vec_common.h       |  4 -
 drivers/net/i40e/i40e_rxtx_vec_neon.c         | 39 ++++-----
 drivers/net/i40e/i40e_rxtx_vec_sse.c          | 32 +++----
 drivers/net/iavf/iavf_rxtx_vec_avx2.c         | 84 ++++++++++---------
 drivers/net/iavf/iavf_rxtx_vec_avx512.c       | 78 ++++++++---------
 drivers/net/iavf/iavf_rxtx_vec_common.h       | 12 ++-
 drivers/net/iavf/iavf_rxtx_vec_neon.c         | 26 +++---
 drivers/net/iavf/iavf_rxtx_vec_sse.c          | 52 ++++++------
 drivers/net/ice/ice_rxtx_common_avx.h         | 24 +++---
 drivers/net/ice/ice_rxtx_vec_avx2.c           | 74 ++++++++--------
 drivers/net/ice/ice_rxtx_vec_avx512.c         | 64 ++++++--------
 drivers/net/ice/ice_rxtx_vec_common.h         |  4 -
 drivers/net/ice/ice_rxtx_vec_sse.c            | 28 +++----
 drivers/net/idpf/idpf_rxtx_vec_common.h       |  4 -
 .../ixgbe/ixgbe_recycle_mbufs_vec_common.c    |  2 -
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c       | 18 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c        | 20 ++---
 drivers/net/mlx5/mlx5_flow.c                  |  5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h      |  5 --
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h         | 18 ++--
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h          | 71 +++++++++-------
 drivers/net/ngbe/ngbe_rxtx_vec_neon.c         |  8 +-
 drivers/net/tap/tap_flow.c                    |  6 +-
 drivers/net/txgbe/txgbe_rxtx_vec_neon.c       |  8 +-
 drivers/net/virtio/virtio_rxtx_simple.c       |  4 -
 35 files changed, 398 insertions(+), 465 deletions(-)
  

Comments

Bruce Richardson Jan. 16, 2025, 8:57 a.m. UTC | #1
On Wed, Jan 15, 2025 at 05:55:11PM -0800, Andre Muezerie wrote:
> It was a common pattern to have "GCC diagnostic ignored" pragmas
> sprinkled over the code and only activate these pragmas for certain
> compilers (gcc and clang). Clang supports GCC's pragma for
> compatibility with existing source code, so #pragma GCC diagnostic
> and #pragma clang diagnostic are synonyms for Clang
> (https://clang.llvm.org/docs/UsersManual.html).
> 
> Now that effort is being made to make the code compatible with MSVC
> these expressions would become more complex. It makes sense to hide
> this complexity behind macros. This makes maintenance easier as these
> macros are defined in a single place. As a plus the code becomes
> more readable as well.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

On a stylistic note, I think you can be slightly less aggressive in
wrapping the new code in the patch. DPDK allows lines up to 100 long
without wrapping, so please don't wrap at 80.

Thanks,
/Bruce
  
Morten Brørup Jan. 16, 2025, 9:08 a.m. UTC | #2
> From: Andre Muezerie [mailto:andremue@linux.microsoft.com]
> Sent: Thursday, 16 January 2025 02.55
> 
> It was a common pattern to have "GCC diagnostic ignored" pragmas
> sprinkled over the code and only activate these pragmas for certain
> compilers (gcc and clang). Clang supports GCC's pragma for
> compatibility with existing source code, so #pragma GCC diagnostic
> and #pragma clang diagnostic are synonyms for Clang
> (https://clang.llvm.org/docs/UsersManual.html).
> 
> Now that effort is being made to make the code compatible with MSVC
> these expressions would become more complex. It makes sense to hide
> this complexity behind macros. This makes maintenance easier as these
> macros are defined in a single place. As a plus the code becomes
> more readable as well.

Here is some food for thought and discussion...

> @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue,
>  			if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) ==
> 0))
>  				continue;
>  		}
> -		fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage);
> +		fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage));

I do not think this makes the code more readable; quite the opposite.
Before this, I could see which type the variable was being cast to.

How about a macro that resembles "traditional" type casting:

/**
 * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer,
 * without the compiler emitting a warning.
 *
 * @warning
 * Although this macro can be abused for casting a pointer to point to a different type,
 * alignment may be incorrect when casting to point to a larger type. E.g.:
 *   struct s {
 *       uint16_t a;
 *       uint8_t  b;
 *       uint8_t  c;
 *       uint8_t  d;
 *   } v;
 *   uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned!
 */
#define RTE_CAST_PTR(type, ptr) \
	((type)(uintptr_t)(ptr))


Writing the above warning lead me down another path...
Can we somehow use __typeof_unqual__?
It is available in both GCC [1] and MSVC [2].

[1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html
[2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170


We are making a workaround, and should take care to not endorse overusing it.
Especially for other purposes than intended.

Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type.
If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it.


Backtracking a bit...
If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr).
For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS.
  
Andre Muezerie Jan. 17, 2025, 3:56 a.m. UTC | #3
On Thu, Jan 16, 2025 at 10:08:07AM +0100, Morten Brørup wrote:
> > From: Andre Muezerie [mailto:andremue@linux.microsoft.com]
> > Sent: Thursday, 16 January 2025 02.55
> > 
> > It was a common pattern to have "GCC diagnostic ignored" pragmas
> > sprinkled over the code and only activate these pragmas for certain
> > compilers (gcc and clang). Clang supports GCC's pragma for
> > compatibility with existing source code, so #pragma GCC diagnostic
> > and #pragma clang diagnostic are synonyms for Clang
> > (https://clang.llvm.org/docs/UsersManual.html).
> > 
> > Now that effort is being made to make the code compatible with MSVC
> > these expressions would become more complex. It makes sense to hide
> > this complexity behind macros. This makes maintenance easier as these
> > macros are defined in a single place. As a plus the code becomes
> > more readable as well.
> 
> Here is some food for thought and discussion...
> 
> > @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue,
> >  			if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) ==
> > 0))
> >  				continue;
> >  		}
> > -		fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage);
> > +		fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage));
> 
> I do not think this makes the code more readable; quite the opposite.
> Before this, I could see which type the variable was being cast to.
> 
> How about a macro that resembles "traditional" type casting:
> 
> /**
>  * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer,
>  * without the compiler emitting a warning.
>  *
>  * @warning
>  * Although this macro can be abused for casting a pointer to point to a different type,
>  * alignment may be incorrect when casting to point to a larger type. E.g.:
>  *   struct s {
>  *       uint16_t a;
>  *       uint8_t  b;
>  *       uint8_t  c;
>  *       uint8_t  d;
>  *   } v;
>  *   uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned!
>  */
> #define RTE_CAST_PTR(type, ptr) \
> 	((type)(uintptr_t)(ptr))
> 
> 
> Writing the above warning lead me down another path...
> Can we somehow use __typeof_unqual__?
> It is available in both GCC [1] and MSVC [2].
> 
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html
> [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170
> 
> 
> We are making a workaround, and should take care to not endorse overusing it.
> Especially for other purposes than intended.
> 
> Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type.
> If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it.
> 
> 
> Backtracking a bit...
> If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr).
> For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS.
> 

These are great suggestions, and __typeof_unqual__ seems to be exactly what we need to drop the qualifiers. I'll look more closely at the code and find out where a cast is actually being used for other purposes than removing the qualifier.
  
Andre Muezerie Jan. 18, 2025, 3:05 a.m. UTC | #4
On Thu, Jan 16, 2025 at 07:56:52PM -0800, Andre Muezerie wrote:
> On Thu, Jan 16, 2025 at 10:08:07AM +0100, Morten Brørup wrote:
> > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com]
> > > Sent: Thursday, 16 January 2025 02.55
> > > 
> > > It was a common pattern to have "GCC diagnostic ignored" pragmas
> > > sprinkled over the code and only activate these pragmas for certain
> > > compilers (gcc and clang). Clang supports GCC's pragma for
> > > compatibility with existing source code, so #pragma GCC diagnostic
> > > and #pragma clang diagnostic are synonyms for Clang
> > > (https://clang.llvm.org/docs/UsersManual.html).
> > > 
> > > Now that effort is being made to make the code compatible with MSVC
> > > these expressions would become more complex. It makes sense to hide
> > > this complexity behind macros. This makes maintenance easier as these
> > > macros are defined in a single place. As a plus the code becomes
> > > more readable as well.
> > 
> > Here is some food for thought and discussion...
> > 
> > > @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue,
> > >  			if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) ==
> > > 0))
> > >  				continue;
> > >  		}
> > > -		fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage);
> > > +		fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage));
> > 
> > I do not think this makes the code more readable; quite the opposite.
> > Before this, I could see which type the variable was being cast to.
> > 
> > How about a macro that resembles "traditional" type casting:
> > 
> > /**
> >  * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer,
> >  * without the compiler emitting a warning.
> >  *
> >  * @warning
> >  * Although this macro can be abused for casting a pointer to point to a different type,
> >  * alignment may be incorrect when casting to point to a larger type. E.g.:
> >  *   struct s {
> >  *       uint16_t a;
> >  *       uint8_t  b;
> >  *       uint8_t  c;
> >  *       uint8_t  d;
> >  *   } v;
> >  *   uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned!
> >  */
> > #define RTE_CAST_PTR(type, ptr) \
> > 	((type)(uintptr_t)(ptr))
> > 
> > 
> > Writing the above warning lead me down another path...
> > Can we somehow use __typeof_unqual__?
> > It is available in both GCC [1] and MSVC [2].
> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html
> > [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170
> > 
> > 
> > We are making a workaround, and should take care to not endorse overusing it.
> > Especially for other purposes than intended.
> > 
> > Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type.
> > If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it.
> > 
> > 
> > Backtracking a bit...
> > If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr).
> > For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS.
> > 
> 
> These are great suggestions, and __typeof_unqual__ seems to be exactly what we need to drop the qualifiers. I'll look more closely at the code and find out where a cast is actually being used for other purposes than removing the qualifier.

I took a closer look at the code and this is what I found:

* Only 2 places where qualifiers were being dropped were not casting to a different type. I used RTE_PTR_UNQUAL in those as suggested, for clarity.

* I experimented with C23 typeof_unqual. It indeed works on gcc, clang and MSVC, but there are some details:
    a) With gcc the project needs to be compiled with -std=c2x. Many other warnings show up, unrelated to the scope of this patchset. Some look suspicious and should be looked at. An error also showed up, for which I sent out a small patch.
    b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, a warning about the qualifier being dropped is emitted. The project currently uses "-Wcast-qual"
Due to (a) I decided to not use typeof_unqual for now, but it would be trivial to change the macro in the future to do so.

* All other places where I was using RTE_PTR_DROP_QUALIFIERS I'm using RTE_CAST_PTR now. I also think that the code became more readable by doing so.
  
Andre Muezerie Jan. 18, 2025, 3:07 a.m. UTC | #5
On Thu, Jan 16, 2025 at 08:57:27AM +0000, Bruce Richardson wrote:
> On Wed, Jan 15, 2025 at 05:55:11PM -0800, Andre Muezerie wrote:
> > It was a common pattern to have "GCC diagnostic ignored" pragmas
> > sprinkled over the code and only activate these pragmas for certain
> > compilers (gcc and clang). Clang supports GCC's pragma for
> > compatibility with existing source code, so #pragma GCC diagnostic
> > and #pragma clang diagnostic are synonyms for Clang
> > (https://clang.llvm.org/docs/UsersManual.html).
> > 
> > Now that effort is being made to make the code compatible with MSVC
> > these expressions would become more complex. It makes sense to hide
> > this complexity behind macros. This makes maintenance easier as these
> > macros are defined in a single place. As a plus the code becomes
> > more readable as well.
> > 
> > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > ---
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> On a stylistic note, I think you can be slightly less aggressive in
> wrapping the new code in the patch. DPDK allows lines up to 100 long
> without wrapping, so please don't wrap at 80.
> 
> Thanks,
> /Bruce

Thanks for calling this out. I followed you suggestion in the v14 series of this patchset.
  

Patch

diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h
index a326ba9ac8..f5f74a0a39 100644
--- a/drivers/net/axgbe/axgbe_rxtx.h
+++ b/drivers/net/axgbe/axgbe_rxtx.h
@@ -6,15 +6,6 @@ 
 #ifndef _AXGBE_RXTX_H_
 #define _AXGBE_RXTX_H_
 
-/* to suppress gcc warnings related to descriptor casting*/
-#ifdef RTE_TOOLCHAIN_GCC
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
-#ifdef RTE_TOOLCHAIN_CLANG
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 /* Descriptor related defines */
 #define AXGBE_MAX_RING_DESC		4096 /*should be power of 2*/
 #define AXGBE_TX_DESC_MIN_FREE		(AXGBE_MAX_RING_DESC >> 3)
diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h
index 479e1ddcb9..5b98f86932 100644
--- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h
+++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h
@@ -11,10 +11,6 @@ 
 #include "cpfl_ethdev.h"
 #include "cpfl_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #define CPFL_SCALAR_PATH		0
 #define CPFL_VECTOR_PATH		1
 #define CPFL_RX_NO_VECTOR_FLAGS (		\
diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c
index e3b6c7e460..f8b07a5acd 100644
--- a/drivers/net/dpaa2/dpaa2_rxtx.c
+++ b/drivers/net/dpaa2/dpaa2_rxtx.c
@@ -1962,14 +1962,6 @@  dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	return num_tx;
 }
 
-#if defined(RTE_TOOLCHAIN_GCC)
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#elif defined(RTE_TOOLCHAIN_CLANG)
-#pragma clang diagnostic push
-#pragma clang diagnostic ignored "-Wcast-qual"
-#endif
-
 /* This function loopbacks all the received packets.*/
 uint16_t
 dpaa2_dev_loopback_rx(void *queue,
@@ -2083,7 +2075,7 @@  dpaa2_dev_loopback_rx(void *queue,
 			if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0))
 				continue;
 		}
-		fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage);
+		fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage));
 
 		dq_storage++;
 		num_rx++;
@@ -2118,8 +2110,3 @@  dpaa2_dev_loopback_rx(void *queue,
 
 	return 0;
 }
-#if defined(RTE_TOOLCHAIN_GCC)
-#pragma GCC diagnostic pop
-#elif defined(RTE_TOOLCHAIN_CLANG)
-#pragma clang diagnostic pop
-#endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 68acaca75b..6fc9097ebc 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -11,10 +11,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static void
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
 
@@ -270,7 +266,7 @@  fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 		/* Clean up all the HW/SW ring content */
 		for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) {
 			mb_alloc[i] = &rxq->fake_mbuf;
-			_mm_store_si128((__m128i *)&rxdp[i].q,
+			_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].q),
 						dma_addr0);
 		}
 
@@ -316,8 +312,8 @@  fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->q), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->q), dma_addr1);
 
 		/* enforce 512B alignment on default Rx virtual addresses */
 		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
@@ -465,7 +461,7 @@  fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs0[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -477,11 +473,11 @@  fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs0[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs0[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -736,7 +732,7 @@  vtx1(volatile struct fm10k_tx_desc *txdp,
 	__m128i descriptor = _mm_set_epi64x(flags << 56 |
 			(uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len,
 			MBUF_DMA_ADDR(pkt));
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h
index bbb5478015..a0acb2a3d6 100644
--- a/drivers/net/hns3/hns3_rxtx_vec_neon.h
+++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h
@@ -9,8 +9,6 @@ 
 
 #include <arm_neon.h>
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 static inline void
 hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt)
 {
@@ -22,8 +20,8 @@  hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt)
 		0,
 		((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT
 	};
-	vst1q_u64((uint64_t *)&desc->addr, val1);
-	vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&desc->addr), val1);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&desc->tx.outer_vlan_tag), val2);
 }
 
 static uint16_t
diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
index 14424c9921..6eafe51e3d 100644
--- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
+++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c
@@ -10,8 +10,6 @@ 
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 void
 i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs)
 {
diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h
index 85958d6c81..f8a4a96eee 100644
--- a/drivers/net/i40e/i40e_rxtx_common_avx.h
+++ b/drivers/net/i40e/i40e_rxtx_common_avx.h
@@ -11,10 +11,6 @@ 
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #ifdef __AVX2__
 static __rte_always_inline void
 i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
@@ -36,7 +32,7 @@  i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -72,8 +68,10 @@  i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read),
+				dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read),
+				dma_addr1);
 	}
 #else
 #ifdef __AVX512VL__
@@ -144,8 +142,10 @@  i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
 
 			/* flush desc with pa dma_addr */
-			_mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3);
-			_mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7);
+			_mm512_store_si512(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp->read), dma_addr0_3);
+			_mm512_store_si512(RTE_PTR_DROP_QUALIFIERS
+					(&(rxdp + 4)->read), dma_addr4_7);
 		}
 	} else
 #endif /* __AVX512VL__*/
@@ -190,8 +190,10 @@  i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room);
 
 			/* flush desc with pa dma_addr */
-			_mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1);
-			_mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3);
+			_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp->read), dma_addr0_1);
+			_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS
+					(&(rxdp + 2)->read), dma_addr2_3);
 		}
 	}
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
index b6b0d38ec1..3c67f959b8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
@@ -15,8 +15,6 @@ 
 
 #include <rte_altivec.h>
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 static inline void
 i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 {
@@ -43,8 +41,7 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 			dma_addr0 = (__vector unsigned long){};
 			for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				vec_st(dma_addr0, 0,
-				       (__vector unsigned long *)&rxdp[i].read);
+				vec_st(dma_addr0, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read));
 			}
 		}
 		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
@@ -84,8 +81,8 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 		dma_addr1 = vec_add(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read);
-		vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read);
+		vec_st(dma_addr0, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp++->read));
+		vec_st(dma_addr1, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp++->read));
 	}
 
 	rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
@@ -286,7 +283,8 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp1 = *(__vector unsigned long *)&sw_ring[pos];
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = *(__vector unsigned long *)(rxdp + 3);
+		descs[3] = *(__vector unsigned long *)
+			RTE_PTR_DROP_QUALIFIERS(rxdp + 3);
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -296,11 +294,14 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2];
 
 		/* A.1 load desc[2-0] */
-		descs[2] = *(__vector unsigned long *)(rxdp + 2);
+		descs[2] = *(__vector unsigned long *)
+			RTE_PTR_DROP_QUALIFIERS(rxdp + 2);
 		rte_compiler_barrier();
-		descs[1] = *(__vector unsigned long *)(rxdp + 1);
+		descs[1] = *(__vector unsigned long *)
+			RTE_PTR_DROP_QUALIFIERS(rxdp + 1);
 		rte_compiler_barrier();
-		descs[0] = *(__vector unsigned long *)(rxdp);
+		descs[0] = *(__vector unsigned long *)
+			RTE_PTR_DROP_QUALIFIERS(rxdp);
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		*(__vector unsigned long *)&rx_pkts[pos + 2] =  mbp2;
@@ -534,7 +535,7 @@  vtx1(volatile struct i40e_tx_desc *txdp,
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
-	*(__vector unsigned long *)txdp = descriptor;
+	*(__vector unsigned long *)RTE_PTR_DROP_QUALIFIERS(txdp) = descriptor;
 }
 
 static inline void
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index 19cf0ac718..217add8be7 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -15,10 +15,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static __rte_always_inline void
 i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 {
@@ -39,8 +35,10 @@  desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp,
 			 const uint32_t desc_idx)
 {
 	/* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */
-	__m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2);
-	__m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2);
+	__m128i *rxdp_desc_0 = RTE_PTR_DROP_QUALIFIERS
+			((&rxdp[desc_idx + 0].wb.qword2));
+	__m128i *rxdp_desc_1 = RTE_PTR_DROP_QUALIFIERS
+			((&rxdp[desc_idx + 1].wb.qword2));
 	const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0);
 	const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1);
 
@@ -276,21 +274,29 @@  _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 				_mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		const __m128i raw_desc7 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
-		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		const __m128i raw_desc6 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
-		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		const __m128i raw_desc5 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
-		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		const __m128i raw_desc4 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
-		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		const __m128i raw_desc3 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
-		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		const __m128i raw_desc2 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		const __m128i raw_desc1 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+		const __m128i raw_desc0 = _mm_load_si128
+				(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		const __m256i raw_desc6_7 = _mm256_inserti128_si256(
 				_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
@@ -695,7 +701,7 @@  vtx1(volatile struct i40e_tx_desc *txdp,
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
@@ -728,8 +734,8 @@  vtx(volatile struct i40e_tx_desc *txdp,
 		__m256i desc0_1 = _mm256_set_epi64x(
 				hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off,
 				hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off);
-		_mm256_store_si256((void *)(txdp + 2), desc2_3);
-		_mm256_store_si256((void *)txdp, desc0_1);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1);
 	}
 
 	/* do any last ones */
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index 3b2750221b..52a54a9e79 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -15,10 +15,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #define RTE_I40E_DESCS_PER_LOOP_AVX 8
 
 static __rte_always_inline void
@@ -41,8 +37,10 @@  desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp,
 			 const uint32_t desc_idx)
 {
 	/* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */
-	__m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2);
-	__m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2);
+	__m128i *rxdp_desc_0 = RTE_PTR_DROP_QUALIFIERS
+			(&rxdp[desc_idx + 0].wb.qword2);
+	__m128i *rxdp_desc_1 = RTE_PTR_DROP_QUALIFIERS
+			(&rxdp[desc_idx + 1].wb.qword2);
 	const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0);
 	const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1);
 
@@ -264,28 +262,28 @@  _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* load in descriptors, in reverse order */
 		const __m128i raw_desc7 =
-			_mm_load_si128((void *)(rxdp + 7));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
 		const __m128i raw_desc6 =
-			_mm_load_si128((void *)(rxdp + 6));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
 		const __m128i raw_desc5 =
-			_mm_load_si128((void *)(rxdp + 5));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
 		const __m128i raw_desc4 =
-			_mm_load_si128((void *)(rxdp + 4));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
 		const __m128i raw_desc3 =
-			_mm_load_si128((void *)(rxdp + 3));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 		const __m128i raw_desc2 =
-			_mm_load_si128((void *)(rxdp + 2));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
 		const __m128i raw_desc1 =
-			_mm_load_si128((void *)(rxdp + 1));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
 		const __m128i raw_desc0 =
-			_mm_load_si128((void *)(rxdp + 0));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		raw_desc6_7 =
 			_mm256_inserti128_si256
@@ -875,7 +873,7 @@  vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
@@ -909,7 +907,7 @@  vtx(volatile struct i40e_tx_desc *txdp,
 			hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off,
 			hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off,
 			hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off);
-		_mm512_storeu_si512((void *)txdp, desc0_3);
+		_mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3);
 	}
 
 	/* do any last ones */
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 8b745630e4..ec59a68f9d 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -11,10 +11,6 @@ 
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline uint16_t
 reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs,
 		   uint16_t nb_bufs, uint8_t *split_flags)
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index e1c5c7041b..23525db319 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -16,9 +16,6 @@ 
 #include "i40e_rxtx.h"
 #include "i40e_rxtx_vec_common.h"
 
-
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 static inline void
 i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 {
@@ -41,7 +38,7 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 		    rxq->nb_rx_desc) {
 			for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				vst1q_u64((uint64_t *)&rxdp[i].read, zero);
+				vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), zero);
 			}
 		}
 		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
@@ -58,11 +55,11 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 		dma_addr0 = vdupq_n_u64(paddr);
 
 		/* flush desc with pa dma_addr */
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
 
 		paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr1 = vdupq_n_u64(paddr);
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
@@ -87,10 +84,10 @@  descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt)
 {
 	/* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */
 	uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23;
-	desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2);
-	desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2);
-	desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2);
-	desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2);
+	desc0_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 0)->wb.qword2));
+	desc1_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 1)->wb.qword2));
+	desc2_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 2)->wb.qword2));
+	desc3_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 3)->wb.qword2));
 
 	/* FDIR ID data: move last u32 of each desc to 4 u32 lanes */
 	uint32x4_t v_unpack_02, v_unpack_13;
@@ -421,18 +418,22 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT};
 
 		/* A.1 load desc[3-0] */
-		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
-		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
-		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
-		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
+		descs[3] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
+		descs[2] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
+		descs[1] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
+		descs[0] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 		/* Use acquire fence to order loads of descriptor qwords */
 		rte_atomic_thread_fence(rte_memory_order_acquire);
 		/* A.2 reload qword0 to make it ordered after qword1 load */
-		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
-		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
-		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
-		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+		descs[3] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+				(rxdp), descs[0], 0);
 
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
@@ -662,7 +663,7 @@  vtx1(volatile struct i40e_tx_desc *txdp,
 			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
-	vst1q_u64((uint64_t *)txdp, descriptor);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index ad560d2b6b..61c71c8c98 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@@ -14,10 +14,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline void
 i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 {
@@ -41,7 +37,7 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -72,8 +68,8 @@  i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
@@ -97,10 +93,14 @@  descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt)
 {
 	/* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */
 	__m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23;
-	desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2);
-	desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2);
-	desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2);
-	desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2);
+	desc0_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&(rxdp + 0)->wb.qword2));
+	desc1_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&(rxdp + 1)->wb.qword2));
+	desc2_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&(rxdp + 2)->wb.qword2));
+	desc3_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&(rxdp + 3)->wb.qword2));
 
 	/* FDIR ID data: move last u32 of each desc to 4 u32 lanes */
 	__m128i v_unpack_01, v_unpack_23;
@@ -462,7 +462,7 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]);
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -474,11 +474,11 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		descs[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -681,7 +681,7 @@  vtx1(volatile struct i40e_tx_desc *txdp,
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index 49d41af953..ab5c10fe03 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -6,10 +6,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static __rte_always_inline void
 iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 {
@@ -193,21 +189,29 @@  _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq,
 			 _mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		const __m128i raw_desc7 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 7));
 		rte_compiler_barrier();
-		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		const __m128i raw_desc6 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 6));
 		rte_compiler_barrier();
-		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		const __m128i raw_desc5 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 5));
 		rte_compiler_barrier();
-		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		const __m128i raw_desc4 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 4));
 		rte_compiler_barrier();
-		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		const __m128i raw_desc3 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 3));
 		rte_compiler_barrier();
-		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		const __m128i raw_desc2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 2));
 		rte_compiler_barrier();
-		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		const __m128i raw_desc1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 1));
 		rte_compiler_barrier();
-		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+		const __m128i raw_desc0 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(rxdp + 0));
 
 		const __m256i raw_desc6_7 =
 			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
@@ -509,7 +513,7 @@  _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq,
 			0, rxq->mbuf_initializer);
 	struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail];
 	volatile union iavf_rx_flex_desc *rxdp =
-		(union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
+		(volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
 
 	rte_prefetch0(rxdp);
 
@@ -743,28 +747,28 @@  _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq,
 		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
 
 		const __m128i raw_desc7 =
-			_mm_load_si128((void *)(rxdp + 7));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
 		const __m128i raw_desc6 =
-			_mm_load_si128((void *)(rxdp + 6));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
 		const __m128i raw_desc5 =
-			_mm_load_si128((void *)(rxdp + 5));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
 		const __m128i raw_desc4 =
-			_mm_load_si128((void *)(rxdp + 4));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
 		const __m128i raw_desc3 =
-			_mm_load_si128((void *)(rxdp + 3));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 		const __m128i raw_desc2 =
-			_mm_load_si128((void *)(rxdp + 2));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
 		const __m128i raw_desc1 =
-			_mm_load_si128((void *)(rxdp + 1));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
 		const __m128i raw_desc0 =
-			_mm_load_si128((void *)(rxdp + 0));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		raw_desc6_7 =
 			_mm256_inserti128_si256
@@ -960,36 +964,36 @@  _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq,
 			    rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
 				/* load bottom half of every 32B desc */
 				const __m128i raw_desc_bh7 =
-					_mm_load_si128
-						((void *)(&rxdp[7].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[7].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh6 =
-					_mm_load_si128
-						((void *)(&rxdp[6].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[6].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh5 =
-					_mm_load_si128
-						((void *)(&rxdp[5].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[5].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh4 =
-					_mm_load_si128
-						((void *)(&rxdp[4].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[4].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh3 =
-					_mm_load_si128
-						((void *)(&rxdp[3].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[3].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh2 =
-					_mm_load_si128
-						((void *)(&rxdp[2].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[2].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh1 =
-					_mm_load_si128
-						((void *)(&rxdp[1].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[1].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh0 =
-					_mm_load_si128
-						((void *)(&rxdp[0].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[0].wb.status_error1));
 
 				__m256i raw_desc_bh6_7 =
 					_mm256_inserti128_si256
@@ -1664,7 +1668,7 @@  iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static __rte_always_inline void
@@ -1719,8 +1723,8 @@  iavf_vtx(volatile struct iavf_tx_desc *txdp,
 				 pkt[1]->buf_iova + pkt[1]->data_off,
 				 hi_qw0,
 				 pkt[0]->buf_iova + pkt[0]->data_off);
-		_mm256_store_si256((void *)(txdp + 2), desc2_3);
-		_mm256_store_si256((void *)txdp, desc0_1);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1);
 	}
 
 	/* do any last ones */
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index d6a861bf80..dbb9588a47 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -6,10 +6,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #define IAVF_DESCS_PER_LOOP_AVX 8
 #define PKTLEN_SHIFT 10
 
@@ -165,28 +161,28 @@  _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq,
 
 		__m512i raw_desc0_3, raw_desc4_7;
 		const __m128i raw_desc7 =
-			_mm_load_si128((void *)(rxdp + 7));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
 		const __m128i raw_desc6 =
-			_mm_load_si128((void *)(rxdp + 6));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
 		const __m128i raw_desc5 =
-			_mm_load_si128((void *)(rxdp + 5));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
 		const __m128i raw_desc4 =
-			_mm_load_si128((void *)(rxdp + 4));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
 		const __m128i raw_desc3 =
-			_mm_load_si128((void *)(rxdp + 3));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 		const __m128i raw_desc2 =
-			_mm_load_si128((void *)(rxdp + 2));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
 		const __m128i raw_desc1 =
-			_mm_load_si128((void *)(rxdp + 1));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
 		const __m128i raw_desc0 =
-			_mm_load_si128((void *)(rxdp + 0));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4);
 		raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1);
@@ -600,7 +596,7 @@  _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq,
 						    rxq->mbuf_initializer);
 	struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail];
 	volatile union iavf_rx_flex_desc *rxdp =
-		(union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
+		(volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
 
 	rte_prefetch0(rxdp);
 
@@ -734,28 +730,28 @@  _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq,
 		__m512i raw_desc0_3, raw_desc4_7;
 
 		const __m128i raw_desc7 =
-			_mm_load_si128((void *)(rxdp + 7));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
 		const __m128i raw_desc6 =
-			_mm_load_si128((void *)(rxdp + 6));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
 		const __m128i raw_desc5 =
-			_mm_load_si128((void *)(rxdp + 5));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
 		const __m128i raw_desc4 =
-			_mm_load_si128((void *)(rxdp + 4));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
 		const __m128i raw_desc3 =
-			_mm_load_si128((void *)(rxdp + 3));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 		const __m128i raw_desc2 =
-			_mm_load_si128((void *)(rxdp + 2));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
 		const __m128i raw_desc1 =
-			_mm_load_si128((void *)(rxdp + 1));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
 		const __m128i raw_desc0 =
-			_mm_load_si128((void *)(rxdp + 0));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4);
 		raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1);
@@ -1112,36 +1108,36 @@  _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq,
 			    rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
 				/* load bottom half of every 32B desc */
 				const __m128i raw_desc_bh7 =
-					_mm_load_si128
-						((void *)(&rxdp[7].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[7].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh6 =
-					_mm_load_si128
-						((void *)(&rxdp[6].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[6].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh5 =
-					_mm_load_si128
-						((void *)(&rxdp[5].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[5].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh4 =
-					_mm_load_si128
-						((void *)(&rxdp[4].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[4].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh3 =
-					_mm_load_si128
-						((void *)(&rxdp[3].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[3].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh2 =
-					_mm_load_si128
-						((void *)(&rxdp[2].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[2].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh1 =
-					_mm_load_si128
-						((void *)(&rxdp[1].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[1].wb.status_error1));
 				rte_compiler_barrier();
 				const __m128i raw_desc_bh0 =
-					_mm_load_si128
-						((void *)(&rxdp[0].wb.status_error1));
+					_mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+					(&rxdp[0].wb.status_error1));
 
 				__m256i raw_desc_bh6_7 =
 					_mm256_inserti128_si256
@@ -1983,7 +1979,7 @@  iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
-	_mm_storeu_si128((__m128i *)txdp, descriptor);
+	_mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 #define IAVF_TX_LEN_MASK 0xAA
@@ -2037,7 +2033,7 @@  iavf_vtx(volatile struct iavf_tx_desc *txdp,
 				 pkt[1]->buf_iova + pkt[1]->data_off,
 				 hi_qw0,
 				 pkt[0]->buf_iova + pkt[0]->data_off);
-		_mm512_storeu_si512((void *)txdp, desc0_3);
+		_mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3);
 	}
 
 	/* do any last ones */
@@ -2225,7 +2221,7 @@  ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 	__m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off,
 							high_ctx_qw, low_ctx_qw);
 
-	_mm256_storeu_si256((__m256i *)txdp, ctx_data_desc);
+	_mm256_storeu_si256(RTE_PTR_DROP_QUALIFIERS(txdp), ctx_data_desc);
 }
 
 static __rte_always_inline void
@@ -2300,7 +2296,7 @@  ctx_vtx(volatile struct iavf_tx_desc *txdp,
 						hi_ctx_qw1, low_ctx_qw1,
 						hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off,
 						hi_ctx_qw0, low_ctx_qw0);
-		_mm512_storeu_si512((void *)txdp, desc0_3);
+		_mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3);
 	}
 
 	if (nb_pkts)
diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h
index 5c5220048d..a7f7f977ec 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/iavf/iavf_rxtx_vec_common.h
@@ -11,10 +11,6 @@ 
 #include "iavf.h"
 #include "iavf_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static __rte_always_inline uint16_t
 reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs,
 		   uint16_t nb_bufs, uint8_t *split_flags)
@@ -422,7 +418,7 @@  iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
 				rxp[i] = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -458,8 +454,10 @@  iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp++->read), dma_addr1);
 	}
 #else
 #ifdef CC_AVX512_SUPPORT
diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c
index 04be574683..e989868d7a 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_neon.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c
@@ -36,7 +36,7 @@  iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 		    rxq->nb_rx_desc) {
 			for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
 				rxep[i] = &rxq->fake_mbuf;
-				vst1q_u64((uint64_t *)&rxdp[i].read, zero);
+				vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), zero);
 			}
 		}
 		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
@@ -53,11 +53,11 @@  iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 		dma_addr0 = vdupq_n_u64(paddr);
 
 		/* flush desc with pa dma_addr */
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
 
 		paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr1 = vdupq_n_u64(paddr);
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH;
@@ -269,18 +269,22 @@  _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq,
 		int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT};
 
 		/* A.1 load desc[3-0] */
-		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
-		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
-		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
-		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
+		descs[3] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
+		descs[2] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
+		descs[1] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
+		descs[0] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 		/* Use acquire fence to order loads of descriptor qwords */
 		rte_atomic_thread_fence(rte_memory_order_acquire);
 		/* A.2 reload qword0 to make it ordered after qword1 load */
-		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
-		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
-		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
-		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+		descs[3] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS
+			(rxdp), descs[0], 0);
 
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c
index 0db6fa8bd4..10173e2102 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c
@@ -12,10 +12,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline void
 iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 {
@@ -38,7 +34,7 @@  iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
 				rxp[i] = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -69,8 +65,10 @@  iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += rxq->rx_free_thresh;
@@ -578,7 +576,8 @@  _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]);
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -590,11 +589,14 @@  _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 2));
 		rte_compiler_barrier();
-		descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 1));
 		rte_compiler_barrier();
-		descs[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -783,7 +785,7 @@  _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
 	/* Just the act of getting into the function from the application is
 	 * going to cost about 7 cycles
 	 */
-	rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
+	rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail;
 
 	rte_prefetch0(rxdp);
 
@@ -864,7 +866,7 @@  _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
 		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]);
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -876,11 +878,11 @@  _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		descs[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -927,17 +929,17 @@  _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
 			offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP ||
 			rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
 			/* load bottom half of every 32B desc */
-			descs_bh[3] = _mm_load_si128
-					((void *)(&rxdp[3].wb.status_error1));
+			descs_bh[3] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp[3].wb.status_error1));
 			rte_compiler_barrier();
-			descs_bh[2] = _mm_load_si128
-					((void *)(&rxdp[2].wb.status_error1));
+			descs_bh[2] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp[2].wb.status_error1));
 			rte_compiler_barrier();
-			descs_bh[1] = _mm_load_si128
-					((void *)(&rxdp[1].wb.status_error1));
+			descs_bh[1] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp[1].wb.status_error1));
 			rte_compiler_barrier();
-			descs_bh[0] = _mm_load_si128
-					((void *)(&rxdp[0].wb.status_error1));
+			descs_bh[0] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp[0].wb.status_error1));
 		}
 
 		if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) {
@@ -1349,7 +1351,7 @@  vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h
index dacb87dcb0..e35e79e39f 100644
--- a/drivers/net/ice/ice_rxtx_common_avx.h
+++ b/drivers/net/ice/ice_rxtx_common_avx.h
@@ -7,10 +7,6 @@ 
 
 #include "ice_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #ifdef __AVX2__
 static __rte_always_inline void
 ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
@@ -33,7 +29,7 @@  ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < ICE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -77,8 +73,10 @@  ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+			(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS
+			(&rxdp++->read), dma_addr1);
 	}
 #else
 #ifdef __AVX512VL__
@@ -157,8 +155,10 @@  ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
 
 			/* flush desc with pa dma_addr */
-			_mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3);
-			_mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7);
+			_mm512_store_si512(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp->read), dma_addr0_3);
+			_mm512_store_si512(RTE_PTR_DROP_QUALIFIERS
+				(&(rxdp + 4)->read), dma_addr4_7);
 		}
 	} else
 #endif /* __AVX512VL__ */
@@ -213,8 +213,10 @@  ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room);
 
 			/* flush desc with pa dma_addr */
-			_mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1);
-			_mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3);
+			_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS
+				(&rxdp->read), dma_addr0_1);
+			_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS
+				(&(rxdp + 2)->read), dma_addr2_3);
 		}
 	}
 
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index d6e88dbb29..da4f433db9 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -7,10 +7,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static __rte_always_inline void
 ice_rxq_rearm(struct ice_rx_queue *rxq)
 {
@@ -254,21 +250,29 @@  _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			 _mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		const __m128i raw_desc7 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 7));
 		rte_compiler_barrier();
-		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		const __m128i raw_desc6 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 6));
 		rte_compiler_barrier();
-		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		const __m128i raw_desc5 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 5));
 		rte_compiler_barrier();
-		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		const __m128i raw_desc4 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 4));
 		rte_compiler_barrier();
-		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		const __m128i raw_desc3 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 3));
 		rte_compiler_barrier();
-		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		const __m128i raw_desc2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 2));
 		rte_compiler_barrier();
-		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		const __m128i raw_desc1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 1));
 		rte_compiler_barrier();
-		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+		const __m128i raw_desc0 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(rxdp + 0));
 
 		const __m256i raw_desc6_7 =
 			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
@@ -444,37 +448,29 @@  _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads &
 					RTE_ETH_RX_OFFLOAD_RSS_HASH) {
 				/* load bottom half of every 32B desc */
-				const __m128i raw_desc_bh7 =
-					_mm_load_si128
-						((void *)(&rxdp[7].wb.status_error1));
+				const __m128i raw_desc_bh7 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[7].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh6 =
-					_mm_load_si128
-						((void *)(&rxdp[6].wb.status_error1));
+				const __m128i raw_desc_bh6 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[6].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh5 =
-					_mm_load_si128
-						((void *)(&rxdp[5].wb.status_error1));
+				const __m128i raw_desc_bh5 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[5].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh4 =
-					_mm_load_si128
-						((void *)(&rxdp[4].wb.status_error1));
+				const __m128i raw_desc_bh4 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[4].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh3 =
-					_mm_load_si128
-						((void *)(&rxdp[3].wb.status_error1));
+				const __m128i raw_desc_bh3 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh2 =
-					_mm_load_si128
-						((void *)(&rxdp[2].wb.status_error1));
+				const __m128i raw_desc_bh2 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh1 =
-					_mm_load_si128
-						((void *)(&rxdp[1].wb.status_error1));
+				const __m128i raw_desc_bh1 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh0 =
-					_mm_load_si128
-						((void *)(&rxdp[0].wb.status_error1));
+				const __m128i raw_desc_bh0 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1));
 
 				__m256i raw_desc_bh6_7 =
 					_mm256_inserti128_si256
@@ -790,7 +786,7 @@  ice_vtx1(volatile struct ice_tx_desc *txdp,
 		ice_txd_enable_offload(pkt, &high_qw);
 
 	__m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt));
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static __rte_always_inline void
@@ -841,8 +837,8 @@  ice_vtx(volatile struct ice_tx_desc *txdp,
 			_mm256_set_epi64x
 				(hi_qw1, rte_pktmbuf_iova(pkt[1]),
 				 hi_qw0, rte_pktmbuf_iova(pkt[0]));
-		_mm256_store_si256((void *)(txdp + 2), desc2_3);
-		_mm256_store_si256((void *)txdp, desc0_1);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3);
+		_mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1);
 	}
 
 	/* do any last ones */
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index add095ef06..5613478bca 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -7,10 +7,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #define ICE_DESCS_PER_LOOP_AVX 8
 
 static __rte_always_inline void
@@ -244,28 +240,28 @@  _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
 
 		/* load in descriptors, in reverse order */
 		const __m128i raw_desc7 =
-			_mm_load_si128((void *)(rxdp + 7));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7));
 		rte_compiler_barrier();
 		const __m128i raw_desc6 =
-			_mm_load_si128((void *)(rxdp + 6));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6));
 		rte_compiler_barrier();
 		const __m128i raw_desc5 =
-			_mm_load_si128((void *)(rxdp + 5));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5));
 		rte_compiler_barrier();
 		const __m128i raw_desc4 =
-			_mm_load_si128((void *)(rxdp + 4));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4));
 		rte_compiler_barrier();
 		const __m128i raw_desc3 =
-			_mm_load_si128((void *)(rxdp + 3));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 		const __m128i raw_desc2 =
-			_mm_load_si128((void *)(rxdp + 2));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
 		const __m128i raw_desc1 =
-			_mm_load_si128((void *)(rxdp + 1));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
 		const __m128i raw_desc0 =
-			_mm_load_si128((void *)(rxdp + 0));
+			_mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0));
 
 		raw_desc6_7 =
 			_mm256_inserti128_si256
@@ -474,37 +470,29 @@  _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
 			if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads &
 					RTE_ETH_RX_OFFLOAD_RSS_HASH) {
 				/* load bottom half of every 32B desc */
-				const __m128i raw_desc_bh7 =
-					_mm_load_si128
-						((void *)(&rxdp[7].wb.status_error1));
+				const __m128i raw_desc_bh7 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[7].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh6 =
-					_mm_load_si128
-						((void *)(&rxdp[6].wb.status_error1));
+				const __m128i raw_desc_bh6 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[6].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh5 =
-					_mm_load_si128
-						((void *)(&rxdp[5].wb.status_error1));
+				const __m128i raw_desc_bh5 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[5].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh4 =
-					_mm_load_si128
-						((void *)(&rxdp[4].wb.status_error1));
+				const __m128i raw_desc_bh4 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[4].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh3 =
-					_mm_load_si128
-						((void *)(&rxdp[3].wb.status_error1));
+				const __m128i raw_desc_bh3 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh2 =
-					_mm_load_si128
-						((void *)(&rxdp[2].wb.status_error1));
+				const __m128i raw_desc_bh2 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh1 =
-					_mm_load_si128
-						((void *)(&rxdp[1].wb.status_error1));
+				const __m128i raw_desc_bh1 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1));
 				rte_compiler_barrier();
-				const __m128i raw_desc_bh0 =
-					_mm_load_si128
-						((void *)(&rxdp[0].wb.status_error1));
+				const __m128i raw_desc_bh0 = _mm_load_si128
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1));
 
 				__m256i raw_desc_bh6_7 =
 					_mm256_inserti128_si256
@@ -987,7 +975,7 @@  ice_vtx1(volatile struct ice_tx_desc *txdp,
 		ice_txd_enable_offload(pkt, &high_qw);
 
 	__m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt));
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static __rte_always_inline void
@@ -1029,7 +1017,7 @@  ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
 				 hi_qw2, rte_pktmbuf_iova(pkt[2]),
 				 hi_qw1, rte_pktmbuf_iova(pkt[1]),
 				 hi_qw0, rte_pktmbuf_iova(pkt[0]));
-		_mm512_storeu_si512((void *)txdp, desc0_3);
+		_mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3);
 	}
 
 	/* do any last ones */
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 4b73465af5..45147decff 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -7,10 +7,6 @@ 
 
 #include "ice_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline uint16_t
 ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs,
 			  uint16_t nb_bufs, uint8_t *split_flags)
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index c01d8ede29..5f536fe5c5 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -6,10 +6,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline __m128i
 ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3)
 {
@@ -52,7 +48,7 @@  ice_rxq_rearm(struct ice_rx_queue *rxq)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < ICE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -91,8 +87,8 @@  ice_rxq_rearm(struct ice_rx_queue *rxq)
 		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += ICE_RXQ_REARM_THRESH;
@@ -425,7 +421,7 @@  _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]);
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -437,11 +433,11 @@  _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		descs[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -491,19 +487,19 @@  _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			/* load bottom half of every 32B desc */
 			const __m128i raw_desc_bh3 =
 				_mm_load_si128
-					((void *)(&rxdp[3].wb.status_error1));
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1));
 			rte_compiler_barrier();
 			const __m128i raw_desc_bh2 =
 				_mm_load_si128
-					((void *)(&rxdp[2].wb.status_error1));
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1));
 			rte_compiler_barrier();
 			const __m128i raw_desc_bh1 =
 				_mm_load_si128
-					((void *)(&rxdp[1].wb.status_error1));
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1));
 			rte_compiler_barrier();
 			const __m128i raw_desc_bh0 =
 				_mm_load_si128
-					((void *)(&rxdp[0].wb.status_error1));
+					(RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1));
 
 			/**
 			 * to shift the 32b RSS hash value to the
@@ -680,7 +676,7 @@  ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt,
 		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt));
-	_mm_store_si128((__m128i *)txdp, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h
index 2787d27616..002c1e6948 100644
--- a/drivers/net/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/idpf/idpf_rxtx_vec_common.h
@@ -11,10 +11,6 @@ 
 #include "idpf_ethdev.h"
 #include "idpf_rxtx.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 #define IDPF_SCALAR_PATH		0
 #define IDPF_VECTOR_PATH		1
 #define IDPF_RX_NO_VECTOR_FLAGS (		\
diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
index d451562269..92a89f8def 100644
--- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
+++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c
@@ -8,8 +8,6 @@ 
 #include "ixgbe_ethdev.h"
 #include "ixgbe_rxtx.h"
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 void
 ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs)
 {
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 952b032eb6..1c4a52a84a 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -11,8 +11,6 @@ 
 #include "ixgbe_rxtx.h"
 #include "ixgbe_rxtx_vec_common.h"
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 static inline void
 ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 {
@@ -36,7 +34,7 @@  ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 		    rxq->nb_rx_desc) {
 			for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				vst1q_u64((uint64_t *)&rxdp[i].read,
+				vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 					  zero);
 			}
 		}
@@ -60,12 +58,12 @@  ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 		paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr0 = vsetq_lane_u64(paddr, zero, 0);
 		/* flush desc with pa dma_addr */
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
 
 		vst1_u8((uint8_t *)&mb1->rearm_data, p);
 		paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr1 = vsetq_lane_u64(paddr, zero, 0);
-		vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH;
@@ -367,10 +365,10 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
 
 		/* A. load 4 pkts descs */
-		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
-		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
-		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
-		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
+		descs[0] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp));
+		descs[1] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
+		descs[2] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
+		descs[3] =  vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);
@@ -554,7 +552,7 @@  vtx1(volatile union ixgbe_adv_tx_desc *txdp,
 			pkt->buf_iova + pkt->data_off,
 			(uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len};
 
-	vst1q_u64((uint64_t *)&txdp->read, descriptor);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&txdp->read), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index a77370cdb7..c3c71d442f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -12,10 +12,6 @@ 
 
 #include <rte_vect.h>
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 static inline void
 ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 {
@@ -41,7 +37,7 @@  ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 			dma_addr0 = _mm_setzero_si128();
 			for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
+				_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read),
 						dma_addr0);
 			}
 		}
@@ -76,8 +72,8 @@  ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 		dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
 
 		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0);
+		_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH;
@@ -466,7 +462,7 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load desc[3] */
-		descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+		descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3));
 		rte_compiler_barrier();
 
 		/* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */
@@ -478,11 +474,11 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		/* A.1 load desc[2-0] */
-		descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2));
 		rte_compiler_barrier();
-		descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1));
 		rte_compiler_barrier();
-		descs[0] = _mm_loadu_si128((__m128i *)(rxdp));
+		descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp));
 
 #if defined(RTE_ARCH_X86_64)
 		/* B.2 copy 2 mbuf point into rx_pkts  */
@@ -676,7 +672,7 @@  vtx1(volatile union ixgbe_adv_tx_desc *txdp,
 	__m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 |
 			flags | pkt->data_len,
 			pkt->buf_iova + pkt->data_off);
-	_mm_store_si128((__m128i *)&txdp->read, descriptor);
+	_mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&txdp->read), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 16ddd05448..8bfbc7290d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7287,10 +7287,7 @@  flow_tunnel_from_rule(const struct mlx5_flow *flow)
 {
 	struct mlx5_flow_tunnel *tunnel;
 
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wcast-qual"
-	tunnel = (typeof(tunnel))flow->tunnel;
-#pragma GCC diagnostic pop
+	tunnel = (typeof(tunnel))RTE_PTR_DROP_QUALIFIERS(flow->tunnel);
 
 	return tunnel;
 }
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 240987d03d..b37483bcca 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -25,11 +25,6 @@ 
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#pragma GCC diagnostic ignored "-Wstrict-aliasing"
-#endif
-
 /**
  * Store free buffers to RX SW ring.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index dc1d30753d..290395cb5d 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -25,8 +25,6 @@ 
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
 
-#pragma GCC diagnostic ignored "-Wcast-qual"
-
 /**
  * Store free buffers to RX SW ring.
  *
@@ -75,7 +73,7 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		    struct rte_mbuf **elts, bool keep)
 {
 	volatile struct mlx5_mini_cqe8 *mcq =
-		(void *)&(cq + !rxq->cqe_comp_layout)->pkt_info;
+		(volatile void *)&(cq + !rxq->cqe_comp_layout)->pkt_info;
 	/* Title packet is pre-built. */
 	struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0];
 	unsigned int pos;
@@ -139,9 +137,9 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 	 */
 cycle:
 	if (rxq->cqe_comp_layout)
-		rte_prefetch0((void *)(cq + mcqe_n));
+		rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + mcqe_n));
 	for (pos = 0; pos < mcqe_n; ) {
-		uint8_t *p = (void *)&mcq[pos % 8];
+		uint8_t *p = RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8]);
 		uint8_t *e0 = (void *)&elts[pos]->rearm_data;
 		uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data;
 		uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data;
@@ -157,7 +155,7 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		if (!rxq->cqe_comp_layout)
 			for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i)
 				if (likely(pos + i < mcqe_n))
-					rte_prefetch0((void *)(cq + pos + i));
+					rte_prefetch0((volatile void *)(cq + pos + i));
 		__asm__ volatile (
 		/* A.1 load mCQEs into a 128bit register. */
 		"ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t"
@@ -367,8 +365,8 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		if (!rxq->cqe_comp_layout) {
 			if (!(pos & 0x7) && pos < mcqe_n) {
 				if (pos + 8 < mcqe_n)
-					rte_prefetch0((void *)(cq + pos + 8));
-				mcq = (void *)&(cq + pos)->pkt_info;
+					rte_prefetch0((volatile void *)(cq + pos + 8));
+				mcq = (volatile void *)&(cq + pos)->pkt_info;
 				for (i = 0; i < 8; ++i)
 					cq[inv++].op_own = MLX5_CQE_INVALIDATE;
 			}
@@ -383,7 +381,7 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		    MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) {
 			pos = 0;
 			elts = &elts[mcqe_n];
-			mcq = (void *)cq;
+			mcq = (volatile void *)cq;
 			mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1;
 			pkts_n += mcqe_n;
 			goto cycle;
@@ -663,7 +661,7 @@  rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ?
 				   -1UL >> ((pkts_n - pos) *
 					    sizeof(uint16_t) * 8) : 0);
-		p0 = (void *)&cq[pos].pkt_info;
+		p0 = RTE_PTR_DROP_QUALIFIERS(&cq[pos].pkt_info);
 		p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe);
 		p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe);
 		p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe);
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 81a177fce7..c235c8eeee 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -24,10 +24,6 @@ 
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 /**
  * Store free buffers to RX SW ring.
  *
@@ -75,7 +71,8 @@  static inline uint16_t
 rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		    struct rte_mbuf **elts, bool keep)
 {
-	volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout);
+	volatile struct mlx5_mini_cqe8 *mcq =
+		(volatile void *)(cq + !rxq->cqe_comp_layout);
 	/* Title packet is pre-built. */
 	struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0];
 	unsigned int pos;
@@ -130,7 +127,7 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 	 */
 cycle:
 	if (rxq->cqe_comp_layout)
-		rte_prefetch0((void *)(cq + mcqe_n));
+		rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + mcqe_n));
 	for (pos = 0; pos < mcqe_n; ) {
 		__m128i mcqe1, mcqe2;
 		__m128i rxdf1, rxdf2;
@@ -141,10 +138,10 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		if (!rxq->cqe_comp_layout)
 			for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i)
 				if (likely(pos + i < mcqe_n))
-					rte_prefetch0((void *)(cq + pos + i));
+					rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + pos + i));
 		/* A.1 load mCQEs into a 128bit register. */
-		mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]);
-		mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]);
+		mcqe1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8]));
+		mcqe2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8 + 2]));
 		/* B.1 store rearm data to mbuf. */
 		_mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm);
 		_mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm);
@@ -355,8 +352,8 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		if (!rxq->cqe_comp_layout) {
 			if (!(pos & 0x7) && pos < mcqe_n) {
 				if (pos + 8 < mcqe_n)
-					rte_prefetch0((void *)(cq + pos + 8));
-				mcq = (void *)(cq + pos);
+					rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + pos + 8));
+				mcq = (volatile void *)(cq + pos);
 				for (i = 0; i < 8; ++i)
 					cq[inv++].op_own = MLX5_CQE_INVALIDATE;
 			}
@@ -371,7 +368,7 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		    MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) {
 			pos = 0;
 			elts = &elts[mcqe_n];
-			mcq = (void *)cq;
+			mcq = (volatile void *)cq;
 			mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1;
 			pkts_n += mcqe_n;
 			goto cycle;
@@ -651,38 +648,44 @@  rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		p = _mm_andnot_si128(mask, p);
 		/* A.1 load cqes. */
 		p3 = _mm_extract_epi16(p, 3);
-		cqes[3] = _mm_loadl_epi64((__m128i *)
-					   &cq[pos + p3].sop_drop_qpn);
+		cqes[3] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+					   (&cq[pos + p3].sop_drop_qpn));
 		rte_compiler_barrier();
 		p2 = _mm_extract_epi16(p, 2);
-		cqes[2] = _mm_loadl_epi64((__m128i *)
-					   &cq[pos + p2].sop_drop_qpn);
+		cqes[2] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+					   (&cq[pos + p2].sop_drop_qpn));
 		rte_compiler_barrier();
 		/* B.1 load mbuf pointers. */
 		mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]);
 		mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]);
 		/* A.1 load a block having op_own. */
 		p1 = _mm_extract_epi16(p, 1);
-		cqes[1] = _mm_loadl_epi64((__m128i *)
-					   &cq[pos + p1].sop_drop_qpn);
+		cqes[1] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+					   (&cq[pos + p1].sop_drop_qpn));
 		rte_compiler_barrier();
-		cqes[0] = _mm_loadl_epi64((__m128i *)
-					   &cq[pos].sop_drop_qpn);
+		cqes[0] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+					   (&cq[pos].sop_drop_qpn));
 		/* B.2 copy mbuf pointers. */
 		_mm_storeu_si128((__m128i *)&pkts[pos], mbp1);
 		_mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2);
 		rte_io_rmb();
 		/* C.1 load remained CQE data and extract necessary fields. */
-		cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]);
-		cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]);
+		cqe_tmp2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p3]));
+		cqe_tmp1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p2]));
 		cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask);
 		cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask);
-		cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum);
-		cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum);
+		cqe_tmp2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p3].csum));
+		cqe_tmp1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p2].csum));
 		cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30);
 		cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30);
-		cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]);
-		cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]);
+		cqe_tmp2 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p3].rsvd4[2]));
+		cqe_tmp1 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p2].rsvd4[2]));
 		cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04);
 		cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04);
 		/* C.2 generate final structure for mbuf with swapping bytes. */
@@ -700,16 +703,20 @@  rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		/* E.1 extract op_own field. */
 		op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]);
 		/* C.1 load remained CQE data and extract necessary fields. */
-		cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]);
-		cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]);
+		cqe_tmp2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(&cq[pos + p1]));
+		cqe_tmp1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(&cq[pos]));
 		cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask);
 		cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask);
-		cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum);
-		cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum);
+		cqe_tmp2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p1].csum));
+		cqe_tmp1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos].csum));
 		cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30);
 		cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30);
-		cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]);
-		cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]);
+		cqe_tmp2 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos + p1].rsvd4[2]));
+		cqe_tmp1 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS
+			(&cq[pos].rsvd4[2]));
 		cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04);
 		cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04);
 		/* C.2 generate final structure for mbuf with swapping bytes. */
diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c
index 37075ea5e7..3c555214a8 100644
--- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c
+++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c
@@ -35,7 +35,7 @@  ngbe_rxq_rearm(struct ngbe_rx_queue *rxq)
 		    rxq->nb_rx_desc) {
 			for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero);
+				vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), zero);
 			}
 		}
 		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
@@ -58,12 +58,12 @@  ngbe_rxq_rearm(struct ngbe_rx_queue *rxq)
 		paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr0 = vsetq_lane_u64(paddr, zero, 0);
 		/* flush desc with pa dma_addr */
-		vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr0);
 
 		vst1_u8((uint8_t *)&mb1->rearm_data, p);
 		paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr1 = vsetq_lane_u64(paddr, zero, 0);
-		vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH;
@@ -484,7 +484,7 @@  vtx1(volatile struct ngbe_tx_desc *txdp,
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off,
 				(uint64_t)pkt_len << 45 | flags | pkt_len};
 
-	vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index c0e44bb1a7..373b773e2d 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -23,10 +23,10 @@ 
 
 #ifdef HAVE_BPF_RSS
 /* Workaround for warning in bpftool generated skeleton code */
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wcast-qual"
+__rte_diagnostic_push
+__rte_diagnostic_ignored_wcast_qual
 #include "tap_rss.skel.h"
-#pragma GCC diagnostic pop
+__rte_diagnostic_pop
 #endif
 
 #define ISOLATE_HANDLE 1
diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c
index d4d647fab5..713b8fc26b 100644
--- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c
+++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c
@@ -34,7 +34,7 @@  txgbe_rxq_rearm(struct txgbe_rx_queue *rxq)
 		    rxq->nb_rx_desc) {
 			for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) {
 				rxep[i].mbuf = &rxq->fake_mbuf;
-				vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero);
+				vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), zero);
 			}
 		}
 		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
@@ -57,12 +57,12 @@  txgbe_rxq_rearm(struct txgbe_rx_queue *rxq)
 		paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr0 = vsetq_lane_u64(paddr, zero, 0);
 		/* flush desc with pa dma_addr */
-		vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr0);
 
 		vst1_u8((uint8_t *)&mb1->rearm_data, p);
 		paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM;
 		dma_addr1 = vsetq_lane_u64(paddr, zero, 0);
-		vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1);
+		vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr1);
 	}
 
 	rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH;
@@ -484,7 +484,7 @@  vtx1(volatile struct txgbe_tx_desc *txdp,
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off,
 				(uint64_t)pkt_len << 45 | flags | pkt_len};
 
-	vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor);
+	vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor);
 }
 
 static inline void
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index 438256970d..439e00a7e1 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -23,10 +23,6 @@ 
 
 #include "virtio_rxtx_simple.h"
 
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-
 int __rte_cold
 virtio_rxq_vec_setup(struct virtnet_rx *rxq)
 {