Patchwork [OpenWrt-Devel] ar71xx TCP/IPsec unaligned instructions

login
register
Submitter Markus Stockhausen
Date 2012-01-04 15:58:33
Message ID <web-3872266@collogia.de>
Download mbox | patch
Permalink /patch/1721/
State Superseded
Headers show

Comments

Markus Stockhausen - 2012-01-04 15:58:33
Hello,

attached an extension to the already existing patch that will fix some more unaligned
accesses on ar71xx devices. First patch location during establishing of tcp recv operations.
The second one is only relevant for IPsec tunnels. In my setup this results in the following
numbers:

- Without patch:

Normal transmission of 400MB between router and client ~ 140k mem faults
Encrpyted transmission of 200MB between router and client ~ 600k mem faults

With patch:

Normal transmission of 400MB between router and client ~ No more mem faults
Encrpyted transmission of 200MB between router and client ~ 480k mem faults

Signed-off-by: markus <dot> stockhausen <at> collogia <dot> de

Best regards.

Markus
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************
Jonathan Bither - 2012-01-04 16:10:51
On a side note would the unaligned patch in the ar71xx target benefit 
the atheros target?

On 01/04/2012 10:58 AM, Markus Stockhausen wrote:
> Hello,
>
> attached an extension to the already existing patch that will fix some
> more unaligned
> accesses on ar71xx devices. First patch location during establishing of
> tcp recv operations.
> The second one is only relevant for IPsec tunnels. In my setup this
> results in the following
> numbers:
>
> - Without patch:
>
> Normal transmission of 400MB between router and client ~ 140k mem faults
> Encrpyted transmission of 200MB between router and client ~ 600k mem faults
>
> With patch:
>
> Normal transmission of 400MB between router and client ~ No more mem faults
> Encrpyted transmission of 200MB between router and client ~ 480k mem faults
>
> Signed-off-by: markus <dot> stockhausen <at> collogia <dot> de
>
> Best regards.
>
> Markus
>
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Dave Taht - 2012-01-04 16:23:00
2012/1/4 Markus Stockhausen <markus.stockhausen@collogia.de>:
> Hello,
>
> attached an extension to the already existing patch that will fix some more
> unaligned
> accesses on ar71xx devices. First patch location during establishing of tcp
> recv operations.

Awesome. Did you get any different performance numbers from something
like iperf or netperf from this?

> The second one is only relevant for IPsec tunnels. In my setup this results
> in the following
> numbers:
>
> - Without patch:
>
> Normal transmission of 400MB between router and client ~ 140k mem faults
> Encrpyted transmission of 200MB between router and client ~ 600k mem faults
>
> With patch:
>
> Normal transmission of 400MB between router and client ~ No more mem faults
> Encrpyted transmission of 200MB between router and client ~ 480k mem faults
>
> Signed-off-by: markus <dot> stockhausen <at> collogia <dot> de
>
> Best regards.
>
> Markus
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
>
Markus Stockhausen - 2012-01-04 19:18:58
> Awesome. Did you get any different performance numbers from something
> like iperf or netperf from this?
  
Sorry but the patch is only a residue from my work on a fast IPsec module.
It might help when doing normal transfers over gigabit interconnect. Here
a snippet with the patch applied for an IPsec AES128/MD5 throughput
of around 5MB/sec.

CPU: MIPS 24K, speed 0 MHz (estimated)
Counted INSTRUCTIONS events (Instructions completed) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        app name                 symbol name
8834     47.9587  mipsec.ko                aes_enc_loop
2559     13.8925  mipsec.ko                mi_md5_transform
725       3.9359  crypto_algapi.ko         crypto_xor
620       3.3659  vmlinux                  __copy_user
482       2.6167  mipsec.ko                mi_aes_encrypt
448       2.4321  vmlinux                  rt_intern_hash
367       1.9924  ip_tables.ko             ipt_do_table
251       1.3626  mipsec.ko                aes_dec_loop
...
30        0.1629  vmlinux                  do_ade
...

Markus
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************
Dave Taht - 2012-01-29 16:17:27
2012/1/4 Markus Stockhausen <markus.stockhausen@collogia.de>:
> Hello,
>
> attached an extension to the already existing patch that will fix some more
> unaligned
> accesses on ar71xx devices. First patch location during establishing of tcp
> recv operations.
> The second one is only relevant for IPsec tunnels. In my setup this results
> in the following
> numbers:
>
> - Without patch:
>
> Normal transmission of 400MB between router and client ~ 140k mem faults
> Encrpyted transmission of 200MB between router and client ~ 600k mem faults
>
> With patch:
>
> Normal transmission of 400MB between router and client ~ No more mem faults
> Encrpyted transmission of 200MB between router and client ~ 480k mem faults


I am planning to play with this patch this week, and update to 3.2.2.

I am curious if you tracked down the causes of the other 480k mem faults?

> Signed-off-by: markus <dot> stockhausen <at> collogia <dot> de
>
> Best regards.
>
> Markus
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
>

Patch

Index: target/linux/ar71xx/patches-2.6.39/910-unaligned_access_hacks.patch
===================================================================
--- target/linux/ar71xx/patches-2.6.39/910-unaligned_access_hacks.patch	(Revision 29645)
+++ target/linux/ar71xx/patches-2.6.39/910-unaligned_access_hacks.patch	(Arbeitskopie)
@@ -115,3 +115,47 @@ 
  
  	return true;
  }
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -5234,6 +5234,7 @@ int tcp_rcv_established(struct sock *sk,
+ {
+ 	struct tcp_sock *tp = tcp_sk(sk);
+ 	int res;
++	__be32 *tfw = ((union tcp_word_hdr *)(tp))->words + 3;
+ 
+ 	/*
+ 	 *	Header prediction.
+@@ -5261,7 +5262,7 @@ int tcp_rcv_established(struct sock *sk,
+ 	 *	PSH flag is ignored.
+ 	 */
+ 
+-	if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags &&
++	if ((__get_unaligned_cpu32(tfw) & TCP_HP_BITS) == tp->pred_flags &&
+ 	    TCP_SKB_CB(skb)->seq == tp->rcv_nxt &&
+ 	    !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
+ 		int tcp_header_len = tp->tcp_header_len;
+--- a/net/xfrm/xfrm_input.c
++++ b/net/xfrm/xfrm_input.c
+@@ -52,6 +52,7 @@ int xfrm_parse_spi(struct sk_buff *skb,
+ {
+ 	int offset, offset_seq;
+ 	int hlen;
++	__be32 *pspi, *pseq;
+ 
+ 	switch (nexthdr) {
+ 	case IPPROTO_AH:
+@@ -77,8 +78,12 @@ int xfrm_parse_spi(struct sk_buff *skb,
+ 	if (!pskb_may_pull(skb, hlen))
+ 		return -EINVAL;
+ 
+-	*spi = *(__be32*)(skb_transport_header(skb) + offset);
+-	*seq = *(__be32*)(skb_transport_header(skb) + offset_seq);
++	pspi = (__be32 *)(skb_transport_header(skb) + offset);
++	pseq = (__be32 *)(skb_transport_header(skb) + offset_seq);
++
++	*spi = __get_unaligned_cpu32(pspi);
++	*seq = __get_unaligned_cpu32(pseq);
++
+ 	return 0;
+ }
+