2

From the range of OSs available for Raspberry Pi, I would like to know whether the distro impacts the performance of the OS in pure terminal mode. I plan to run computationally power consuming algorithms with it and hence I would like to get the best out of its performance.

Specifically, I am interested in Raspbian, Ubuntu MATE and ArchLinux. How does the performance compare when there is not GUI related process running at the backround? Suppose if I were to disable the startx to boot in text mode, will there be a difference in the speed between these distros?

I would prefer Raspbian or Ubuntu MATE as it already has its WiFi and Bluetooth readily configured (I need them both), but was wondering if Arch can be any faster.

Ébe Isaac
  • 931
  • 2
  • 8
  • 12
  • 1
    If you really want to know, try each and benchmark, rather than others opinion. – Milliways Jul 26 '16 at 04:57
  • Thanks for the suggestion, @Milliways. I was going to do just that, but was wondering if it has been done before. – Ébe Isaac Jul 26 '16 at 05:10
  • I thought I saw a performance comparison of distro's some time ago but I am unable to find it again. While it might be true that a lightweight distro might save some time during boot I would not expect big performance gain. Do not underestimate the value of a well documented distro with a large user base making any configuration or bug fixing task a lot easier (ok, but Arch is well documented too). Just consider making that part of your *performance metrics* too. – Ghanima Jul 26 '16 at 07:50

3 Answers3

3

I would like to know whether the distro impacts the performance of the OS in pure terminal mode. I plan to run computationally power consuming algorithms with it and hence I would like to get the best out of its performance...

It depends on the computation and the OS, and less on the terminal. The RPI-3 is ARMv8/Aarch64. ARMv8 is 64-bit, so it can be more efficient than earlier models of the Raspberry Pi.

ARMv8/Aarch64 also has two (maybe more) useful instructions: pmull and pmull2. They perform 64x64 → 128-bit carryless multiplies. The multiplier can be used in a larger multiplier that improves speed in some context by 30%. Additional instructions include hardware acceleration for CRC, AES, SHA1 and SHA2.

The OS image provided by the Raspberry project does not utilize or expose the ARMv8 instructions; and the toolchain is not capable of compiling programs which take advantage of the instructions (even though Aarch32 executions environments on Aarch64 are completely valid). Also see Enable crc32 for armv7? on the GCC help mailing list.


Raspbian, Ubuntu MATE and ArchLinux...

I think you should make ARMv8/AArch64 a priority in your quest. Unfortunately, I don't believe you will find a 64-bit OS from Ubuntu or ArchLinux. My apologies if its bike shedding.

The CentOS ARM folks are getting ready to investigate the feasibility of a 64-bit OS image for the device. Also see Raspberry Pi 3 and Aarch64 image? on the ARM-dev mailing list.


You can see some of the cpu features by inspecting /proc/cpuinfo. Notice its missing:

$ cat /proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS    : 38.40
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part    : 0xd03
CPU revision    : 4
...

Hardware    : BCM2709
Revision    : a22082
Serial      : 00000000e7ffc20d

Notice its missing flags like pmull and pmull2. Surprisingly, it includes crc32 (because its an ARMv8 extension).


I know the Broadcom SoC can consume the instructions because I wrote a program that tested the instructions. The program simply emitted the byte codes for the instructions, and the compiler could not stop me from using them :)

If interested, here's what it would look like. If you compile it with -march=armv7-a -mfpu=neon and the run it, then you will notice it does not die with an illegal instruction and it returns 0 as expected.

$ cat ../test.cc
#include <arm_neon.h>
int main(int argc, char* argv[])
{
  __asm__ __volatile__
  (
   #ifndef __aarch64__
     ".code 32"
   #endif

   // PMULL
   ".byte 0x0e, 0xe1, 0xe0, 0x00;\n"
   // PMULL2
   ".byte 0x4e, 0xe1, 0xe0, 0x00;\n"
   ...

  : : : "cc", "d0", "d1", "d2", "q0", "q1", "q2"
 );

  return 0;
}
  • Thank you very much for the detailed explanation. Is there a significant performance difference considering Python-NumPy implementation of code. I realize C++ is faster in theory, but as my applications consider Data Science, I'm mostly inclined to Python and R. Any views on this? – Ébe Isaac Jul 28 '16 at 03:43
  • I think there are two questions there. First, is there a difference between 32-bit Python and 64-bit Python. For this question, I'm guessing YES. I expect 64-bit to be faster because it can operate on more data per cycle. The second question is, does Python take advantages of the instructions. For this question, I'm guessing NO, but I have no real knowledge. (I have knowledge of some of the issues because I help maintain the [Crypto++ library](http://www.cryptopp.com/). IoT gadgets are a priority for me. I'm the guy who implemented GCM mode for ARM using instructions like PMULL and PMULL2). –  Jul 28 '16 at 04:32
  • Nice, interesting, detailed answer. Two observations for posterity though: Ubuntu, Archlinux, and I'm presuming all other major distros *do* have `aarch64` versions and there's no reason they could not be used on the Pi 3 [following this methodology](http://raspberrypi.stackexchange.com/a/27545/5538) -- of course some people are going to have trouble with that, and other people are going to recognize it takes less than an hour if you understand it; the only uncertain part is *if* you want to go all the way, you'd want to compile the kernel for the architecture too. – goldilocks Aug 03 '16 at 14:26
  • (BTW, I do use that on the Pi 3 but have not bothered trying aarch64 as I currently use the same card in a 2, and...) Second, while I understand you are being completely honest, people should not get too excited/misled by the "may improve speed by 30% in some contexts" -- based on this I still think overall, for general purposes, it is not going to make much difference for most people doing most things, so unless you are really interested and know what you are doing, don't bother with what I've just suggested should be possible to try without much trouble. – goldilocks Aug 03 '16 at 14:26
2

I don't have a thorough answer as I have not done any benchmarking or deep investigation of such, but the only thing that could be significant here is the difference between something compiled for ARMv6 (e.g., Raspbian), something compiled for ARMv7 (various distros targeting the 2/3 including Ubuntu variants), and something targeting 64-bit ARMv8 (Pi 3 only; there is nothing pi specific here, but there are distros you could adapt to it).

I run an ARMv7 distro on the 2/3 and TBH I do not think that amounts to much if any difference performance wise. The version of ARMv6 implemented on the single core models, for which Raspbian is compiled, is actually very close to v7. Further, the fact that the Raspberry Pi Foundation did their benchmarks on the 2 and 3 using Raspbian implies that they did not see a huge advantage in using anything else (I do not have a direct reference for that but it is implicit in the material on their site).

A quick search regarding "aarch64" distro benchmarks on the Pi 3 at first seems to offer a modest advantage, although if you look closely at that page you'll notice the tests weren't done on a Pi 3, they were done on a Cortex based Juno board -- something to watch out for if you do your own research.

At this point, if someone had demonstrated aarch64 made much difference on the Pi 3, I think that information would be easy to find, so the silence implies it does not.

How does the performance compare when there is not GUI related process running at the background?

Running something inside a GUI won't make any real difference performance wise unless the additional memory consumed becomes significant. Ideally, on the pi you probably want at least 50-100 MB free for the page cache, which does generally speed performance. If you are not getting that, you want to try and free some up, whether it is from a GUI or something else. Beware that many tools include (or distinguish) the page cache in their reporting of memory use, so if at first it appears full, check and see if that is the case. If it includes the cache and the system has been in use for a while, it should be full.

Likewise, if there are other things active and contending for CPU time that are part of the GUI, this will have an impact. However, this is equally possible without a GUI. In my experience the only significant culprits here are (ironically) eye candy system monitors, which on a pi may use as much as 5-10% of a core depending how frequently they update.

You can use tools such as top or htop to check for this kind of thing, GUI or not. Since they're monitors, they may end up appearing frequently near the top of their own list, but you only need them for evaluation.

goldilocks
  • 56,430
  • 17
  • 109
  • 217
  • Thank you for the detailed answer. About the "eye candy system monitors"; could you name a few? – Ébe Isaac Jul 26 '16 at 12:15
  • I'm a `gkrellm` fan so more specifically that's what I was referring to, but that is with quite a pile of metrics updated several times per second; another one that is/was popular on linux is `conky`. The little taskbar based CPU monitors that may be available in LXDE (the Raspbian desktop) are probably much lighter. – goldilocks Jul 26 '16 at 12:19
0

If your application benefits from the extra instructions available in the Armv8 instruction set, then you could consider Gentoo and build your own. I looked at the Gentoo Raspberry Pi guide and it looks like a doable solution. The Gentoo community has good and knowledgable support. If you're setting up multiple Pis, then it might not be a big overhead. The most hassle could be the distcc setup for compiling everyt

NTwoO
  • 1