See the question and my original answer on StackOverflow

The performance problem doesn't come from interop nor from C#, it comes from the fact you use bitmap's Width and Height in the loop. Both internally call a GDI Plus API:

public int Width {
    get {
        int width; 
 
        int status = SafeNativeMethods.Gdip.GdipGetImageWidth(new HandleRef(this, nativeImage), out width);
 
        if (status != SafeNativeMethods.Gdip.Ok)
            throw SafeNativeMethods.Gdip.StatusException(status);
 
        return width;
    }
}

And note you don't do this in the C/C++ case... you pass precomputed height and width. So, if you change the C# version for this:

unsafe
{
    byte* array = (byte*)bmpData.Scan0.ToPointer();
    byte temp;
    var max = bmp.Width * bmp.Height * 3;
    for (int x = 0; x < max; x = x + 3) {
        temp = *(array + x + 2);
        *(array + x + 2) = *(array + x);
        *(array + x) = temp;
    }
}

It may run globally even faster. You can also use a safe version like this:

private static void ChangeSafe(Bitmap bmp, BitmapData bmpData)
{
    var array = bmpData.Scan0;
    byte temp;
    var max = bmp.Width * bmp.Height * 3;
    for (var x = 0; x < max; x = x + 3)
    {
        temp = Marshal.ReadByte(array + x + 2);
        Marshal.WriteByte(array + x + 2, Marshal.ReadByte(array + x));
        Marshal.WriteByte(array + x, temp);
    }
}

It's marginally slower but avoids the need for unsafe code.