tamthegod's blog

By tamthegod, history, 4 weeks ago, In English
ll res = -oo;
    for(int k=1; k<=n; k++)
        for(int i=1; i<=n-k+1; i++)
            for(int j=1; j<=n-k+1; j++)
            {
                f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
                res = max(res, f[i][j][k]);
            }
    cout << res;
ll res = -oo;
    for(int i=1; i<=n; i++)
        for(int j=1; j<=n; j++)
            for(int k=1; k<=n; k++)
            {
                if(k + i > n || k + j > n) break;
                f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
                res = max(res, f[i][j][k]);
            }
    cout << res;

These code seem not to be differ too much but execution time have much differ. Example in case n = 500, code1 run in 3103ms but code 2 run only 789ms. What is the reason ? Pls explain to me. Thanks.

 
 
 
 
  • Vote: I like it
  • +20
  • Vote: I do not like it

»
4 weeks ago, # |
  Vote: I like it +3 Vote: I do not like it

From my little understanding, arrays are arranged row order in continuos blocks of memory. i.e. the first row is in a block of memory, then the second row immediately continues from where the first one left off and so on...

This makes it easier to go to the next row in a loop that is ordered row by row by just going to the next position in memory eveytime making it cache friendly. If ordered the other way around, it takes more time as you have to continuosly make jumps in memory position when switching to different rows which isn't cache friendly

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, C arrays and C++ vectors have to store elements contiguously to make pointer arithmetic to work.

    In fact C's pointer arithmetic prevents compiler optimizations unless used with restrict

»
4 weeks ago, # |
  Vote: I like it +3 Vote: I do not like it

Hey, can you please put the codes inside the block

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by tamthegod (previous revision, new revision, compare).

»
4 weeks ago, # |
  Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by tamthegod (previous revision, new revision, compare).

»
11 days ago, # |
  Vote: I like it 0 Vote: I do not like it

pro vjp idol

»
11 days ago, # |
Rev. 4   Vote: I like it +5 Vote: I do not like it

Because of how memory works
Whenever you read 4 bytes, you actually load 64 bytes into cpu caches and basically read other 15 values for free (MUCH cheaper than from memory) if you read contigious ranges of memory

But if you random jump, you lose advantages of caches, because they're limited and always filled with data you currently don't need and you have to read each value from memory directly