Vectorisation #130

sv2518 · 2020-07-23T12:59:38Z

I would like to merge TJs work on introduction of a C vector target in order to enable vectorisation. PyOP2 and firedrake pass all tests with this branch. If possible I would like to merge this into master, since it should not depend on the callables structure introduced in the firedrake branch.

…vector types.

Don't make code generation dependent on the variable name of the zero temporary in various places. Generate temporary inside cvec codegeneration. Generate one zero vec per type for constant assignment to vector types.

…ames is zero.

sv2518 · 2020-07-23T13:23:21Z

loopy/check.py

-                        "parameter '%s' and contains parallel "
-                        "inames '%s'. This is not allowed (for now)."
-                        % (i, par, ", ".join(par_inames)))
+                warn_with_kernel(kernel, "data_dep_par_bound('%s')" % par,


PyOP2 and Firedrake tests are passing even though we get this error, so I changed the error to a warning. This might need a more thorough investigation, but that was out of my depth for the moment.

sv2518 · 2020-07-23T13:24:48Z

loopy/codegen/loop.py

        if lower_slab:
            slabs.append(lower_slab)
+        slabs.append(("bulk", bulk_slab))


This should probably only go into the firedrake branch. The code generation in PyOP2 needs the right order of the slabs.

sv2518 · 2020-07-23T13:29:45Z

loopy/target/c/__init__.py

+    """C Target with vector extensions, e.g. double4
+    """
+
+    def __init__(self, length):


Compared to TJ I have added that the CVecTarget gets initialised with the batch size. The information is necessary to generate the right preamble, e.g. to typedef double to double4. For an instruction with a constant right hand side, we need to add a vector type zero otherwise the compiler won't vectorise - for the declaration of those zeros the batch size is also necessary.

sv2518 · 2020-08-17T10:26:03Z

loopy/target/c/__init__.py

-        return Comment(s)
+        # Attaching empty comment wrapping in block needed
+        # otw compiler error due to empty block
+        return Block([Comment(s), Comment(" ")])


The remaining failing test is a DocTestFailure due to this.

I fixed it by adjusting the test. Maybe this is something that should only go in the firedrake branch as well.

kaushikcfd · 2020-08-17T16:19:11Z

Hi Sophia,
Thanks for the PR. Could you please point out what additional features does c_vec provide that's not already present in the vec iname tags.
IIUC the advantage of the c_vec tag over vec is it emits cleaner code compared to vec by making use of language extensions provided by GCC / Clang.

sv2518 · 2020-08-19T07:41:43Z

In my understanding vec iname tags are only available for OpenCL/Cuda target, am I mistaken? The cvec tag allows to use C vector extensions.

inducer · 2020-08-19T23:05:52Z

Could you explain how c_vec differs semantically from vec?

Off the cuff, rather than introduce a new tag, I think I would prefer realizing the existing vec tag the way you realize c_vec now in the target you're adding.

sv2518 · 2020-08-20T08:33:04Z

I had a look at where the code generation differs dependent on the tags.

Dependent on the tag (CVectoriseTag or VectorizeTag) sequential or vectorised loop code is generated here https://github.com/sv2518/loopy/blob/cda1d551c7b607915d9d6aaea9d4edef132d13a2/loopy/codegen/control.py#L123
The access info to arrays differs between the dim_tags (VectorArrayDimTag or CVectorArrayDimTag) here https://github.com/sv2518/loopy/blob/cda1d551c7b607915d9d6aaea9d4edef132d13a2/loopy/kernel/array.py#L1350
and here https://github.com/sv2518/loopy/blob/cda1d551c7b607915d9d6aaea9d4edef132d13a2/loopy/kernel/array.py#L441 the code generation differs as well between the dim_tags, but I don't quite understand yet what the function is doing.

I have asked @tj-sun to join the discussion, he is the owner of this code and can probably explain better why he decided to do it this way.

inducer · 2020-08-20T21:35:17Z

OIC, thanks for those links. In that case, I'm not sure how cvec (as an array axis tag) is different from a normal (strided) array axis. And I don't really understand what CVectorizeTag (as an iname tag) does. Could you explain?

tj-sun · 2020-08-20T22:13:47Z

Hi,

It's been a while since I last looked at this so I might end up saying something stupid. But the idea of c_vec is indeed an abstraction of vector extensions in C, i.e. to suppose the use cases such as:

typedef double double4 __attribute__ ((vector_size (32)));
double4 a, b, c;
// use as vector
c = a + b;
// but can also access individual elements
for (int i = 0; i < 4; i++)
  c[i] = a[i] + b[i];

I'm not sure if we can achieve this by specialising vec for CTarget, but I seems to recall that I tried and ran into some problems with vec.

The iname tag was introduced mostly because there are some special cases where such vectors can't be used, e.g.

c = sin(b);  // can't compile
c = 0.0;  // can't compile

Let me know what you think?

inducer · 2020-08-20T22:27:43Z

Thanks @tj-sun for your help! I'm still of the mind to address the problems you encountered (and allow a unified "vocabulary" across targets) rather than introduce a target-specific widget, at least since I can't think of a first-principles reason that would prevent this. To that end, do you have a vague sense of what the issues were on either end?

sv2518 · 2020-08-21T07:35:38Z

In my understanding you want to keep the target, you only request to use the same tag as for vectorisation on other targets, is that right?
I will go ahead and try to make this work. To me it looks like the new tags were only introduced to go into different code paths for vectorisation depending on the targets. I think we should be able to use the VectorizeTag, and make the code path simply conditional on the kernel target.
So for example for the snippet controlling the generation of the loop code here https://github.com/sv2518/loopy/blob/cda1d551c7b607915d9d6aaea9d4edef132d13a2/loopy/codegen/control.py#L123,
instead of the following

        if filter_iname_tags_by_type(tags, (UnrollTag, UnrolledIlpTag)):
            func = generate_unroll_loop
        elif filter_iname_tags_by_type(tags, VectorizeTag):
            func = generate_vectorize_loop
        elif filter_iname_tags_by_type(tags, CVectorizeTag):
            func = generate_sequential_loop_dim_code

we would do something like

        if filter_iname_tags_by_type(tags, (UnrollTag, UnrolledIlpTag)):
            func = generate_unroll_loop
        elif filter_iname_tags_by_type(tags, VectorizeTag):
            if isinstance(kernel.target, CVecTarget):
                 func = generate_sequential_loop_dim_code
            else:
                 func = generate_vectorize_loop

tj-sun · 2020-08-21T08:43:03Z

That sounds like a reasonable approach.

I remember that one thing to watch out is the codegen of indexing into a cvec here:
https://github.com/sv2518/loopy/blob/cda1d551c7b607915d9d6aaea9d4edef132d13a2/loopy/target/c/codegen/expression.py#L962

The index is put into the subscripts, and it should just print as a 1D array: a[0], I can't remember what happens when using vec in CTarget, but you probably need a slightly different mechanism as vec has a separate index field?

Also the iname tag c_vec is an IlpBaseTag and vec is an HardwareConcurrentTag, which seems more restrictive.

sv2518 · 2020-08-21T09:12:01Z

The codegen of the indexing there is working on the access information to the array if I see that right. I would simply condition on the target in get_acess_info as well, I don't think that is a problem.

@kaushikcfd / @inducer can you explain the difference between IlpBaseTag and HardwareConcurrentTag? Do you think it would be a problem to switch from IlpBaseTag to HardwareConcurrentTag for the iname tag on the CVectorTarget?

…uto'

…iname of interest

…get.

sv2518 · 2020-08-25T07:13:40Z

Okay, done that. We have got "unified vocabulary across targets" now by using the VectorizeTag also on CVecTarget. Let me know what you think.
More feedback welcome.

kaushikcfd · 2020-08-26T17:36:52Z

An even simpler implementation is possible by handling the vector extensions similar to OpenCLTarget's support for vector types (like floatn, doublen). I have demonstrated it in the c_vecextensions_target branch.

I am curious to know if there's any feature in the implementation here that cannot be expressed through the implementation in the c_vecextensions_target branch?

Thanks.

sv2518 · 2020-08-31T12:43:00Z

Okay, I am currently trying to understand the difference between our branches. Can I just ask a few questions?

The way the vector extensions are define in your branch is base_type __attribute__((vector_size(byte_count))) and you use that for every temporary declaration, we go over a typedef in the preabmble typedef base_type vector_type __attribute__ ((vector_size(byte_count))); I think going over the typedef would be a bit neater, or is there any disadvantage to that?

Just a little question, you keep track of the new types of declarations, but why do you do it twice? You set it as class attribute here already

loopy/loopy/target/c_vector_extensions.py

Line 67 in f81f3cd

setattr(vec, name, dtype)

but then you also save it into a dict that is also in saved in the class

loopy/loopy/target/c_vector_extensions.py

Line 69 in f81f3cd

vec.names_and_dtypes.append((name, dtype))

Your VectorExtensionASTBuilder does not have the functions emit_sequential_loop and generate_temporary_decl as ours does, former is used for the fallback option of simd pragmas, latter is used to declare global constants.

The former we definitely need, in case expressions can't be vectorised with help of the vector extensions. That's one of the big difference between our two branches.

And I think, instead of the latter, you made use of add_vector_access in the ast builder. The generated code for a declaration is different however, for a 10 by 1 vector the code in your branch produces

for (int32_t i = 0; i <= 9; ++i)
  {
    temp1[0] = x[4 * i];
    temp1[1] = x[1 + 4 * i];
    temp1[2] = x[2 + 4 * i];
    temp1[3] = x[3 + 4 * i];

}

In our branch this would look like

for (int i3 = 0; i3 <= 9; ++i3)
        temp1[i3] = vector_value;

where vector_value is of size 4, so that an access to the second element of the say third batch would be t2[1][2]. So our storage format is a) transposed to yours, plus in your case the access the array would be expressed in c as t2[2+4*1], is that right?

I think this difference in storage format is also why we need the ExpressionToCVecExpressionMapper and CVecExpressionToCodeMapper and you do not.

Would you be able to change the storage format and add the fallback OpenMP pragma option in your branch without ending up doing what we did?

Another little question: What's your arguments against using a DtypeRegistryWrapper for the vector types? Surely that it is nicer to encapsulate this in specialised DtypeRegisterWrapper, than just calling _register_vector_types?
Another major difference is that rather than generating_seq_loop_dim_code, you go over vectorise_loop here https://github.com/sv2518/loopy/blob/e2684a5f5062fb5f7bfb5c5f092ceb3092c98e5b/loopy/codegen/control.py#L125. I don't quite understand yet in which ways the code generation of these two differ, maybe someone else can comment on this, otherwise I will have a closer look

Besides that your branch does not contain the fallback option of OpenMP SIMD pragmas and the difference in storage format, another big difference is the preprocessing work we do in realize_c_vec before generating the code. Some expressions need some transformations, before the compiler wants to vectorise them. One example is, if you have a constant right hand side, we need to add a vector valued 0 to the expression.

sv2518 · 2020-08-31T12:50:18Z

And another thing is, that how many data of one type you can fit into one vector depends on the vector extensions available on your machine. We pass this information through from PyOP2, I think you don't consider this in your branch.

kaushikcfd · 2020-08-31T17:24:37Z

Thanks for taking a look into the alternative implementation!

I think going over the typedef would be a bit neater, or is there any disadvantage to that?

True. That can be implemented by overriding the preamble generator of the target's AST builder and maintaining the seen_dtypes.

but then you also save it into a dict that is also in saved in the class..

Yep, that seems unnecessary and should be removed. I copied it over from loopy/target/opencl.py and didn't notice it.

Your VectorExtensionASTBuilder does not have the functions emit_sequential_loop and generate_temporary_decl as ours does, former is used for the fallback option of simd pragmas, latter is used to declare global constants.

In such cases, I would prefer if the user realizes that the instruction cannot be vectorized and tags the iname with the fallback tag. This way the code generation process is more transparent to the user.

The generated code for a declaration is different however, for a 10 by 1 vector the code in your branch produces

I think the indexing you provided is incorrect, in the first kernel it should be temp1[4*i3+1]
The assembly of both the kernels would be similar i.e. would have vmov operations.

How many data of one type you can fit into one vector depends on the vector extensions available on your machine. We pass this information through from PyOP2, I think you don't consider this in your branch.

This isn't needed, as that's inferred from the iname bounds of the loop being vectorized. This freedom is also allowed in the underlying target as one can have types of double __attribute__((vector_size(16))) on processors with AVX support.

Some expressions need some transformations, before the compiler wants to vectorise them. One example is, if you have a constant right hand side, we need to add a vector valued 0 to the expression.

In this case the loop responsible for zero-ing the array should be tagged with unr and not vec and the compiler vectorizes it. I have compared the 2 approaches here.

Would you be able to change the storage format and add the fallback OpenMP pragma option in your branch without ending up doing what we did?

I think before answering this, we need to figure out if we intend to using pragma omp_simd as the fallback for vec tags implicitly. I would lean towards "no", but others can voice their opinions.

sv2518 · 2020-09-01T08:45:19Z

True. That can be implemented by overriding the preamble generator of the target's AST builder and maintaining the seen_dtypes.

Okay, thanks! I will try to rewrite our preamble generation with consideration of your fix.

This isn't needed, as that's inferred from the iname bounds of the loop being vectorized. This freedom is also allowed in the underlying target as one can have types of double attribute((vector_size(16))) on processors with AVX support.

Aha, okay! I will try to clean to this up a little then.

In such cases, I would prefer if the user realizes that the instruction cannot be vectorized and tags the iname with the fallback tag. This way the code generation process is more transparent to the user.

Okay, since PyOP2 is basically the user from your perspective, we would need to collect the inames that need to be retagged, and provide feedback about it. How would you like to do that? loopy.generate_code_v2 is our trigger for the loopy process. Maybe I can collect all the inames here https://github.com/sv2518/loopy/blob/e2684a5f5062fb5f7bfb5c5f092ceb3092c98e5b/loopy/preprocess.py#L2195 and then throw an Error, so that in PyOP2 we can say

try:
     loopy.generate_code_v2(...)
catch(CVecError):
     retag_from_vec_to_simd(kernel, to_be_retagged_inames)
     loopy.generate_code_v2(...)

I don't think this is ideal however, it means we need to run twice through your pipeline, at least partially.

Maybe we can port the check, if the expressions are vectorisable with the vector extensions or or simd pragmas into a function on it's own, say def check_vectorisability(...) and invoke that before we call loopy.generate_code_v2(...).

Are there other options?

I think the indexing you provided is incorrect, in the first kernel it should be temp1[4*i3+1]

I actually copied the first code snippet from the kernel in the print on your branch. The full kernel produced by your branch is

#include <stdint.h>
void loopy_kernel(double const *__restrict__ x, double *__restrict__ y)
{
  double __attribute__((vector_size(32))) temp1;
  double __attribute__((vector_size(32))) temp2;

  for (int32_t i = 0; i <= 9; ++i)
  {
    temp1[0] = x[4 * i];
    temp1[1] = x[1 + 4 * i];
    temp1[2] = x[2 + 4 * i];
    temp1[3] = x[3 + 4 * i];
    temp2 = 2.0 * temp1 + 1.0;
    y[4 * i] = temp2[0];
    y[1 + 4 * i] = temp2[1];
    y[2 + 4 * i] = temp2[2];
    y[3 + 4 * i] = temp2[3];
  }
}

That looks to me like your storage format is temp1[batch_number+batch_size*element_number], our storage format is temp1[element_number][batch_number], so we don't use striding in this case (We would use striding in the first dimension, if the considered data were in matrix form). I think this is important because the code generation in map_subscript differs see here https://github.com/sv2518/loopy/blob/e2684a5f5062fb5f7bfb5c5f092ceb3092c98e5b/loopy/target/c/codegen/expression.py#L962 and here https://github.com/sv2518/loopy/blob/e2684a5f5062fb5f7bfb5c5f092ceb3092c98e5b/loopy/target/c/codegen/expression.py#L1009

In this case the loop responsible for zero-ing the array should be tagged with unr and not vec and the compiler vectorizes it. I have compared the 2 approaches here.

I had a look at that and it is slightly different to what I meant to say. I meant to say, if you have an expression vector_valued_lhs=vector_valued_constant; then in the preprocessing we transform that to vector_valued_lhs=vector_valued_constant+vector_valued_zero; Maybe it's the same story, however, do you think vector_valued_lhs=vector_valued_constant; would be vectorised by the compiler, if retagged to unr? @tj-sun did you try this at the time you wrote the code?

Thanks for your reviews so far! I hope we can sort this out rather quickly, so I can continue with my own work on batched blas. Let me know, what you think about the ideas I had in regards to the OpenMP SIMD pragmas. I will also chat to Firedrake people about moving the retagging a level up to be controlled by PyOP2.

kaushikcfd · 2020-09-01T09:37:45Z

(about the implementation in PyOP2)

In PyOP2 the steps that would be needed (not very different from what's already implemented):

Split n into n_batch depending on the underlying arch.
Privatize the temporaries in n_batch loop
- Tag the privatized array axes of the temporaries with vec dim tag
Tag n_batch with vec iname tag

In this case loopy would emit a warning saying that the gather/scatter instructions of the PyOP2 kernel weren't vectorized and would unroll n_batch for those instructions. (@tj-sun: Could you please confirm if this would capture all the desired vectorization opportunities?)

(about the discrepancy in the storage format)

temp is not a multi-dimensional array in the provided example. Trying an example with multi-dimensional array realized as a vector type:

knl = lp.make_kernel(
        "{[i, j1, j2, j3, k]: 0<=i<10 and 0<=j1,j2,j3<4 and 0<=k<2}",
        """
        <> temp1[k, j1] = x[i, k, j1]
        <> temp2[k, j2] = 2*temp1[k, j2] + 1 {inames=i:j2}
        y[i, k, j3] = temp2[k, j3]
        """,
        [lp.GlobalArg('x, y', shape=lp.auto, dtype=float)],
        seq_dependencies=True,
        lang_version=(2018, 2),
        target=lp.CVectorExtensionsTarget())

knl = lp.tag_inames(knl, 'j2:vec, j1:ilp, j3:ilp')
knl = lp.tag_array_axes(knl, 'temp1,temp2', 'c,vec')

Generated code:

void loopy_kernel(double const *__restrict__ x, double *__restrict__ y)
{
  double __attribute__((vector_size(32))) temp1[2];
  double __attribute__((vector_size(32))) temp2[2];

  for (int k = 0; k <= 1; ++k)
    for (int i = 0; i <= 9; ++i)
    {
      (temp1[k])[0] = x[8 * i + 4 * k];
      (temp1[k])[1] = x[1 + 8 * i + 4 * k];
      (temp1[k])[2] = x[2 + 8 * i + 4 * k];
      (temp1[k])[3] = x[3 + 8 * i + 4 * k];
      temp2[k] = 2.0 * temp1[k] + 1.0;
      y[8 * i + 4 * k] = (temp2[k])[0];
      y[1 + 8 * i + 4 * k] = (temp2[k])[1];
      y[2 + 8 * i + 4 * k] = (temp2[k])[2];
      y[3 + 8 * i + 4 * k] = (temp2[k])[3];
    }
}

sv2518 · 2020-09-01T10:56:41Z

In PyOP2 the steps that would be needed (not very different from what's already implemented):

Split n into n_batch depending on the underlying arch.
Privatize the temporaries in n_batch loop
Tag the privatized array axes of the temporaries with vec dim tag
Tag n_batch with vec iname tag
In this case loopy would emit a warning saying that the gather/scatter instructions of the PyOP2 kernel weren't vectorized and would unroll n_batch for those instructions. (@tj-sun: Could you please confirm if this would capture all the desired vectorization opportunities?)

I think I don't understand this completely. I have two questions:

In my understanding you want to move even more than the retagging from VectoriseTag to OpenMPSIMDTag to PyOP2? The kernel passed to loopy from PyOP2 is actually a program. Going through your pipeline has the advantage that we end up in preprocess_single_kernel and can do our transformations there (on a single kernel), rather than walking through the kernels in the program ourselves.
Does your proposal imply that we would drop the fallback option of OpenMPSIMD pragmas completely?

sv2518 · 2020-09-01T15:44:40Z

Okay maybe I have got something working that fits to what you said. I outsourced the checks, if vectorizable with extensions, OpenMP pragmas or unrolling, into check_cvec_vectorizability(...). It gives back information about which instruction and iname pairs belong to which of the three categories. These information are then passed to cvec_retag_and_privatize, which is responsible for all the substitutions. The according loopy branch is here sv2518/loopy@cvec-minus-firedrake...sv2518:cvec-restructure-checks-minus-firedrake, how the new functions are called from PyOP2 you can find here OP2/PyOP2@vectorisation...vectorisation-restructure-checks.

The only drawback is that I have to call realize_ilp by hand before vectorising. Maybe we would also need to realise the reductions and so forth by hand. Previously this was happening by going through the standard preprocessing pipeline of loopy.

Let me know if this fits to what you had in mind. At least this exposes more of what is happening to the user: checking if vectorisable, retagging and privatising.

kaushikcfd · 2020-09-04T03:32:30Z

Does your proposal imply that we would drop the fallback option of OpenMPSIMD pragmas completely?

I agree omp_simd tags are nice to have as a fallback. New proposal: have the class CVectorExtensionsTarget take an input for the 'vec_fallback_tag' (defauled to unr which is the current implementation).

In my understanding you want to move even more than the retagging from VectoriseTag to OpenMPSIMDTag to PyOP2?

Because this would lead to a complicated user script, I think there is a merit in having a fallback tag. (see above). So, in PyOP2 if we just have n_batch: vec as the iname tags, this should be fine.

how the new functions are called from PyOP2 you can find here OP2/PyOP2@vectorisation...vectorisation-restructure-checks.

The interface is still a bit complicated than I was imagining. I was hoping on the PyOP2 end we do no more than spliting iname -> privatizing -> tagging arrays axes, tagging inames.

So the changes that would be needed on the loopy end would be to generalize the vectorization end of codegen so that it does not always fallback to unrolling.

A simple test case would be something on the lines of:

import loopy as lp

knl = lp.make_kernel(
        "{[i]: 0<=i<4}",
        """
        <> x[i] = 1.0f
        <> y[i] = 2.0f
        <> z[i] = x[i] + y[i]
        out[i] = z[i]
        """, lang_version=(2018, 2),
        target=lp.CVectorExtensionsTarget(vec_fallback_tag='omp_simd'))

knl = lp.tag_inames(knl, 'i:vec')
knl = lp.tag_array_axes(knl, 'x, y, z', 'vec')

print(lp.generate_code_v2(knl).device_code())

which would generate the following code:

typedef float v4f __attribute__((vector_size(16)));

void loopy_kernel(float* __restrict__ out)
{
  v4f x;
  v4f y;
  v4f z;
  #pragma omp simd
  for (int i=0; i<4; i++)
  {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }
  z = x + y; 
  #pragma omp simd
  for (int i=0; i<4; i++)
    out[i] = z[i];
}

tj-sun and others added 15 commits Jan 18, 2019

update c vector target

c55aa9e

zero declaration

e573b83

do not vectorize reduction over globals

d27bde0

check for conditionals

2122ced

check for comparisions

0fc9073

add more math functions

ce42314

add omp simd vectorization mode

5641b46

Codegen: intermediate fix for loops only containing a Noop.

3b6ab9c

Complex: register complex types in CVecTarget.

543e3a9

Codegen of preamble: Generate CVecTarget specific preamble to define …

7fdf7c9

…vector types.

Codegen for zero vector temporaries:

573ec0b

Don't make code generation dependent on the variable name of the zero temporary in various places. Generate temporary inside cvec codegeneration. Generate one zero vec per type for constant assignment to vector types.

Codegen: Maintain iteration order in slab decompositon.

9d9d7ad

Fix assertion: check that either min or max of lower bounds of ilp in…

32a0233

…ames is zero.

Check: Handle data dependent parallel loops with warning.

80f5173

Drop omp target pure realisation only.

Loading status checks…

06b0a1d

sv2518 mentioned this pull request Jul 23, 2020

Vectorisation #129

Closed

sv2518 reviewed Jul 23, 2020

View changes

Don't set target axis for VectorArrayDimTag.

8837488

sv2518 reviewed Aug 17, 2020

View changes

kaushikcfd and others added 9 commits Aug 12, 2020

make flake8 happy

62412eb

leftovers from cherrypicking

e926d38

Statistics: Handle the case of an axis being tagged with stride 'lp.a…

883f1a8

…uto'

Mem access stride finding: handle Array.axis_tags is None

61c04eb

Mem access stride finding: only extract coefficients with respect to …

8961894

…iname of interest

PyOpenCLExecutionWrapperGenerator: retain and write to numpy inputs

46ae1ff

appease flake8

835c691

Throw error for invalid type casting.

6bd99d8

Update doctest for splitting inames.

c5a4437

sv2518 force-pushed the sv2518:cvec-minus-firedrake branch 2 times, most recently from 496278f to 1fe6a7f Aug 23, 2020

sv2518 added 4 commits Aug 21, 2020

Expunge CVectorArrayDimTag.:

0874269

Expunge VectorizeTag.

9f60b2d

Supress functionality of HardwareConcurrentTag when target is CVecTar…

6971b0f

…get.

Get functionaly of IlpBaseTag for the VectorizeTag on CVecTargets.

502bb5e

sv2518 force-pushed the sv2518:cvec-minus-firedrake branch from 1fe6a7f to a28f62a Aug 23, 2020

Flake8.

Loading status checks…

e2684a5

sv2518 force-pushed the sv2518:cvec-minus-firedrake branch from a28f62a to e2684a5 Aug 23, 2020

kaushikcfd mentioned this pull request Aug 31, 2020

Extract seen_dtypes from ImplementedDataInfo #139

Merged

Dec	JAN	Feb
	20
2020	2021	2022

inducer / loopy

Vectorisation #130

Vectorisation #130

sv2518 commented Jul 23, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

kaushikcfd commented Aug 17, 2020

sv2518 commented Aug 19, 2020

inducer commented Aug 19, 2020

sv2518 commented Aug 20, 2020

inducer commented Aug 20, 2020

tj-sun commented Aug 20, 2020

inducer commented Aug 20, 2020

sv2518 commented Aug 21, 2020

tj-sun commented Aug 21, 2020

sv2518 commented Aug 21, 2020 •

edited

sv2518 commented Aug 25, 2020

kaushikcfd commented Aug 26, 2020 •

edited

sv2518 commented Aug 31, 2020 •

edited

sv2518 commented Aug 31, 2020

kaushikcfd commented Aug 31, 2020

sv2518 commented Sep 1, 2020

kaushikcfd commented Sep 1, 2020 •

edited

sv2518 commented Sep 1, 2020

sv2518 commented Sep 1, 2020

kaushikcfd commented Sep 4, 2020

inducer / loopy

Vectorisation #130

Vectorisation #130

Conversation

sv2518 commented Jul 23, 2020 • edited

This comment has been minimized.

sv2518 Jul 23, 2020 Author Contributor

This comment has been minimized.

sv2518 Jul 23, 2020 Author Contributor

This comment has been minimized.

sv2518 Jul 23, 2020 Author Contributor

This comment has been minimized.

sv2518 Aug 17, 2020 Author Contributor

This comment has been minimized.

sv2518 Aug 17, 2020 Author Contributor

kaushikcfd commented Aug 17, 2020

sv2518 commented Aug 19, 2020

inducer commented Aug 19, 2020

sv2518 commented Aug 20, 2020

inducer commented Aug 20, 2020

tj-sun commented Aug 20, 2020

inducer commented Aug 20, 2020

sv2518 commented Aug 21, 2020

tj-sun commented Aug 21, 2020

sv2518 commented Aug 21, 2020 • edited

sv2518 commented Aug 25, 2020

kaushikcfd commented Aug 26, 2020 • edited

sv2518 commented Aug 31, 2020 • edited

sv2518 commented Aug 31, 2020

kaushikcfd commented Aug 31, 2020

sv2518 commented Sep 1, 2020

kaushikcfd commented Sep 1, 2020 • edited

sv2518 commented Sep 1, 2020

sv2518 commented Sep 1, 2020

kaushikcfd commented Sep 4, 2020

sv2518 commented Jul 23, 2020 •

edited

sv2518 Jul 23, 2020
Author Contributor

sv2518 Jul 23, 2020
Author Contributor

sv2518 Jul 23, 2020
Author Contributor

sv2518 Aug 17, 2020
Author Contributor

sv2518 Aug 17, 2020
Author Contributor

sv2518 commented Aug 21, 2020 •

edited

kaushikcfd commented Aug 26, 2020 •

edited

sv2518 commented Aug 31, 2020 •

edited

kaushikcfd commented Sep 1, 2020 •

edited