Aksenov239's blog

By Aksenov239, history, 7 weeks ago, In English

Hello everybody!

After a long break I would like to announce the fourth Bioinformatics contest organized by ITMO University, Bioinformatics Institute, Rosalind and Stepik. The contest is sponsored by JetBrains, Yandex, Serokell, and Genotek.

The contest is hosted on Stepik platform. To participate you have to sign the rules.

The contest consists of two rounds:

The contest is prepared by the following team: Alexey Sergushichev, Nikita Alexeev, Nebuchadnezzar, demon1999, tourist, cdkrot, GShark, doreshnikov (all ITMO University), German Demidov (Center for Genomic Regulation), Andrey Prjibelski (CAB SPbU).

Are there any prizes?

  • 1st place – Whole Genome Sequencing
  • 2nd and 3rd – Whole Exome Sequencing
  • 4th and 5th – 23andMe or Genotek DNA Service
  • 1st–30th – Honorable Bioinformatics T-Shirt

Do I need to know biology?

The knowledge of biology is not necessary to solve the problems!

We wish that you will like the problems!

Good luck!

 
 
 
 
  • Vote: I like it
  • +254
  • Vote: I do not like it

»
7 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

What is the requirement for being qualified to the final round? It's not stated how maby participants advance, and only 'score the right number of points' is mentioned on the website :(

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    The only requirement is 'scoring the right number of points', it is stated clearly :)

    Everybody who will get at least 952 points (including one point from the "Welcome!" step with the Honor Code) will advance to the Final Round.

»
7 weeks ago, # |
  Vote: I like it +1 Vote: I do not like it

Daniil Oreshnikov is doreshnikov, you could have made him a colorful link too. Now it looks like he isn't registered on Codeforces.

»
7 weeks ago, # |
Rev. 2   Vote: I like it +157 Vote: I do not like it

Is it ethical to sequence the genome of one of the greatest programmers of our time? What if a bad agent gets access to it, creating a million clones, thereby causing rating deflation?

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it -117 Vote: I do not like it

    It's not ethical to clone a human (as unethical as a UN-level war crime). Your question is still valid btw, concerning privacy of the data but I assume you meant it as a joke?

    And identical twins have exactly the same DNA but one is typically dumber. So, if they made 10000 clones of the greatest programmer, they may get a "sheep" 9999x

  • »
    »
    7 weeks ago, # ^ |
    Rev. 2   Vote: I like it -101 Vote: I do not like it

    Ok why the downvotes?

    What 1-gon said is COMPLETELY dumb. Your DNA is "legally" flying around, everything you touch, your hair, saliva, typically anything. If a bad agent needs your DNA, they already have it.

    So why the downvotes? cz unrated? Elitist/racist ruskis again?

    • »
      »
      »
      7 weeks ago, # ^ |
        Vote: I like it +69 Vote: I do not like it

      Because his comment was a joke and you are treating as it was a serious inference, genius.

      • »
        »
        »
        »
        7 weeks ago, # ^ |
          Vote: I like it -22 Vote: I do not like it

        Beg to differ, only joke implied is the rating deflation and I CLEARLY stated that in my OP. And if you clone a tourist 10,000x it's very likely you get a "sheep" like me 10,000x, if IQ: phenome.

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it -19 Vote: I do not like it

    ka joke maare ho bey! maza aa gavo. badiya joke hegi..

»
7 weeks ago, # |
  Vote: I like it +8 Vote: I do not like it

Welcome back! You were missed, even if my CPU is dreading what cursed implementations I come up with.

»
7 weeks ago, # |
  Vote: I like it +5 Vote: I do not like it

Hope Benq(or someone else) can share us something about his genome after the contest:-)

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    By the way, does "Whole Genome Sequencing" really mean the service?

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it -53 Vote: I do not like it

    I don't think it'll be nothing special. What we really need is the brain sample :)

    However, if Benq or any toprated guy/girl has special mutations that qualify them as unfairly advantaged, he/she could be banned from CP if CP is an Olympic sport.

»
7 weeks ago, # |
  Vote: I like it +1 Vote: I do not like it

One of it's kind!

»
7 weeks ago, # |
  Vote: I like it +9 Vote: I do not like it

The link for time and date says "19 June", but leads to "19 July"

»
7 weeks ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I will try to get more points in 19th july

»
7 weeks ago, # |
Rev. 2   Vote: I like it -9 Vote: I do not like it

Hope! series of negative deltas will end(If it's rated for me).

»
7 weeks ago, # |
  Vote: I like it -7 Vote: I do not like it

Contest prepared by tourist

»
7 weeks ago, # |
Rev. 2   Vote: I like it -42 Vote: I do not like it

is this contest rated? if so, how do you know which user is which on the other platform? edit 1: so, let me get things clear, in this website people downvote posts where there's a genuine question and mods do nothing to prevent this?

»
7 weeks ago, # |
  Vote: I like it +8 Vote: I do not like it

t is a substring of s if t is contained as a contiguous collection of symbols in contiguous collection of symbols in s

Is this a non-standard definition of a substring? It's very confusing.

»
6 weeks ago, # |
Rev. 3   Vote: I like it +36 Vote: I do not like it

Could anyone tell me how to solve Diagnosis without running 2.5 hours for all the tests?

I only have solultions with time complexity $$$O(m(n+q))$$$ or $$$O((\sum cq_i)(\sum cm_i))$$$ or $$$O(m(\sum cq_i)+q(\sum cm_i))$$$.

  • »
    »
    6 weeks ago, # ^ |
    Rev. 2   Vote: I like it +29 Vote: I do not like it

    I solved it by realizing that several test-cases are weak and having a certain pattern e.g.:

    • There is one input file with $$$|Q_p|=1$$$ and $$$|D_m|=1$$$. I solved it using a linear $$$O(n+m+q)$$$ solution to associate every vertex with a disease.

    • On another input file, the vertices for each patient are a subset of a certain disease, i.e. $$$Q_p \subseteq D_m$$$. Hence for each patient, I only need to find a disease that is a superset of patient vertices. With a decent filtering trick, my code ran for less than 1 minute for this test-case.

    • On another input file, I need to draw a portion of the tree to realize that the vertices that belong to the same disease or patient is not randomly placed within the tree, e.g. there is a certain pattern on it. The solution for this test-case becomes pretty trivial.

    The fact that the constraint for each test-case isn't explicitly stated made me think that the clue is hidden within the test-case itself.

    • »
      »
      »
      6 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How did you solve tests 6 and 7? Are those the last two cases you described?

      • »
        »
        »
        »
        6 weeks ago, # ^ |
          Vote: I like it +11 Vote: I do not like it

        test #7 is $$$Q_p \subseteq D_m$$$ the vertices for each patient are a subset of a certain disease

        test #6 is the one with the funny pattern (I need to draw a portion of the tree on a paper).

        • »
          »
          »
          »
          »
          6 weeks ago, # ^ |
            Vote: I like it +11 Vote: I do not like it

          Another way to solve test 6 is to notice that depths are $$$\leq 5$$$(let's call this number $$$d$$$) and the sizes are $$$\leq 3$$$(let's call this number $$$s$$$). Now, for each (disease, patient) pair the solution is the maximum sum of nodes such that the $$$i$$$-th node is a parent of some disease's node and the $$$i$$$-th patient's node. For each disease, the number of possible parents is $$$s \cdot d$$$, so we can just precompute for each disease all possible ways its parents can contribute to the solution; there are up to $$$(s \cdot d)^s$$$ ways. Then, for each patient, try all $$$d^s$$$ ways to choose the parents and see if there is a disease which can also have that way of choosing parents.

    • »
      »
      »
      6 weeks ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Thanks!

      Maybe I am not good at finding the patterns of tests...

      • »
        »
        »
        »
        6 weeks ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        I think that (is this task and competetion) finding patterns + thinking about + impelemnting multiple solutions per test takes not less time than running $$$O(m (n + q))$$$. I'd better spent this time on actualizing template from old GCJ to run output only tasks in multiple threads (but I had spent it on optimizing running time from 2h+ to 1h...)

»
5 weeks ago, # |
Rev. 3   Vote: I like it +13 Vote: I do not like it

Problem 2 was B(inary)S(earch)

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Assuming that problem 2 can't be solved without doing a lot number of submission, yes I hate the problem. Even worse that the allocated score for that problem is quite big (2100 IIRC). RIP my carpal tunnel.

»
5 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Thanks a lot for challenge, I really enjoyed it (especially binary search on Stepik on B).

1). What was the optimal strategy for problem A?

2). OK, I received some points on problem D. But I don't understand, how was counting the rate of minor haplotype? Was some read counted if it differed with minor haplotype by less than k polymorphisms?

»
5 weeks ago, # |
  Vote: I like it +19 Vote: I do not like it

When you got 299.97 of 300 points trying to reconstruct the minor haplotype in Problem 4: