We had a great launch last week and Higgs Hunters was off to a flying start. After a day or two though, we realised that something wasn’t right. Over the weekend, you all began seeing way too many simulations. That should start to fix itself now. Here’s what happened, and how we want to move forward:
On many Zooniverse projects, there are simulated data mixed in with the real data. We do this to help calibrate the project as a whole, and, in the case of Higgs Hunters, to see what sort of events can easily be seen in this data, and what can’t. It became apparent over the weekend that the balance of real and simulated data on Higgs Hunters was wrong. Several weeks ago when the ATLAS team delivered the Higgs Hunters data to the Zooniverse, a piece of metadata was missing and the developers at Zooniverse thought that a whole batch of simulated data was actual data. The entire dataset was uploaded and the project launched as planned.
We realised a day or two after launch that we weren’t telling volunteers when they had seen a simulation. This is important to know — especially if you think you found something really cool — and so we moved quickly to get the sims flagged as such after each classification. However, as many of you noticed, this fixed made it apparent that, in fact, 8/10 images were sims! Fixing the one mistake brought the other to the surface — and after a lot of searching we figured out what had happened.
Today we have done the following:
- We have paused 90% of the simulated data — so you won’t be asked to classify it now.
- We are aggressively retiring the simulations — so they’ll be removed 4x faster than the real data.
- ATLAS are preparing more data for Higgs Hunters — 10 times as much!
- Hopefully we can upload the new data before the end of 2014, and we’ll email participants to let them know.
After all of this, we now have ~8k subjects in Higgs Hunters and ~30% are sims — that means we’ll be done in ~100k classifications. So let’s call this Round One and see where it takes us. In the meantime, we thank everyone for their efforts on Higgs Hunters, and we apologise for all the confusion and frustration that this has caused some users.
We’re really sorry, and hope that this helps to explain some of the problems that have occurred and the ways in which we’re moving forward.
I find it amazing that you consider humans to be better than computers in this task. If it is possible to get the coordinates of the lines, it should be trivial matter to a) filter out lines that are irrelevant, b) locate and calculate branches c) get rid of those circles … but assuming this can’t be done, for the humans, could you provide:
– Images without background
– Images where you can show and hide lines
Also, it might be better idea to try to identify each line as something. E.g. allow users to select and delete the lines that should be ignored. And store this data and remove those irrelevant lines from the images you show to following users.
Also consider releasing the numerical data, explain the format and ask programmers to write an algorithm that will do something for the data. E.g. filter out the irrelevant lines, filter out images that having nothing interesting or find branches.
But keep up the current project also. You are more likely to succeed if you try many different alternatives.
Thanks for the thorough explanation.
In other Zooinverse projects with sims – e.g. Andromeda Project (AP), SpaceWarps – the Science Teams wrote blog posts (etc) explaining, in some detail, just what the sims were being used to calibrate. For example, in AP the thresholds for zooites (collectively) to identify clusters of a given mass and concentration.
Is the Science Team planning to write a blog post or two on what – in some detail – the sims will be used to calibrate in Higgs Hunters (HH)?