Feel free to ask and discuss anything about the challenge !
105 thoughts on “Challenge discussion”
Leave a Reply
You must be logged in to post a comment.
Feel free to ask and discuss anything about the challenge !
You must be logged in to post a comment.
Hi all, I am little confused with the problem itself and the datasets.
What is the context free problem to solve?
What is the context dependent problem to solve?
I mean, should I try to learn a good policy to recommend articles ? or clicks are at random and I need to learn just only the right balance between exploration and exploitation?
Thanks,
jamh.
Imagine you have a visit on your website and you can choose to highlight one of your articles. The user is described (136 features). You are rewarded if the article you choose is clicked.
Newest articles tends to perform better (that’s why “always last” performs better than random).
Some articles tends to perform better because they are better (that’s why UCB like algorithms without having any usage of user description also performs better than random).
I guess one could combine the two first points in a clever way to have quite good performance without using the user description.
Some articles performs better on some kinds of users (context dependency). That’s the hardest part.
Hi Jeremie, thanks for your answer. good hints.
But I have more questions.
As I understand, I am not rewarded for selecting an article that the user has clicked but instead that is on the dataset. In this sense what I am learning?
And then, only If I select the same article as in the data (who selected this article?) then I can learn if a user has clicked or not on that article.
What is the connection between the selection of the articles at the presentation and the user click?
Thanks a lot,
jamh
The complete answer is in this papers: http://www.research.rutgers.edu/~lihong/pub/Li11Unbiased.pdf
http://hunch.net/~jl/projects/interactive/scavenging/scavenging.pdf
The short one is: do not care about the dataset, consider the problem :
- you see a user (described) and you have a list of possible articles to show,
- you choose to display an article of the list,
- you receive your reward (0 or 1).
Of course you want to maximize your probability of click (which is the score).
Sometimes you do not receive the reward but in that case just consider the user never came to your website.
Ok let me try:
* I should always try to select the right article because if just by chance this is the chosen one (with 1/30 uniform probability?) then the probability to be rewarded is maximized?
However I think this does explain only the beginning. Is that right?
Thanks,
jamh
I’m not sure to understand the part about probability maximization, but I could agree
If your policy is to always choose the last element of the list, this basically means your are always choosing the most recent article.
If by chance it was also chosen by the sample collection strategy (which is random uniform) then you will score the corresponding 0 or 1. If not this round of evaluation is discarded.
Hi,
I’ve just updated the post concerning the evaluation process. All the details I could think of are there. Hope it helps
Thanks a lot!
how can differentiate individual users? if for instance, i dont want to recommend an article that the user has already clicked. can i assume that the 136 bit context description is unique per user? or if i choose an article again for the user, will the evaluation count it as a click because at some point in the history, this same user did click on the article.
We do not have this information… It may be one of 136 boolean features but we do not know which one (and this is interesting to discover it automatically).
In all cases it won’t count as as click if selected article have been clicked before in the history.
thanks.
I think participants should read this paper :
http://dl.acm.org/citation.cfm?doid=1772690.1772758
Ok. As I see it now. There are two pontis of view:
1) I can forget about click at all! I just only need to learn the pseudorandom distribution of the selected articles and these will give me optimal returns.
2) I can forget about selection at all! I just only need to learn the right articles (the users clicked on) so when by random choice (1/30) the article matchs the dataset I maximize the probability to get a 1 reward and so maximize return.
What do you think?
1) No. Even if you guess the peudorandom (good luck) and always choose the article, your reward will be the average value of click (ie 0.0366 so a score of 366)
2) Yes and no. If all users have the same behavior (ie context/description of users is not correlated to click probability) then it’s true. There is strong evidence for this assumption to be false (read http://dl.acm.org/citation.cfm?doid=1772690.1772758 ). Selecting the right article is the object of this challenge.
as we don’t know anything other than ID about the article , context of users seems useless to me. So I didn’t use context information in my approach ,actually my approach works well.
Dear Jeremie, sorry for my emails, I think I am approaching to understand the point, but please be patient with me
Reading the evaluation text of the challenge it says that:
evaluation = cr / hr,
where:
cr: is the number of times you got a click (click-rate)
hr: is the number of times you chose the same ad as in the data hit-rate.
So, lets consider the four possiblities (2×2 combiations of events):
event A: you chose the same ad as in the data: match 1 nomatch 0
event B: after A; you got a click 1 or not 0
A0 and B*: nothing happends to your return (0) since evaluation does not change
A1B0: penalty since hr increases but cr remains constant
A1B1: reward? cr increase by 1 but also hr, so?
So, in principle you should try to avoid A1B0 event combinations, that is, avoid making bad recommendations.
However, is this not perhaps a biased system in favor of avoiding bad recomendations against good ones?
Right. Avoiding making bad recommandations would be cool. But how to do it ?
In the dataset the choice of the article is independant of the user: this is a random uniform policy.
You can do it trying to find the best possible association between user and probability of click: that’s fine that’s our goal.
You can also try to do it by trying to not chose the same article each time you see a bad user (well in fact you need to be able to recognize a bad user or to have a first initial click and then alway play something else than what is logged). In that case you need to be able to find the seeds of the random number generators. You don’t know the generative method (probably close to a Mersene Twister), you dont know what is the chosen action (only if it matches your choice). You are not allowed to log in order to identify it over several evalautions…
Seed identification doesn’t sound reasonable… and this is not the goal. Moreover final evaluation will be with a different dataset, so it is useless to guess the seeds on the first datas.
To score a maximum it would be simpler to scan the memory to found the dataset and read the right answers (and it’s explicitly forbidden)…
Hi Jeremie, thank you again for your great clarifications!
I think I now understand ( hope so )
Unless I have an Oracle that tell me if an article will match the logging policy, I should select the article with higher probability to be clicked. So I have to learn such probabilities using the available information, i.e., clicks-feedback and visitor features.
However, I am an Oracle certified professional, so, expect the unexpected! @_@
Great
Unexpected is my graal.
Any guidelines for the acceptable number of simultaneous submissions? So far I have been assuming that 3-5 is OK but that more than 10 is bad.
Unfortunately most of the algorithms I can think of for this problem have at least one free parameter, and (by design) there is no real offline data.
Thanks,
Ed
3-5 seems fair. In fact the “real rule” is that i dedicated 16 cores on the cluster for evaluations and I’d want that most of the jobs do not stay “pending” more than 30 minutes. If it’s happen I’ll have to limit the number of submission.
What does it mean when a submission is marked incomplete and there is nothing more in the error/logs file?
I would like to know if the problem was due to
a) timing problem
b) memory usage
c) program ends before input is complete
It can be a) : the process is stopped with no error message)
It cannot be b) : it would lead to an error
It can be c) : if your program exits, no error message is reported
Option d) is you modified the log process and you added something at the end of the log file
I have the following error from time to time.
Exception in thread “main” java.lang.NoClassDefFoundError: myPolicy/MyPolicy
at exploChallenge.MainCluster.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException: myPolicy.MyPolicy
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
… 1 more
I am wondering is an issue on the evaluation clusters or is it my code?
In that case first thing to check is your java version : it must be 1.6
Edit : In that case it was linked to a disk management problem on cluster. Contact me if this happens again, I’ll do my best to fix.
Is it possible to add an option to add a private comment to a submission so we can tag the submission with some metadata abt the algorithm and the parameter values we used etc. just some text that can be associated with a particular submission and is visible only to the contestant who submitted?
Humm I’ll think about it but with the current version this is not straightforward to implement this feature (but not impossible). Some participants already do that kind of stuff with the name of their submission (eg submission_algoname_parameters.jar ). Of course if you have 20 parameters, it’s not a very suitable solution.
Edit : When/where do u want to be able to add theses comments? At submission time?
preferably at submission time – yes. of course i can add these comments offline in a document and note the performance separately after the evaluation finishes. so it is not a critical requirement.
I made something
Test it and give me your feedback
wow! that was quick. thanks. will try out for my next submissions. my current two submissions are going to breach the timelimit i guess
Thanks a lot! This feature really helps.
Hi –
Below is how I did this. Since its easy to get back the *.jar file from various runs; you can download the jar you want, and run a main() method you bury in MyPolicy: java -cp good.jar myPolicy/MyPolicy
package myPolicy;
import java.util.List;
import exploChallenge.logs.yahoo.YahooArticle;
import exploChallenge.logs.yahoo.YahooVisitor;
import exploChallenge.policies.ContextualBanditPolicy;
public class MyPolicy implements
ContextualBanditPolicy {
private ContextualBanditPolicy implementation;
public MyPolicy(){
// implementation = new SimplePolicy();
// implementation = new YoungestPolicy();
// . . .
implementation = new BestPolicyEver();
}
@Override
public YahooArticle getActionToPerform(YahooVisitor visitor,
List possibleArticles) {
return implementation.getActionToPerform(visitor, possibleArticles);
}
@Override
public void updatePolicy(YahooVisitor visitor, YahooArticle article, Boolean reward) {
implementation.updatePolicy(visitor, article, reward);
}
public static void main(String[] args) {
MyPolicy policy = new MyPolicy();
System.out.println(policy.implementation.toString());
}
}
Each implementation’s toString() method can be set up to self document algorithm, parameters, etc.
Hi, it is my perception or the cluster is running now a little bit slower than before?
I guess this is a little bit slower because each process want to access the data (disk acces) and there is a quite large number of running process at the same time. In the submission webpage I added a cluster charge indicator to provide a feedback of the charge.
The run for victory (on new data after 1st of june) will be done alone on the cluster if the submitted algo have some time limit problems.
precision : sometimes when cluster load is orange (9 to 14) your jobs can be sent to the cluster but do not start computation immediately. When the color is red then your process will have to wait. Of course, this extra wait time is not counted in the timelimit.
Ok! It is not my perception,something is going wrong with the server.
Hi Jeremie,
Another feature request. I dont know if others are also facing this stupid problem. I seem to notice a bug in my code almost immediately after I submit my solution
Can we have a way of killing our own currently running tasks somehow. I feel guilty about submitting almost immediately again with a small change and the submission with a bug is also running using up valuable server resources and blocking others as well.
-exploreit
Well… I was waiting for a such request (in last year challenge we was always facing that kind of problem). This is not a straightforward implementation (because of separations between web servers and cluster).
I’ll think about doing something, but as it is really a lot of mess, u can also ask me for killing a job (providing the full reference of the job as stated in the submission mail xx%yyyyy-zzzz.{zip,jar} (If i’m fed up my manual killing i’ll write an automatic solution )
If your bug leads to a java exception (which is not caught) the execution will stop and your algorithm won’t waste any resources. If it doesn’t, you can add a few tests in your code and throw an Error as follows if you notice an unexpected behavior.
if(n==0)
throw new Error("division by 0 !");
double a = t / n;
Hi, Jeremie,
Could you please confirm that the feature #1 of user feature vectors is always 1? I found that not the case for the 100 test cases given. Thanks.
Do not try to gather information in this sample datas : click or not click has been modified, choice possibilities modified and some values attribute too. They are here to show u the shape no more.
Edit : I looked more carefully, you are right feature 1 is always active and on some lines there is very few other feature activated in the same time. So without any guantee I would say when a feature is not present this can be because this is a “missing” value.
Thanks, Jeremie. Just to double check, does this mean “Feature #1 is the constant (always 1) feature” in the real dataset, as stated in the raw data description?
I was double checking while you were writing this
So yes 1 is always here
Great, thanks
Si, can I for instance, modify the python code in order to set the feature #1 automatically always to 1?
jamh : ok
I am a bit interested in the meaning of “missing”. Does it mean the feature is absent, or whether the feature is active or not is unobservable due to some reason, or both?
In this dataset “missing” seems to refer to both.
Hi, it would be possible to get within the information of a submission some kind of indication about the CPU time or elapsed time used by the submission? So we can get how much time the algorithm consumed and the remaining time to the time limit?
It will help a lot
Thanks,
Ok, If u want you can log it as a new column in the log files (if u have a doubt on how to do it, mail me).
I’d be interested to know how to do this.
In explochallenge/eval/MyEvaluationPolicy.py change the log method to add a timestamp at the end of the output (separated of the score by a space).
I’ll not override this modification during evaluation. It should be enough (if needed I’ll do it)
quick question: do all contest participants have to submit a writeup to the workshop before the may 7 workshop submission deadline? or is the contest winners’ presentation a separate item on the agenda? also, how many of the top contesting teams are invited for a presentation?
No workshop papers are different from the challenge.
The possibility to do a presentation and to write a challenge paper will (probably) be given to the 3 best submissions. It also depends on the originality of the contributions.
About prizes and invitation with paid inscription to ICML, I’m waiting to have more definitive information about sponsorship and amounts needed.
Some of my recent submissions have been returning with errors unexpectedly. They are the same as other submissions that returned successfully with different parameter values so I am pretty sure the error is on the server end. Thought I should let you all know.
Best,
Ed
They are errors of this type:
/bin/bash: line 0: cd: 59%KF_2_0001-fnf11j8cjko: No such file or directory
Unfortunately I also submitted a file with an actual syntax error, so don’t get thrown off by that.
Yes u’re right… I saw them. I’m working on it, but I do not undersand what is happening. This is something quite rare which seems to happens with higher probability when there is more than 7-8 jobs on the cluster node. I suspect some webdav synchronization issues.
If it helps, the error seems to occur most frequently when two submissions are made within a couple minutes of each other.
-Ed
Thanks… It confirms my feeling for a webdav bug (I thinking for an i/o lock). I restarted it, let’s see if there is an amelioration
The problem seems to be fixed. Thanks!
-Ed
Hi, how about the order in which the features appear in the dataset? In the logreader the order is lost when converted to a binary vector, but in the dataset the features are not trivially ordered.
No, i do not have any information about this. But you are right in the dataset it doesn’t seems ordered (but I don’t know if its related to a trust value of an order of apparition for this user)
Are we allowed to convert the timestamp into real world dates and learn from that?
Yes (and it can be usefull)
Thanks.
Some clarification questions, answer what you can:
a) Can you tell us anything about whether the number of times an arm is selectable is approximately uniform? In other words can you say anything about the possibility that one arm will be one of the ~30 selectable arms 3 million times while another is selectable only 300,000 times?
b) The articles have a 6-digit id, e.g. 560620 . The visitors have a 10-digit timestamp, e.g. 1317513291. Are the article ids just timestamps with ’1317′ removed? This seems to be suggested in the ‘some remarks’ post. That means that for the test data the visitor timestamp is always /smaller/ than the article timestamp. Can you tell us if this is true in the actual data?
c) You also say “So between two consecutive users possible choices tends to be same but evolves over time.” I think this means that articles with smaller ids will be available towards the beginning of the evaluation, and articles with larger ids will be available towards the end, but that it /isn’t/ necessarily true that at the very beginning I will have the 30 smallest ids and at the very end the 30 largest. Is this right?
Thanks,
Ed
Don’t get me wrong, but I do not think these information are really important. As one of the organizers, I do not have any confidential information about the data: all the information we have is available on this website; I simply have the whole dataset, so that I can make some stats on them. Anyway, I provide a few hints:
a) no, it’s not. It more or less Gaussian (strictly speaking, it’s not).
Range is 1630 – 107400 displays, mean 42600, sd is around 20000.
The distribution is skewed towards 0.
b) This is really not important (see below). Anyway, I checked and this is not true.
c) To me, ids themselves are really not meaningful; they are just a way to identify entities, track them, and differentiate entities. In my code, I simply follow the flow of data and record new ids as they appear, along with the timestamp at which they appear for the first time. I guess that the ids at the beginning have appeared earlier than the first timestamp in the log; so, the thing has to warm-up to observe truly new ids. But we do not have any information about all that.
I have a number of jobs that have finished but are still being reported as “in the cluster”.
Ok… Seems that the webdav bug is back… I’m fixing that.
Edit : should be done. Webdav again (lot of simultaneous submissions)…
I’m having trouble exporting the jar file. When I tried using build.xml, I got this error message:
../Documents/ExploChallenge/build.xml:21: restrict doesn’t support the nested “name” element.
and when I tried using the export function in the package explorer, I got a bunch of error messages like these:
Could not find source file attribute for: ‘../Documents/ExploChallenge/bin/exploChallenge/Main.class’
Source name not found in a class file – exported all class files in ExploChallenge/bin/exploChallenge
Resource is out of sync with the file system: ‘/ExploChallenge/bin/exploChallenge/Main.class’.
Can anyone help me with this?
Thanks!
Check your java version (1,6), recompile and try to resubmit : it could have been a local bug.
In case someone runs into the same problem, I found a solution. My Eclipse built-in ant (version 1.7) is outdated, and all you need to do is to install ant 1.8.
Cheers!
Hi, is there any chance numpy can be upgraded to a currentish version? I believe at some point I have had some weird issues where the old version of numpy has different behavior.
Apologies for the inconvenience,
Ed
I’ll try to do it on April 25th (planed release of new LTS). But I cannot guarantee.
If I do it I will have to stop evaluations for a few hours.
It is also possible to hace a local python installed independent of the python used by the system and one can install whatever packages on this local python wihout affecting the system python. However you have to do it manually. This could be a solution.
BTW, I am very happy to have Python collegues here!
Likewise – thanks for coding the python evaluator!
-Ed
Hi Jeremie, Is today the python day?
Ubuntu delayed the release of the LTS… So we have to wait too
Edit: Python 2.7 should be available now see the python post
Thanks Jeremie.
I noticed a marked slowdown with python2.7.
It’s strange because 2.7 is usually faster than 2.6.
At the installation level I do not see any reason for that
Could it be linked to scipy use ?
Edit : i made some quick testes.
- Python 2.7 is slightly faster than 2.6
- numpy 1.7 is significantly slower than 1.3
Hi, maybe some Good Samaritan wants to discuss or help me to understand why or why not should I pay attention to the “age” of an article. I’ve read in the task description that it is very important, however I can’t see why… maybe I am obfuscated with the selection procedure ?
I also don’t understand why
forget the selection procedure. think of it this way:
1. given a context, each article has a different propensity of being clicked. your job is to select the article with highest chance of being clicked.
2. for the same article, the chance that it will be clicked by a context c is higher earlier in its lifetime.
So, now the task is to somehow model and tradeoff between “appropriateness” and “novelty” of an article, given a context.
but I have another interpretation.
in time t ,a visitor A comes and read an new article B.
in time t+n , a new visitor C comes and he may treate B as a new article too.Because visitor C had never read B before.
the novelty of an article may not decrease among different visitor in a short time.
Hello,
I am getting this error:
bash: ./go.sh: No such file or directory
I changed the argument to YahooLogLineReader, but it doesn’t help. I am using the Python version.
Please let me know if I am making a mistake somewhere… Thanks!
Yasin
In your case (I had a look to your zip files) this is because you are not using the build_sumission.sh script.
Your zip file wants to extract everything in a
submission
dir. The ./go.sh file must be extracted in the current directory. More precisely the commandunzip yourfile.zip -d mydirectory
must extract everything in themydirectory
with thego.sh
file located atmydirectory/go.sh
Nice name.
I think according to the current plan the best model of each team gets evaluated in the second pass of the competition.
I would like to ask can we pick the algorithm ourselves? Due to my concerns, I think the best one on the leader-board might not be the best algorithm because of overfitting.
Thanks.
No problem, just provide me the full name (or the file) of the wanted submission after the end of the first part
Hi, i have a question about the end of the challange. In the information you say:
“In phase 1, winners will be known at the beginning of June, these winners are strongly encouraged to present their work at the workshop.”
So “who” will be the winners of phase 1? The first X persons? All Submissions above a threshold?
“Phase 2 results will be known only at the workshop, it will be the same procedure of evaluation but with more (and new) data. Participants cannot submit any new algorithm, we will use their best submission of phase 1.”
Will this new data has the same shape, or can for example the number of user-features vary?
And many thanks for the great challange, it’s real fun
About the workshop we think it’s more about having interesting discussions, to stick to you description I would say X=3
But we are open minded and if somebody have something new an fun to present then it’s ok.
About the second phase, it will be the same kind of data but maybe a very few features will be removed (I may have a bigger dataset in the next few days)
If the feature dimension of the phase 2 data is different then some submissions from phase 1 might not work.
Well… At least mine won’t work. I hard coded the number dimension
Do not worry (in fact 2 dims could be missing) so your hard coded value should not be a problem.
In all cases for all algorithms over “always last” in phase 1, I’ll pay attention to get them working on phase 2 data.
I am noticing that random seed variations between runs can cause a swing of upto +/-10 points. since there nearly 10 people at the top within a difference of 20, this could be a significant effect. Do you have any suggestions for fixing this issue? Will the best submissions of each participant be run a number of times and the best/average score taken?
The final dataset is 4 times bigger so I expect to have lower variance. Anyway if the scores appears to be close, I’ll check with some t-tests
Hi Dr. exploreit.
As I understand, for the first part of the challenge, there is no final round. The winner will be the one with higher score passed the deadline.
For the second round I am not clear now how it will be. As I understand there will be a final round with more data as Jeremie explains.
Yes it is
The second round is to avoid all possible overfitting.
Hi Jeremie,
Would we get a chance after June 2 to submit code for Phase 2 evaluation (taking care of feature removal, tuning constants etc) ? or would you just use our best submission file for phase 2 data? I did not understand when you said “I’ll pay attention to get them working on phase 2 data”.
Also, what time exactly will the last submission be accepted tomorrow?
Precise time limit is Samoa Standard Time (that means as long as it is June 2th somewhere on earth you can submit).
After the deadline you cannot submit, but you can indicate me your favorite old submission. This submission will replace your “best” one. About “paying attention” this is about not exclude a submission because of a trivial problem on Phase 2 datas.
Thank you for hosting such an enjoyable competition, had a great time!
Yeah, thank you very much for the great competition!!!
Will you later publish the initial dataset, or the dataset from phase 2, so we can continue offline? Would be great
thanks again
It’ll be done after the workshop on the Yahoo! webscope program
It is really great that Yahoo is willing to release these datasets. Lacking real world data is a big concern for contextual bandit researchers(at least for me it is).