A page for users to ask each other questions and exchange comments about WikiLens. Not meant to be a substitute for reading the documentation, etc! :)
Active Discussion
pwdh: So is it just me or is this site not working? The recommendations don't change. 3 July 2006
DanFr: Two things: 1) The recs. right now are based on your buddies' opinions; no buddies, and you will get averages. We should probably change that, but haven't yet for a variety of reasons; 2) the traffic on the site is not yet high enough for things to radically change often. I hope that answers your questions. We would like the recs to change more, and for there to be more traffic. We'll keep plugging away!
Suggested Changes to Buddy System
rivette: Hi, I've been thinking a bit about the buddy system, and I have a couple of suggestions for changes. Please let me know if you think they are good or bad ideas, and how easy they would be to implement.
rivette: The Buddy system currently forms the personalized part of the recommendations system. Ratings made by your Buddies are weighted more strongly than those made by other users. You therefore want to identify people who rate things similarly to you. At the moment there is a page which gives you a simple measure of this. It shows you what proportion of pages that you rated highly are also rated highly by other users. Personally I don't find this number particularly useful. One big problem is that this is more strongly influenced by your rating distribution than your taste. For example, if someone rates everything identically to me, but one star lower, they would be characterized as frequently disagreeing with me. For this sort of problem, I think a more correct (and computationally equally simple) measure would be to take your overlapping ratings and calculate a Rank Correlation Coefficient. This basically asks: if I rate X higher than Y, then does this other user do the same? If your ratings are completely uncorrelated it will be 0. If you rank all pages in the same order (even if your actual ratings do not overlap at all!) it will be 1. What this would require would be identifying overlapping page ratings (already done), ranking them (straightforward), and then some simple number crunching. The final number can be given a non-scary sounding name like Compatibility Score, which you can then view in a big table. I hope someone likes the idea!
rivette: The second issue I have is with how you assign buddies. Currently it is a two way process. You nominate a buddy, and then they need to confirm the link. Given what Buddies do on this site (weight your personal recommendations), this seems completely unnecessary, and maybe counterproductive. I would like to be able to pick and choose people who will influence my recommendations without some confirmation process and without them having to have me as a buddy too. I might even want to add buddies who are no longer active on the site. I understand, however, that the creators of the site wanted this to be a social network, so I propose the following change: the current buddy system would be split into two parts. 1) Add as Recommender. This would allow you to add a user to the list of people who weight your predicted ratings. It would be a one way interaction invisible to everyone but you, and would not require permission from the other user. 2) Add as Buddy. This would be a two way interaction that would highlight information about the other persons ratings, exactly like the current system. It would require confirmation from the other user, but you could both choose independently if you want to use the other person as a recommender.
rivette: Thanks for reading. I hope I made some sense.
ddjiii: As a user, I agree with these suggestions in principle. Since the power of the site comes from the recommendations, it makes sense to gives users as much control as possible over how they are generated (2nd suggestion) and as much information as possible to guide this (1st suggestion.)
DanFr: First, I like both suggestions at the "high level." In particular, people want more personalization than they can get now, and I think they should have it. Originally we were thinking (hoping?) that people who were actual buddies in real life would buddy up and pull each other into the system, but that seems to be rare compared with people who are just online and want to interact with other online acquaintances. Anyway, let me discuss each part of the suggestion in turn.
DanFr: 1) Rank correlation. some simple score would be nice; I don't happen to like true Pearson rank correlation because it's hard to interpret. Is 0.3 good? Two ways to attack this: a) find another score that is more interpretable (e.g., the probability that if the other person likes something, you will too); b) have the 0.3, but put it into percentile terms (this person is in the top 10% of people in the system for compatibility with you).
DanFr: 2) Assigning buddies. I agree that you should be able to choose people to influence your recommendations even if they don't want a reciprocal relationship. There is some additional complication because we offer the ability for people to show their ratings to everybody, only buddies, or no one else. Thus, it would have to correctly interact with that (i.e., you can only be a Recommender if your ratings are public or you've granted permission to a "buddy"). Recommender is exactly the right way to describe it. In that world, I'm not sure "buddy" is the right thing, then. It might be instead, "I grant permission for this person to see my ratings" but I don't want them to influence my recommendations. So it would require a little extra thought, but I think it could be worked out. Furthermore, I can imagine having a bunch of Recommenders, but temporarily turning some on and off. (You can imagine doing it by category, too, but that's really complicated!)
rivette: This all sounds great. The ability to tune your recommendations in real time would be fun. Do you think people should only be recommenders if their ratings are public? I think if someone doesn't want their ratings to be used for recommendations they are missing the point of the site! In fact should people have the option of hiding their ratings at all? I don't think rateyourmusic has this option, for example, and that seems more in the spirit of the site.
DanFr: The biggest problem with both of these is simply finding the time to work on it! As I've told some people, I believe strongly in wikilens, but I only get to work on it nights and weekends, and I have a lot of that time occupied with other things. I'm always sorry things don't move faster, but that's how it is. My current priorities are
- Fix a few critical bugs (renaming pages is broken, when a user is created I have to manually put them in the User category, rolling out a simple captcha to prevent spam users)
Work on SubCategory because the number of Restaurant categories is out of control!!
rivette: I totally understand. Thanks for all the hard work you put into the site. I think it would be interesting to have a page listing what you're working on, if it's not too much trouble.
DanFr: I'd be willing to put up a page of what I am currently working on if people would like that information. This sort of proposal would go somewhere on the page, but if it's not at the top, I'm not working on it yet. In any case, thanks for your ideas and enthusiasm. It's definitely a motivator for me.
Older Discussion
Ratings bug with international characters in Safari
rwfitzg: Occasionally I run across a page on the site that refuses to record my rating. Mis?rables, Les is the most recent one. Any reason? (added at 10:19:41 AM on 08/03/05)
KurtWilms: What do you mean 'refuses'? You click a smile face and nothing happenes? (added at 12:06:51 PM on 08/03/05)
rivette: I have the same problem with La Jet?e and John Fahey +. I can rate them normally, but the ratings disappear. It seems like a problem with the ? character and maybe with the + in the second case. Anyone understand why this might be? (added at 01:20:19 PM on 08/03/05)
DanFr: Almost certainly this is special characters, probably in the Javascript to submit a rating. Unfortunately, it works for me (Firefox 1.0.6 and IE 6.0). It looked like possibly rwfitzg uses Safari. rivette? (added at 02:04:16 PM on 08/03/05)
rwfitzg: I am using Safari 2.0 in Mac OS 10.4.2. IE 5.2 doesn't show ratings at all for anything I've rated, nor does it save ratings when I select the smilies. It does, however, color the appropriate smilies when I click only to get rid of them when I reload or move to another page. Firefox 1.0.1 works as expected. Safari doesn't retain any changes I make in Safari to the rating set in Firefox, but it does display the rating appropriately that I set in Firefox. (added at 02:45:39 PM on 08/03/05)
rivette: Thanks - yes I was using Safari - I'll try with Firefox (added at 03:54:43 PM on 08/03/05)
rivette: Using Firefox seems to remedy the '?' problem, but I still can't rate the John Fahey Album 'Red Cross' which has a '+' in the title. (added at 04:03:59 PM on 08/03/05)
Bottom part of page
KurtWilms: How does everyone feel about the 'bottom area' of a Wikilens page? Some pages have a short description of the ?item/thing? the page is rating. However, other times the description is included in the fields (ie the Book categories synopsis field). Should the bottom area be reserved for comments or should the bottom area contain a description? (added at 12:38:12 PM on 08/04/05)
DanFr: Good question Kurt. I'd love to hear peoples' opinions on this. (added at 10:44:28 PM on 08/04/05)
KurtWilms: I am of the opinion that a short summary of the ?thing; should be in a field and that the bottom of the page should be free for comments. (added at 01:23:36 PM on 08/08/05)
Item importing
DanFr: I'm curious what people think about item importing from Amazon for books and albums. A pro is that it is easy to add new items. A con is that it ties our data to Amazon. If you look at Wikipedia, they have careful rules about copyrighted material, and thus they own their data. Right now, we do not. Ideally, I'd like to turn off importing, rip out all the item details added by importing, and leave it blank until people are willing to fill it in without legal encumbrance. However, if people won't do it, then that's a bad idea. What do you say? (added at 10:17:48 PM on 08/05/05)
DanFr: (Note that this problem doesn't exist with restaurant importing from chefmoz, since their data allows use with attribution. Of course, their site isn't as useful, either.) (added at 10:18:55 PM on 08/05/05)
rwfitzg: I would think the ownership of the material would only help the long term health of the site. It would seem to me people would add items; they've been adding movies which don't allow import like books or albums. You are likely to get extra information added more slowly though. (added at 09:03:51 AM on 08/06/05)
KurtWilms: I agree. If Amazon owns the content then we should rip it out before the problem (a lot more things get addded that will have to be ripped out) gets worse. However, I find the importers a quick and painless way to add things to WikiLens. (added at 01:22:36 PM on 08/08/05)
TheTibetanTravellere: Exactly what content is Amazon alleging to own? You import title, author, publisher, publication date, ISBN number, and a link to the book on Amazon's site. All those are facts which the Supreme Court has ruled are not copyrightable. And, the clickable link also passes legal muster since it is not a "deep link" whose content you are trying to pass as your own. I do not see any problem from the legal point. (added at 04:17:19 AM on 08/09/05)
DanFr: There is one, though. When we set up an Amazon importer, we got a developer services token, and agreed to the Amazon web services license. It states very clearly that all text and images we import are "Amazon Properties" and may not be redistributed (see clause #3). Thus, basically the supporting data for all imported pages can never be in a public data dump. I am glad to see several people weigh in on this conversation, as I want to know how people would react if I disabled the feature .. and ripped out every page that has been imported (!!). (added at 09:38:45 AM on 08/09/05)
TheTibetanTravellere: What they claim and what the courts will enforce are not necessarily the same thing. The issue has already been decided in the Supreme Court. (I will look up the exact cite if you want it.) Some company took the phone company's telephone book, copied the names and numbers, and sold their own competing phone book. The phone company took them to court claiming copyright infringement. The court said no. They said that names and numbers are facts which are not copyrightable. So, you are covered because the only data field that isn't a "fact" is the snyopsis which is not imported from Amazon. But, to keep or tear out the pages is a decision that you will have to make since you are the one that Amazon is going to holler at if they decide to interpet that clause to mean even the author's name.
P.S. I don't think Amazon will holler since you have a link directly to Amazon's web site. It drives traffic to them. I read an article a little while ago that specifically mentioned someone who took Amazon's data, remixed it, and presented on their own web site. (Yes, I will try to relocate the article.) The only change Amazon required was that the site owner take down links to Amazon's competitor's.
P.P.S. If you are going to "tear them out", sooner is better than latter. The longer you wait, the more pages will have to be deleted. Also, you might want to add a feature to search for entries that have missing fields. That way some enterprising fool (I mean soul can find them easily and add them. (added at 11:31:00 AM on 08/10/05)
TheTibetanTravellere: Here are the links. The magazine article, Mix, Match, and Mutate Amazon's response to the website "Amazon Light". A lawyer's webpage discusses copyright and databases. And, finally, the case I was referring to is FEIST PUBLICATIONS, INC. v. RURAL TEL. SERVICE CO., 499 U.S. 340 (1991) and here is the full opinion. Hope that helps. (added at 11:51:00 AM on 08/10/05)
DanFr: Tibetan, thanks for all the thoughts. I'd say, a) We clicked "I agree" on an agreement, so it seems odd to back out; b) If I were on the wrong side of some Amazon lawyers, I'd probably cave pretty fast. I am just waiting for
rivette to weigh in, since that user has contributed a lot.
TheTibetanTravellere: As I said, it is your call. Just wanted you to make an informed decision. However, one thought occured to me while I was putting in my last entry. If you do rip them out and automate the process: do NOT filter on the existance of the "clicable" link. There are several occasions where I manually entered the information including the link to Amazon's web page. (added at 04:55:00 AM on 08/17/05)
TheTibetanTravellere: Having said the above, I wouldn't be upset to see the feature go away. While I sometimes use the feature, I have added the fields by hand because I did not like the way Amazon handles the title. One example is "Charlie and the Chocolate Factory (Puffin Novels)". Puffin Novels is not part of the title of the book. It is the content of the book that makes the book worth reading, not the publisher. As noted, the downside of the removing the feature is that most people do not fill in the blanks. A work around might be to not let anyone create a page unless certain essential fields are filled in like "title", "aurthor", and "publication date". (Actually, I do not like the last field either. It should be copyright date, not publication date. 1984 was written in 1948, not 1983! (added at 04:27:06 AM on 08/09/05)
chowhound: I say rip it out. It'll mean less/slower growth, but it avoids disaster scenarios, so at least that slow growth will be leading somewhere durable! **BUT**: couldn't hurt to ask Amazon for special dispensation first. You never know. And if they say "yes", it'll be best of all worlds.
rivette: I like the amazon import feature. However, it would be unfortunate if the data could somehow not be made public. Having said that, the most important data here is in the titles and ratings which can't belong to Amazon (or can they?) and not the subfields. I don't understand about pulling all the Amazon pages right now. If we reentered the data wouldn't it look exactly the same? It seems you could get away with just stripping the current amazon pages of all but the bare essentials that are general knowledge, Title/Author/Year, say. Then you won't lose any rating information or site growth. I definitely think its worth seeing if Amazon will be friendly first. (added at 11:29:43 AM on 08/17/05)
rivette: Also, how have movielens dealt with this? I'm guessing they import their data from IMDB. Is this the case, and are they limited in what they can make public? (added at 11:36:21 AM on 08/17/05)
DanFr: MovieLens is definitely NOT importing data from IMDB. That's illegal, against license. In fact, they received a letter from IMDB at one point. Actually, a single person, a "movie guru" named Chad, enters most of the movies. It is a bottleneck. They are also looking at opening up movie adding and editing to members. Finally, their data is not yet "public" in the sense that they don't redistribute it. (added at 12:39:19 PM on 08/17/05)
HomePage
DanFr wrote (in WikiLens/FAQ): Sorry about the HomePage. I've had to defend against spammers (really!), and the HomePage is the most often defaced. In fact, I believe that's just about the only page that's locked in the whole thing. By the way, if others are willing to help defend it, I would unlock the HomePage, too. (added at 06:21:50 PM on 07/28/05)
chowhound: well, of course, a natural immune system is fundamental to the operation of any Wiki. Connecting the community in a place where members can interact about the project would be an important first step in building a community that can self-defend without your intercession. Right now, we're all just sort of eerie phantasmagorically sets of preferences!
Hopefully this page can serve that purpose, and also take some of the burden off of you guys to answer questions, etc. That is....if people see it! :)
DanFr: Sounds good to me! (added at 08:02:52 AM on 08/01/05)
chowhound: Also...this page is doing amazingly well considering it's kind of not linked anywhere. Imagine how useful it would be (users helping users, ability to poll users' opinions and ask their help) if it was linked to home page and help...just an idea!
DanFr: I added Discuss WikiLens .. and unlocked the HomePage. If the HomePage gets defaced too much, I'll have to lock it again. (added at 07:43:06 AM on 08/16/05)
Resolved Discussion
# things counter not working
rivette: I think the # things counter for the movie category on the home page is stuck (added at 01:38:48 PM on 08/03/05)
DanFr: The "# things" counter on the homepage has a small bug, whereby it counts number of pages with a page reference to their category instead of just number of pages in category. I am fixing this in the next version. (added at 02:05:12 PM on 08/03/05)
DanFr: I added a new column for "# things" that shows the actual # of items in a category. What I am saying is this counter is fixed now. (added at 03:48:18 PM on 10/04/05)
WikiLens slow
DanFr: I must apologize that WikiLens has become so slow! Believe me, I'm working on it. (added at 09:01:42 AM on 09/09/05)
ScottYilek: Yeah, it's unbearable. Maybe switch to 'Groovlets'? ;-) (added at 11:11:42 AM on 09/09/05)
KurtWilms: What's up with the RecentChanges page never showing up!!?? (added at 04:07:48 PM on 09/09/05)
DanFr: Yeah, my bad. RecentChanges would show many hundreds of things, and the server grinds and runs out of either time or memory, and comes back with nothing. The best thing is to add paging (only show 20 at a time or something). Seems simple, but still I need a spare few hours to code it up. Also, I want to add memcached to speed everything up. That will take a number of hours to get right. (added at 05:26:22 PM on 09/09/05)
KurtWilms: Rock on. (added at 09:08:24 PM on 09/09/05)
DanFr: Okay, I added a limit so that RecentChanges only shows the last 50. Now it loads again. (added at 07:55:54 PM on 09/13/05)
DanFr: I made some changes to speed up the page and HomePage about 4X. Still somewhat slow, but a lot better. (added at 03:47:45 PM on 10/04/05)
Spammers
DanFr: Spammers have discovered us again. They used to barf out tons of links, so I borrowed PhpWiki's solution: more than 20 links on a page means "spam", so don't save. Well, someone's gotten smarter and is submitting fewer links. I just put in place another strategy, and we'll see if it helps. (added at 11:40:00 AM on 10/11/05)
DanFr: The other simple strategy (changing the URL slightly) didn't work at all. I've changed the pages to require a login before editing. That is sad and I don't want to do it, but our community is not yet strong enough to resist massive spamming. I would like to look into other strategies (e.g., change the "Save" button names and such), and open back up. (added at 3:43:00 PM on 10/11/05)
ScottYilek: What about using captchas or something similar?
New Questions
scrognoid: What do the colors of ratings under 'pred' mean? (added at 08:16:14 PM on 05/27/07)
DanFr: Green is "above your average" and red is "below your average." (added at 11:34:34 PM on 05/28/07)
ddjiii: Check out the discussion of ratings systems and their effect on Machinist (added at 02:40:25 AM on 06/15/07)
baaic: Has there been a change to the prediction algorithm? I'm looking at Seabiscuit: An American Legend, and my prediction is 3.7, even though average rating is 4.5, and I have two "buddies" that have given it a 4.5 and a 5. That doesn't match up with my understanding of how the predictions work here. (added at 10:48:33 AM on 06/16/07)
DanFr: There has been no change. I'd have to grovel your data and do the calculation to see why your prediction is what it is. There might be nonintuitive things in the algorithm. For example, it average-adjusts (i.e., if you tend to rate things around 4 and your buddies around 4.5, then their 4.5 is really a 4 for you). Also, I think it might use negative correlations (which perhaps it shouldn't). (added at 09:58:00 PM on 06/25/07)
