cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Is it ethical to ask SDNers to code for free on a community project?

Former Member
0 Likes
594

As some of you may be aware, Anton Wenzelheumer has been writing a lot of beautiful php code down in the Scripting Lanuages forum, in response to problems I've been posing in the bioinformatics section of the SAP wiki.

But now, I have about 100-200 hours of straightforward but complex code that has to be written in perl or php - no fancy web scripting - just read a file in, analyze it in various ways (no regex - all comma-separated variables), and output one or more files. (But the analytic routines are very interesting from a technical point of view.)

So, suppose there are 20 folks here at SDN who know perl and/or php as well as Anton knows php.

That means that the job could get done if each one of these folks did 10 hrs for free.

Is it fair to propose a project of this size to the SDN community?

What I mean is the following.

Even if 20 people actually signed on for 10 hours each, would it be fair to ask them in the first place?

Please note that I'm not asking if it would be "appropriate". In my opiniuon, it's definitely "appropriate" because SDN not only has a scripting languages forum, but also actively seeks code samples that people can learn from. So in this case, the code samples would teach folks how to solve a wide variety of very complex analytic problems in perl or php.

But what do you think of the "fairness" question?

djh

View Entire Topic
Former Member
0 Likes

hi djh,

I don't think it's unethical to ask the community this question but i think it's kind of an obfuscation to declare your personal task to be a community project and therefore assume that you will not be sucessful with that approach (for you personally of course I hope it'll work out).

to get a community project running I think you must understand the individual's motivation to take part in such an effort. I can atl least tell you something about my motivation to work on some of your challenges in the Scripting Languages forum.

1) I love to code for fun; writing a little routine here and there is like solving my daily sudoku puzzle, read a political analysis about Georgia or riding my bike through the woods; but since it's a spare time activity I take the liberty to code exactly what seems to be fun or challenging to me

2) I feel a certain urge to demonstrate the interested reader the beauty of using scripting languages in certain circumstances, to show that complex string manipulation or network programming is really easy to handle in such environments; to show what is possible to do beyond ABAP and thereby to maybe contribute a tiny little bit to moving the maintainers of ABAP to support and advance certain modern features

3) I want to help you free your mind, to change your attitude a little from asking 'can XY be done?' to better ask 'how is XY done best'

that said, I wouldn't code your project but I would certainly help you here and there to get beyond some conceptual or technical obstacles. programming a routine for a supposedly simple statistics problem from a scientifically point of view doesn't appear to be an intellectual challenge to me but rather something to cash $$$ on. I also don't see much that SAP or anyone else can learn from that (better than in any other practicing environment) and a quick thought on a possible business plan and a future big commercial success (startup aptitude) doesn't yield very shiny results either, but that's just my 2 cents.

kind regards, anton

Former Member
0 Likes

Hi Anton -

I want to thank you very much for taking the time to reply because the nature of your response makes this a much more interesting discussion than it might otherwise have been.

Before responding to the substance of your reply (which I may actually do in a different post later on), I want to clarify two important things.

I.

The project I'm working on has NO cash value for me personally. I have done what the US Government calls a "laydown" of all the relevant IP ("intellectual property") at a public website owned and operated by the US Government. And if you know anything about the US Government, you know that because of this "laydown", there is only one owner of the IP, namely the US government.

So for all intents and purposes this is truly an "open-source" project.

II

The coding that needs to be done is not the trivial programming required for normality or t-testing. For that, the STATA statistical package can be called in batch mode, just as Bill Mann did when he wrote a lot of the existing project code back in 2003-2004.

So no - I would never ask you or any other scripting language SDN hotshot to code something that's already available off-the-shelf in STATA and probably hundreds of other cheap or free statistical packages.

The problem is complex and interesting for an entirely different reason having to do with the set up of the control experiment from the data. I have about 1M "words" of various lengths that are all subsequences drawn from the same set of 20000 strings, and for each of these 1M words, I know the start and end positions for where it occurs in some string S belonging to the set of 20000.

Now it may seem that 1M words drawn from 20000 strings would completely "cover" each and every string, i.e. that there is no substring of any string S which is not in the set of 1M words.

But in fact this is not the case - there are large "unused" portions of each string S that contain no words in the set of 1M.

So, suppose that in the set of 1M words, there are 5 of length 10 from some string Si, 2 words of length 20 from the same Si , and 10 words of length 30 from the same Si. Then in order to set up the data for the control experiment, I have to find 5 words of length 10 that occur in the "unused" portion of Si, and 2 words of length 20 that occur in the "unused" portion of Si, and 10 words of length 30 that occur in the "unused" portion of Si. And I have to do this for each of the 20000 strings in relation to the set of 1M words. And furthermore, the REALLY complex requirement comes in when the following condition has to be satisified: the set of control words from the unused portion of S have to overlap or not overlap each other in exactly the same way as the set of real words from the "used" portion of S do or do not overlap each other.

This may strike you as a trivial string manipulation problem, but it doesn't strike me that way, particularly since the size of the word set and the string set will increase dramatically (by orders of magnitude) in the future, perhaps to the point where true distributed algorithms on MPP systems will be required.

And therefore, what I would ask from you is not "code", but exactly what you offered - how to design the required algorithms "best", i.e. so that they can scale up indefinitely. Once you (and others) designed the architecture, it could be encoded by others who are simply interested in writing elegant perl or php against well-defined complex problems.

Furthermore, I am prepared to argue with you (though not now) about how relevant the structure of the above problem is or isn't to "real" problems in MM/IM, not the kind of more-or-less "trivial" everyday IM/MM problems that SAP ERP now handles. I called my first SDN presentation back in 2005 "Beyond Best Practices: SAP as a Platform for Innovation" because I felt back then and still feel that the potential of SAP for solving "real" MM/IM problems hasn't even been explored yet. And like you, I want to wake some folks in Walldorf up - but in a different way - namely, by showing that a number of very interesting "real" problems in MM/IM are suggested from a serious consideration of bioinformatic problems in relation to MM/IM problems.

That being said, I am also prepared to continue arguing with you (not now, later) about the marketability of an "open" turnkey SAP bioinformatic system. This issue involves the question of whether SAP will ever market NW7 or 8 or 9 or 10 or 20 as a standalone IDE or not, and since that's something I very much would like to see happen, it's another reason why I'm surfacing the "bioinformatic" questions here. (BTW, did you notice Mike Pokraka's response in another thread where he said he agreed that SAP should do this ??? I was very surprised that he agreed.)

In any event, I really meant it when I said that your response was thought-provoking, illuminating, and yes - even a little inspiring. If more people thought like you in this world, it would be a better place.

So thanks very much again.

djh

Former Member
0 Likes

I've read this thread multiple times and I can't get over one point and I know it's not your intention because you are simply not like that (at least I don't think) but it sounds kind of like that.

When I read this I get the feeling you want to use community in an official project capacity meaning they make a commitment do something on a regular basis and are held to it. I know it's not what you mean but it does come across that way so in that sense it is of course unethical to ask HOWEVER with the idea that any community project is based on desire and interest asking if anyone in the community wants to join in on a project is of course the nature of community and is fine.

It's always of course to best lead by example and most successful community projects all started with something tangible (not just an idea) first - so for example you bring a chunk of code to the table and the community works together to optimize and make it better but the work was already done it's about enhancing. On other level you have an idea but just need a nudge or two in the right direction to make it actually work. For example I have an idea in my head and 'm tossing it around with others because I'm having trouble determining if the code is actually doing what I think it's doing or if I have just convinced myself that what I see as output is what I want it to be.

There are lots of ways to "do it" in the community but the thing to remember is that it's all very informal for the most part...

Your project here with Bioinformatics is extremely interesting and I'd love to be part, I've just so little time I won't step forward because I don't think it's fair to join a project if I don't really have time to do something.

I am looking forward to Vegas though because I want to see/hear how it going.

Former Member
0 Likes

Hey Craig -

Really glad you took the time to respond here because I agree with you completely - there's a fine line that shouldn't be crossed and my post was exactly an attempt to generate discussion on exactly what that line is or where it needs to be drawn.

I think in a way you're saying the same thing Anton was implicitly saying when he listed his reasons for actually writing the code that he has. He wrote the code because he had his own personal reasons for doing so (very good ones in my opinion) , and he wanted to make it absolutely clear that by doing what he did, he wasn't "signing on" to some project in any "official" sense. In other words, there will be times when I pose a problem in the bio-wiki that he finds interesting for his own reasons, and he'll code something up in response, and other times when he won't. Not because he doesn't have the time or ability - just because he doesn't choose to at that particualr time for whatever reason.

Fortunately, I think, the community is it's own "self-regulator" that sorts out questions like this very well.

What I mean by saying that is simple. If my "20K sting: 1M word" problem happened to strike enoigh folks here as an interesting technical problem in and of itself, then those folks would coalesce around it and something would get done. But not because any "official" project has been "declared" or "signed up for" - just because the idea happened to tickle enough folks' intellectual "funny-bones" in the same way at the same time.

At the same time, the community can't properly "self-regulate" these matters unless it is in possession of all the facts.

To see what I mean by this, you gotta think back to when I posted my first problem very informally in the SL forum and you and several others posted solutions almost immediately in about as many different scripting languages as there are.

Although I was extremely grateful to you and all the others for doing so, I felt a little guilty because on my side, it wasn't just "puzzle-solving" in the fun sense - I really needed the code that you guys wrote for a very practical reason. Not because I was going to make money on it or anythiing like that, but just because someone had asked me to prove something to them that I couldn't do without the particular code that you guys wrote.

So it's kind of the same thing in this case. On the one hand, I definitely think the SDN community as a whole would benefit from watching how a :"working group" at SDN attacked and solved my "20K string, 1M word" problem, and yes, I think the solution to the problem has applicability in areas that SAP has traditionally been interested in.

But at the same time, I'm getting something done that I need to get done to further a personal project of mine. And as I said, the community is entitled to know this up front - that I would take any solution they come up with and use it in a practical way. Not because I intend to "make money" on it or anything like that - as I said, everything I do in the bio area is completely "pro bono",

The reason is that I'm taking the results of a community effort and using them outside the community, and that makes this case very different from a case like SAPLink, for example. In that case, there was no use for the results of the project outside the SAP community, so the issue that we're discussing here couldn't have arisen.

I know this response is getting long (as usual, right?), but two other things have to be said.

Since I begain sharing my research on the Web four years ago, I have been amazed at the generosity of complete strangers who have donated their time simply because, like I said, a particular problem tickled their intellectual "funny-bone". There's a very sharp C/Basis coder named Gunter Sterten over on your side of the poind who works from his home in a little town called Hilter (I think), and he downloaded a whole himan chromosome from the public databases and wrote a lot of complex code against it just to get independent verification of a result I had already obtained. More recently, there's a guy in Spain named Angel Herraez from the "JMol" maintenance group (a volunteer group) who offered to colloborate on a problem that struck his interest.

So it's kind of the same thing here - if I put an idea out here at SDN, and folks volunteer to help, I think that's fine. And if they don't, then no damage done. Just so long as they have all the facts up front, which as I said, was one of the reasons I started this thread.

The last thing I wanted to say is about what I'm "putting in" on my side in return for what I'm "getting out". Well, the answer to that is the WIKI - because I really do believe that folks should know more about "bioinformatics" than they do, and the WIKI is my way of providing a little mini-seminar that will get folks comfotable with the basics.

Also, I really do believe that SAP should be seriously thinking about penetrating bioinformatics as a new vertical sector, and the WIKI is a place for me to try and show why. (That may strike some folks as not reallu "putting in" anything of value on my side, but I think one main purpose of SDN is to provide a place where ideas can be aired and maybe float upwards if they have any merit.)

And finally, I really do believe that SAP should market NW7 as a stand-alone IDE, and the WIKI gives me a place to show why that it not an unreasonable idea. By doing so, I'm airing another idea that maybe should get more discussion in the SAP community (at least Mike Pokraka agrees with it, and maybe there are others like him), so I think I'm making a contribution here as well.

Anyway, thanks again for your response - glad you took the time to post it.

Best

djh

Former Member
0 Likes

>

> Although I was extremely grateful to you and all the others for doing so, I felt a little guilty because on my side, it wasn't just "puzzle-solving" in the fun sense - I really needed the code that you guys wrote for a very practical reason. Not because I was going to make money on it or anythiing like that, but just because someone had asked me to prove something to them that I couldn't do without the particular code that you guys wrote.

>

> But at the same time, I'm getting something done that I need to get done to further a personal project of mine. And as I said, the community is entitled to know this up front - that I would take any solution they come up with and use it in a practical way. Not because I intend to "make money" on it or anything like that - as I said, everything I do in the bio area is completely "pro bono",

>

A bit out of context but those two paragraphs I think sum everything up about starting a community project.

1) Lay it all out upfront - some people might have felt "used" otherwise.

2) Be clear what's in it for you

Motivation behind starting a project is key to it's success, SAPLink, FLOB, and many others are very clear the benefit for all so they are working - just like this case http://redmonk.com/sogrady/2008/08/17/wetry/ - be clear, honest, upfront and transparent and you can't go wrong and you'll find you have more supporters than you might thing even from people like me.

Former Member
0 Likes

Craig -

I think that's a fair summary of the do's and don'ts, and I only want to add a couple of more comments.

First, my ability to maximize the benefit of this project to SDN and SAP is limited by the fact that I have the subscription stack but no adequate server to run it on (adequate in terms of RAM and disk space.) This is due solely to personal financial limitations which I can't do anythng about at the moment.

Suppose I had such a server. Then I would already have been able at this point to demsonstate a WDA presentation engine sitting on top of an ABAP analytic engine, and in particular:

a) a WDA presentation engine that links out "safely" to applets in iframes;

b) an ABAP analytic engine that links out via RFCs to perl and other scripting languages to do things that are best done in scripting languages.

(Of course, I would have needed help on the protocols for the WDA iframe linkouts to applets and the ABAP RFC linkouts to perl,etc. But I think I would have easily gotten this help from the community or even been able to find the examples I need in existing tutorials, forum posts, blogs, etc.)

So if these two engines were up and running now, then I would have satisfied one of your main requirements: namely, the requirement to get somethin running first and then ask peple to join in if they're interested.

Furthermore, if these two engines were up and running, then the benefit of the project to SAP and SDN would be maximized in three important ways.

c) SDN'ers who don't know about "safe" WDA iframe linkouts to applets and ABAP RFC linkouts to scripting languages would have some great examples to learn from (because this project has to "do everything in the book";)

d) SDNers (and SAP itself) would have a real-world example that illustrates my reasons for thinking that SAP should markert NW7 as a stand-alone IDE;

e) SDNers (and SAP itself) would have a eal world example that illustrates how easy and natural it would be for SAP to get involved in bioinformatics.

Now, I know you're annoyed at this point because you think all of the above is just another plea to SAP to give this project an adequately resourced server.

But that's not true. I know it's inapproriate to make that request again,

The only reason I mentioned all of the above is by way of saying that my feelings are a little hurt in one respect.

In particular, I think it's a little unfair of you to have compared what I am able to do in my present situation to what folks like Ed and Dan could do a couple of years ago as members of an elite group at C-P who were specifically tasked to prototype new ideas and were given adequate resources to do that job.

Yes yes - I know what you'll say: "well, David, if you were as good as Ed and Dan, you'd have a job like they had and could do the same kind of things."

And when you say it, I'm afraid I'll have to agree that you're absolutely right !!!

Best

djh

Former Member
0 Likes

David,

Ask Eddie and Dan, Tom and Rich and many others what they use primarily for the stuff they do I think you'll be surprised

We all have imitations, heck even I don't have access to a standard dev system unless I could ask someone for a favor.

Let's chat in Vegas about future possibilities.

Former Member
0 Likes

Hi Craig -

Oooooo - you shouldn't have left the door open like that ... you know you'll never get rid of me now ....

Regarding what the big dogs use, I know that whatever machines they've got, they've gotta have 6-8G RAM, cause NW7 just won't run reasonably on anything less ... again, if I had the coin right now, the new Dell XPS's would be fine ..., or I'd just buy a new server.

And even if my old personal Sun RaQ550 appliance server wasn't having old age problems right now, I doubt that NW7 would install because Sun, in its infinite wisdom, installed RedHat with all kinds of patches that come with warnings saying "change this and you void your warranty." When the new drive comes in next week, I may give the install a shot ... but I really doubt it'll work.

The other factor here is size of transparent tables ... in this kind of application the tables will make BSEG look puny ... that's why it's interesting from a distributed point of view .... something else that SDN'ers could learn from if they've never worked at truly high volumes ...

Anyway, thanks very much again.

djh

thomas_jung
Developer Advocate
Developer Advocate
0 Likes

> I know that whatever machines they've got, they've gotta have 6-8G RAM, cause NW7 just won't run reasonably on anything less

I run CE Java 7.1 EnhP1 and ABAP 7.0 EnhP1 and MDM 5.5 all on my laptop with 3Gb of RAM. Admittedly I generally don't run ABAP and Java at the same time, but it is possible to do so. I just find that it slows down my normal desktop apps too much when I run them both. However I keep my ABAP stack running just about 100% of the time and get great performance on everything else while doing so.

When I am at home I do have a Dell XPS 420. It has quad core, 3Gb of RAM, and 1Tb of RAID 0 HD. I run Server 2008 64-bit and it handles JAVA + ABAP + MDM at the same time just beautifully.

Former Member
0 Likes

Hey Thomas J -

Thanks very much for stopping by.

When you say Server 2008, I'm assumign that you're using the XPS 420 just as a server, right ????

thomas_jung
Developer Advocate
Developer Advocate
0 Likes

On the Dell XPS I dual boot. When running in Windows Server 2008 it is used mostly as a server - although I have the SAPGUI installed over there and have been know to work directly on the box. However about 80% of the time I run with the systems off my laptop and have my Dell booted back over to Vista otherwise the family gets upset.

Right now on my laptop I have a typical load. I am running Outlook, SAPGUI, NetWeaver Business Client, Twhirl, SnagIt, NetWeaver Developers' Studio (with Flex Builder), an MDM Server, and my ABAP Server.

Here is the memory load by process:

http://www.flickr.com/photos/tjung/2783870139/

And overall:

http://www.flickr.com/photos/tjung/2783870165

That's a fairly heavily load, yet my performance is still excellent. On thing I have found is that if you are going to run server products (ABAP or Java) on your main client machine in XP, is that you should configure the Performance Options as though the system is a server. Set the Processor Scheduling to favor Background Services and the Memory usage to favor System cache. That will have a huge impact.

http://www.flickr.com/photos/tjung/2783870193

Also I found that the profile parameters on the ABAP sneak preview are actually a little small if you have 2Gb plus. You can get better performance by increasing the program buffer to 200Mb and the shared pool to 512Mb.

Former Member
0 Likes

Hi again Thomas -

Well, whatever other purposes this thread may or may not have served, I'm glad it elicited these wonderful responses from you. It's good to know that 3G will handle loads like the ones in your snapshots.

The reason I may go up to a higher-RAM machine when I can affod to do so is that the bioinformatics perl programs can involve some very long running "nsquared minus n" (i.e. "quadratic") case-loads, and I'd like to be able to let those run in "in background" on the server while "foreground" of the server is supporting my WDA/ABAP development and testing.

Best regards

djh