Osclass forums
Support forums => General help => Topic started by: SteveJohnson on December 16, 2016, 04:05:10 am
-
Hello,
What do you recommend for stopping duplicate posts?
Currently i am using Tuffclassified theme (which has spam protection inbuilt - it checks title and description for duplication). I will soon use another theme so i will lose this function.
Wondered what others use to stop duplication.
Thanks!
-
Come on guys, reply. I think this is quite a valid topic
-
Hi
Not sure if there exists a mechanism within Osclass to achieve this but you can try the following plugins:
First One Plugin 1.3.3
Gives the users the freedom to push their Ads to the top. This way they will have fewer tendencies to post duplicates
Activate Deactivate Item 3.0.1
This is useful for those who want to pause their Ads for certain duration and reactivate when needed.
Hope this helps.
-
Not sure if there exists a mechanism within Osclass to achieve this but you can try the following plugins:
--- There does exist a mechanism in oslclass to mark as spam/ block, but i have no idea how this works, as duplicate ads even with the exact same title and description is never marked as spam.
First One Plugin 1.3.3
Gives the users the freedom to push their Ads to the top. This way they will have fewer tendencies to post duplicates
--- This plugin has nothing to do with spam, this is a plugin to move an item to the top.
Activate Deactivate Item 3.0.1
This is useful for those who want to pause their Ads for certain duration and reactivate when needed.s
--- This plugin is for activating and deactivating ads. Again, nothing to do with spam.
The only solution i see is the plugin by Michael (https://market.osclass.org/plugins/security/spam-solution-plugin_318) . But i still don't get what is the spam/block inbuild function in the osclass core. And why doesn't it work for such a basic feature.
-
I don't think the core of Osclass has ANY spam filters or duplicate checks.
-
I don't think the core of Osclass has ANY spam filters or duplicate checks.
I guess so. Are you using some automated way to stop duplication Aficionado, or manually removing them? Some users just keep copy pasting multiple times, and it's quite annoying .
Thanks.
-
I guess so. Are you using some automated way to stop duplication Aficionado, or manually removing them? Some users just keep copy pasting multiple times, and it's quite annoying .
Thanks.
No i don't have any checks like that. And to be honest i don't know how feasible it is, since imagine to check every new ad Topic against my 10.000+ posted ads. Not sure but this could pose a lot of server load and i don't want that.
-
There is a modified version of spamkiller plugin that check for similar ads published before and block the new ad.
If i remeber well, it check only between the ads with the same email.
You can surf this post to see if you find a working version http://forums.osclass.org/plugins/(new-plugin)-spam-killer/
-
There is a modified version of spamkiller plugin that check for similar ads published before and block the new ad.
If i remeber well, it check only between the ads with the same email.
You can surf this post to see if you find a working version http://forums.osclass.org/plugins/(new-plugin)-spam-killer/
I haven't found any in that topic. Maybe deleted.
-
Come on guys, reply. I think this is quite a valid topic
The interest for valid topics is almost zero in this forum. Sad but true.
Do you know how many "valid topics" i have started and got zero interest ?
-
Since this is a very important issue, I put this on my priorities list and make a plugin for this purpose.
A feature that is still missing for my page
-
I have a version working on my sites, but is highly customized, it also check if at least one of custom fields is filled, so i can't upload it.
-
There is a modified version of spamkiller plugin that check for similar ads published before and block the new ad.
If i remeber well, it check only between the ads with the same email.
You can surf this post to see if you find a working version http://forums.osclass.org/plugins/(new-plugin)-spam-killer/
This plugin checks specific keywords, and marks as spam. I think it does not check duplications.. or am i missing something?
I currently use the version that is modified by Aficionado
-
No i don't have any checks like that. And to be honest i don't know how feasible it is, since imagine to check every new ad Topic against my 10.000+ posted ads. Not sure but this could pose a lot of server load and i don't want that.
Let's see.. If the duplications are checked only with the ads posted by one email address (one user), and not the entire DB, the load will not be that high.
I think my theme currently does the same, only checking with respect to email addresses
-
Since this is a very important issue, I put this on my priorities list and make a plugin for this purpose.
A feature that is still missing for my page
Liath, you're the only one these days who is interested in making something for the community.. thank you so much.
I would like to contribute if you need any help with this..
-
I would like to contribute if you need any help with this..
Is not necessary, but thanks anyway :)
I have some Ideas for this Plugin, so i want to check for duplicate posts, banned words, banned emailadresses and want to include a honeypot... any suggestions for more security?
-
I would like to contribute if you need any help with this..
Is not necessary, but thanks anyway :)
I have some Ideas for this Plugin, so i want to check for duplicate posts, banned words, banned emailadresses and want to include a honeypot... any suggestions for more security?
Ideas no, suggestions yes.
Right now i do the following. Check for spam/stopwords and the plugin marks them as SPAM and DISABLES them. Then i use the Butler plugin to delete the spam automatically.
So it is important to think what to do if a spam/stopword is found. Mark as spam, disable the ad, or whatever.
-
I have some Ideas for this Plugin, so i want to check for duplicate posts, banned words, banned emailadresses and want to include a honeypot... any suggestions for more security?
This much sounds good to me Liath.. just that i'd make sure not to scan the entire DB, and scan only a particular user's listings to mark as spam/duplicate.
-
just that i'd make sure not to scan the entire DB, and scan only a particular user's listings to mark as spam/duplicate.
sure
So it is important to think what to do if a spam/stopword is found. Mark as spam, disable the ad, or whatever.
i think its the best to do the same... mark it as spam and disable it
-
One problem i see is that prof spammers (paid from India, Pakistan, Philinines etc) are using more than one email to post the same post.
-
One problem i see is that prof spammers (paid from India, Pakistan, Philinines etc) are using more than one email to post the same post.
True, i've had one terrible case in which one guy just wouldn't give up. He was using a new email every time i blocked his previous one, and even when i blocked his IP, he was using proxies to access from new addresses.
I don't think such desperate cases can be automatically blocked. This probably can to be only dealt with manually.. luckily such cases are not very common
-
This simple trick also helps (.htaccess) and maybe give Liath some ideas for the plugin
# Protect from spam bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.yourwebsite.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]
</IfModule>
Replace “yourwebsite.com” with your blog url. The code only blocks SPAM BOTS and not humans who manually try to spam. Bots are the most annoying pests so preventing them from spamming is a good first step.
-
I know this from Spamassassin... but its for wordpress and redirect user/bots and prevent ddos attack on loginpage i think
-
First screenshot of the options page... for now, all working well :)
-
hmmmm. Where is the spam or stopwords ?
-
will be integrated too
-
will be integrated too
oh. i feel better now. thanks
8)
-
all antispam mechanism working like a charm :)
-
all antispam mechanism working like a charm :)
Phew you're fast! The dashboard looks nice.. eagerly waiting to test.
Thanks Liath
IDEA - How about the duplication has two options : 1.) To check both title and description OR 2.) To check only the description
Also, if the number of characters that need to be tested can be inputted from the dashboard.
...not sure if you like this, but just suggesting
-
i can think about it... but for now i'm finished... check all files and will upload it tomorrow to the market
here the last screenshot, where you can check and handle spam files and the settings page again
-
i can think about it... but for now i'm finished... check all files and will upload it tomorrow to the market
here the last screenshot, where you can check and handle spam files and the settings page again
This is awesome. I hope it gets approved for the market soon.
-
I'm done... have checked all twice, add some translations for german support, made a help section and uploaded it to the market...
if all are right, you can download it there soon
-
I'm done... have checked all twice, add some translations for german support, made a help section and uploaded it to the market...
if all are right, you can download it there soon
Much appreciated, thanks
-
@liath
One suggestion for the Stopwords. Could you add an extra check for the PLURAL of stop words ?
For example if the stopword is EXAMPLE, EXAMPLES could be banned also (example #2 LOAN and LOANS). How about it ? Seems logical ? Not sure if this could possibly create some problems .....
That way we could keep the words down and everything faster.
Thanks
-
good idea, but for this i have to use another method to compare the words... i'm working on some other functions to do this job, to make it faster and easier...
at moment i'm playing with the php function similar_text() (http://php.net/manual/en/function.similar-text.php) for this, so admins can determine a percentage value to determine the similarity of the compared strings
-
Also: got your plugin from your blog and tried it.
It seems that it doesn't work well for stopwords. While i have (for example) the tinyurl.com to be filtered as spam, all ads with it (containing it) pass as normal.
-
humm... i tried different words and cases... all time it works fine for me :(
-
humm... i tried different words and cases... all time it works fine for me :(
Well, they don't. Trust me.
Here is my stop words.
-
i'm working on it :)
-
I have uploaded a new version with improved stopword search ... now I use strpos() to search for stopwords, this is not looking for the exact word, but whether the stopword occurs at any point in the text
if you think its ok, i will upload the modified version to the market too
-
I have uploaded a new version with improved stopword search ... now I use strpos() to search for stopwords, this is not looking for the exact word, but whether the stopword occurs at any point in the text
if you think its ok, i will upload the modified version to the market too
Now seems to work, while i tested only in my dev online osclass with a few ads i posted myself.
-
The same applies to me, I could not test it in the live operation, but good to hear that all working fine now
-
The same applies to me, I could not test it in the live operation, but good to hear that all working fine now
When and if approved in Market, i will be glad to test it in a real online environment.
-
I'd like to test too.. where can i download the current version?
Thanks
-
i'm sorry... if i'm right, it is not allowed to post my webpage here, but you could take a look on my other plugins
-
I'd like to test too.. where can i download the current version?
Thanks
Download one of Liath's available plugins from Market. And see the remarks from the index and readme files. The urls are in there.
-
Some minnor Notices.
[05-Jan-2017 12:30:16 America/Los_Angeles] PHP Notice: Undefined index: sp_honeypot in /home/public_html/oc-content/plugins/spamprotection/classes/class.spamprotection.php on line 157
[05-Jan-2017 12:30:16 America/Los_Angeles] PHP Notice: Undefined index: sp_blocked in /home/public_html/oc-content/plugins/spamprotection/classes/class.spamprotection.php on line 159
[05-Jan-2017 12:30:16 America/Los_Angeles] PHP Notice: Undefined index: sp_mxr in /home/public_html/oc-content/plugins/spamprotection/classes/class.spamprotection.php on line 160
-
fixed
-
Some other suggestion: Since we are adding stop words all the time, could the list when saved to be SORTED ?
That way we could easily spot duplicates.
-
done
on the same way, blocked email adresses are sorted too
-
8)
-
The same applies to me, I could not test it in the live operation, but good to hear that all working fine now
When and if approved in Market, i will be glad to test it in a real online environment.
Vielen Dank Liath
I can tell you it works very good on a live website as it comes to the stopwords. Espacially the part that the ads or not actived is very good!
The checking for duplicate posts doesnt work well.
I see copies of ads not marked as spam.
I use it only on 1, not so busy website so maybe those spammers are just lucky :)
I wil test on a busier website at the moment
-
Thanks for this report :)
marking duplicates as spam has three conditions.
- it has to be from the same user
- it has to have the exactly same title as another ad
- only new or edited ads are checked for duplicates
-
Very Clear.
The most duplicated ads have the same content only they change te title.
Thats why i didnt noticed them
No problem because they are stopped by the stopwords
Thanks again
-
Just a small contribution:
[1] MX record check is completely unreliable and should be removed IMHO. Why? Because, it can reliably only verify domain part, not the local part (user mailbox).
[2] Also, hidden input form is useless IMHO, on my test sites it stopped literally 0 bots, as they are smart enough today to know it's a bait. That module should be removed, too.
[3] Algorithm used for stopwords is completely inappropriate. It will return many false positives.
[4] Plugin uses deprecated functions not available in PHP 7.0 (or scheduled for removal): split() and eregi().
I have not yet had time to actually install it and test it, this is just by the look in the code.
Regards
-
[1] MX record check is completely unreliable and should be removed IMHO. Why? Because, it can reliably only verify domain part, not the local part (user mailbox).
You're right, it can only check the domain, but often users are using fake mail addresses for spam. For this people who running OSClass and unregistered users can publish ads, it is a good addition i think, so why remove it?
[2] Also, hidden input form is useless IMHO, on my test sites it stopped literally 0 bots, as they are smart enough today to know it's a bait. That module should be removed, too.
I don't know how your honeypot is implemented and i know, most Bots are smart enough for this, but when you can block just one bot, why should remove this? It does'nt hurt anyone ;)
[3] Algorithm used for stopwords is completely inappropriate. It will return many false positives.
you have an example for this case?
[4] Plugin uses deprecated functions not available in PHP 7.0 (or scheduled for removal): split() and eregi().
You're right, i will change this functions to filter_var and explode...
edit:
instead of saying "this is bad, remove them", this is useless, remove them" you could make improvement suggestions in the sense of open source :)
thanks anyway for your contribution
-
This is rather strange, i run debug on and PHP 7.0.14 and i see no depreciated notices (i don't doubt they are, i'm just saying it is strange).
-
This is rather strange, i run debug on and PHP 7.0.14 and i see no depreciated notices (i don't doubt they are, i'm just saying it is strange).
dev101 is right... but i've changed it already to more future proof functions
and as addition, i've inserted new functions for stopwords... now you are able to choose between two different algorythms
- Fast Method (substr()) - searches for Substrings in Title/Description (could find are in care)
- Slow Method (preg_match()) - searches for particular words in Title/Description
I will upload the new version tomorrow after some tests
edit:
for this who wants to test the new functions too... i've updated it now
-
Hi Liath,
I am really not sure if you have actual experience with spam, but I have some real one and all above points are based on it. So, let's start with arguments of why:
[1] MX record check is completely unreliable and should be removed IMHO. Why? Because, it can reliably only verify domain part, not the local part (user mailbox).
You're right, it can only check the domain, but often users are using fake mail addresses for spam. For this people who running OSClass and unregistered users can publish ads, it is a good addition i think, so why remove it?
First of all, whoever runs a bot these days or a limited bot campaign will not use a fake mailbox. It is counter-productive. 99,9999999...999% websites use mailbox validation, so their attempt will fail and they will never know for sure if they succeeded in the first step. This is why MX check is useless (waste of resources), because thisismyfakeemail@gmail.com will return true. Also, you should not validate emails with regex on your own, Osclass has a lot of built-in helper functions to the job for you. No need for duplicate code or custom solutions (if you do have a better solution for something, you can always contribute @ github, as you already know).
Again, if we lived in the era where bots do use fake domains, it would be fine, but sadly, not. Ok, leave it, does not do much harm, but just watch out how much fake emails it will catch. And in the case of fake email, that will be the least of your problems, because other 'fields' should not fail in catching the bot, ideally.
I don't know how your honeypot is implemented and i know, most Bots are smart enough for this, but when you can block just one bot, why should remove this? It does'nt hurt anyone ;)
Essentially, the same way (basic part). The bots who are stupid enough to be caught inside a hidden form are even more stupider for the next stage that you have in the chain, trust me. If they can't pass that one, they wan't pass simple 2+2=? either. Not to mention much (much) more advanced stuff in google's nocaptcha.
you have an example for this case?
It will catch partials and trigger the net for no reason. Just test it.
You're right, i will change this functions to filter_var and explode...
edit:
instead of saying "this is bad, remove them", this is useless, remove them" you could make improvement suggestions in the sense of open source :)
Oh, common, be fair ;)
The suggestions are very easy to make (under 5-10 minutes), and I never said anything in that tone. And I would help as much as I'd like, but lately too much is going on for me, not work related. Also, 'open-source' (free) is great, but sadly, does not pay the bills and feed your family :)
Also, I deal with spam in a little different way, for one, always use google's product if you feel hopeless, it will prevent a lot more than simple solutions like this. Honestly.
thanks anyway for your contribution
You're welcome, glad to help.
Regards
-
Unfortunately this new version triggers some front end firewall protection of my hosting provider when saving the options and i can't use it / test it.
It returns 403 forbiden.
-
If something must be deleted from the plugin, that is the HTACCESS editor option. That is dangerous to have in there, you never know....
-
Hello friends i have osclass classifieds sites. some user continues post similar ads so want to stop their recent posting.
my sites -
some one help. also some people post escort ads how to stop ?
Well, how to say this .... , install this plugin and filter out what you don't want.
I hope the plugin will be in Osclass market soon (well, soon is relative, i guess in 1-2 months).
-
this topic is not about spam killer plugin
i used spam killer and works ok
now i test the plugin from this topic and have better solutions for me
but this plugin is not in the market yet
you have to be a little patience
-
spam killer plugin not proper working. when we add some stop words & check posts then many other ads display so its not 100% perfect solutions. i am using this plugin on my sites but not happy. need some best solutions.
Yeah. Ok, so please open a new topic about it.
-
Oh, common, be fair
The suggestions are very easy to make (under 5-10 minutes), and I never said anything in that tone. And I would help as much as I'd like, but lately too much is going on for me, not work related. Also, 'open-source' (free) is great, but sadly, does not pay the bills and feed your family
I'm glad about your suggestions and really dont want to attack you, but i dont do this here to feed my family, for this i have my job :)
OSClass is open source and i want to give something back, what the community and the developer have done for us. Because of this i dont want to be paid for my work here, its just for fun for me and i think we can do much more together, but mostly the people want to be paid.
Because of the Honeypot:
Most bots think different, not all are the same, there are bots they find honeypots there are invisible through "display: none" other use more specific methods, but how i say... if you only stop one bot, its a good addition...
Because of the MX Record
Dont think only bots are spammer... there are enough scripting kiddies they want to abuse you... not all use real mailaddresses from real domains, there are sometimes trying to use mail addresses like xyz@ddddd.com and this would sorted out and marked as spam
Because the Stopwords
yes, the function that was used... strpos() searches for substrings, so e.g. are would be found in care and can trigger false, but for this you can check the ad and activate it manually
Summary:
Remember this is a free plugin, i will do my best to make it for all kind of user and sites, so there are options for everyone. if you dont like an option, disable it :)
-
If something must be deleted from the plugin, that is the HTACCESS editor option. That is dangerous to have in there, you never know....
you dont think the warnings are enough? maybe i should make it clear to dont edit this file, if you dont know what you do?
Unfortunately this new version triggers some front end firewall protection of my hosting provider when saving the options and i can't use it / test it.It returns 403 forbiden.
Thats strange... i take a look on this
edit:
did you know what exactly are blocked from this firewall?
-
Unfortunately this new version triggers some front end firewall protection of my hosting provider when saving the options and i can't use it / test it.It returns 403 forbiden.
Thats strange... i take a look on this
edit:
did you know what exactly are blocked from this firewall?
Not exactly but when i select the first radio of the stopwords and save, i get 403. The second radio (the exact word) seem to allow me to save (i didn't know that when i 1st posted my problem).
I had that also a couple of times with Wordpress plugins. I couldn't save their changed options. Nothing i could do about it.
-
I had that also a couple of times with Wordpress plugins. I couldn't save their changed options. Nothing i could do about it.
really strange ... so i dont know if it is caused through my plugin ...
-
I had that also a couple of times with Wordpress plugins. I couldn't save their changed options. Nothing i could do about it.
really strange ... so i dont know if it be caused through my plugin ...
Not 100% sure it is caused BY your plugin, probably a side-effect from my hosting security. But it is happening for sure IN your plugin. Maybe the stop words trigger something ? I have no idea.
Or your url during saving/edit cause this ?
oc-admin/index.php?page=plugins&action=renderplugin&file=spamprotection/admin/config.php?tab=settings
-
but in all case if you save the settings, the same url is used... and you can save it, if you choose to search for particular words.
Just save the settings dont use the search algorythm, so i think this cannot be the error
edit:
i've changed the url a little bit and uploaded new package, maybe now it works fine
at first the parameter tab was separated by ? but correct would be & because there was always another parameters
-
Hi guys sorry for the delay, have been away.
I just tested the plugin on a test (but complete) environment .. and it seems the plugin is marking spam to listing that have different descriptions too.
Suggestions -
1.) It seems it checking the titles, from the same email. I really think there can be an option to check title+description too.
2.) When an item is marked spam, currently the standard flash message displays "your listing has been published" . Maybe there can be a custom flash message, which can be modified from the dashboard? (if this is hard, just a message that says "listing marked as spam")
Nonetheless, this plugin seems to be going in a nice direction. Great job Liath.
-
2.) When an item is marked spam, currently the standard flash message displays "your listing has been published" . Maybe there can be a custom flash message, which can be modified from the dashboard? (if this is hard, just a message that says "listing marked as spam")
Nonetheless, this plugin seems to be going in a nice direction. Great job Liath.
Why should you help the spammer to let him know that the ad is not publiced
-
and it seems the plugin is marking spam to listing that have different descriptions too.
they are marked as duplicates or for another reason? Actually the plugin mark ads as duplicate if it is from one user with the same title. For know i dont have false marked ads, but i'll keep watching
It seems it checking the titles, from the same email. I really think there can be an option to check title+description too.
I can make it optional, to check descriptions too, but it will need much more time for (i think) less success
When an item is marked spam, currently the standard flash message displays "your listing has been published" . Maybe there can be a custom flash message, which can be modified from the dashboard? (if this is hard, just a message that says "listing marked as spam")
About this point i'm still thinking. We should'nt inform the user about that, that his ad is marked as spam
- he would be afraid, he did something wrong (if he is a real user)
- we would inform spammer, that we have some protection installed and he would do a better job next time
Nonetheless, this plugin seems to be going in a nice direction. Great job Liath.
Thank you :)
-
About this point i'm still thinking. We should'nt inform the user about that, that his ad is marked as spam
Hmm, yes that makes sense.
I see that users are sent emails for spam posts too. You think you could stop that?
Also, for the description, some users change the title like "buy xxx in london" and "buy xxx in liverpool" ... but with the same description. I think checking the description can be a good addition (even better if that can be made an option on the dashboard)
-
Unfortunately this new version triggers some front end firewall protection of my hosting provider when saving the options and i can't use it / test it.It returns 403 forbiden.
I've changed the settings page from radio buttons to select boxex... maybe this will help for this error.
Also, for the description, some users change the title like "buy xxx in london" and "buy xxx in liverpool" ... but with the same description. I think checking the description can be a good addition (even better if that can be made an option on the dashboard)
In the new version, this is now optional
-
Unfortunately this new version triggers some front end firewall protection of my hosting provider when saving the options and i can't use it / test it.It returns 403 forbiden.
I've changed the settings page from radio buttons to select boxex... maybe this will help for this error.
Well ok, thanks but this is not a problem of your plugin but rather my hosting security (over-reacting i guess).
-
don't worry about that :) it could be happen through my plugin, but good to know that it isn't
-
I see that if an ad is marked as SPAM it is also DE-Activated. Why ?
Now i need two steps to activate and unmark as spam.
-
If the ad is only marked as spam the ad is still visible i think.
Maybe a option to give u the choice to deactivate or mark as spam maybe a solution. But i dont need that.
This way the ad is not visible.
The ad saves me a lot om time by blocking hundreds of ads every day.
Only sometimes a ad pass but more then 90 procents are blocked
-
If the ad is only marked as spam the ad is still visible i think.
Maybe a option to give u the choice to deactivate or mark as spam maybe a solution. But i dont need that.
This way the ad is not visible.
The ad saves me a lot om time by blocking hundreds of ads every day.
Only sometimes a ad pass but more then 90 procents are blocked
No, flagged as spam are not visible.
-
So, if somebody mark a ad as spam on the website it is not visible from that moment?
I didnt know that. But still no problem whit the plugin.
Do you use it on live site at the moment?
-
So, if somebody mark a ad as spam on the website it is not visible from that moment?
I didnt know that. But still no problem whit the plugin.
Do you use it on live site at the moment?
Mark as spam of a LISTING from a user from the dropdown of the ad page DOESN'T disable the ad.
Mark as spam within the admin listing also hide the ad.
This is how Osclass works, trust me.
-
offcourse i believe you, only didnt know that
-
If you are using the inbuild spamcheck, there is an option to unmark and activate at once
-
Thanks Liath, didn't know that. Ok, seems faster than two clicks.
-
welcome :)
-
BLOCK user within the plugins option, doesn't work. Doesn't Block the user.
Also in multi-language site, the Check Spam page is not showing well. No ad body and other visual problems.
-
i will check this
-
i will check this
To help you more, here is a capture of a two languages site that i admin:
-
i will check this
To help you more, here is a capture of a two languages site that i admin:
The user IS registered (and not blocked). Not sure why it says Not Registered.
-
Is this happened to all or this single ad? It looks like the check page didnt become the ad id and cannot read informations about user and ad.
Can you check please, if the itemid is submitted to the check page?
-
Is this happened to all or this single ad? It looks like the check page didnt become the ad id and cannot read informations about user and ad.
Can you check please, if the itemid is submitted to the check page?
This happens only to ONE multi-language site. Single language sites work great.
ALL flagged spam ads, show the same.
The url is:
oc-admin/index.php?page=plugins&action=renderplugin&file=spamprotection/admin/check.php&itemid=4464
-
Also:
PHP Warning: Invalid argument supplied for foreach() in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 117
-
PHP Warning: Invalid argument supplied for foreach() in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 117
This error is according to the missing item id the check page did not received. So we have to find out why the id is missing and for now i have no answer for this.
$id = Params::getParam('itemid');
didnt get any result (rather unlikely)
$item = $sp->_getRow('t_sp_items', array('key' => 'fk_i_item_id', 'value' => $id), 'pk_i_id', 'DESC');
didnt get any data about the ad in due of missing $id
$item_spam = $sp->_getResult('t_item_description', array('key' => 'fk_i_item_id', 'value' => $item['fk_i_item_id']));
didnt get any result, in due of missing $item['fk_i_item_id']
you can try to have a look, if id is submitted
add in spamprotection/admin/check.php
echo 'SUBMITTED ID: '.$id;
before Line: 10
$item = $sp->_getRow('YOUR_PREFIX_t_spam_protection_items', array('key' => 'fk_i_item_id', 'value' => $id), 'pk_i_id', 'DESC');
after refreshing the check page you should see the submitted id
or you could try to hardcode the needed tablename in spamprotection/admin/check.php Line: 10
$item = $sp->_getRow('YOUR_PREFIX_t_spam_protection_items', array('key' => 'fk_i_item_id', 'value' => $id), 'pk_i_id', 'DESC');
-
1. SUBMITTED ID: 4477 (with the echo command)
2. This
$item = $sp->_getRow('YOUR_PREFIX_t_spam_protection_items', array('key' => 'fk_i_item_id', 'value' => $id), 'pk_i_id', 'DESC');
doesn't seem to show anything or i didn't got what i must do.
-
sorry, i forgot... you have to replace YOUR_PREFIX with your used database prefix
edit: i have uploaded a second version with some debugging features. This writes some info into your debug.log after calling the check page, if you want you can try this... the info should look like this
Reading table: `MY_PREFIX_t_spam_protection_items`
Reading ARRAY $where:
Array
(
[key] => fk_i_item_id
[value] => 29
)
SHOWING RESULT
Array
(
[pk_i_id] => 33
[fk_i_item_id] => 29
[fk_i_user_id] => 1
[s_reason] => Bad/Stopword found. Please check this ad manually
[s_user_mail] => mymail@domain.tld
[dt_date] => 2017-01-24 20:10:10
)
-
Hello Liath,
Please note that you have some typos in the plugin. See screenshot.
It's E-Mail addresses/E-Mail address
Also, the correct wording for Summary blocked ads, is Blocked ads summary.
.htaccess Editor (correction needed here also): Beware of editing this file, unless you know what you're doing!!!
And as a suggestion, for the e-mail blocking, you should add another field there: Blocked email domains.
This would help us deal with mail domains that are known for spam, like mail.ru, bigmir.net etc.
Also, at the moment the Check Spam feature is only available for Administrators.
Given that a Moderator has access to the listings moderation, I think he should have access to the Spam Checking also.
Can this be done?
Regards.
-
Hello Liath,
Please note that you have some typos in the plugin. See screenshot.
It's E-Mail addresses/E-Mail address
Also, the correct wording for Summary blocked ads, is Blocked ads summary.
.htaccess Editor (correction needed here also): Beware of editing this file, unless you know what you're doing!!!
thanks, i will correct this
And as a suggestion, for the e-mail blocking, you should add another field there: Blocked email domains.
This would help us deal with mail domains that are known for spam, like mail.ru, bigmir.net etc.
i can look for this feature for the next update
Also, at the moment the Check Spam feature is only available for Administrators.
Given that a Moderator has access to the listings moderation, I think he should have access to the Spam Checking also.
Can this be done?
it isn't restricted through this plugin, maybe there are possibilities to make the hook available for moderators, i have to look for this
-
@liath here are the results:
Reading table: `oc_t_spam_protection_items`
Reading ARRAY $where:
Array
(
[key] => fk_i_item_id
[value] => 4477
)
SHOWING RESULT
Array
(
)
PHP Notice: Undefined index: fk_i_item_id in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 11
PHP Notice: Undefined index: fk_i_user_id in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 12
PHP Notice: Undefined index: s_user_mail in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 13
PHP Notice: Undefined index: s_user_mail in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 95
PHP Warning: Invalid argument supplied for foreach() in /home/public_html/oc-content/plugins/spamprotection/admin/check.php on line 117
-
strange...
it seems the entry for this item id was not written to oc_t_spam_protection_items or did no longer exist. Maybe you handled this item before or it is deleted?
-
strange...
it seems the entry for this item id was not written to oc_t_spam_protection_items or did no longer exist. Maybe you handled this item before or it is deleted?
Of cource it exists and it is not deleted. I delete nothing. Flagged spam is deleted every 30 days by Butler plugin.
I handled nothing, touched nothing. Nothing strange or modified. A very stable Osclass installation. It is something about the MULTY LANGUAGE since this the only difference with other sites i run. Why don't you add a second language and try ?
-
in my installation i'm using three languages... bangla, english and german... all works fine here... so for the moment i didnt know what caused this error :-\
-
in my installation i'm using three languages... bangla, english and german... all works fine here... so for the moment i didnt know what caused this error :-\
I wish i could also help more. What is the default in your languages ? Maybe set default the bangla ?
-
in my installation i'm using three languages... bangla, english and german... all works fine here... so for the moment i didnt know what caused this error :-\
I wish i could also help more. What is the default in your languages ? Maybe set default the bangla ?
default is bangla
Some questions:
- Is this a real spam ad or a test ad from you?
- Do you know the reason for the ad marked as spam?
- Can you take a look into the database, if the entry for this item id exist in table oc_t_spam_protection_items
- If this not help, can you try another debugging version with more informations later?
Maybe it wasn't written to the database, for a reason we don't thought about it and i try error fixing on a wrong point
-
Some questions:
- Is this a real spam ad or a test ad from you?
- Do you know the reason for the ad marked as spam?
- Can you take a look into the database, if the entry for this item id exist in table oc_t_spam_protection_items
- If this not help, can you try another debugging version with more informations later?
Maybe it wasn't written to the database, for a reason we don't thought about it and i try error fixing on a wrong point
Answers:
1. Yes, real spam, 3 of them. Just posted one more myself and same problem.
2. Yes. Words like "loan" are flagged.
3. The table is EMPTY
4. I can try whatever you want but not today (it is 2 in the morning in Europe).
-
@Liath do not do anything more on this. Tomorrow i will complete erase your plugin and tables and try from scratch.
-
i have finished already a modified debugging version, that allows me a deeper insight which function isn't working, i will upload it tommorow, it's nearly 01:00 am here also
but, if your table is empty, there is a good point to look why it is empty... maybe because of different configurations it wouldn't write the entry in your database
thank you for your help :)
-
i have finished already a modified debugging version, that allows me a deeper insight which function isn't working, i will upload it tommorow, it's nearly 01:00 am here also
but, if your table is empty, there is a good point to look why it is empty... maybe because of different configurations it wouldn't write the entry in your database
thank you for your help :)
Yeap. I remember some 403 error (from mod_security) when saving the stopwords and maybe that created something wrong. So i hate to make you chase ghosts, i will solve this tommorow.
I have 5 Osclass, your plugin works great in 4, so maybe something is wrong on my side.
-
Yeap. I remember some 403 error (from mod_security) when saving the stopwords and maybe that created something wrong. So i hate to make you chase ghosts, i will solve this tommorow.
I have 5 Osclass, your plugin works great in 4, so maybe something is wrong on my side.
don't worry about that, if there is an error in my plugin, it's in my interest to solve it :)
if not, maybe i can help you to solve errors on your side ;D
-
Good afternoon! I can not find a link to download?
-
don't worry about that, if there is an error in my plugin, it's in my interest to solve it :)
if not, maybe i can help you to solve errors on your side ;D
Today i totally uninstalled your plugin (cleaning all the tables also) and i reinstalled from scratch. And guess what: it works well.
Deeply apologize for the trouble.
-
Good afternoon! I can not find a link to download?
Since it is now allowed to post any links here anymore (sigh...), go to Osclass market, download a plugin of Liath's and look at Index.php.
-
don't worry about that, if there is an error in my plugin, it's in my interest to solve it :)
if not, maybe i can help you to solve errors on your side ;D
Today i totally uninstalled your plugin (cleaning all the tables also) and i reinstalled from scratch. And guess what: it works well.
Deeply apologize for the trouble.
Sounds really good and don't worry about the trouble :)
-
Sometimes i get a perfectly ok ad to be flagged as spam. Because of some substring probably.
I would be of great help if the manual Check Spam option could also highlight the string which trigger the spam filter, so we know and modify the stopwords.
Thanks !
-
i've thought sometimes about this and will be modify it in next version
-
Hello Liath,
I'm not sure why, but sometimes a completely duplicate ad is not marked as spam. And sometimes it does catch the duplicate post.
-
it triggers for title or description?
to prevent to compare everytime the whole descriptions i've only convert the description to md5 and compare the result, maybe there is any difference in it
-
@Liath Currently my settings are just for checking the title.
-
.
-
Today i have some exactly same spam posts too.. same content, same email and yet they are not marked as spam. When i test to duplicate posts however, the spam does get caught.
I'm not sure what could possibly be causing this.. any ideas?
Edit - After i manually marked a couple of items as spam, and when i click on "Check Spam" on the manage listings page.. i see "user is not registered" (that actually is a registered user), and the number of ads space as blank, the ad date is blank too. On the right hand side i see ---
Warning: Invalid argument supplied for foreach() in /serverpath/public/oc-content/plugins/spamprotection/admin/check.php on line 117
-
Edit - After i manually marked a couple of items as spam, and when i click on "Check Spam" on the manage listings page.. i see "user is not registered" (that actually is a registered user), and the number of ads space as blank, the ad date is blank too. On the right hand side i see ---
Warning: Invalid argument supplied for foreach() in /serverpath/public/oc-content/plugins/spamprotection/admin/check.php on line 117
True. If you MANUALLY just mark as spam, then you get the above.
-
True. If you MANUALLY just mark as spam, then you get the above.
I see.
I have disabled the plugin on my website. It is also randomly marking some posts as spam, even to completely unique ads. I'm not sure what is wrong, but at the moment i can't use it.
@Liath.. i know you've put in a lot of hard work here, and i hope you can make this better.
-
I see.
I have disabled the plugin on my website. It is also randomly marking some posts as spam, even to completely unique ads. I'm not sure what is wrong, but at the moment i can't use it.
@Liath.. i know you've put in a lot of hard work here, and i hope you can make this better.
Works great for me. And sure, sometimes it marks or misses some, but this is how it works. My previous spam filter also did that.
SOME admin check is needed always.
-
it should be only a helper and was not thought as automatic spam free solution, for now i dont know why sometimes ads are marked wrong, but still work on it and try to make it better...
when you mark manually ads as spam, the error occurs because the missing table entry for automatically marked ads.
-
it should be only a helper and was not thought as automatic spam free solution, for now i dont know why sometimes ads are marked wrong, but still work on it and try to make it better...
Marked wrong as spam ads: could be solved/explained if you add in Check Spam the reason it was marked.
-
Sometimes "some" duplicates are wanted (for example if a multi-country). So maybe an option could be added, to allow a certain number of duplicates.
-
Today i had also the above problem. Spam/Fraud post were not blocked.
The word was "Stroller" in the ad, and the block word was "stroller" (with lower letter). Substring selected.
So the post had "Stroller" and was not blocked.
An other example of NOT blocked is an ad with "LOAN" (all capital). The stop word was "loan" (lower). And the ad was not blocked.
Maybe the Capital / Lower had something to do ?
Apparently this is major and serious bug.
Thanks
-
@Liath If you start working on the next version of this plugin (looking forward to it ;D), please don't forget about the previous issues/suggestions:
First (http://forums.osclass.org/general-help/duplicate-posts/msg146807/#msg146807), Second (http://forums.osclass.org/general-help/duplicate-posts/msg146927/#msg146927), Third (http://forums.osclass.org/general-help/duplicate-posts/msg147518/#msg147518).
Thanks!
-
Yes, in my next version i will change this all... but it need some time, at moment i'm very busy with my other projects
-
New Version: 1.3.4
- Added found string for stopwords in check page
- Improved the search for stopwords and duplicates (compare always in lowercase now)
- Changed some translations
Sometimes "some" duplicates are wanted (for example if a multi-country). So maybe an option could be added, to allow a certain number of duplicates.
Maybe i will add this feature in the future
-
New Version: 1.3.4
- Added found string for stopwords in check page
- Improved the search for duplicates (compare always in lowercase now)
- Changed some translations
Thanks.
You may also want to update the version in index.php, since it shows Version: 1.3.2.
Also you should offer this plugin Via Osclass market.
-
Thanks.
You may also want to update the version in index.php, since it shows Version: 1.3.2.
Also you should offer this plugin Via Osclass market.
damn... i'll change this ::)
i've uploaded it yesterday... last time i had used wrong language domains, so it wasnt published in OSClass Market, but now i'm sure it will reviewed succesfully and will be available soon
-
Thanks Liath, i'm gonna try this on the live website and give my feedback.
Btw, could you tell what you mean by this -
Improved the search for duplicates (compare always in lowercase now)
Do you mean you first convert the text inputted for blocking in the admin panel in lowercase, and then compare it with what the user has posted?
-
Yes,
In order to avoid mistakes, now all strings will be changed to lowercase before comparing. I hope this improves the search algorithm.
-
New Version: 1.3.4
- Added found string for stopwords in check page
- Improved the search for duplicates (compare always in lowercase now)
- Changed some translations
Sometimes "some" duplicates are wanted (for example if a multi-country). So maybe an option could be added, to allow a certain number of duplicates.
Maybe i will add this feature in the future
Thank you Liath!
Amazing work as always!
-
Improved the search for duplicates (compare always in lowercase now)
I would appreciate about feedback on this feature. Whether ads are still marked as incorrect or not.
Thank you all for your help
-
Improved the search for duplicates (compare always in lowercase now)
I would appreciate about feedback on this feature. Whether ads are still marked as incorrect or not.
Thank you all for your help
For me marked or not ads were wrong NOT in the duplicate function but in the spam stop-words.
Has that changed ?
-
Improved the search for duplicates (compare always in lowercase now)
I would appreciate about feedback on this feature. Whether ads are still marked as incorrect or not.
Thank you all for your help
It works for me.
For example if I specify apple as a stopword, and post and ad using !ApPlE, it get's marked as spam. :)
Also, thanks a lot for the Block banned E-Mail TLD feature, this works great too.
The only issue i'm having, is with Toggle HTML/Code button which doesn't really look like a button. Very hard to notice.
Also, clicking on the item ID in the Blocked ads summary, doesn't do anything.
Other than that, still a few small typos here and there, like: Block banned E-Mailadresses , wo'nt
Cheers!
-
Are we talking about the same plugin, i wonder.
What Duplicates upper/lower have to do with stop words ?
Am i missing something ?
-
For me marked or not ads were wrong NOT in the duplicate function but in the spam stop-words.
Has that changed ?
Yes... i've changed it for stopwords and search for duplicates... i think i forgot this in the update notification xD
-
The only issue i'm having, is with Toggle HTML/Code button which doesn't really look like a button. Very hard to notice.
Please clear your browser cache, the button looks like my screenshot
Also, clicking on the item ID in the Blocked ads summary, doesn't do anything.
I noticed it... normally it should redirect to the edit page... have to fix it
Other than that, still a few small typos here and there, like: Block banned E-Mailadresses , wo'nt
I can't speak/write perfect english... when you find some typos, pls tell me the exact place to change it :)
-
Hello,
I've been running the latest version 1.3.4 for a couple of days. Some genuine posts still get caught as spam for some reason. And same with some spams not getting caught. However, the percentage is fairly low.
The keyword blocking works quite good.
Though it's not perfect, but it does reduce manual work a bit.
Thanks.
-
Hello,
I've been running the latest version 1.3.4 for a couple of days. Some genuine posts still get caught as spam for some reason. And same with some spams not getting caught. However, the percentage is fairly low.
The keyword blocking works quite good.
Though it's not perfect, but it does reduce manual work a bit.
Thanks.
The REASON is now shown so you can see what is wrong (or not).
See:
-
it would be nice to know which cases are (not) marked wrong...
-
Here are some language corrections:
Plugin description:
Spam Protection for OSClass. Checks new ads for duplicates and banned email-addresses or stopwords, inclusive honeypot and more.
Correction: Spam Protection for Osclass. Checks new ads for duplicates, banned e-mail addresses and stopwords. Includes a honeypot and many other features.
Settings area:
This Option activates the whole Spam Protection. Some features are optional and can be de/activated separetely
Correction: This option activates the whole Spam Protection. Some features are optional and can be de/activated separately.
This Option enables the System to check the MX Record of the sbmitted Mail address
Correction: This option enables the System to check the MX Record of the submitted Email address.
This Option enables the System to block ads from banned E-Mailaddresses
Correction: This option enables the System to block ads from banned Email addresses.
This method searches in title/description for substrings and mark ads as spam if found (e.g. are will be found in care)
Correction: This method searches in Title/Description for substrings and if found, marks ads as spam (e.g. 'are' will be found in 'care').
This method searches in Title/Description for particular words and mark ads as spam if found (e.g. are wo'nt be found in care)
Correction: This method searches in Title/Description for particular words and if found, marks ads as spam (e.g. 'are' won't be found in 'care').
HELP area:
This plugin gives you the option to block spam on different ways
Correction: This plugin gives you different options to stop spam.
You can configure this plugin on two ways.
Correction: You can configure this plugin in two ways.
Hope it helps. :)
-
Here is my 2 cents :
plugin spam check
save md5 title,description into plugin table
Function check_spam ($item){
If Md5 (title(slug) ) exist in database
{
IF md5(description) exist {
mark as spam
}
}
Osc_add_hook('posted_item',check_spam');
Not a fancy plugin but will check for exact title ,description duplicates faster than text search , string comparation ..
-
thank you both...
@TangoX
i will change it with new update
@Web-Media
good idea, will think about a good implementation for this
-
I have a huge problem with people registering, and then use the Contact publisher to spam the ads posters.
Could the plugin solve this also ? Could those contact emails be filtered ? Could COMMENTS filtered also ?
Thanks
-
The REASON is now shown so you can see what is wrong (or not).
Thanks, i will share my reason the next time i see a wrong marked post.
-
it would be nice to know which cases are (not) marked wrong...
Hi Liath, today morning i have 3 posts exactly the same that are not marked as spam. Tell me how would you like me to help you debug this.
Regards,
-
it would be nice to know which cases are (not) marked wrong...
Hi Liath, today morning i have 3 posts exactly the same that are not marked as spam. Tell me how would you like me to help you debug this.
Regards,
All from the same user ?
-
All from the same user ?
Exactly. Same everything including email.
-
Tell me how would you like me to help you debug this.
i will prepare a debug version for this and inform you here when you can download it... but please be patient, i'm very busy at moment :-\
-
Liath, in your next version could you please add another button next to Block User, for BAN User!, that uses the standard Osclass Ban rules (http://localhost/oc-admin/index.php?page=users&action=ban) and completely bans the user IP & Email, using SPAM as a reason?
I'm asking this because on my website, everyone can post an ad and many users can't be blocked as they don't have an account.
Also, please don't forget about the issue with the clicking on the item ID in the Blocked ads summary, that doesn't do anything and about the language corrections.
Thanks!
-
ok... i'll check this :)
-
Well, instead of hash (it is very sensitive, and I don't think that it is reliable method to be used for duplicate detection, one extra space or line break and it will fail), you can switch to similarity % calculation and set it to something like 99% threshold (or custom).
$text1 = 'abcdefg';
$text2 = 'bcdefsgjk';
$similarity = similar_text($text1, $text2, $percent);
if ($percent >= 99) {
// block code
}
This problem is best dealt with neural networks and AI, that can be trained over time to learn what is similar and what is not, but that may be well beyond our knowledge and scope for now (like facebook does with deeptext) :)
-
My first intention was to use this function, maybe i can add this as an option, so the user can choose between both
thx dev101
-
Btw, i got 2 more spam posts that were not caught today.
I´ll wait for your debug version.
Thanks
-
@Liath,
I know i have posted that before, still why this plugin is not available on Osclass Market (yet) ?
It is hard for users to find it, since we are not even allowed to post any links here (go figure ...).
Thanks !
-
I have tried many automated script and word filtering. Still some dude from third world countries are sitting and copy pasting contents like they got diarrhea.
The easy way is to piss them off by adding the waiting time like 5 mins between each post.
-
I have tried many automated script and word filtering. Still some dude from third world countries are sitting and copy pasting contents like they got diarrhea.
The easy way is to piss them off by adding the waiting time like 5 mins between each post.
Many ? But there aren't many available.
Also if you correctly and carefully filter, you cut down most of them. Of cource nothing will solve spam 100%, always some administration is needed.
-
@Liath,
I know i have posted that before, still why this plugin is not available on Osclass Market (yet) ?
It is hard for users to find it, since we are not even allowed to post any links here (go figure ...).
Thanks !
i hope it will reviewed succesfully soon
-
I'm also bombarded with duplicated recently BUT from different email accounts.
So i was thinking: it is hard to check realtime (when the ad is posted) for TITLE and DESCRIPTION duplicated in the WHOLE database.
BUT it can be done in chunks during some CRON.
How about it ?
-
real-time is easy from coding perspective, all you have to do is just scan entire database each time new item is posted. From server side, it could be very challenging, depending on your # of items and if server is not up to the task.
md5() is a very fast function, but not much reliable in duplicate detection, as we've seen above. On average, it is 750-1000 times faster in execution than above proposed similarity function (so, for example, it may take 3 seconds to scan 1000 items with similarity, and only just 0.003 with md5). Values are informative (tested on various systems, running PHP 5.4 ~ 7.x, it also depends greatly on the length of the content being compared). Your database server will also add to the overall delay in execution, plus other traffic/tasks, as well.
real-time, for that matter, can be also very dangerous, causing entire server to hang while it does the work, possibly even crashing it.
And cron solution is harder to implement in the code, because you need to keep track of each comparison job, number of scanned items etc. Also, if you compare something to 1000 items, you will still not get the complete picture if it is duplicate or not.
Another clever idea is to limit scanning only to the N latest items, that should greatly speed up the execution, but may be blind for older items.
-
@dev101
Yes, that was what i ment, realtime check is very resources consuming for the server.
As for CRON check, i believe what you say about coding difficulty. But i don't get why the results could not be correct ? What do you mean ? If you mean 100% correct, of cource not. But what is the difference in CHECKING accuracy between realtime and CRON cleanup ?
Also i think Osclass already uses CRON chunks to count regions etc etc ? Could that help to simplify the tracking ? I mean insert something in there and let Osclass do the tracking of items checked ?
-
Say, you set a limit to scan/compare with 1000 other items every hour. So, new item is posted, initial scanning batch begins, you scan with first* 1000 items, no duplicates found. But, what if duplicates are contained in the rest of the items you have (say, total of 20k site-wide)? This is what I meant if you split the scanning into batches, you might not be sure if it is a dupe or not.
* this is actually why you maybe don't need entire scanning in the end, only small sample of, say, latest 1000 items, as the probability for duplicates is much higher in this case, than scanning items from past weeks or months.
-
Say, you set a limit to scan/compare with 1000 other items every hour. So, new item is posted, initial scanning batch begins, you scan with first* 1000 items, no duplicates found. But, what if duplicates are contained in the rest of the items you have (say, total of 20k site-wide)? This is what I meant if you split the scanning into batches, you might not be sure if it is a dupe or not.
Ok, then the new ads will be scanned in the next few CRONS. What is wrong with that, appart from some delay ? Could that be .... some ads will never be scanned ?
-
Well, how do you exactly imagine cron chunking? What is your idea about it?
* * *
Addition to previous thoughts/ideas: when you find duplicates (say, more than 5 - that is an alarming rate), user account should be temporarily suspended, to stop posting.
Also, scanning algorithm should be set to ignore if an item is active, spam or blocked, or coming from different accounts -- because, items that are blocked contain precious information about potential spam, if you only compare new items to the fresh content, and for example, spammer registers one last time and posts again, that one last item might be missed, if it is only compared with it's items (and it will have only 1 item in total).
-
Well, how do you exactly imagine cron chunking? What is your idea about it?
As i said, Osclass already uses chunk CRONS to count (?) regions/states or something. 5000 per CRON if i remember (i actually asked for chunks and Daniel was kind to do it a few years back because i was out of memory back then).
So could we inject something in there ?
If not, then in my case, an hourly cron check of the latest ads against the db, could do the job. I don't have that much ads posted daily, maybe 100 or 150.
For checking the complete ammount of ads against each other, i will have to think about it.
-
Yeah, but those are FIXED numbers, you don't really change your regions and cities in a daily fashion, do you?
This is why that algorithm is much simpler, you scan first 1000 cities, then next, simply storing the multiplication index into db.
$limit = max(1000, ceil($total_cities/22));
And items are live, they constantly change, new are added, old get deleted, expire, indexes are piling-up etc. Also, you need to store the scanning shift index for every item, separately. Say, you publish new item, compare it it latest N values, store position where you stopped. And next time, new items are already published, so you can go either way.
Again, if we limit the scan to latest items only, things become much more simpler.
-
Again, if we limit the scan to latest items only, things become much more simpler.
Well, duplicate ads usually come within a fixed period of time. Some one is hiring some people to post the same (more or less) ad. So checking OLD ads (=more that ... say ... a month old) could be useless.
So maybe Latest items could mean within a period of time, maybe a month or two ?
-
Well, not exactly useless, but less harmful, sure.
Say, you have a user who tries to "re-publish" (post again) his same item over and over again (every month or so, and given your site new items frequency of 100-150/day, it will be missed, if the limit is set to 1000 latest only). So, if you wish to prevent even those, another function - low priority one mind you - could run via cron, to scan items from particular users only and, well, block either the old ones, or new ones (per your preference).
The key point is, of course, to stop new "immediate" duplicates, and this can also be relatively simple to implement, just limit the scan to latest N items and you are safe. Another function, low priority one, could be split to check users with IDs, say from 1 to 1000, then next batch, then next, and split them around a full 24 hours period.
That should be more acceptable, performance wise.
-
One /off topic/ question - do you use noCaptcha reCaptcha? Also, do you block cloud services somehow? Maybe CloudFlare's filtering? Strange that you get so much spam, but is it mostly human-driven?
I use ZB Block (as per your old suggestion :) ) for years now. There are just 2 minor changes you need to make to be PHP 7 compatible (http://www.spambotsecurity.com/forum/viewtopic.php?p=24722&sid=37a43c24066ee229aa0e82226ed2568d#p24722), and with latest forks for the rules @ github by Maikuolan (zbb-dirty-30 (https://github.com/Maikuolan/zbb-dirty30-fork), zbb-badip-fork (https://github.com/Maikuolan/zbb-badip-fork), and CIDRAM (https://github.com/Maikuolan/CIDRAM) (newer ip filter), it works really great. With some scripting wizardry, setting up auto-updates from github can put you at a complete ease of mind :)
-
One /off topic/ question - do you use noCaptcha reCaptcha? Also, do you block cloud services somehow? Maybe CloudFlare's filtering? Strange that you get so much spam, but is it mostly human-driven?
I use ZB Block (as per your old suggestion :) ) for years now. There are just 2 minor changes you need to make to be PHP 7 compatible (http://www.spambotsecurity.com/forum/viewtopic.php?p=24722&sid=37a43c24066ee229aa0e82226ed2568d#p24722), and with latest forks for the rules @ github by Maikuolan (zbb-dirty-30 (https://github.com/Maikuolan/zbb-dirty30-fork), zbb-badip-fork (https://github.com/Maikuolan/zbb-badip-fork), and CIDRAM (https://github.com/Maikuolan/CIDRAM) (newer ip filter), it works really great. With some scripting wizardry, setting up auto-updates from github can put you at a complete ease of mind :)
I don't use ZBBLOCk any more for my Osclass and Wordpress sites.
For Wordpress i use other better plugins.
For Osclass it blocks some human posters so now i don't block anything. I prefer to check for bad words and duplicates.
-
sorry...
i dont read and understand it all... is there a better solution to compare title/description?
i think to let the admin choose between md5 (fast/many false) and similarity (slow/less false) should be the best way. I don't see another good solution
-
How about a function that checks the entire DB for duplicates manually (irrespective of email addresses).
Or maybe it can run at a particular time, say everyday at 3:00 am (the timer could be set in the admin panel).
Or maybe as a queued job whose limit can be set by the admin?
-
sorry...
i dont read and understand it all... is there a better solution to compare title/description?
i think to let the admin choose between md5 (fast/many false) and similarity (slow/less false) should be the best way. I don't see another good solution
We are talking about check for duplicates for Different Emails (accounts) and not REALTIME. Via a cleanup/CRON.
-
@Liath Any news on a new version containing: 1 (http://forums.osclass.org/general-help/duplicate-posts/msg147867/#msg147867), 2 (http://forums.osclass.org/general-help/duplicate-posts/msg148050/#msg148050), 3 (http://forums.osclass.org/general-help/duplicate-posts/msg148025/#msg148025), 4 (http://forums.osclass.org/general-help/duplicate-posts/msg148061/#msg148061)?
Thanks!
-
but please be patient, i'm very busy at moment :-\
sorry... no
-
Plugin is available on OSClass - Market now: https://market.osclass.org/plugins/security/spam-protection_787 (https://market.osclass.org/plugins/security/spam-protection_787)
here are the new thread: http://forums.osclass.org/plugins/(plugin)-spam-protection/ (http://forums.osclass.org/plugins/(plugin)-spam-protection/)
-
I will test this on live server